Hi,
i am writing a software library that has some of its algorithms implemented in two different way: an OpenCL implementation and a plain C++ implementation. The problem i am facing, among many others, is the following.
Suppose the algorithm i am writing (call it A) take, as input parameter, an RGB image and does the following computations:
- RGB->gray scale->binary image
- component labeling
The second part is done to detect all connected components in the binary image. The plain C++ algorithm returns an array of ConnectedComponent structs:
struct Pixel
{
int x, y;
};
struct ConnectedComponent
{
int contour_len;
Pixel *contour;
};
So, if A find N connected components, it will allocate an array of type ConnectedComponent of length N and, for every element, it will allocate memory for the contour pointer.
This is not a good solution regarding OpenCL. One need to avoid the Pixel* pointer. I see two possible solution here:
suppose N is the number of connected components and M is the sum of contour_len
variables. Then, one could allocate a single linear buffer like:
ConnectedComponent ccs = (ConnectedComponent ) new
char[Nsizeof(ConnectedComponent) + Msizeof(Pixel)];
You need to setup the contour pointers adeguately and use it not as a pointer but as an offset relative to ccs. However, this design is not useful when you find a connected component at time and add it to the list of connected components found…
Define the ConnectedComponent structure as follows:
struct ConnectedComponent
{
IMemoryObject *contour;
};
where IMemoryObject interface encapsulate a cl_mem object (or an alternative simple
implementation when OpenCL is not used).
How would you address the design in this situation?
And what if i want ConnectedComponent be a class?
What i am saying is that i want to use algorithm A without bothering if the implmentation actually is OpenCL or plain C++. But how to handle input/output parameters to algorithm A? One thing to keep in mind is the following:
suppose you have algorithms A, B, C. You want to execute ABC, using the results of A as input parameters of B and the results of B as input parameters of C. I want to do this minimizing device<->host memory transfer.
Any idea/suggestion?