Hello all! I am trying to write a FAST (Corner detection Algorithm) function in opencl, but I am finding that just copying the memory to the OpenCl buffer and running an empty kernel is taking 1-2 milliseconds, I feel like I am doing something wrong (Im pretty new to OpenCL) but I’m just stumped, I was hoping someone could give me some direction or pointers.
clEnqueueWriteBuffer(commands, input,
CL_FALSE, 0, DATA_SIZE,
Image->data(), 0, NULL, NULL);
clEnqueueWriteBuffer(commands, outputSize,
CL_FALSE, 0, sizeof(int),
numResults, 0, NULL, NULL);
//Stride of image Data
clSetKernelArg(kernel, 3, sizeof(unsigned int), &Stride);
clSetKernelArg(kernel, 5, sizeof(unsigned char), & Threshhold);
clSetKernelArg(kernel, 6, sizeof(int), &Height);
ErrorCheck(err, "Error: Failed to set kernel arguments! ");
clFinish(commands);
This particular piece of code is taking .5-4 milliseconds (usually closer to 1) with the exact same sized data every time (a byte array of a 1280X720 Image), which is troubling because the single thread cpu function to process it takes 1 millisecond to do the whole fast algorithm. Am I just not going to be able to match the speed of the CPU processing it? Or am I just passing data around wrong? Id be glad to post any other pieces of code that may be relevant I just didn’t want to flood the thread with my whole function XD