Hello all, I’m a computer science student at Edinboro University of PA working on a senior project and I’m running out of time to debug this.
I am struggling to understand the work group sizes, global size, local size, and all that.
The situation is that I have an 800800 image I need to work on. Originally I tried to set my global work size in clEnqueueNDRangeKernel to 800800. This caused display driver crashes. So I found out about max kernel work group sizes. So I tried en queuing over and over for the amount of iterations I needed to based on work group size. This worked, but it was very very slow.
I am currently trying to set global items to my iterations and local items to my workgroup size and I’m having problems.
How does one divide an 800*800 project up so that I can do one enqueue? Multiple enqueues result in very bad performance negating the reason for using openCL.
Here is my code to grab max work group sizes based on the kernel.
error = clGetKernelWorkGroupInfo(createImageKernelCL, platformDevices[0]->deviceIds[0], CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &maxWorkgroupSize, NULL);
Here is my enqueuing code.
unsigned int iterations = (xyPixels * xyPixels) / (unsigned int)maxWorkgroupSize;
size_t leftOverWorkgroupSize = (xyPixels * xyPixels) % (unsigned int)maxWorkgroupSize;
for(unsigned int i = 0 ; i < iterations ; i++)
{
globalIDOffset = i * maxWorkgroupSize;
clEnqueueNDRangeKernel(commandQueueIds[0], createImageKernelCL, 1, &globalIDOffset, &maxWorkgroupSize, NULL, 0, NULL, NULL);
}
if(leftOverWorkgroupSize)
{
globalIDOffset = iterations * maxWorkgroupSize;
clEnqueueNDRangeKernel(commandQueueIds[0], createImageKernelCL, 1, &globalIDOffset, &leftOverWorkgroupSize, NULL, 0, NULL, NULL);
}
This is very slow, please help a noob understand these sizes and how to break up problems.