Specify the number of compute units for execution of kernel.

Hello,
My GPU has 10 compute units.But using all of them together makes the system unresponsive during execution.But after execution evrything becomes normal.So I want to use 5 compute units so that my system remains responsive during execution of kernel.Therfore how do I specify the number of compute units to use??

Try submitting less work each time so that overall the machine is still responsive. For example, instead of executing 100,000 work-groups in one call to clEnqueueNDRangeKernel(), make ten calls and run only 10,000 work-groups in each of them. The “global_work_offset” parameter of clEnqueueNDRangeKernel() should come handy.

The method above will work well even on older hardware.

Thanks for your help.I understand what you are telling but can you give me an example how to use global_work_offset in clEnqueueNDRangeKernel().suppose I have 100 work items and I want to split them in two halves 0-49 and 50-99.Assume work_dim=1.

You can try something like it:

int workSize = 100;
int globalWorkSize = 50;
int passes = 2; // this value is obvious in this example
size_t globalWorkSize[1] = {globalWorkSize};
size_t globalWorkOffset[1] = {0};

for(int i=0; i<passes; i++)
{
clEnqueueNDRangeKernel(GPUCommandQueue, OpenCL, 1, globalWorkOffset, globalWorkSize, NULL, 0, NULL, NULL); 
// read results by clEnqueueReadBuffer() with blocking set to CL_TRUE
globalWorkOffset[0]+=globalWorkSize[1];
}

Hope code is OK :wink:

Of course take a look at:
http://www.khronos.org/registry/cl/sdk/ … ernel.html
http://www.khronos.org/registry/cl/sdk/ … uffer.html

Yeah, something like what ilektrik suggests. If I may make a couple of little changes,


int globalWorkSize = 50;
int passes = 2; // this value is obvious in this example
size_t globalWorkSize = globalWorkSize;
size_t globalWorkOffset = 0;

for(int i=0; i<passes; i++)
{
clEnqueueNDRangeKernel(GPUCommandQueue, OpenCL, 1, &globalWorkOffset, &globalWorkSize, NULL, 0, NULL, NULL); 
globalWorkOffset+=globalWorkSize;
}
// Here you can read results by clEnqueueReadBuffer()
// with blocking set to CL_TRUE

Just make sure that your kernel source calls get_global_offset(0) to know which portion of the computation to execute since get_global_size(0) will now return values from 0 to 50 instead of from 0 to 100.

Thanks a lot guys.Technique works great. :smiley: