11-19-2009, 06:56 AM
I have MacOS 10.6.2 (Snow L) and ATI Radeon HD4870. For this card, CL_DEVICE_MAX_WORK_GROUP_SIZE=1024. However, for some reason I cannot use any work group sizes larger than 256, for example if I try to use 512x1x1, clEnqueueNDRangeKernel reports CL_INVALID_WORK_GROUP_SIZE. Any ideas why this can be happening? Can it be something inherent to the ATI Stream?

Here's my code (error handling stripped, since no errors are generated in the middle):

clGetPlatformIDs(max_num_platforms, platforms, &num_platforms);
clGetDeviceIDs( platforms[0], CL_DEVICE_TYPE_GPU, num_entries, devices, &num_devices);
device = devices[0];
cmd_queue = clCreateCommandQueue(context, device, 0, NULL);
program = clCreateProgramWithSource( context, 1, &kernel_str, NULL, NULL);
clBuildProgram(program, 1, &device, NULL, NULL, NULL);
kernel = clCreateKernel(program, "inc", NULL);
cl_mem memobj = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, sizeof(float)*n, srcA, NULL);
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobj );

size_t gws = 512, lws = 512;
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &gws, &lws, 0, NULL, NULL);

A kernel is very simple:

__kernel void inc (__global const double *a) {
int k = get_global_id(0);

11-19-2009, 07:26 AM
You should use the work-group size value returned by clGetKernelWorkGroupInfo(kernel, CL_KERNEL_WORK_GROUP_SIZE, ...). Can you check what this value is?

Note that CL_DEVICE_MAX_WORK_GROUP_SIZE is the max. workgroup size that can be used on device but the max. work-group size value can vary from kernel to kernel depending on resources used by the kernel. You should always use the value returned by clGetKernelWorkGroupInfo.

11-19-2009, 07:47 AM
Thank you, this explains everything. It reports max available work group size = 256 for my kernel.

11-19-2009, 08:46 AM
with a ATI RADEON 5870 i have the same problem (on NT)
MAX WORK ITEM = 1024 but it works only with max size = 256
I thinks it's just a bug in the AMD driver for the moment.