specify work group sizes

One can specify work group sizes with kernel attribute reqd_work_group_size and/or with the local_work_size argument of clEnqueueNDRangeKernel(). If reqd_work_group_size is specified, then local_work_size must match it. I wonder what the rationale is behind it. Why can not local_work_size be NULL?

The intent is that if reqd_work_group_size is specified, then you cannot pass a different local_work_size value. NULL is OK as it will use the values specified with reqd_work_group_size. You cannot specify a non-NULL local_work_size which with values that are not the same as those specified with reqd_work_group_size.

Actually the NULL value won’t work, at least with NVIDIA’S implementation, which is also implied by the OpenCL specification 1.0.

I need to correct my previous post. A NULL value cannot be used when reqd_work_group_size is specified. local_work_size must be the values specified with reqd_work_group_size. The reason for this is that the global_work_size must be a multiple of the local_work_size. In this case it must be a multiple of the reqd_work_group_size. The OpenCL implementation may have optimized the program executable (and kernel code) for this work-group size. Therefore a NULL value or any other value other than reqd_work_group_size cannot be used.

It makes some sense. Alternatively, the standard could allow the kernel to be queued even with local_work_size=NULL as long as global_work_size is a multiple of reqd_work_group_size and makes clEnqueueNDRangeKernel() fail otherwise.

Agree with duanmu.