Cannot enqueue> ~65k threads

When attempting to enqueue a kernel with the following statement:
(Where SIZE = 256, and queue is of type cl::CommandQueue as defined in the OpenCL C++ bindings)

queue.enqueueNDRangeKernel(
kernel, 
cl::NullRange,
cl::NDRange((SIZE)*(SIZE)),
cl::NDRange(1, 1), 
NULL, 
&event); 

An error defined as “CL_Invalid_Value” (-30) is returned. This problem does not occur if SIZE < 256 (where SIZE is a positive integer).

Q: What/who is responsible for this error? Why can >65k threads not be created? Is this a limitation of the NVIDIA OpenCL runtime or the result of some form of EBCAK (Error between chair and keyboard)?
Thanks in advance.

You should always be able to enqueue a global size of 65k as long as your local size is an even divisor of that. My guess is that with a local size of 1 and a global size of 65k you’re running into a bug in the implementation here. Try setting the local size to 256 (or NULL) and see if it works. If it does, you should file a bug with the vendor.

Changing the local size to a non-1 divisor of the global size resolved the problem - it seems that you are correct about it being a driver issue. A bug report has been field with NVIDIA. Thank you for your assistance in this matter of grave importance.