3-D vs 1-D Global worksapce

This is yet another query about global workspace.

Basically, i’m getting a graphics card crash, when i specify a large number of threads in the global workspace.
When i specify a 1-D global workspace, i have noticed that in my case, i can specify a number up to 2^32. Doing so, the kernel runs absolutely fine. It takes a while to get through the number of threads, but all is good. If i do (2^32) +1 threads, then i get a crash (which i would expect).

My problem is when i have a 3-D global workspace, where the number of threads is still very large, but is in fact quite a bit less than 2^32.
For example, specifying workspace as: global(837, 1098, 352) and local(1,1,1)
will cause a crash, with the error of an invalid command queue.
But, (837 * 1098 * 352) < 2^32…

I have tried removing all code from the kernel, and still get a crash when specifying this size of global workspace.

My max work item sizes is [1024, 1024, 64].
I have tried using [1050, 1050, 100] and this works fine, but say [1050, 1050, 500] will not.

Any ideas?

This seems like a bug in the OpenCL implementation you are running on. Suggest you contact the vendor and file a bug.

It did sound like it to me, but i was hoping for it not to be. Thanks for your reply.

(Using OpenCL 1.1, NVIDIA Quadro 2000, on version 311.15 drivers)

You put the reason right in your message. Your device only accepts maximum dimensions of [1024, 1024, 64], yet you are passing [837, 1098, 352]. Since 352 > 64, you are asking for something the device cannot do.

Furthermore, you are settings a local work group size of [1,1,1] which means your GPU is mostly idle, running over a quarter million work items on a single GPU core. That’s not going to be very fast.