This is yet another query about global workspace.
Basically, i’m getting a graphics card crash, when i specify a large number of threads in the global workspace.
When i specify a 1-D global workspace, i have noticed that in my case, i can specify a number up to 2^32. Doing so, the kernel runs absolutely fine. It takes a while to get through the number of threads, but all is good. If i do (2^32) +1 threads, then i get a crash (which i would expect).
My problem is when i have a 3-D global workspace, where the number of threads is still very large, but is in fact quite a bit less than 2^32.
For example, specifying workspace as: global(837, 1098, 352) and local(1,1,1)
will cause a crash, with the error of an invalid command queue.
But, (837 * 1098 * 352) < 2^32…
I have tried removing all code from the kernel, and still get a crash when specifying this size of global workspace.
My max work item sizes is [1024, 1024, 64].
I have tried using [1050, 1050, 100] and this works fine, but say [1050, 1050, 500] will not.
Any ideas?