clEnqueueNDRangeKernel returns -6 CL_OUT_OF_HOST_MEMORY

clEnqueueNDRangeKernel returns -6 CL_OUT_OF_HOST_MEMORY

I have already found out that this error code may signify a lack any host resource, not necessary memory (as the error code suggests, which is only for legacy reasons).

How do I find which resource?

Is there any way to turn on some extended debug output so that the actual reason be exactly diagnosed? Any tweak to turn that on?

When I comment some of the kernel code more or less arbitrarily, the clEnqueueNDRangeKernel succeeds. Which make me think the problem has something to do with the kernel size. Information on the max kernel size is scarce, but I recall it is 2000000 instructions and my kernel is way way below this limit.

Any suggestions?

Platform: GeForce 570, driver 301.42, Windows 7 64 bit, NVIDIA GPU Computing SDK 4.2

Alternatively, is there some OpenCL developer runtime with abundance of checks and verbose messages?

Well it appears it has something to do with the amount of per-thread local memory the kernel uses. The card declares compute capability 2.0, which means 512K of local memory and so the compiler happily allows for as much as 100K to be used.

But either the card does not conform to CC2.0, or the driver/openCL impl/whatever-on-the-host has a bug, but if the kernel uses about 44K of local memory or more, clEnqueueNDRangeKernel starts returning CL_OUT_OF_HOST_MEMORY. The kernel in this particular case is a test kernel, with little more than an arrray declared and some simple stuff done on it. The difference between working kernel and failing kernel is the size of the array only.

BTW, is 512K of local per-thread memory a must to conform to CC2.0, or is it just a “recommendation” which manfacturers of cards do not have to follow?
If the latter is the case, how do I find out (or query) the actual amount of per-thread local memory available?

The actual card is Palit GeForce GTX 570 Sonic Platinum.