Infinite loop invalidating command queue

Hello all,
As mentioned in a few recent posts I’m working on a prime factoring algorithm for very large numbers in OpenCL. I’ve recently run into a problem that I believe is being caused by OpenCL incorrectly identifying an infinite loop.

I have a loop that should correctly execute exactly 29,936,901 times for the case that I have found that makes it fail. Essentially the loop is starting with a variable set 3 and counting up, by one, to the square root of a larger number (in this case the sqrt happens to be 29,936,904) before exiting the loop and writing some variable to a buffer. I get an INVALID_COMMAND_QUEUE error when reading from the buffers if I count up by one. However if I count up by any number >= 3 then there is no problem with the command queue and variables are set correctly.

Does anyone know if there is a limit to the number of times a loop may be run? Based on my research this is due to TDR (timeout detection and recovery) on the GPU… but it doesn’t seem that linux does that (I’m trying this on Slackware 13.37). Any other ideas?

As far as I know there is no standardized TDR in X11, but individual vendors might still implement it. For example, this is from the README in NVIDIA drivers:

Option “Interactive” “boolean”

This option controls the behavior of the driver's watchdog, which attempts
to detect and terminate GPU programs that get stuck, in order to ensure
that the GPU remains available for other processes. GPU compute
applications, however, often have long-running GPU programs, and killing
them would be undesirable. If you are using GPU compute applications and
they are getting prematurely terminated, try turning this option off.

Default: on. The driver will attempt to detect and terminate GPU programs
that cause excessive delays for other processes using the GPU.

Thanks for the help, that did the trick!