Timeouts for GPU kernels

codedivine · October 10, 2012, 5:33am

When you run a kernel on the GPU, sometimes the kernel ends up running for a very long amount of time. Depending upon the vendor and OS combination, it can lead to anything from application crashes to system freezes. This is obviously very undesirable.

Consider how an API like C++ AMP deals with this. C++ AMP kernels do not lead to system hangs. If the kernel runs for a long time, then the system terminates it and an exception is thrown which can be caught by the application and responded to. Even if the developer is sloppy and does not handle the exception, it will only lead to application crash, not the crashing of the whole system.

Thus, I want a well defined way to enforce some kind of maximum timeouts on GPU kernels and well defined error codes for the case where the timeout does occur.

The current behaviour is completely unacceptable.

thanks,
Rahul Garg
PhD student (CS), McGill University

codedivine · October 10, 2012, 12:36pm

As far as I can tell, some big ISVs have faced the GPU timeout issue as well.
For example, you can check Adobe’s presentation at the SIGGRAPH 2012 OpenCL BOF presentation. Link: http://www.khronos.org/assets/uploads/d … _Aug12.pdf

See page 17. They mention “Win/Mac timeout issues on low-end cards” as one of the challenges they faced.

system · January 5, 2013, 10:34am

We need a flag initializing the OpenCL context to indicate we want explicitly disable the watchdog without having to touch the Windows’s registry / affecting the GPU drawing APIs.