Debugging in OpenCL

I am trying to write a ray tracer for a school research project and I am running into some strange issues with my code.

Normally I would be fairly confident I could figure out what is going wrong (I already have the algorithm working in C#), but I can not figure out how to debug the OpenCL code.

I thought about trying ATI’s stream SDK with GDB, but I am using Image2d/Image3d textures that AMD does not appear to support on CPUs. I have heard very good reports about gDEBugger CL and applied to their free beta program, but I have not heard back from them yet and have no idea what the turnaround time on that might be.

Any suggestions on how to go about debugging OpenCL programs/kernels?

(For what it is worth, my code is currently implemented in C# with the Cloo OpenCL wrapper library. I could probably port it to C++ if that would make debugging easier.)

At present my approach is to start by using a CPU device and call printf to output work-item state so I can see what is going on. Usually I try to isolate a single work-item at a time so that I get a manageable amount of output. This can include doing many enqueues with one work-group each, instead of one massive one. Once I have it working on the CPU device, I switch it to the GPU device and then am careful to make incremental changes that I can verify individually. Sometimes the switch to GPU brings unexpected crashes where the CPU worked – I’ve seen this happen when my code makes poor assumptions about alignment in memory, order of execution, and reading/writing out of bounds of a buffer.

If you really need to get output from your GPU code to figure out what is going on, considering passing in an extra buffer that you can write debug info to.

As products like gDEBugger CL become available and the vendor drivers & tools mature, the debugging situation should improve significantly.