Tool for viewing threads' information

A few topic in this forum and CUDA forum discuss about CPU busy wait after clEnqueueNDRangeKernel call. People said the busy wait happen occurs on some devices. Someone said OpenCL creates a background thread that spins to wait for gpu to return. I’m curious if my program also has this busy wait. However, I don’t know how to get this information. What tool should I use to view threads’ information?

I tried to use gdb (info threads) right after the call, but info threads command doesn’t print until the kernel finishes running (noticed by very long waiting time before it prints out the information). Therefore, by the time it prints out, the background tread will be gone already.

Another approach that I tried is using gdb’s single instruction step and finding out if it calls pthread_create, but I don’t see anything that creates new thread (if I don’t miss anything).

You could use a profiler like Instruments.