busy wait when executing kernel

I am trying to let the host continue to work while the GPU is busy executing a kernel, but the CPU simply waits until the kernel has been executed. I haven’t blocked the kernel execution in any way.
I found a similar remark on http://forums.nvidia.com/index.php?showtopic=201041 .

Does this mean that currently, with my NVIDIA Quadro FX 2700M, using OpenCL and Visual Studio 2008, there is no way to have my CPU and GPU to be running at the same time? That would contradict the whole purpose of OpenCL…

I am trying to let the host continue to work while the GPU is busy executing a kernel, but the CPU simply waits until the kernel has been executed. I haven’t blocked the kernel execution in any way.

Can you show us the API calls you made? Or alternatively, can you write a small program that reproduces the issue? We don’t have enough information to give a meaningful answer.

Here’s the part of the code that matters:

host_timing();
ciErr1 = clEnqueueNDRangeKernel(cqCommandQueue, gpu_satsolver, 2, NULL, szGlobal, szLocal, 0, NULL, NULL);
host_timing();
host_timing();

Host_timing simply prints the time (in sec) since the program started.
The result is:

Current time: 11.874000
Current time: 11.874000
Current time: 66.129000

So the CPU still executes one print command after is has enqueued the kernel, and afterwards waits for it to finish.

I’m sorry but that’s still not enough information to help you. In the code you posted there are two consecutive calls to host_timing() and yet 50+ seconds elapse between them. Something is missing.

That’s exactly the point, nothing is missing. Those lines of code appear in this way in the program.
The first call to host_timing() is executed before the kernel is enqueued, then the kernel is enqueued and the function clEnqueueNDRangeKernel() returns. Then the second call to host_timing() is executed. Then, apparantly, the CPU waits for the GPU to finish the execution of the kernel (which takes abiout 50 seconds), and only after that executers the third call to host_timing().

The fact that the second call to host_timing is executed before the kernel has finished, shows that there’s no block going on. It seems that the CPU is kept busy, waiting for the GPU to finish.