completion of the kernel to retrieve buffer from device

Hi,

As I understand, clEnqueueNDRangeKernel(), enqueues the kernel and returns. If the kernel writes to a memory that I have to retrieve, I will have to wait for the kernel to complete before I can copy the data from the device memory to the host memory. So will one come to know if the kernel hash finished execution. Will clEnqueueReadBuffer() wait till the kernel, which is using the buffer object that it(clEnqueueReadBuffer() ) is trying to retrieve, to finish execution, before retrieving the buffer object?

One of the ways you could accomplish that is by using what OpenCL defines as an “event”. By placing your work in the command queue with help of clEnqueueNDRangeKernel, you may provide a pointer to an event variable (last argument -> see documentation) that will be updated upon completion of the work. Following the NDRangeKernel you may wait for the execution to complete by using clWaitForEvents(1,&eventp)… then proceed with your buffer read/copy.

An easier way is just to set blocking to CL_TRUE in clEnqueueRead(). As long as you are running on an in-order queue this will work fine. Remember that to get best performance you want to make sure you queue up enough work to keep the device busy. If you have to synchronize with your application by waiting for each kernel to finish you’ll have periods of time where the device is idle. You should consider double-buffering strategies to get around this.