I am using Opencl 1.2 developing a program. I have a problem of the synchronization between the host (CPU) and my kernel. In my program, the host does some calculations on a global array, which is shared with the kernel. The kernel should be started after the host finishes its operations on the array, and passes it to the kernel, so I use an user event and clSetUserEventStatus() to control the process. However, it does not work. My code is below:
//initializing code
…
a=(cl_float*)clEnqueueMapBuffer(cmdQueue,buffer_a,CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,0,sizeof(cl_float)*N,0,NULL,NULL,NULL);
err=clSetKernelArg(test,0,sizeof(cl_mem),&buffer_a);
cl_event userEvt=clCreateUserEvent(context,&err);
for(int x=0;x<N;x++)
*(a+x)=x;
clSetUserEventStatus(userEvt,CL_COMPLETE);
err=clEnqueueNDRangeKernel(cmdQueue,test,2,NULL,globalWorkSize,localWorkSize,1,&userEvt,&fEvt);
clWaitForEvents(1,&fEvt);
//the kernel
const char*test[] = {
"__kernel void test (__global float * a, __global float *d ,__global float * out)
"
" {
"
//code…
"}
"
};
In this program, the value of array “a” load into the kernel “test” is wrong. It seems the kernel started before the host finishing the operation on array “a”. However, if I add a sleep() after the for loop, the value of array “a” load into the kernel is correct. The modified code is below:
//initializing the program
…
a=(cl_float*)clEnqueueMapBuffer(cmdQueue,buffer_a,CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,0,sizeof(cl_float)*N,0,NULL,NULL,NULL);
err=clSetKernelArg(test,0,sizeof(cl_mem),&buffer_a);
cl_event userEvt=clCreateUserEvent(context,&err);
for(int x=0;x<8;x++)
*(a+x)=x;
Sleep(30);
clSetUserEventStatus(userEvt,CL_COMPLETE);
err=clEnqueueNDRangeKernel(cmdQueue,test,2,NULL,globalWorkSize,localWorkSize,1,&userEvt,&fEvt);
clWaitForEvents(1,&fEvt);
I also tried deleting the clSetUserEventStatus(userEvt,CL_COMPLETE) function, and it leads the program running into a dead lock state, waiting forever for the userEvt completed to start the test kernel to pass the clWaitForEvents(). I was confused. It seems the host run clSetUserEventStatus(userEvt,CL_COMPLETE) before it finished the for loop, though the sequence of them does not suggest this result. Could anyone please tell me what is wrong with my code?
I was wondering how to synchronize the host and the kernel, and how to force a kernel started after a certain point in the host. I would be grateful if anyone could help me figure this out?
Many thanks!
Tan