Hi all,
I am studying how to create an interactive connection between a CPU and a descrete CPU on OpenCL.
I create a buffer(unsigned long) with CL_MEM_USE_HOST_PTR flag, and then I pass the buffer to be a kernel argument.
I am sure the buffer is pinned in host memory by the report of the CodeXL.
The kernel code for GPU is an infinite loop to probe the buffer, if the buffer is not zero, the loop will break.
After issue the kernel code, the CPU-side program waits for a while and then changes the buffer to 1.
I expect that the GPU could notice the buffer is not zero and then exits the infinite loop.
However, the result is the loop cannot be terminated.
I doubt it is because the GPU has cached the content of the buffer and the vaule is still zero.
Will the hardware or the dirver keep cache coherence?
If the answer is no, can I flush the cache of GPU?
the partial code
main thread
// Create a command queue
cl_command_queue command_queue = clCreateCommandQueue(context, device_id,
CL_QUEUE_PROFILING_ENABLE ,
&ret);
cl_command_queue memop_queue = clCreateCommandQueue(context, device_id,
CL_QUEUE_PROFILING_ENABLE ,
&ret);
.....
char *buf_flag = malloc(sizeof(unsigned long));
cl_mem flags=clCreateBuffer(context,CL_MEM_READ_WRITE |CL_MEM_USE_HOST_PTR, sizeof(buf_flag) ,buf_flag, &ret);
...
ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&flags);
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL,
&global_item_size, &local_item_size, 0, NULL,
&ndrEvt);
...
while(1){
*buf_flag=1;
while(buf_flag);
printf("kernel ack
");
}
===
Kernel part
__kernel void test_host(__global unsigned long *flags) {
while(!flags[0]);
do something
flags[0]=0;
}