No need to “post complete code”, just the one line of the clCreateBuffer was all I wanted to see, however, I think you have explained much more about your application and your use of OpenCL - thanks. I now understand that you are trying to use both the CPU and GPU which operate on one or more buffers that are shared between them.
Background: The only valid choices of the clCreateBuffer “HOST_PTR” flags are the following combinations (ignoring the “READ/WRITE” flags which are orthogonal):
[ul][li]none[/2ir5vteg][/li][li]CL_MEM_COPY_HOST_PTR[/2ir5vteg][/li][li]CL_MEM_ALLOC_HOST_PTR[/2ir5vteg][/li][li]CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR[/2ir5vteg][/li][li]CL_MEM_USE_HOST_PTR[/*2ir5vteg][/ul][/li]
The use of each combination depends upon a number of factors or strategies. For example here is one (and there are more, if you are interested please ask and I’ll write more).
[ul][li]Your host application has already allocated and computed some data outside of OpenCL, for example, 1M floats, that is, [/li]```
float array[1024*1024]
or
float array = (float)malloc(1024*1024)
I think this might be what you are doing.[/*:m:2ir5vteg]
[li]Now you wish to access it from you OpenCL kernel. So you should issue a clCreateBuffer w/ CL_MEM_USE_HOST_PTR. This call takes a pointer to your existing application data, array. For example, [/li]```
cl_mem buffer = clCreateBuffer(context, CL_MEM_USE_HOST_PTR, 1024*1024, array, &error)
This is preferred when the application already has allocated the data. Any other flag choice causes an allocation. Once created you should assume that the host application NO longer has access to the data, that is, only the OpenCL devices can access the data until you release the buffer.[/*2ir5vteg]
[li]For the CPU device the runtime uses the data directly in the array and invokes the kernel passing a pointer to this data. There should be no need to move the data or make copies of the data. This is what I think you’re trying to accomplish. For example, [/li]```
error = clSetKernelArg(kernel, 0, sizeof(buffer0, buffer)
and
error = clEnqueueTask(cpucommandqueue, kernel, 0, NULL, NULL)
[/*:m:2ir5vteg]
[li]For the GPU however, the runtime MUST transfer the array from the host memory to the device memory and invoke the kernel using the data that is now in the device memory. Naturally this transfer takes time depending upon how much data there is. For example, [/li]```
error = clEnqueueTask(gpucommandqueue, kernel, 0, NULL, NULL)
[/*2ir5vteg]
[li]If another GPU device kernel is enqueued for this data, then the runtime knows that the data is already on the device so no data transfer should happen. For example, [/li]```
error = clEnqueueTask(gpucommandqueue, kernel2, 0, NULL, NULL)
[/*:m:2ir5vteg]
[li]After the GPU device kernel completes execution then the runtime can transfer the data to host memory when requested by either the host application or the CPU device. [/*:m:2ir5vteg][/li][li]If the CPU device kernel needs this data, then the runtime must transfer the data from device memory to host memory, incur the transfer time and invoke the kernel passing a pointer to the data. For example, [/li]```
error = clEnqueueTask(cpucommandqueue, kernel2, 0, NULL, NULL)
[/*2ir5vteg]
[li]If the application needs this data, then the runtime may or may not transfer the data depending if it is or is not in host memory. However, in general, for buffers created with CL_MEM_USE_HOST_PTR it is best to use clEnqueueMapBuffer, because if the data is already in host memory, then there is no need to transfer the data back. For example, [/li]```
void mapaddr = clEnqueueMapBuffer(cpucommandqueue, buffer, CL_TRUE, 0, 0, 10241024, 0, NULL, NULL, &error)
access the data at mapaddr, then
error = clEnqueueUnmapMemObject(cpucommandqueue, buffer, mapaddr, 0, NULL, NULL)
[/*:m:2ir5vteg]
[li]If you are done using OpenCL then release the buffer to regain access to the data. For example, [/li]```
error = clReleaseMemObject(buffer)
[/*2ir5vteg][/ul]
Note: I have not compiled any of this code, so there might be some typos in it.