I’m confused about how to load read-only array values into the device constant memory space versus global memory space. The moderately sized array nodes_x in the below kernel meets all the criteria such that some hardware should perform better if the array is loaded to constant memory instead of global memory.
__kernel void sequenceKernel_00( constant const float * nodes_x, global float * const restrict density_n)
{ /* etc. */ }
My code already works with the first nodes_x argument being declared global. I have been passing the flag CL_MEM_READ_ONLY to clCreateBuffer().
I’m surprised that clCreateBuffer() does not take an additional argument that specifies the desired memory space.
So, it seems insufficient to me, that all I have to change to move nodes_x array into constant memory is the kernel argument declaration from global to constant.
As an experiment, I changed the flag for my nodes_x buffer creation to CL_MEM_WRITE_ONLY. Nevertheless, clSetKernelArg() still reports CL_SUCCESS when setting the constant nodes_x argument to that buffer. This is clearly wrong and makes me think I’m missing something.
Since cl_buffers are created independent of a particular kernel, and clEnqueueWriteBuffer() can be called before clSetKernelArg(), I don’t see how any late-binding can be in play
(running Mac OS X 10.8 and an NVIDIA Tesla GPU)