What’s the best way to pass an arbitrary length array of floats to a kernel? (say between 2 and 200 values)
The array is the same for all work items, so I want to compute them on the host and then pass the data as
an argument to the kernel. I tried various things like this:
but this gives a CL_INVALID_ARG_SIZE from clSetKernelArg when I try to pass the weights arg to it.
trying “__local float *weights” gives the same error.
I’m setting up the arg like this: clSetKernelArg(kernel, 2, sizeof(float)16, weights);
where weights is a float
I suppose I could allocate a global memory buffer, copy the weights to that, and then pass that to the kernel,
but I think it should be faster and easier using constant memory somehow (?)
The values will vary per kernel call, so I can’t hard-code them into the kernel code as a constant array.
That’s exactly what you need to do. You have to create a buffer (ie cl_mem object) and pass that as an argument to your kernel. On GPUs constant memory normally resides in global memory, as far as i know, but unlike “standard” global memory, constant memory can be cached on-chip.
Ok, thanks! Any more specific tips for how to correctly create and use constant memory?
It works if I use a regular global buffer as follows:
cl_mem weights = clCreateBuffer(ocl_context, CL_MEM_READ_ONLY, size, NULL, NULL);
clEnqueueWriteBuffer(ocl_command_queue, weights, CL_TRUE, 0, size, host_weights_ptr, 0, NULL, NULL);