how to pass an arbirary length constant array to a kernel?

What’s the best way to pass an arbitrary length array of floats to a kernel? (say between 2 and 200 values)
The array is the same for all work items, so I want to compute them on the host and then pass the data as
an argument to the kernel. I tried various things like this:

__kernel void xblur(__global const float *source,
__global float *dest,
__constant float *weights,
const int radius) {

}

but this gives a CL_INVALID_ARG_SIZE from clSetKernelArg when I try to pass the weights arg to it.
trying “__local float *weights” gives the same error.
I’m setting up the arg like this: clSetKernelArg(kernel, 2, sizeof(float)16, weights);
where weights is a float

I suppose I could allocate a global memory buffer, copy the weights to that, and then pass that to the kernel,
but I think it should be faster and easier using constant memory somehow (?)
The values will vary per kernel call, so I can’t hard-code them into the kernel code as a constant array.

Thanks!

That’s exactly what you need to do. You have to create a buffer (ie cl_mem object) and pass that as an argument to your kernel. On GPUs constant memory normally resides in global memory, as far as i know, but unlike “standard” global memory, constant memory can be cached on-chip.

Ok, thanks! Any more specific tips for how to correctly create and use constant memory?

It works if I use a regular global buffer as follows:
cl_mem weights = clCreateBuffer(ocl_context, CL_MEM_READ_ONLY, size, NULL, NULL);
clEnqueueWriteBuffer(ocl_command_queue, weights, CL_TRUE, 0, size, host_weights_ptr, 0, NULL, NULL);

__kernel void my_kernel(__global float *dest,
__global float *source,
__global float *weights,
int n_weights) {
}

However… when I change the kernel argument above from
__global float *weights to
__constant float *weights
I get different and incorrect results.

Is there a more correct way to specify the constant memory buffer?

It looks like you’re doing it the right way…

How big is your buffer? Constant memory is usually limited to a few kilobytes (you can check it by querying CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE)

Did you check for errors?