What's the best way to pass an arbitrary length array of floats to a kernel? (say between 2 and 200 values)

The array is the same for all work items, so I want to compute them on the host and then pass the data as

an argument to the kernel. I tried various things like this:

__kernel void xblur(__global const float *source,

__global float *dest,

__constant float *weights,

const int radius) {

...

}

but this gives a CL_INVALID_ARG_SIZE from clSetKernelArg when I try to pass the weights arg to it.

trying "__local float *weights" gives the same error.

I'm setting up the arg like this: clSetKernelArg(kernel, 2, sizeof(float)*16, weights);

where weights is a float*

I suppose I could allocate a global memory buffer, copy the weights to that, and then pass that to the kernel,

but I think it should be faster and easier using constant memory somehow (?)

The values will vary per kernel call, so I can't hard-code them into the kernel code as a constant array.

Thanks!