What's the best way to pass an arbitrary length array of floats to a kernel? (say between 2 and 200 values)
The array is the same for all work items, so I want to compute them on the host and then pass the data as
an argument to the kernel. I tried various things like this:

__kernel void xblur(__global const float *source,
__global float *dest,
__constant float *weights,
const int radius) {

but this gives a CL_INVALID_ARG_SIZE from clSetKernelArg when I try to pass the weights arg to it.
trying "__local float *weights" gives the same error.
I'm setting up the arg like this: clSetKernelArg(kernel, 2, sizeof(float)*16, weights);
where weights is a float*

I suppose I could allocate a global memory buffer, copy the weights to that, and then pass that to the kernel,
but I think it should be faster and easier using constant memory somehow (?)
The values will vary per kernel call, so I can't hard-code them into the kernel code as a constant array.