I'm wanting to read an arbitrary element from a float16. The kernel code below using array subscript syntax "weights[i]" works on Apple's OpenCL implementation, however it errors on Nvidia's Linux implementation saying "subscripted value is not an array, pointer, or vector"
Not sure if this is valid OpenCL syntax, or if Apple just happens to support it (?)
Code :
__kernel void convolve_x16(__global const float *source,
               __global float *dest,
               const int2 image_dims,
               const float16 weights) {
  int x = get_global_id(0);
  int y = get_global_id(1);
  if ((x < image_dims.x) && (y < image_dims.y)) {
    int index = y * image_dims.x + x;
    float d = .0f;
    float wsum = .0f;
    float w;
    int imax = min(16, image_dims.x - x);
    for (int i = 0; i < imax; i++) {
      w = weights[i];  // <---- this is the syntax in question
      d += w * source[index + i];
      wsum = w;
    dest[index] = d / w;

My question: is there another way to index into a float16 that is correct OpenCL syntax? (I don't want to use weights.s0 etc and unroll the loop.)

If not, is there another way to pass an arbitrary sized chunk of constant memory as a kernel argument (such as an array 16 floats) that can be indexed like this? I know I could create a read-only global memory buffer, fill it using clEnqueueWriteImage, and then pass that in as a float* kernel argument, but I'm wondering if I'm missing a simpler way (more like the float16) ?