How to read elements from a float16 (or other) using index?

ksi · April 27, 2011, 12:05pm

I’m wanting to read an arbitrary element from a float16. The kernel code below using array subscript syntax “weights[i]” works on Apple’s OpenCL implementation, however it errors on Nvidia’s Linux implementation saying “subscripted value is not an array, pointer, or vector”
Not sure if this is valid OpenCL syntax, or if Apple just happens to support it (?)


__kernel void convolve_x16(__global const float *source,
               __global float *dest,
               const int2 image_dims,
               const float16 weights) {
  int x = get_global_id(0);
  int y = get_global_id(1);
  if ((x < image_dims.x) && (y < image_dims.y)) {
    int index = y * image_dims.x + x;
    float d = .0f;
    float wsum = .0f;
    float w;
    int imax = min(16, image_dims.x - x);
    for (int i = 0; i < imax; i++) {
      w = weights[i];  // <---- this is the syntax in question
      d += w * source[index + i];
      wsum = w;
    }
    dest[index] = d / w;
  }
}

My question: is there another way to index into a float16 that is correct OpenCL syntax? (I don’t want to use weights.s0 etc and unroll the loop.)

If not, is there another way to pass an arbitrary sized chunk of constant memory as a kernel argument (such as an array 16 floats) that can be indexed like this? I know I could create a read-only global memory buffer, fill it using clEnqueueWriteImage, and then pass that in as a float* kernel argument, but I’m wondering if I’m missing a simpler way (more like the float16) ?

Thanks!

david.garcia · April 27, 2011, 2:16pm

My question: is there another way to index into a float16 that is correct OpenCL syntax? (I don’t want to use weights.s0 etc and unroll the loop.)

I thought that accessing vector elements using array indexing notation as your example was legal since CL 1.1. but my memory must be failing because I don’t find that in the spec. Can a language lawyer clarify this?

There are ways to work around that, though. You can use either a cast or a union. I would use a cast for convenience:


float *sweights = (float*)&weights;
sweights[i];

ibbles · April 27, 2011, 11:49pm

When writing ordinary host code, at least with gcc, then accessing data of some type through a pointer of another type gives warnings about type-punned pointers and strict aliasing rules. Is the same true for OpenCL kernel code? Or in other worlds, could


float *sweights = (float*)&weights;
// Do a bunch of stuff to float16 weights.
// Dereference float* sweights.

produce unexpected results if weights wasn’t const and could be written to?

david.garcia · April 28, 2011, 5:03am

Sorry, I didn’t notice that it was const. Also, you are correct about strict aliasing. My bad!

There’s yet another workaround: use a union like this:


union
{
    float     s[16];
    float16 v;
} uweights;

...

uweights.v = weights;

// Here you can read from uweights.s[i] safely thanks to section 6.2.4.1
// Notice that this is a nice property of OpenCL C that does not exist in C99

That said, I will ask around because it feels a bit silly that we don’t allow reading from vector elements using array subscripting syntax.