Reverse (.rev) Vector Components

I ran into a case where it would be really useful to reverse the vector components. To create fully compliant BLAS functions, one must account for when the steps between elements (e.g., incx), are negative. Since hardware vendors may not support implicitly vectorize, elements may need to be explicitly packed into vector types.

To treat negative and non-unit increments, I copy the relevent elements from low-to-high global memory addresses to low-to-high local memory. Then I do a vload from local memory into private memory, where I currently shuffle (if needed), compute, and shuffle (again if needed) before doing a vstore to local memory, and then back into global memory.

Since the OpenCL specification already includes .hi, .lo, .even, .odd, I think .rev would be a natural addition. Of course I can continue to just use the built-in shuffle function, but then I need to create a reverse mask for each vector length. I think .hi, .lo, .even, and .odd being already in the spec. makes a reasonable argument to include .rev as well.

Instead of using ‘shufle()’ you can also just do something like:


float4 a = { 1.0f, 2.0f, 3.0f, 4.0f };
float4 b;

b = a.wzyx;  //reverses a

If that doesn’t look explicit enough, just write a macro to do the same.

Reversing built-in vector types using shuffles and swizzles are explicit, but not readily portable for different vector lengths because function overloading in not included in the OpenCL C kernel language. Of course it’s not too difficult manually reverse them, but it seems like it would be a simple thing to add to the specification to make things easier/faster and less prone to error when dealing with swizzles.