I want to apply a transform to a bunch of points. I see OpenCL has floatnxm but I can't find a mention of any function that takes this data type as argument. Furthermore I'm using ATI Stream SDK and declaring data type float4x4 myMatrix; gives an error "identifier undefined". I don't know if I'm using it wrong or if ATI doesn't support this - even though I don't see this type defined as optional.

So are there any built in ways to do affine transform? If I have to write my own, what's a good way to load this matrix into local memory for all threads? i.e. maybe there's a way to load the matrix for the work group, rather than each thread having to parse the float* argument into a data structure before doing the transform.