Does the OpenCL spec address essentially the issue of row-major versus column-major ordering for work items in 2D and 3D work ranges? Or is this up to the implementation?
Being a C based language, I’ve assumed that work items are assigned to compute units such that the highest dimension index varies fastest. i.e. for a 2x2 ND Range, the compute units work items in the order [0][0], [0][1], [1][0], [1][1].
This is important because if I want to access elements of a 2D or 3D array in global memory based on a work items global id, I want the array elements to follow the same ordering scheme.
A little experimentation and some code in the NVIDIA OpenCL best practices guide suggests the opposite ordering is natural.
int row = get_global_id(1);
int col = get_global_id(0);
c[row*N+col] = sum;
But could another vendor make a different choice?