Best way to pass a set of 2D arrays

For the code I’m trying to write, I plan on having each workgroup manipulate a set of 2D arrays. What is generally considered the easiest way to code this?

Right now, when I’m trying is flattening a 3D matrix, then addressing using some scheme like: matrix[(numGroupslocalSizewgroup)+localSize*x+y]

I’d like to copy each matrix in a local matrix per workGroup work on them from there.

Am I headed in the right direction? Are there any good examples of how to do this?

Here’s some example host code:


    size_t global_worksize=N*N*computeUnits;
    size_t local_worksize=N*N;
    error=clEnqueueNDRangeKernel(cq, k_matTest, 1, NULL, &global_worksize, &local_worksize, 0, NULL, NULL);
    error=clFinish(cq);
    error=clEnqueueReadBuffer(cq, mem, CL_FALSE, 0, global_worksize*sizeof(int), matrix, 0, NULL, NULL);

N is the size of the matrix (it’s square) and computeUnits are the number of work groups I want (for now, it’s CL_DEVICE_MAX_COMPUTE_UNITS)

and my kernel code is:



__kernel void matrixTest( __global int *matrix) {
    const size_t numGroups = get_num_groups(0);
    const size_t localSize = get_local_size(0);

    size_t wgroup = get_group_id(0);
    size_t x = get_local_id(0);
    size_t y = 0;

    matrix[(numGroups*localSize*wgroup)+localSize*x+y]++;
}

I initialize the matrix to all zeros. I then expect each matrix to have a “1” in the first column of each row… but I don’t get that. In my kernel, I’m expecting localSize=N - is that correct?

I’m really knew to this, so I apologize if it’s totally wrong.

Thanks!

I’ve modified my testing kernel to code to verify that some assumptions I’ve made are correct.


__kernel void matrixTest( __global int *matrix) {
    const size_t numGroups = get_num_groups(0);
    const size_t localSize = (int)sqrt((float)get_local_size(0));

    size_t wgroup = get_group_id(0);
    size_t x = get_local_id(0);
    size_t y = 0;

    matrix[(numGroups*localSize*wgroup)+localSize*x+y]=x;
    matrix[(numGroups*localSize*wgroup)+localSize*0+(localSize-1)]=wgroup;
    matrix[(numGroups*localSize*wgroup)+localSize*(localSize-1)+(localSize-1)]=numGroups;
}

Right now, I’m using N=5, so 5x5 matrices. The above should should output 0-4 in the first column and then the workgroup number in the top right corner and total number of workgroups in the bottom right.

Which I get for the first few matrices, but all of a sudden I start to get garbage output.


Workgroup 0:
----
 0 0 0 0 0
 1 0 0 0 0
 2 0 0 0 0
 3 0 0 0 0
 4 0 0 0 14
Workgroup 1:
----
 0 0 0 0 1
 1 0 0 0 0
 2 0 0 0 0
 3 0 0 0 0
 4 0 0 0 14
Workgroup 2:
----
 14 0 0 0 2
 1 0 0 0 0
 2 0 0 0 0
 3 0 0 0 0
 4 0 0 0 14
Workgroup 3:
----
 14 0 0 0 3
 15 0 0 0 0
 16 0 0 0 0
 17 0 0 0 0
 18 0 0 0 14
Workgroup 4:
----
 0 0 0 0 4
 1 0 0 0 0
 2 0 0 0 0
 3 0 0 0 0
 4 0 0 0 14
Workgroup 5:
----
 337 0 -1610595835 1142949760 -1610593271
 69207936 -1879047667 1073743744 268438533 654362496
 553649161 117442432 -1879046643 1920 -1610611187
 -1945745536 1074070544 1074136596 1610876433 67456
 1611072533 83840 806357009 -1005582464 -1610612711
....

I have 14 compute units, so that output is truncated. I can see that starting from workgroup 2, it seems like there might be some overlap I’m unaware of.

Also, in workgroup 3, it definitely looks like something is going wrong. I can get a feel for where I’m wrong since the local_worksize=N*N=25, which definitely makes the 14-18 values somewhat sensible.

Thanks!