Hello,
I am having problems understanding what a simple 1D kernel
is doing with local memory.
The kernel does nothing useful at the moment, I simplified it for
this post, but nevertheless exhibits what seems to me strange
behaviour.
Kernel code:
#define STENCILSIZE(2)
#define x11_size_x (20)
__kernel void fluxes_x11_fast3(
__global double* rhoe_in,
__global double* rhoe_out )
{
int ind, num_point;
// 1D version
int main_ref = get_global_id(0);
//Identification of workgroup
int i = get_group_id(0);
//Identification of work item inside workgroup
int idX = get_local_id(0);
int sizeX = get_local_size(0);
__local double lrhoe[x11_size_x+2*STENCILSIZE];
//Copy submatrixes to local memory. One element copied per work item
lrhoe[idX+STENCILSIZE] = rhoe_in[main_ref];
// "Edge" effects on the left
if(idX == 0 && i >= 1) {
for (ind = 0; ind <= STENCILSIZE-1; ind++ ) {
lrhoe[idX+ind] = rhoe_in[main_ref-STENCILSIZE+ind];
}
}
// "Edge" effects on the right
if(idX == x11_size_x-1 && i <= (int)get_num_groups(0)-2 ) {
for (ind=1; ind<=STENCILSIZE; ind++){
lrhoe[idX+STENCILSIZE+ind] = rhoe_in[main_ref+ind];
}
}
// synchronise all the local memory
barrier(CLK_LOCAL_MEM_FENCE);
// main computation bit
if ( main_ref >= STENCILSIZE && main_ref <=DATA_SIZE_X-1-STENCILSIZE ) {
rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE];
}
}
The kernel has one input array, and one output array.
All I’m trying to do for the moment is pull the input array into local
memory, while taking “edge” effects into account (i.e. for each point
in the input array, the local array would have neighbours of this point on
each side).
In the output array, I insert a value from the local array.
What I don’t understand is the values I get from the local array, see last line of
actual code:
- if I take lrhoe[idX+STENCILSIZE], I get the desired result, i.e. the input value.
- if I take lrhoe[idX+STENCILSIZE-1], I expect the input array shifted by one to the
right, but instead I get the input array shifted by 6.
Can anyone explain this behaviour, or suggest what I might be doing wrong?
I can also post host code if this is deemed useful.
Many thanks!
Olivier