Hello there, I’m having some trouble with the Array Reduction example in the specification.
The kernel, in the final stages of its reduction starts to sum up the stored sums in local memory. At first it is checking if it is in the lower half of the pairs begin summed
#if (GROUP_SIZE >= 512)
if (lid < 256)
shared[lid] += shared[lid + 256];
barrrier(CLK_LOCAL_MEM_FENCE);
#endif
That’s fine by me. What twists my nugget is that you cease to check if you’re in the lower half once your local ID is smaller than 32?
if (lid < 32)
{
#if (GROUP_SIZE >= 64)
shared[lid] += shared[lid + 32];
barrier(CLK_LOCAL_MEM_FENCE);
#endif
#if (GROUP_SIZE >= 32)
shared[lid] += shared[lid + 16];
barrier(CLK_LOCAL_MEM_FENCE);
#endif
.....
}
I found the Apple example code here, and their kernel still checks if they’re in the lower half.
Can some one please tell me if I’ve missed something?