First of all I want to present myself, I am from Chile and this is my first post on the forum. I am Ph.D. in Physics and some time ago I started this adventure on OpenCL.
The question is the following. I am studying an example of a Histogram from AMD webpage ( http://developer.amd.com/tools/heteroge … les-demos/ ) and I have an “inner conflict” with the dimensions of the data array on the kernel, the code is the following:
#define LINEAR_MEM_ACCESS
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
#define BIN_SIZE 256
/**
-
@brief Calculates block-histogram bin whose bin size is 256
-
@param data input data pointer
-
@param sharedArray shared array for thread-histogram bins
-
@param binResult block-histogram array
/
__kernel
void histogram256(__global const uint data,
__local uchar* sharedArray,
__global uint* binResult)
{
size_t localId = get_local_id(0);
size_t globalId = get_global_id(0);
size_t groupId = get_group_id(0);
size_t groupSize = get_local_size(0);/* initialize shared array to zero */
for(int i = 0; i < BIN_SIZE; ++i)
sharedArray[localId * BIN_SIZE + i] = 0;barrier(CLK_LOCAL_MEM_FENCE);
/* calculate thread-histograms */
for(int i = 0; i < BIN_SIZE; ++i)
{
#ifdef LINEAR_MEM_ACCESS
uint value = data[groupId * groupSize * BIN_SIZE + i * groupSize + localId];
#else
uint value = data[globalId * BIN_SIZE + i];
#endif // LINEAR_MEM_ACCESS
sharedArray[localId * BIN_SIZE + value]++;
}barrier(CLK_LOCAL_MEM_FENCE);
/* merge all thread-histograms into block-histogram */
for(int i = 0; i < BIN_SIZE / groupSize; ++i)
{
uint binCount = 0;
for(int j = 0; j < groupSize; ++j)
binCount += sharedArray[j * BIN_SIZE + i * groupSize + localId];binResult[groupId * BIN_SIZE + i * groupSize + localId] = binCount;
}
}
In the “/calculate thread-histograms/” part each element of the data array is called in the following way:
data[groupId * groupSize * BIN_SIZE + i * groupSize + localId]
Now comes my conflict. The dimensions of the data array is 10241024=1048576, the number of workgroups is 4096 and the local size is 128 (as defined by the host program). If one calculates the index for the “last thread” of groupId=4095 and localId=127 one would have an index of 4095128256+i128+127=134185087+i*128, which clearly exceeds the dimension of the data array. So how does the program handle this issue?
Well, a last question, what does LINEAR_MEM_ACCESS mean?, I have been trying to find the answer without luck… thanks for your help!