Right know I try to handle a problem, and my google-Research did not work out. I do some calculations of Voronoi-Cells (nearest neighbor-problem). Therefore, I write into my Result array, which is quite big (Number of Points > 250.000).
I only want to store True/False to keep the information of closest neighbors. (i.e. Pnt 12 is Neighbour of Pnt 13 => 12/13 = 1)
I compressed this information to a uint8 linear array (Values 0-255) and I do bit manipulations to store 8 Neighbor Informations.
Furthermore the array grows with the Gaussian sum:
int index = (cur_compare_pnt-1)*cur_compare_pnt/2 + cur_pnt;
int main_index = index / 8;
int sub_index = index % 8;
nachbar_buf[main_index] |= 1<<sub_index;
I am not sure, if this was understandable… :shock: I hope so. My problem is, that this process is not vectorizable, since I write with different Point Tupels into the same uint8 Entry.
My question is: Is there a way to use a bool-linear- array in OpenCL? Or is there a better way at all?
Is it even worth compressing? 250k is nothing on modern hardware. Of course, if this grouping makes the next stage of processing that much more efficient that is another matter.
But you have some options:
do a full bytes worth in each thread. less parallelism but with a big enough problem it should still be ok
use atomic_or to merge results from each group of N threads
use a reduction algorithm to perform the bit merging without atomics
For atomics you probably want to use local memory as a staging area, as with the reduction algorithm. The reduction algorithm might be better as atomics can be slow - although most hardware has specialised units which make certain operations fast.
thanks for your answer. The Problem is not the 250k Points, the problem is that i have to store the nearest neighbor information. The amout of data is given by
size = 1/2 * N * ( N -1). This is for 250k ca. 31G Connections (so. ca. 31 GB)
Right now I try to work with semaphors and atomic in generell but somehow I am to unskilled to be successful:-) I simply dont understand the idea of barriers and local groups.
Does somebody has a nice tutorial for this kind of problems?
int main_index = index / 8;
if(id0 == 1)
{
big_ones[main_index] |= 1<<sub_index;
}
}
}
This code work perfectly parallel. But this idea is not appliable for my Nearest Neighbour problem, since I cannot check a specific connection (i.e. Point 12 to Point 14 ? Neighbors). I start the code with a specific point and get a list of Neighbors. So, I do not know which byte will be modified in advance.
:shock: I hope that this is somehow understandable to anouther person :shock: