Ok - so i’m trying to get my head around the global and local ndranges.
Briefly - i 'm trying to calculate whether or not two polygons intersect one another… and doing this by testing one line against another line.
Let’s say polygon1 has 5 points, and polygon2 has 10 points.
My max work item sizes are 1024,1024,64, and preferred multiple is 32.
So here’s the pseudo code:
int IntersectionTest(float *p1, float *p2, float *q1, float *q2)
{
int rv = 0;
// do the intersection test
return rv;
}
kernel void CL_IntersectionTest(global float2 *polygon1, global float2 *polygon2, int numPoints1, int numPoints2, global int *doesIntersect)
{
int i = get_global_id(0);
int j = get_global_id(1);
if (i < numPoints1-1 && j < numPoints2-1)
{
float p1[2] = {polygon1[i].x, polygon1[i].y};
float p2[2] = {polygon1[i+1].x, polygon1[i+1].y;
float q1[2] = {polygon2[j].x, polygon2[j].y;
float q2[2] = {polygon2[j+1].x, polygon2[j+1].y};
int res = IntersectionTest(p1, p2, q1, q2);
// if res > 0 it means it intersects
if (res > 0)
doesIntersect[0] = 1;
}
}
So - my hope is to do this intersection routine has optimally as possible.
Currently, i do the following:
cl::NDRange globalws(5, 9);
cl::NDRange localws(1,1);
cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);
and when i run this, i get the results i would expect.
But, i THINK that this means that the gpu is only running 1 thread at a time, which is obviously not what i want to be doing.
Given that number of points is much less than the max work item sizes - i would have thought i could have done the following:
cl::NDRange globalws(5, 9);
cl::NDRange localws(5,9);
cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);
Doing this, i don’t get any opencl errors, but i do get different results… which is obviously not what i want. Is it a problem with the ndrange i have chosen?
Or could it because the value of doesIntersect[0] is changed at the same time and thus causing problems? I’ve adapted the code so that all it does is change it from 0 to 1 if we find an intersection… (it is initialised as 0 originally), and i wouldn’t have thought this would have been the reason (but then what do i know?).
I know this isn’t gpu intensive, but i’m trying to do these more simple routines before jumping into the deep end.