A question on NDRange

Ok - so i’m trying to get my head around the global and local ndranges.

Briefly - i 'm trying to calculate whether or not two polygons intersect one another… and doing this by testing one line against another line.

Let’s say polygon1 has 5 points, and polygon2 has 10 points.
My max work item sizes are 1024,1024,64, and preferred multiple is 32.

So here’s the pseudo code:



int IntersectionTest(float *p1, float *p2, float *q1, float *q2)
{
  int rv = 0;

 // do the intersection test

  return rv;
}

kernel void CL_IntersectionTest(global float2 *polygon1, global float2 *polygon2, int numPoints1, int numPoints2, global int *doesIntersect)
{
  int i = get_global_id(0);
  int j = get_global_id(1);

  if (i < numPoints1-1 && j < numPoints2-1)
  {
     float p1[2] = {polygon1[i].x, polygon1[i].y};
     float p2[2] = {polygon1[i+1].x, polygon1[i+1].y;
     
     float q1[2] = {polygon2[j].x, polygon2[j].y;
     float q2[2] = {polygon2[j+1].x, polygon2[j+1].y};

     int res = IntersectionTest(p1, p2, q1, q2);
     // if res > 0 it means it intersects
     if (res > 0)
       doesIntersect[0] = 1;
  }
}


So - my hope is to do this intersection routine has optimally as possible.

Currently, i do the following:


cl::NDRange globalws(5, 9);
cl::NDRange localws(1,1);

cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);

and when i run this, i get the results i would expect.

But, i THINK that this means that the gpu is only running 1 thread at a time, which is obviously not what i want to be doing.
Given that number of points is much less than the max work item sizes - i would have thought i could have done the following:


cl::NDRange globalws(5, 9);
cl::NDRange localws(5,9);

cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);

Doing this, i don’t get any opencl errors, but i do get different results… which is obviously not what i want. Is it a problem with the ndrange i have chosen?
Or could it because the value of doesIntersect[0] is changed at the same time and thus causing problems? I’ve adapted the code so that all it does is change it from 0 to 1 if we find an intersection… (it is initialised as 0 originally), and i wouldn’t have thought this would have been the reason (but then what do i know?).

I know this isn’t gpu intensive, but i’m trying to do these more simple routines before jumping into the deep end.