Ok - so i'm trying to get my head around the global and local ndranges.

Briefly - i 'm trying to calculate whether or not two polygons intersect one another..... and doing this by testing one line against another line.

Let's say polygon1 has 5 points, and polygon2 has 10 points.

My max work item sizes are 1024,1024,64, and preferred multiple is 32.

So here's the pseudo code:

Code :int IntersectionTest(float *p1, float *p2, float *q1, float *q2) { int rv = 0; // do the intersection test return rv; } kernel void CL_IntersectionTest(global float2 *polygon1, global float2 *polygon2, int numPoints1, int numPoints2, global int *doesIntersect) { int i = get_global_id(0); int j = get_global_id(1); if (i < numPoints1-1 && j < numPoints2-1) { float p1[2] = {polygon1[i].x, polygon1[i].y}; float p2[2] = {polygon1[i+1].x, polygon1[i+1].y; float q1[2] = {polygon2[j].x, polygon2[j].y; float q2[2] = {polygon2[j+1].x, polygon2[j+1].y}; int res = IntersectionTest(p1, p2, q1, q2); // if res > 0 it means it intersects if (res > 0) doesIntersect[0] = 1; } }

So - my hope is to do this intersection routine has optimally as possible.

Currently, i do the following:

Code :cl::NDRange globalws(5, 9); cl::NDRange localws(1,1); cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);

and when i run this, i get the results i would expect.

But, i THINK that this means that the gpu is only running 1 thread at a time, which is obviously not what i want to be doing.

Given that number of points is much less than the max work item sizes - i would have thought i could have done the following:

Code :cl::NDRange globalws(5, 9); cl::NDRange localws(5,9); cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);

Doing this, i don't get any opencl errors, but i do get different results...... which is obviously not what i want. Is it a problem with the ndrange i have chosen?

Or could it because the value of doesIntersect[0] is changed at the same time and thus causing problems? I've adapted the code so that all it does is change it from 0 to 1 if we find an intersection....... (it is initialised as 0 originally), and i wouldn't have thought this would have been the reason (but then what do i know?).

I know this isn't gpu intensive, but i'm trying to do these more simple routines before jumping into the deep end.