jazpearson

01-30-2013, 06:47 AM

Ok - so i'm trying to get my head around the global and local ndranges.

Briefly - i 'm trying to calculate whether or not two polygons intersect one another..... and doing this by testing one line against another line.

Let's say polygon1 has 5 points, and polygon2 has 10 points.

My max work item sizes are 1024,1024,64, and preferred multiple is 32.

So here's the pseudo code:

int IntersectionTest(float *p1, float *p2, float *q1, float *q2)

{

int rv = 0;

// do the intersection test

return rv;

}

kernel void CL_IntersectionTest(global float2 *polygon1, global float2 *polygon2, int numPoints1, int numPoints2, global int *doesIntersect)

{

int i = get_global_id(0);

int j = get_global_id(1);

if (i < numPoints1-1 && j < numPoints2-1)

{

float p1[2] = {polygon1[i].x, polygon1[i].y};

float p2[2] = {polygon1[i+1].x, polygon1[i+1].y;

float q1[2] = {polygon2[j].x, polygon2[j].y;

float q2[2] = {polygon2[j+1].x, polygon2[j+1].y};

int res = IntersectionTest(p1, p2, q1, q2);

// if res > 0 it means it intersects

if (res > 0)

doesIntersect[0] = 1;

}

}

So - my hope is to do this intersection routine has optimally as possible.

Currently, i do the following:

cl::NDRange globalws(5, 9);

cl::NDRange localws(1,1);

cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);

and when i run this, i get the results i would expect.

But, i THINK that this means that the gpu is only running 1 thread at a time, which is obviously not what i want to be doing.

Given that number of points is much less than the max work item sizes - i would have thought i could have done the following:

cl::NDRange globalws(5, 9);

cl::NDRange localws(5,9);

cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);

Doing this, i don't get any opencl errors, but i do get different results...... which is obviously not what i want. Is it a problem with the ndrange i have chosen?

Or could it because the value of doesIntersect[0] is changed at the same time and thus causing problems? I've adapted the code so that all it does is change it from 0 to 1 if we find an intersection....... (it is initialised as 0 originally), and i wouldn't have thought this would have been the reason (but then what do i know?).

I know this isn't gpu intensive, but i'm trying to do these more simple routines before jumping into the deep end.

Briefly - i 'm trying to calculate whether or not two polygons intersect one another..... and doing this by testing one line against another line.

Let's say polygon1 has 5 points, and polygon2 has 10 points.

My max work item sizes are 1024,1024,64, and preferred multiple is 32.

So here's the pseudo code:

int IntersectionTest(float *p1, float *p2, float *q1, float *q2)

{

int rv = 0;

// do the intersection test

return rv;

}

kernel void CL_IntersectionTest(global float2 *polygon1, global float2 *polygon2, int numPoints1, int numPoints2, global int *doesIntersect)

{

int i = get_global_id(0);

int j = get_global_id(1);

if (i < numPoints1-1 && j < numPoints2-1)

{

float p1[2] = {polygon1[i].x, polygon1[i].y};

float p2[2] = {polygon1[i+1].x, polygon1[i+1].y;

float q1[2] = {polygon2[j].x, polygon2[j].y;

float q2[2] = {polygon2[j+1].x, polygon2[j+1].y};

int res = IntersectionTest(p1, p2, q1, q2);

// if res > 0 it means it intersects

if (res > 0)

doesIntersect[0] = 1;

}

}

So - my hope is to do this intersection routine has optimally as possible.

Currently, i do the following:

cl::NDRange globalws(5, 9);

cl::NDRange localws(1,1);

cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);

and when i run this, i get the results i would expect.

But, i THINK that this means that the gpu is only running 1 thread at a time, which is obviously not what i want to be doing.

Given that number of points is much less than the max work item sizes - i would have thought i could have done the following:

cl::NDRange globalws(5, 9);

cl::NDRange localws(5,9);

cl_int status = queue.enqueueNDRangeKernel(intersectKernel, cl::NullRange, globalws, localws);

Doing this, i don't get any opencl errors, but i do get different results...... which is obviously not what i want. Is it a problem with the ndrange i have chosen?

Or could it because the value of doesIntersect[0] is changed at the same time and thus causing problems? I've adapted the code so that all it does is change it from 0 to 1 if we find an intersection....... (it is initialised as 0 originally), and i wouldn't have thought this would have been the reason (but then what do i know?).

I know this isn't gpu intensive, but i'm trying to do these more simple routines before jumping into the deep end.