Serial to Parallel question

Hey everyone,
My question today is more of an algorithmic question. When I have the following code


// In main kernel
for(a = 0; a < gSize; a++)
{
   for(b = 0; b < gSize; b++)
   {
       if(b!=a)
       {
            //Do something using a, b, and c
       }
   }
}

// Sub kernel
__kernel void DoStuffWithAandB(a, b, c)

What I am wondering is how I could make this parallel. Basically if these are points, I need to look at them (a->b) and from (b->a) (which is why I don’t do b=a). So what are your thoughts?
Thanks!
Chris

Can you change your problem slightly so you have a wavefront that is diagonal across the a’s and b’s? This is a common approach for extracting parallelism where certain indices need to be executed before others.

I guess to correctly state the problem, I have to be a bit more specific about the code. I am the code that this is running determines if a point is on the left of a line defined by points (a,b).
There is no order requirement per say, my real question is how to remove either the inner loop or both loops, as I believe this defeats the parallelism. The most effective approach I can see is several wavefronts, each having a being set and the group size being the number of items to check, but is it possible to share data across groups?
The idea is that once I determine that there are no points to the left of the line, I want to be able to make an OpenGL texture (this is to come later :D) and be able to display the line without having to transfer any back to host.
Thanks for your reply!
Chris

I don’t really understand what you’re trying to do, but I can answer one part of your question. The only way to synchronize across work-groups in OpenCL 1.0 is to do so between kernel executions. Within one kernel execution you can not synchronize between work-groups. You can share data, but there is no guarantee that one will execute before the other, so you can’t wait for each other.