Best way for solving an optimization problem

Let’s suppose you’ve a (large) collection of 2D points, and you’ve to find the “best” point among them, where “best” means evaluating the cost of each point, and finding which one has the smallest cost.

The function for evaluating the cost is relatively complex (it needs the coordinates of all the points, because the cost of a single point depends on all the other points too), but despite of this it fits in a kernel, and I successfully compiled it into a kernel both on my NVIDIA and AMD devices.

But now… since it’s not possible to implement spin-locks in OpenCL, how should I find the “best” point? Allocating the cost of all points in global memory, returning them all, and letting the host find the smallest one isn’t an option, because this would mean sending hundreds of MB from the device to the host…

You could use a parallel reduction with the min operator (instead of sum). Your reduction would simply keep track of the minimum value and the index of that value. If you only care about the minimum point, not the others, then you could do the first stage of the reduction, i.e. finding the minimum and its index within a work group, at the end of your kernel that evaluates the points.

I agree. This is a parallel reduction problem. OpenCL 2 even has operators for reduction, but you can write it in OpenCL 1.x yourself. I’d write the values to global memory and then run the reduction kernel on it separately; that way you can debug the reduction separate from the point evaluation. You can merge them later if it makes sense.