Find max element in a huge array in kernel

_MASTER · April 15, 2017, 1:51pm

Hi guys. Anyone can give an example of the code to find the maximum element in the array on OpenCL?

struct Item
{
cl_float3 color = {0.0f, 0.0f, 0.0f};
//below more data
};
…
clCreateBuffer(context, CL_MEM_READ_WRITE, over100millions * sizeof(Item), nullptr, &error);
…
__kernel void findMaxColor(__global Item* in_items, __global Item* out_maxElement)
and now i want to find max element on GPU by criterion like this:

float3 maxElement = (float3)(0.0f, 0.0f, 0.0f);
if(it.x > maxElement.x || it.y > maxElement.y || it.z > maxElement.z)
maxElement = it;
and so on.

so… How to calc maxElement and put value to out_maxElement?
P.S.: If possible, the code should be very fast, can it use local memory?
Need help…

HadrienG · April 16, 2017, 1:57am

For this kind of data reduction task, the typical plan is this:

[ul][li]Compute the max of each workgroup, moving the inputs into local memory at the beginning and using a binary tree for maximal concurrency.
[/li][li]Put the max of each workgroup into another array in global memory.
[/li][li]Recursively invoke the same kernel on the output until you only have one element left.
[/li][li]This element is your global maximum[/ul]
[/li]
Here’s a good resource on how to optimize the performance of a different kind of reduction, summing instead of computing the max: http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf . It’s NVidia- and CUDA- based, but the general concepts map quite well to equivalent OpenCL constructs.

In general, reduction is memory-bound, so your performance target should be to max out the RAM bandwidth of your GPU.

_MASTER · April 16, 2017, 5:43am

thanks, i will try somethik like this;)