If at the end of a computation I want to add up all the elements in a one-dimensional array A, how would I best go about doing this?
I figured I could do this in O(log(n)) time, by halving the field repeatedly and doing
A[i] := A[i] + A[i + n/2]
for all i.
Here is the idea (serial) in pseudocode (for simplicity I’m assuming the size of the field is a power of two):
n := n / 2
while (n > 0) do
for i = 1 to n do
A[i] := A[i] + A[i+n]
end
n := n / 2
end
I want each iteration of the for loop to be done by one work item, however, I am not sure which of the following two approaches would be best or most idiomatic here:
[ol]
[li]Enqueing the kernel log(n) times with an ever decreasing work size.[/18gjs991][/li][]Putting the while loop (with log(n) iterations) inside the kernel and having some sort of synchronisation point inside the loop so that the additions don’t conflict with each other.[/*18gjs991][/ol]