If at the end of a computation I want to add up all the elements in a one-dimensional array A, how would I best go about doing this?

I figured I could do this in O(log(n)) time, by halving the field repeatedly and doing

A[i] := A[i] + A[i + n/2]

for all i.

Here is the idea (serial) in pseudocode (for simplicity I'm assuming the size of the field is a power of two):

Code :n := n / 2 while (n > 0) do for i = 1 to n do A[i] := A[i] + A[i+n] end n := n / 2 end

I want each iteration of the for loop to be done by one work item, however, I am not sure which of the following two approaches would be best or most idiomatic here:

- Enqueing the kernel log(n) times with an ever decreasing work size.[/*:m:18gjs991]
- Putting the while loop (with log(n) iterations) inside the kernel and having some sort of synchronisation point inside the loop so that the additions don't conflict with each other.[/*:m:18gjs991]