Hi,
how would the more experienced devs work out that problem:
I want to calculate a financial math problem called “Ichimoku” on GPU.
The actual problem can be shortened down to:
- you have a price series array - lets say an array of 10.000 doubles - 0 to 9.9999
Calculating Ichimoku involves basically the following task 2-3 times with different widths and a few minor challenges. All major calculations are independent from the previous / next one so the outer loop is perfectly parallel. The inner loop is a min/max reduction of the X previous values:
perfect parallel outer loop:
- do the inner loop (kernel) for each array value independent from the prev / next value
inner loop:
(int) argument X = 26
calculating the result of array index I for width X:
- find the low of index I to index (I - X) = LOW
- find the high of index I to index (I - X) = HIGH
- result for I = (LOW + HIGH) / 2.0
so for X = 26 and array_index = 100
-
find the low of array[100] to array[100-26-1] (inclusive)
-
find the high of array[100] to array[100-26-1]
-
global result[100]= (low+high)/2.0
-
of course only calculate for index values > X argument values
I could simply write a kernel which gets invoked with the array length and does a sequential calculation of the high/low in the kernel. I would gain over traditional cpu implementation because i can call that kernel for every array value perfectly in parallel but the inner loop main work load would still be sequential.
How could i do a min/max reducation within the kernel? Call array_size * X work items and keep track which work items are supposed to do a min/max local mem reduction at a certain stage and nothing on the later stage?
Help is very much appreciated.