Hello,
I am new to Opencl, I use to code using numpy and I am dealing with stack of images. At the end I have 3D array of float.
I made my first customized kernel and it works like a charm and it’s so fast, I love it!
Now I am converting other algorithm and many of them has operation over a given axis.
For instance I am stuck with cumsum along a given axis : I have a 3D set of data (time,row,col), I would like to compute like in numpy : numpy.cumsum(data, axis=0) to perform for every pixel of my 3D stack a cumsum along time
In the Pyopencl documentation : Parallel Algorithms - pyopencl 2022.2.4 documentation, there is an example :
knl = InclusiveScanKernel(context, np.int32, "a+b")
n = 2**20-2**18+5
host_data = np.random.randint(0, 10, n).astype(np.int32)
dev_data = cl_array.to_device(queue, host_data)
knl(dev_data)
assert (dev_data.get() == np.cumsum(host_data, axis=0)).all()
This code works for 1D input data, but I do not know how to input more than 1D data. I’d like to have 3D data as input and perform operation along a given axis.
Is the good method to have two loops in python and enqueue task?
I hope someone could help me and teach the right “opencl” approach for such operation
Thanks,
Jérôme