I used clCreateFromD3D10Texture2DKHR to contact one texture with one cl_mem, then copy every texture

to this texture, do gauss blur. Pseudo is as follows:

cl_mem g_src_mem=clCreateFromD3D10Texture2DKHR(context, 0, sharedTexture,..);

while()

{

StretchRect(texture, sharedTexture);

clEnqueueAcquireD3D10ObjectsKHR()

do gauss blur

...

}

something wrong with above Pseudo? whether synchronization is needed ? because StretchRect had not completed before opencl used this texture?. ]]>

I'm very new to the whole OpenCL world so I'm following some beginners tutorials. I'm trying to combine this and this to compare the the time required to add two arrays together on different devices. However I'm getting confusing results. Considering that the code is too long I made this GitHub Gist.

On my mac I have 1 platform with 3 devices. When I assign the j in

Code :

cl_command_queue command_queue = clCreateCommandQueue(context, device_id[j], 0, &ret);

I can adjust the j to any higher numbers and it still seems to run on GPUs.

When I put all the calculation in a loop, the time measured for all the devices are the same as calculating on CPU (as I persume).

The time required to do the calculation for all 0<j are suspiciously close. I wonder if they are really being ran on different devices.

I have clearly no clue about OpenCL so I would appreciate if you could take a look at my code and let me know what are my mistake(s) and how I can solve it/them. Or maybe point me towards a good example which runs a calculation on different devices and compares the time. ]]>

I am new to Opencl, I use to code using numpy and I am dealing with stack of images. At the end I have 3D array of float.

I made my first customized kernel and it works like a charm and it's so fast, I love it!

Now I am converting other algorithm and many of them has operation over a given axis.

For instance I am stuck with cumsum along a given axis : I have a 3D set of data (time,row,col), I would like to compute like in numpy : numpy.cumsum(data, axis=0) to perform for every pixel of my 3D stack a cumsum along time

In the Pyopencl documentation : https://documen.tician.de/pyopencl/a...edefined-scans, there is an example :

Code :

knl = InclusiveScanKernel(context, np.int32, "a+b")
n = 2**20-2**18+5
host_data = np.random.randint(0, 10, n).astype(np.int32)
dev_data = cl_array.to_device(queue, host_data)
knl(dev_data)
assert (dev_data.get() == np.cumsum(host_data, axis=0)).all()

This code works for 1D input data, but I do not know how to input more than 1D data. I'd like to have 3D data as input and perform operation along a given axis.

Is the good method to have two loops in python and enqueue task?

I hope someone could help me and teach the right "opencl" approach for such operation

Thanks,

Jérôme ]]>