Using a buffer on multiple devices.

PhilTillet · August 12, 2012, 12:28pm

Hello everybody,

I have searched the web for similar questions but couldn’t find the answer I need…

I am making a dynamic scheduler (following AMD’s OpenCL Guide) for handling multiple GPUs. However, I am experiencing some troubles with the way OpenCL handles memory…

Basically, I have 5 buffers, i’ll just call them A, B, C, D, E …
I am executing two kernels on two devices:

Device 1 : A = f(B,C) [ does not modify B or C ]
Device 2 : D = f(B,E) [ does not modify B or E ]

I am making one host thread per queue, and there is only one queue on each device…
The problem is that, if Device1 executes first, Device 2 does not execute the task until B is available (i.e. until Device 1 is done…). So, in the end, everything ends up being serialized.
I have tried to use READ_ONLY and WRITE_ONLY buffers to indicate the OpenCL implementation that B is not modified, but experienced the same problem…
Is there any AMD-and-NVidia-compatible way of concurrently enqueueing these two tasks without having to duplicate B?

Thank you very much !

Edit : my tests were done on an NVidia platform.

affie · August 13, 2012, 10:27pm

Just wanted to confirm that when you enqueue kernel to device 2 that does not modify B or E, you do not use the event that refers to kernel enqueued to device 1 (that does not modify B or C) in the event_wait_list argument.

If no event dependencies are specified both kernels should execute in parallel. You should take this with the folks at AMD on their developer forum.

Dithermaster · August 28, 2012, 6:14pm

Did you try using a duplicate of B to see if it works like you want?