Using a buffer on multiple devices.

Hello everybody,

I have searched the web for similar questions but couldn’t find the answer I need…

I am making a dynamic scheduler (following AMD’s OpenCL Guide) for handling multiple GPUs. However, I am experiencing some troubles with the way OpenCL handles memory…

Basically, I have 5 buffers, i’ll just call them A, B, C, D, E …
I am executing two kernels on two devices:

Device 1 : A = f(B,C) [ does not modify B or C ]
Device 2 : D = f(B,E) [ does not modify B or E ]

I am making one host thread per queue, and there is only one queue on each device…
The problem is that, if Device1 executes first, Device 2 does not execute the task until B is available (i.e. until Device 1 is done…). So, in the end, everything ends up being serialized.
I have tried to use READ_ONLY and WRITE_ONLY buffers to indicate the OpenCL implementation that B is not modified, but experienced the same problem…
Is there any AMD-and-NVidia-compatible way of concurrently enqueueing these two tasks without having to duplicate B?

Thank you very much !

Edit : my tests were done on an NVidia platform.

Just wanted to confirm that when you enqueue kernel to device 2 that does not modify B or E, you do not use the event that refers to kernel enqueued to device 1 (that does not modify B or C) in the event_wait_list argument.

If no event dependencies are specified both kernels should execute in parallel. You should take this with the folks at AMD on their developer forum.

Did you try using a duplicate of B to see if it works like you want?