Hello everybody,
I have searched the web for similar questions but couldn’t find the answer I need…
I am making a dynamic scheduler (following AMD’s OpenCL Guide) for handling multiple GPUs. However, I am experiencing some troubles with the way OpenCL handles memory…
Basically, I have 5 buffers, i’ll just call them A, B, C, D, E …
I am executing two kernels on two devices:
Device 1 : A = f(B,C) [ does not modify B or C ]
Device 2 : D = f(B,E) [ does not modify B or E ]
I am making one host thread per queue, and there is only one queue on each device…
The problem is that, if Device1 executes first, Device 2 does not execute the task until B is available (i.e. until Device 1 is done…). So, in the end, everything ends up being serialized.
I have tried to use READ_ONLY and WRITE_ONLY buffers to indicate the OpenCL implementation that B is not modified, but experienced the same problem…
Is there any AMD-and-NVidia-compatible way of concurrently enqueueing these two tasks without having to duplicate B?
Thank you very much !
Edit : my tests were done on an NVidia platform.