multiple command-queues in the same context for one device

Hello everyone!
I have a problem with implementation of two command queues in the same context. The goal is to develope application, that could work in “two streams” mode. That means, that while the first queue processing executing the kernel, second processing copying from host to device.
It can be implemented like this:


...
cl_command_queue queue[2];
...
for (int i = 0; i < n; i++)
{
  ...
  clEnqueueWriteBuffer(queue[0], buf,***);
  clEnqueueNDRangeKernel(queue[0], kernel,**);
//then refreshing buf on the host
  if (i < n-1)
  clEnqueueWriteBuffer(queue[1], buf,***);
  ...
}

It’s a humble peace of code, written just to present the idea.
So, is this kind of programm is possible to be created with OpenCL? I know that CUDA lets develop algorithms like this.

Any help is greatly appreciated!

The goal is to develope application, that could work in “two streams” mode. That means, that while the first queue processing executing the kernel, second processing copying from host to device.

It is possible to create an application with multiple threads. However, you need to be aware of not introducing any race conditions. What you describe as “two streams mode” sounds a lot like double buffering.

Notice that when double buffering is used, you must duplicate your surfaces/buffers/images so that, for example, while the NDRange is executing reading data from buffer0 there is a concurrent operation that is writing data for the next iteration into buffer1. If both of them were reading/writing into the same buffer you would have a race condition and the output would be incorrect most of the time.

To use two different buffers should be pretty self-explanatory, but as far as I understood, the question then was: Does he need to use two queues or would be possible to use double-buffering with only one?

To use two different buffers should be pretty self-explanatory

I pointed that out because his example code used only one buffer.

Does he need to use two queues or would be possible to use double-buffering with only one?

Command queues by default execute commands in order. That means that a command does not start executing until all previously enqueued commands have completed. In that case double buffering cannot improve performance.

To implement double buffering with a chance of performance gains you need either an out-of-order queue or two queues. In any case you will need to set up event dependencies to prevent race conditions. If you are running all in the same device it is unlikely (but still possible) that you will see performance gains.

Thanks! I’ll take into account youre advices!