Parallel Dispatch

I am using beta support for OpenCL 2.0 on NVIDIA and targeting highend GPU like 1080ti. In my compute pipeline, I need to sometimes dispatch work to independently image process relatively small images. In theory, I think these images should be able to be processed in parallel on a single GPU because the amount of work groups for a single image won’t saturate all the compute units of the GPU.

  1. Is this possible in OpenCL? Does this have a name in OpenCL?

  2. If it is possible, is using multiple queues for a single device the only way to do this? Or will the driver look at the “waitEventList” and decide which kernels can be processed in parallel?

  3. Do I need CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE?

You either create additional queues or, yes, a single out of order queue if you want more control.

Although,

relatively small images.

means it can be more wise to use an image array to process the whole batch in a single kernel run.

[QUOTE=Salabar;42647]You either create additional queues or, yes, a single out of order queue if you want more control.

[/QUOTE]

Is there a flag to query support for this in OpenCL. How to be sure it is really dispatching in parallel?

Also, if you are doing multiple queues, is it correct that you don’t really need wait events?

No API mechanism to check if hardware can do multiple dispatch, just API to provide work that is able to be run together. Then check with hardware vendor specs to find GPU that can do it.