Parallel Dispatch

PabloBot · September 15, 2017, 12:33pm

I am using beta support for OpenCL 2.0 on NVIDIA and targeting highend GPU like 1080ti. In my compute pipeline, I need to sometimes dispatch work to independently image process relatively small images. In theory, I think these images should be able to be processed in parallel on a single GPU because the amount of work groups for a single image won’t saturate all the compute units of the GPU.

Is this possible in OpenCL? Does this have a name in OpenCL?
If it is possible, is using multiple queues for a single device the only way to do this? Or will the driver look at the “waitEventList” and decide which kernels can be processed in parallel?
Do I need CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE?

Salabar · September 15, 2017, 11:12pm

You either create additional queues or, yes, a single out of order queue if you want more control.

Although,

relatively small images.

means it can be more wise to use an image array to process the whole batch in a single kernel run.

PabloBot · September 28, 2017, 1:13pm

[QUOTE=Salabar;42647]You either create additional queues or, yes, a single out of order queue if you want more control.

[/QUOTE]

Is there a flag to query support for this in OpenCL. How to be sure it is really dispatching in parallel?

Also, if you are doing multiple queues, is it correct that you don’t really need wait events?

Dithermaster · September 29, 2017, 7:55am

No API mechanism to check if hardware can do multiple dispatch, just API to provide work that is able to be run together. Then check with hardware vendor specs to find GPU that can do it.