Async kernel execution and data copy

Hi all,

Is it possible to copy data to the gpu while a kernel is executing? As there is no concurrent kernel execution so far this would be better than nothing…

If you have a command queue that is out-of-order, then OpenCL is free to do this if the hardware and runtime support it. Additionally, in-order queues could legally do this if they can prove that the memory model will remain consistent. However, this is an optimization that is entirely up to the vendor, so there is no specific way (short of specifying an out-or-order command queue) to do this.

I have CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE as one of the queue properties, so I think I’m good.

So what you’re saying is that there’s no way of controlling it and it depends on the implementation? I’m confused, because in CUDA this can be done but you need to define a couple of streams and, as far as I knows, it only works with mapped memory…

That’s correct. The OpenCL model is inherently asynchronous (hence the clEnqueue… commands) so if the implementation is well optimized and the hardware supports it you should get that automatically.

How to check if the hardwords support it ? Sorry just curious and I’m such a newbie…

You’d have to ask the vendors what the particular device supports. I know many GPUs have at least some DMA capability, but whether that is used by OpenCL is a completely different issue. Given how new OpenCL is I doubt this optimization has been done yet.