When is a command queue actually executed?

All operations which are enqueued into the same command queue are executed as one group, but at which point exactly is a command queue executed?
Is it when the first operation in the queue is enqueued? This is important in the context of the following kind of situations:

  1. Create command queue CQ.
  2. Enqueue operations Ops.
  3. Wait for Ops to have finished.
  4. if ( condition on the output of Ops)
    Enqueue operation A in CQ
    else
    Enqueue operation B in CQ.

This seems impossible, since CQ has to be created as a whole before Ops can be executed, therefore one has to know in advance whether A or B is a part of it. Is this correct? That would make it quite cumbersome to write conditional statements and loops, since everytime a new command queue has to be created.

I think you are confusing the command queue as a one-time use item.

Apologies in advance, as I’m in a bit of a rush to really recall the proper terminology here. From my understanding, there are several commands which cause the command queue to be executed (I do not remember all of them - you’ll need to check the spec). In essence, though, you are building a number of items onto the command queue, and then the queue is executed via one of the commands.

In essence, it is completely possible to put host-side loops and branching statements in.

For example,

  1. Create Command queue (command_queue)
    Do Some Loop
    2)Enqueue kernel Ops
    3)Use clFinish (command_queue) [this executes the kernels up to that point in the command queue and, from what I understand, causes the host to wait on the command queue to finish executing
    4)Get condition from kernel Ops
    5)if(condition)
    5a - enqueue operation A in command_queue
    else
    5b - enqueue operation B in command_queue
    endif
    End do

You do not have to continually be creating new command queues - you can keep adding onto the one that exists. I’d recommend re-reading the command_queue section in the spec. From personal experience, my first use of OpenCL was rather discouraging, because I was not loading up the command queue effectively. I was adding one kernel at a time, executing, and verifying the results (which is probably a good idea for debuggin purposes, but not performance). Eventually, I was enqueueing hundreds of kernels before executing the command queue VIA the clFinish (command_queue).

I hope that this makes sense, but if not, I’ll try and clarify later.

I think it’s important that we all use the standard terminology, or otherwise we won’t understand each other.

A command queue is not executed. A command queue is a container of commands and the only things that are executed are the commands.

Command queues may or may not start executing commands as soon as they are enqueued. It’s an implementation detail that is hidden from the application. If you want to ensure that the commands that have been enqueued so far will start to execute you have to call clFlush(queue). It is not recommended that you call this function unless you have a good reason.

This seems impossible, since CQ has to be created as a whole before Ops can be executed, therefore one has to know in advance whether A or B is a part of it. Is this correct?

No, it’s not. You create your queue first, then enqueue any commands you want in it, one by one. Just like the example you gave. If at any time you need to enqueue some commands conditionally on the result of previous commands, you will have to call an operation like clEnqueueReadBuffer() to find out what was written out by those previous commands. That’s all.

Do not create and destroy queues frequently. Ever :slight_smile: There’s never a good reason to do that.

Thanks a lot, that’s very useful info!

Why is this not recommended? And what’s the preferred alternative (clFinish, clWaitForEvents, … ) ?

Why is this not recommended? And what’s the preferred alternative (clFinish, clWaitForEvents, … ) ?

I shouldn’t have said anything :slight_smile:

clFlush() has a small cost associated with it. It’s fine to call it if you have a good reason. That said, a lot of applications probably only need a blocking call to clEnqueueRead{Buffer,Image}() to read back the data.

hah ok, so basically your program should rely on queues and blocking events, and the “under-the-hood” of OCL will take care of the rest in an optimal fashion…

Thanks again!