What is clEnqueueWaitForEvents for?

Can anyone explain me the purpose of this function?
After exploring clFinish, clFlush and clWaitForEvents, I’m not really sure where clEnqueueWaitForEvents fits in.
Is it blocking?
Why do I need to specify a wait on a command queue instead of just passing the list of events to complete before a call?
Thanks.

it isn’t very clear for me too.

my understanding is:
clWaitForEvents is blocking
clEnqueueWaitForEvents blocks at the next enqueued command in in-order-mode and is pretty much useless in out-of-order mode.

one motivation for introducing clEnqueueWaitForEvents in the api could have been to decrease contention on multi device systems… but this is only a wild guess.

My understanding is that clWaitForEvents is blocking on the host, meaning that it is a way to ensure that a set of enqueued commands have finished before continuing.
clEnqueueWaitForEvents, on the other hand, returns immediately. Instead it causes commands added to the queue after it to wait until all the events listed in clEnqueueWaitForEvents have finished. I guess it is a form of selective barrier.

Regarding in-order and out-of-order queues, I believe that clEnqueueWaitForEvents is useless in an in-order queue since all enqueued commands always waits for prior commands to finish before executing (can someone confirm this), which an out-of-order queue can launch commands whenever there are resources available. It is thus important to enforce ordering of dependent commands in some way. Now that I think about it, clEnqueueWaitForEvents appears to be a convenient way to avoid adding the list of events to the wait list for all subsequent clEnqueu* calls.

But the same thing can be achieved by passing in a list of events that need to complete before a certain command can execute. In other words this command will wait anyway for the events in the list. This holds true for both in and out of order command queues. So why do I need to explicitly specify a wait? Sounds a bit like a NOOP to me.
Maybe I’m missing the point altogether here.

Ok I can think of this scenario:

  1. Have an out of order command queue.
  2. Send in a bunch of independent commands.
  3. clEnqueueWaitForEvents.
  4. Send in a second bunch of commands that are independent of each other but rely on the first group.

The advantage of this approach is you don’t need to explicitly specify an event list so the OpenCL scheduler can take better advantage of the HW resources by rearranging the commands inside the group.

I can imagine situations where an explicit list of cl_event is not available at the time of a clEnqueuNDRange kernel. Or consider for example an application using two different libraries that wrap a set of OpenCL kernels and I want the application to first perform some operations provided by library ‘lib1’ and then some operations in ‘lib2’. It will be sufficient to gather the events from ‘lib1’, call clEnqueueWaitForEvents on them and after that perform some operations required by the second library. The gain is that I won’t have to explicitly sent the wait list to each and every kernel launch in the second library, and there is no need to do a global synchronization of ALL running kernels that a barrier would do.

thats true. In-order queues don’t cause race conditions.

regarding in-order + clEnqueueWaitForEvents = useless
It can be still useful if you want delay the blocking on the host thread to the next command. The question is: do you want that?
IMO there is not much difference whether you have a in- or out-of-order queue. You can use both versions (clEnqueueWaitForEvents or clWaitForEvents) in both cases.

clWaitForEvents would produce the same result

(or not?)

Look at clEnqueueWaitForEvents as a barrier with a customized set of command dependencies rather than all outstanding commands on the queue. It is different from its host-side counterpart, clWaitForEvents, in that it is non-blocking to the host.

The usefulness of clEnqueueWaitForEvents on an in-order queue is minimal, but it can provide the ability to synchronize the queue based on events from other queues in the same context. This cannot be done with a barrier.

If an application does not have dependencies between queued commands then an out-of-order queue can prove beneficial. The underlying implementation is free to optimize the execution of the queued commands in order to improve command throughput. This is where the WaitForEvents are most useful.

Actually no. Well, it would produce the same result but the path to that result is different.
clWaitForEvents blocks the host until the events signal completion.
clEnqueueWaitForEvents blocks a COMMAND QUEUE until the events signal completion. This is useful in case you’re issuing “waits” from the host to several queues but the host needn’t stop at all. Instead it carries on issuing other commands.

The purpose of this function is very clear once you start managing several command queues that need to wait for each other at some point.

I hope this is clear now. Thanks to all for the replies.