Global synchronization

Hello everyone! As well known OpenCL barrier() function works only for single workgroup, and there is no direct possibility to synchronize workgroups. If it possible what’s best approach for global synchronization today? Using atomics, OpenCL 2.0 features, etc.?

Because the runtime may choose to run some workgroups to completion before starting others (when the number of workgroups far exceeds the hardware capabilties) there are therefore no global synchronization functions. In OpenCL 1.x the solution it to use a series of kernels schedules on the same command queue; one will finish all work items before the next starts. I’m not sure if OpenCL 2.0 dynamic parallelism would do what you want but you could study it.