How does OpenCL handle loops ??

following situation…

global_work_size = (128,128)
local_work_size = (16,16)
>> the result is a workgroup size of 256 /// 16x16 work items per workgroup /// and 64 workgroups


__kernel void example_kernel(_global int * p) {
   for(int i = 0; i < 10; i++) {
      ...
   }
}

is it correct, that the loop is, work group wide and for each work_group a loop is executed asynchronous? so workgroup (0,1) for example could be finished faster than workgroup (2,3) ??

how does opencl handle loops in workgroups?
can anyone explain that?

It is correct that workgroup (0,1) may finish before workgroup (2,3). The latter may not even have started when the former finished, or vise versa.

Not sure what you mean with “how does opencl handle loops in workgroups?”. Each individual thread runs the iterations of the loop, and when all threads in a work group has finished then that work group is finished. When all work groups have finished, then the kernel has also finished.