following situation…
global_work_size = (128,128)
local_work_size = (16,16)
>> the result is a workgroup size of 256 /// 16x16 work items per workgroup /// and 64 workgroups
__kernel void example_kernel(_global int * p) {
for(int i = 0; i < 10; i++) {
...
}
}
is it correct, that the loop is, work group wide and for each work_group a loop is executed asynchronous? so workgroup (0,1) for example could be finished faster than workgroup (2,3) ??
how does opencl handle loops in workgroups?
can anyone explain that?