If I understand it correctly, barrier function with any argument is responsible for synchronization inside a work-group. Is there a way to synchronize all the work-groups at the same time, besides breaking up my kernel function?
I face the same need… i need some kind of global synchronization between all the Work Items / Processing Elements.
I am trying to simulate the 2-dimensional diffusion using opencl.
The algorithm is quite straightforward.
I have two arrays CUR and PREV of dimension DIM. The diffusion process is simulated until time t reached. The following is the simple pseudo code of what i do:
initialize CUR and PREV;
for (time = 0; time < t; ++time) {
evolve(CUR, PREV);
exchangePointers(CUR, PREV);
}
So what I am doing is evolving the current array CUR taking the previous situation PREV.
For now every evolve function call is a new invocation of kernel… that slows down drastically the performance.
My solution is move the for instruction with time t into kernel. But at this point i need to be sure that before I start a new iteration all the processing elements are done their work at time time.
So… i would like to know what do you mean with two kernel invocations and how can I do this. Or how you would implement some sort of global barrier.