Simple question related to the code bellow. It's not a real code, it doesn't compile, it's just a simple example.

Code :
cl_mem image0;
cl_mem image1;
cl_mem image2;
cl_mem buffer;
// Step1
f(image0, image1, 1);
f(image0, image2, 2);
g(image1, image2, buffer);
// Step2
f(image0, image1, 3);
f(image0, image2, 4);
g(image1, image2, buffer);
// Step3
f(image0, image1, 5);
f(image0, image2, 6);
g(image1, image2, buffer);
We basically have 3 images allocated on the device. The function f(input, output, p) apply a kernel that fills the output with values read from input given a parameter p. For instance, f could be a Gaussian smoothing where p would be the variance of the Gaussian kernel.

The function g takes two image inputs and a buffer as an output. In g, the kernel analyses the two inputs and write something in the output buffer. For instance, g could detect the local maxima in both inputs.

Because we apply the "algorithm" 3 times here (3 steps) and because we re-use the same memory space at each step (image0 and image1, buffer grows at each step), I was thinking that maybe I should use a clFinish() between each step. I'm affraid that if I don't, the step 2 may start before step 1 is finished which would lead to an incorrect behavior of function g in step 1.

What do you think?