Looping kernels produce not constant timings

    Looping kernels produce not constant timings

    Hi OpenCL community,

    I would appreciate if any of you can help me with the following issue. I have a program in which I use the same kernels over and over inside a "for" loop. The pseudo-code of my program is the following

    Code :
    Initilialize OpenCL (devices, queue, kernels, create buffers, set arguments, etc)
       read data
       rewrite buffers with CL_TRUE enabled
       run kernel 1
       run kernel n
      read output buffer  
      C functions using the output

    Where tic and toc are time measurement functions similar to Matlab which I use to profile the performance of my code. I am not using the OpenCL profiler functions because I am working with the Nexus10 and they are not working.properly.

    My question is the following:
    When I plot the times for all the running kernels, I observe that there are iterations in which they are not relatively constant (it starts at some timing value and then randomly jump to a higher time for some iterations and then it goes back to a time that is between the min (expected one) and the maximum) as it should be. Do anyone have a hint of what may be causing this?.

    I tried changing the clFinish with clFlush, using both or none. Also, when I run only one iteration of the process with the same input that produces the maximum value it works fine producing the minimum expected time. Finally, if I add a sleep(100ms) at the end of the loop the times are constant (at the minimum value) for all the kernels as they should be.

    Thanks for your time and advise.

