I have setup two kernels and, depending on the options on the host, one of them will be executed. The problem is that if I run one kernel, and then I run again the application and choose to execute the other one, the results I get (for the first time the different kernel runs) are similar to the results of the first one. So, I have to run a kernel 2 times to get the “real” results… Could it be any problem related to initialization of variables inside the kernel, or something related to buffers I could be missing, or something related to flushing?
I can’t really tell what’s going on here since I don’t know where localID or idx come from, but I’d suspect you have a bunch of data races. You have some loops that are walking over aux and setting cvs, but I suspect other work-items could be modifying them at the same time. If that’s the case your results will depend on the order in which the work-items are executed.