Command queue synchroniztion

Hello,
I’m using OpenCL with APP SDK. I have n command queues and m stages. All queues should be in the same stage at the same time. Therefore i need some sort of queue synchronisation between the stages. First, i tried this:

void sync_queues(cl_command_queue *queues, int queue_count) {
  int k;
  for(k = 0; k < queue_count; k++) {
    clFinish(queues[k]);
  }
}

This works. See the application trace (h**p://picload.org/image/ordiadd/soll.jpeg), created with CodeXL. But I prefer a version without main thread blocking. So i tried using events:

void sync_queues(cl_command_queue *queues, int queue_count) {
  int k;
  cl_event event_a, event_b; 
  
  if(queue_count < 2) {
    return;
  }

  clEnqueueMarkerWithWaitList(queues[0], 0, 0, &event_a);

  for(k = 1; k < queue_count; k++) {
    clEnqueueMarkerWithWaitList(queues[k], 1, &event_a, &event_b);
    event_a = event_b;
  }

  clEnqueueMarkerWithWaitList(queues[0], 1, &event_a, 0);
}

This works too, but after the first call the queues behave in a stange fashion. There aren’t working in parallel. You can see my problem in the application trace (h**p://picload.org/image/ordialw/ist.jpeg).
So, what I am doing wrong? Are there a better way to synchronize queues?

PS: Why can’t i post URLs…?
edit: ok, second code won’t work with with every n, but should work with n=2

Problem solved. I had to use clFlush repeatedly.