I have a question regarding clSetEventCallBack() and the way callbacks work.
I run the following code.
================================================== ===========================

int global_var = 0;

void call_back(void * args)
global_var = 1;

begin_comp_copy_time = rtclock();
clEnqueueNDRangeKernel(...,&compute_event); // enqueue some computation
clEnqueueReadBuffer(data_queue, CL_FALSE,....,1, &compute_event, &read_event); // perform some read (non-blocking)
clSetEventCallBack(read_event,..., call_back, args); // set a call back once the read finishes.
clFinish(data_queue); // at this point all commands on all queues have completed.
total_comp_copy_time = rtclock() - begin_comp_copy_time;

begin_callback_time = rtclock();
while(!global_var); // spin until the global_var is set to 1
total_callback_time = rtclock() - begin_time;

================================================== ============================

When I print the timings, I see that total_comp_copy_time is just 3ms, but the total_callback_time is a whoophing 17ms.

Now, If I remove the "while(!global_var)" check, then the time is just a little over 3ms, but there is no synchronization if this code is run inside a loop.

I am not able to reason out precisely the reason for this.(I am suspecting that this exceedingly high time is due to process scheduling issue)
Can anyone precisely reason out the cause for this high timing values ?
On what processor does the callback execute? is it the same processor as the one which registered the callback?
If the processor is spinning in while loop, is the call_back waiting for the process to get scheduled out? is this the reason for 17ms?