How to assure kernel execution ended?

I’m trying to measure the time my kernel takes, but a question came up: what is the best way to assure it execution has terminated? Creating an event and after the enqueue make event.wait() or instead make queue.finish() ?

P.S. as you might have noticed I’m working with C++ bindings.

In C i use profiling for timming my kernel, you have to past to the queue properties the CL_QUEUE_PROFILING_ENABLE flag and then take de event object from clEnqueueNDRangeKernel. After that put that event object to the clWaitForEvents function and then with clGetEventProfilingInfo with the flags CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END takes theirs values on two variables of type long long as and example. To complet the process, yo must to take the difference between this variables and divide the result 1e9 to get timming in second for example. A good practice is take various timmings of the NDRanger in a loop and then promediate, the technique is the same and you not have to run the firsts steps of your programs.

The most efficient way to do this is to use wait(). Finish() is a very heavy command as it will make sure everything in the command queue finishes, not just your kernel.

Is there a huge difference between the timing methods mentioned, i.e. using profiling and using a timer and wait()?
Using your own timer probably also measures the overhead of calling EnqueueNDRangeKernel() etc., but these operations aren’t very expensive, are they?

I think enqueueing itself takes only a few ms.