Timing OpenCL kernel code

Is there any way to time a part of the kernel code? Whatever I have found online, either uses CPU timer or GPU timer (clgeteventinfo). But they are to be used inside the host code if I understand correctly. So if a part of the kernel code is to be measured, is there any way to do it?

Thanks for your help!

The only tools or api’s i’ve seen time a whole kernel. I split the kernel manually to time separate parts.

One reason these cpu’s can run so fast without burning a hole in the floor is they have limited support for stuff like this, so I can’t imagine it will ever be possible on current gen gpus.