Profiling, performance counters on Nvidia OpenCL

Is there anything similar to GPUPerfAPI on Nvidia OpenCL implementation? What libraries and tools do you use on nvidia hardware to understand performance of OpenCL kernels?