Timing with clGetEventProfilingInfo

Hi,

I’m having trouble getting the correct timings from the OpenCL profiling functions. I’m using the CL_DEVICE_PROFILING_TIMER_RESOLUTION property combined with the clGetEventProfilingInfo function to try and get timings in nanoseconds, but following the information presented in the OpenCL spec seems to give incorrect results, unless I’m doing something wrong.

The spec states that:

The CL_DEVICE_PROFILING_TIMER_RESOLUTION specifies the resolution of the timer i.e. the number of nanoseconds elapsed before the timer is incremented.
i.e. One tick on the timer is equal to CL_DEVICE_PROFILING_TIMER_RESOLUTION nanoseconds.

With that in mind, I’m using the following (cut down) code:

// Get timer resolution
cl_ulong resolution;
clGetDeviceInfo(device, CL_DEVICE_PROFILING_TIMER_RESOLUTION, sizeof(cl_ulong), &resolution, NULL);

// Get start & end timer values
cl_ulong start, end;
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START sizeof(cl_ulong), &start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);

// Convert to nanos, then to seconds
long nanos = (end - start) * resolution;
printf("Time taken = %.4lf seconds
", timeTaken * 1e-9);

The timings reported by this code are off by a factor of 1,000 on NVIDIA devices, and off by a factor of 1,000,000 on AMD CPUs. That would suggest that first of all NVIDIA and AMD have interpreted the RESOLUTION value differently, but also that either my code above is wrong, or both vendors are wrong.

Incidentally, ignoring the resolution and assuming the timer is in nanoseconds gives the right results.

Can anyone shed some light on what I’m doing wrong, if anything?

Thanks.

Incidentally, ignoring the resolution and assuming the timer is in nanoseconds gives the right results.

Isn’t that what the specification requires? I quote:

[CL_PROFILING_COMMAND_START is] A 64-bit value that describes the current device time counter in nanoseconds when the command identified by event starts execution on the device.

What CL_DEVICE_PROFILING_TIMER_RESOLUTION determines is the granularity or accuracy of the values you get from CL_PROFILING_COMMAND_START/CL_PROFILING_COMMAND_END, not the units. The units are always nanoseconds.

Ah, that’s what I was missing. I guess I read over that a little too fast, making the statement about CL_DEVICE_PROFILING_TIMER_RESOLUTION a little confusing.

That makes everything much clearer, thank you very much.