Researches with OpenCL and simple code with binaries

Hi,

Someone know if in researchs of High Performance Computing, more specifically in OpenCL, whether time results presented are only of execution of application, excluding compilation time or is whole time, execution + compilation?

Someone have a simple example to use compile with binnaries instead compile all files every time?

Very Thanks,

Luiz Drumond.

I’ve not done much outside of individual multi-card workstations (not HPC in the traditional sense), but you have to read the results very carefully.

Most places will present the time results as the computational execution time only. It looks better that way.

In my personal experience, I had a program that used two .cl files with ~50 functions across 6000 lines of code. My read + compile time with an NVIDIA GeForce 460 GTX (and an Intel i7 920 CPU) was approximately 20-30 seconds.

NVIDIA, though, caches off the compiled file in a temp directory, so unless you significantly change the .cl file, it doesn’t re-compile. The read + “compile” in that sense was on the order of 2-3 seconds. Thus, if you can go with pre-compiled binaries, do so - but understand that for commercial codes it might not be feasible (you’d have to ship with a pre-compiled binary for all of the different architectures; a better solution might be to compile the binary when you first run the program, and then save that - OpenCL does make that possible).

Hi,

Thanks for your answer.

Can you indicate some articles that show only the time of execution? It is good for me know how researchers are presenting your results.

About the compile time and execution time, when you used compile with binnaries the whole time fall to 2 - 3 seconds?

Do you have some code exemple how can i do this in OpenCL?

Very Thanks,

Luiz Drumond.