I have many kernels in my OpenCL file and I am using the NVIDIA OpenCL implementation to compile the code, and it takes a full 40 seconds to do so. Having just 1 kernel in the file takes 0.42 seconds to compile.

I have isolated the slow compile to a single kernel, which now takes 37.xx seconds approximately. Is there a way to speed up the compiling of the OpenCL code itself ?

Any suggestions will be appreciated.