OpenCL precompiled kernel faster?

Hello,

I’m trying to optimize my OpenCL program and to this end I thought of precompiling the OpenCL kernel so that I can use clCreateProgramWithBinary to load the kernel and run the program. Doing that however, I notice no change in execution time. I’m using OpenCL on an Nvidia gtx295 so I’m creating a .ptx file. Is that a naive expectation? Would the precompiled kernel run faster? Or am I missing the point completely?

thanks in advance.

I don’t think you should expect precompiled kernels to execute faster. You should expect them to load faster since they don’t need to be compiled each time your application starts up.