How many CUDA registers uses an OpenCL kernel

Hello, I need to know how many registers my OpenCL kernel uses. I am down to executing the kernel in blocks with dimensions 2x4, which is too small to have a practical value. I have written CUDA equvalent of my kernel and it is able to run in blocksize of 8x32.

Use clGetKernelWorkGroupInfo(kernel, CL_KERNEL_WORK_GROUP_SIZE, …) to determine the max. work-group size that can be used for kernel.

I have exported the kernel’s intermediate PTX code into a text file, and then used the ptxas.exe to compile it in verbose mode. This yielded what I needed to know, in term of registers usage. Well, I need to know how to limit the registers, i.e. something similar to “–maxrregcount” as the ptxas option. Any ideas someone?

you can try to use the compilation parameters you can set with opencl.

but in experienced that the parameters don’t work with the first 2 releases (maybe it works with the conformant release).
i prefer to wait for the optimized version of the nvidia opencl release.

I have no idea where to look for the vendor-specific compiler options. I have tried --help, but the program throws me out without the chance of debugging. I have tried “–maxrregcount” in addition to the clBuildProgram(…) build options, but the program throws me out again.