Controling register usage through clBuildProgram?

Is there any way to control how much register space a kernel uses? My goal is to increase the CL_KERNEL_WORK_GROUP_SIZE value reported by my kernel to increase parallelism to hide memory latencies.

Thanks,
Brian

Besides reducing the complexity of your kernel (and any private variables/arrays) you would have to be able to tell the compiler to make a different optimization tradeoff. There are no standard compiler flags for doing this, but the Nvidia driver does have a few custom ones which might include such options.

Are those custom NVidia flags documented anywhere? I know the PTX assembler has maxrregcount. But how to pass that through the OpenCL implementation?

Thanks,
Brian

I’d take a look at the Nvidia OpenCL guide, but that’s only a guess.