Is there any way to control how much register space a kernel uses? My goal is to increase the CL_KERNEL_WORK_GROUP_SIZE value reported by my kernel to increase parallelism to hide memory latencies.
Thanks,
Brian
Is there any way to control how much register space a kernel uses? My goal is to increase the CL_KERNEL_WORK_GROUP_SIZE value reported by my kernel to increase parallelism to hide memory latencies.
Thanks,
Brian
Besides reducing the complexity of your kernel (and any private variables/arrays) you would have to be able to tell the compiler to make a different optimization tradeoff. There are no standard compiler flags for doing this, but the Nvidia driver does have a few custom ones which might include such options.
Are those custom NVidia flags documented anywhere? I know the PTX assembler has maxrregcount. But how to pass that through the OpenCL implementation?
Thanks,
Brian
I’d take a look at the Nvidia OpenCL guide, but that’s only a guess.