Limiting number of compute units?

Can I limit the number of compute units assigned to my kernel? For example, the device I choose might report 8 compute units, but I want only 4 to be dedicated to my job. I suppose if I have only four work_groups, that would do it, but many algorithms would not allow you to force the problem to fit in some given number of work_groups.

I don’t think there’s an official way to do this. The only way I can think of is to limit the number of workgroups as you said.

I think this is an interesting question with regard to the new NVidia Fermi architecture that allows you to execute multiple kernels concurrently. Will there be a way of influencing the mapping of compute units to kernels or is this entirely managed by the hardware scheduler?

The only way to limit this is by specifying that you only want a total of 4 work-groups. (e.g., total size = 4 * work-group size). This will guarantee that you only use 4. However, there is no way to have a large number of work-groups and specify that they only use a certain part of the device. As new devices with multi-kernel capabilities come out the standard will have to evolve to enable priorities or some sort of more sophisticated resource management.