get_compute_unit_id and get_processing_element_id functions

Currently there is only the software abstracted get_num_groups and get_local_id, but how about the querying about the hardware those are mapped to, something like get_compute_unit_id or get_processing_element_id (and size functions too)? For example, if one were to use atomic operations (I know, they’re slow) to perform some kind of reduction, then knowing the compute unit or processing element a work group and work item were mapped to could permit doing local atomic operations rather than global atomic operations.

I forgot that work-items from different work-groups cannot access the same local memory, so local atomic operations as described above is not correct. They would still be global atomic operations, but possibly faster than general global atomic operations if each atomic operation is localized to their respective compute units?