This came up recently and seemed somewhat silly as we have 32bit and 64bit atomics. It was frustrating to have to send over a much larger buffer when all I needed was 16bit unsigned shorts.
I doubt that support for 16-bit atomics is widespread on GPU hardware.
Why? 16bit computations are commonplace these days.
You can always use masking combined with 32bit atomic ops to work with a 16bit buffer.
Hmmm…unless I am just crazy I don’t see how that would work with atomic_inc?
All atomic operations can be implemented with atom_cmpxchg(), albeit less efficiently. In particular, using masking like ljbade suggests may be a good option.