I know a research group that built a abstracted layer upon OpenCL to treat multiple identical GPU devices as a single device (assuming they all also have the same PCIe bandwidth). However, I can‘t imagine how they could make it work if any atomic functions are required to be complient. Are some atomics required? If they were all optional then I can think of how to do it without too much pain and it could be very useful.