The buffer is allocated on both devices and OpenCL provides a guranteee that the memory between the two devices is consistent. For example, given a context that has two assocated devices, with command queues cmd1 and cmd2, respectively, then the following is valid:
At this point the expression (out_data == in_data) will evaluate to true. The key point to note is that we preformed a write on cmd1 and a read on cmd2. More detail on why this works can be found in Appendix A of the 1.0 specification.
PCIe point-to-point transfers are a performance optimization that would certainly be appealing, but it’s up to the individual vendors to implement this, so it will depend on whose OpenCL you are using.