An important thing to realize about pointers in OpenCL is that they differ between devices. If you are running on the CPU device that happens to be the same underlying hardware as the host, then you may happen to find that the pointers are identical… but this is a very bad assumption to rely on, and it will break badly on any other devices (and isn’t guaranteed by the spec to ever work, even if your host is the exact same hardware as your device).
OpenCL devices may have independent address spaces from the host (and each other). There is no assurance that the physical bytes in a buffer when accessed from the host are the same as the physical bytes in the same buffer when accessed from any other device, or that the physical bytes will be the same ones holding the buffer over time. Even if they do happen to be the same physical bytes, they could be mapped to different addresses. When you pass a cl_mem object to a kernel, the system is responsible for converting that object handle into the device’s native pointers. This may include physically copying the data between memory spaces (for example, from host memory to a GPU’s VRAM). Since the bits that make up pointers look just like the bits that make up any other data, the OpenCL runtime has no idea what bits are pointers that need to be remapped… except when they are kernel arguments. Thus you cannot put pointers into your buffers and expect them to work from anywhere else (even a different work-group or kernel on the same device).
Even beyond having different addresses, you aren’t even assured that pointers are the same number of bits. If your host is running a 64-bit OS then its pointers are going to be 64-bit if your application is 64-bit. Most GPU’s, however, are not 64-bit devices and thus will use 32-bit pointers… or smaller. The different address spaces of a device (global, local, constant, private) can also vary in size and pointers could potentially be only 16-bits or even smaller for really small memory pools.
It is also important to understand the memory model that OpenCL provides – it is a “relaxed consistency memory model” and this means that there are very carefully defined rules about when memory needs to be made consistent between devices, kernels, work-groups, work-items … and these rules are defined to allow as low a degree of synchronization and coupling as possible, thus maximizing opportunity for concurrency. This means you need to be careful about having multiple pieces of code touching the same buffer at the same time… even if it seems to work on your particular machine when you write the code, any number of factors can cause your assumptions to break (changing hardware, changing OS, changing drivers, and even just changes to the way your own application works). This affects things like when writes to a buffer from one piece of code become visible to reads from the buffer in a different work-item, kernel, device, etc. Or what the result is when multiple pieces of code write to the same location “simultaneously” (and I put that in quotes because “simultaneous” is a very hard thing to define and harder to be assured of).
So this all boils down to: do not make assumptions about pointers and do not pass them between devices. The CL memory objects were put into the spec to abstract these details from us. Respect and understand that abstraction.