For CPU and AMD Fusion devices which share the same (host) memory, there is no point in relying on the clCreateBuffer to copy data to the device and back. In such cases it would make sense that clSetKernelArg should be allowed to accept (properly alligned) pointer to host memory directly. clSetKernelArg also has a very small overhead in compare to the "buffer" functions, which were designed especially with the "split" memory scenarios in mind.