I am trying to figure out memory interaction issues (allocation, duration and copy operations etc.) between host and devices, using the C++ API (whens, hows, guarantees, nogos etc.). So assume there is a context, device etc (all initialization stuff omitted):

cl::Context someContext;
cl:evice someDevice;
cl::CommandQueue someQueue;
cl::Kernel someKernel;

So # 1 I create a buffer object and initialize it with memory from the host application:
cl::Buffer buf(someContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, ...);

Next, # 2, I define this memory to be a kernel argument:
someKernel.setArg(0, buf);

And #3 I invoke the kernel:
someQueue.enqueueNDRangeKernel(someKernel, ...);

First question:
From which stage on (assume it has been waited for each command to fully complete) i) may memory be allocated on the device side, ii) is memory guaranteed to having been allocated on the device side, iii) is the data in buffer guaranteed to having been copied?
The only place where a command-queue / device comes in place is #3. So I assume at this stage? Or is this unspecified?

Second question, is cl::CommandQueue::enqueueUnmapMemObject releasing memory by guarantee? For example, will
someQueue.enqueueUnmapMemObject(buf, ...)
[and waiting for that to have finished] ensure that the device has freed the memory again?
If I don't do anything explicitly, when will the device release the memory automatically? Upon destructor invokation of the Buffer?

Third question, at which stage must I not write new values to buffer in order to ensure what the kernel sees? So for example, presume I do this:

cl::Buffer buf(someContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, ...);
someKernel.setArg(0, buf);
someQueue.enqueueWriteBuffer(&buf, ...);
someQueue.enqueueNDRangeKernel(someKernel, ...);

Are there any guarantees by the OpenCL standard what memory content the kernel will see?

Fourth question, presume that for some kernels there is a mix of program-wise constant data, i.e. the data never change during the program lifetime, and "dynamic" data which do change between kernel invokations. Other kernels (invoked between) have different layouts. Ideally I want to copy the constant data for the given kernels only once to the device, and then store it there permanently. What do I have to do to ensure that, if that's possible at all? By intuition I would say that ensuring the buffer remains valid over the whole program lifetime and never explicitly releasing the memory should make the device copy the data only once upon first kernel invokation and then reuse it upon further kernel invokations, that might be it. But I really don't know. If there is no general guarantee, what's the "best-chance" approach practically speaking?

thanks for your help!