Hi,
I’m wondering how to use the memory buffer in the most efficient and correct way. Although I’ve been reading the specification and searching some old threads for information in the forum, I’m still confused.
So, here are some of my questions
First, it’s about the flags in clCreateBuffer
1.CL_MEM_USE_HOST_PTR
According to the spec, although the memory object is created in the host, the data in this memory will be cached in the device memory during kernel execution.
Q1. when does this cache happen? (when calling the clEnqueueNDRangeKernel? )
2.CL_MEM_ALLOC_HOST_PTR
Q2. what’s the definition of host accessible memory as mentioned in the spec? Can I simply understand it as the host memory?
3.CL_MEM_COPY_HOST_PTR
Q3. when using this flag alone, where will the memory object be allocated in? I suppose it will be in the device right?
Q4. when using it with CL_MEM_ALLOC_HOST_PTR, then the memory object will be allocated in the host?
Second, about clEnqueueWriteBuffer
if what i said is right so far, then the usage mentioned in Q1, Q2 and Q4 shall be followed by clEnqueueWriteBuffer to ensure the data is passed to the device while the usage mentioned in Q3 won’t need that.
Q5. Is this correct?
Third, about clSetKernelArg
the spec says that this command will make a copy of the argument you try to pass to the kernel.
Q6. Does it mean that your host memory will be copied and also transferred to the device as argument, which makes clEnqueueWriteBuffer unnecessary even if the memory object is allocated in the host memory. Or, does it mean that the copy is made only to ensure that the argument can be reused immediately?
In fact, what I’ve been doing successfully is like this:
clCreateBuffer(CL_MEM_USE_HOST_PTR);
clSetKernelArg();
BTW, I’m using Intel’s integrated GPU as device.
I really need to make it clear. Correct me if I’m wrong, please.
Thanks for your help in advance.