clBuffers, synchronised memory and flags

Hi there,

I’m testing my application with nVidias OpenCL Visual Profiler and I’m noticing that memory buffer I’m allocating for the device using this code:

//_bins
_cmDevBins = cl::Buffer(_context,CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(cl_int) * ndX64ndY64ndZ*64,outBI,&err);

Is being transferred to and from the device twice, that is it’s getting written to the device memory, and read from the device memory for every kernel iteration.

Now I noticed in the apple guide(http://developer.apple.com/mac/library/ … penCL.html)
it mentions that the two devices will be kept synchronised when using CL_MEM_USE_HOST_PTR , I assume this is also the case for CL_MEM_COPY_HOST_PTR, and I assume this is what I’m seeing here, that the buffer is being automatically syncd, hence why it is being copied out and copied back before and after a kernel executes.

However for my application I don’t want the devices to be synchronised, I only want the memory to be written to the GPU, I never want it to be read back, what flags do I need to set for this when creating the buffer?

Thanks in advance.

Sorry I probably haven’t made myself very clear so a simplified version:

Currently I’ve only learnt OpenCL using Buffer objects to transfer data between the host and the GPU, however as a buffer maps host memory and device memory this means that the memory is copied to and from the GPU before and after a kernel runs.

I’m wanting to simply put some data into the global address space on the GPU device without using a buffer object as I want it to be a one time transfer, however I’m unsure how to do this

Not sure if this answers your question. But if you simply allocate your buffer without CL_MEM_COPY_HOST_PTR and use clEnqueueWriteBuffer to move the data to the GPU it should never be written back out of the GPU.

I think this is a bug in the Nvidia driver. They should only do the copy once. COPY_HOST_PTR should just copy the pointer, not reference it or keep it in sync.

So setting CL_MEM_COPY_HOST_PTR results in it being copied to and from the gpu on each run, I actually don’t need to use clEnqueueWriteBuffer to copy the data over as it seems to copy it over for me,

every single buffer object gets copied back to host regardless of any memory settings,

This is a screenshot of the profiler output, you can see that the first three memcpy are from Host to Device/Async (note Async refers to texture memory as these are image objects), after linerMetricAddtion runs, there are 4 memory copies from D/A to Host (the 4th is a parameter I don’t copy over to the device at any point but is a global memory buffer)

Would be incredibly annoying if this is a driver bug considering nVidia have not released an opencl driver update yet. may have to revert to cuda :frowning: