performance diff clEnqueueWriteBuffer vs clEnqueueMapBuffer

Hi.

Sorry for using shorthands in the topic title. The textbox seems to have a length limit.

I have a buffer object, whose data contents are modified across kernel invocations(this memory object is sent as an argument to the kernel for every invocation).

In this case, is it more efficient to transfer data to this memory object using clEnqueueueWriteBuffer() or instead map the memory object to the host address space using clEnqueueMapBuffer() and then to this mapped region? Will there be an overhead of mapping and unmapping the buffer object when clEnqueueMapBuffer() is called multiple times, when compared to clEnqueueWriteBuffer()?

The specific performance implementations will depend a lot on the device you are using (CPU vs. GPU) and the particular implementation. For example, if you are using a CPU device you may be able to avoid a lot of data copying by using map/unmap. If you are on a GPU you may avoid some copying, but you may have a much higher overhead of transfers over the PCIe bus, so small copies may take longer. I’m afraid you’ll have to experiment to see which goes fastest with your particular devices and data sizes. (Remember that small copies will always have a large overhead if the data needs to move from the device.)