Mapping Data: remap to change?

If I have a buffer that I then map to an array followed by an unmap which I then use in a kernel…if I want to change the values in the buffer do I have to remap it and copy them all back in? Do I use clEnqueueWriteBuffer?

ie. Is mapping intended to be a one shot type thing? Since you have to unmap it before you can actually use it; it seems kinda pointless?

Mapping is all still kinda mumbo jumbo to me, but I have gotten it to work with just mapping it once and then unmapping it every time I want it to be valid for the gpu. That’s for transfer to the gpu i guess if i am remembering right. For the reads I just map it when the cpu needs it to be valid; somehow i guess it may end up being more efficient than reading the whole thing as a chunk later. I find just a tiny bit of speedup, and most of that is in the tuning curve where other things are poor. It ends up shaving 1ms off my cycle time over my best non-mapped tuning, so far, and that is then chunked to physical device frame rate so I am gaining nothing.

What I really need to do is learn more about opencl/opengl interop to save pci transfer time, but that’s not for this post…

According to the spec, the (un)mapping is required to properly synchronise with the device. When doing i/o to devices the cpu caches have to be invalidated/flushed and stuff like that, otherwise the data could be stale. It would surely be faster to not do it, but fast results aren’t very useful if they’re invalid.

You either use map/unmap or read/write buffer, and in many cases there wont be any real difference as it will just copy to/from the device so that both the the host gets maximum speed cached access and the device gets maximum speed access when they need it.

Whether a map/unmap requires a copy in a given direction really depends on the implementation and hardware and presumably the buffer options (e.g. write/read/only). For a zero-copy device the map operation may be cheap and only the data read/written is transferred from/to the device. Or if the memory is unified then the map/unmap are basically noops/cache flushes (e.g. amd fusion?). OTOH it might force the host or device to go through a slower memory interface and might be slower overall.

The AMD accelerated parallel processing opencl programming guide, section 4.5.2 has a bit about how buffers work on their cards. The nvidia opencl programming guide v2.3 (a rather old document as i’m no longer using their gear) section 3.3 also has a fairly brief bit about nvidia’s cards.

Mostly you just can’t control this, if it has to copy to/from the device it will anyway, and all you can do is hope the driver takes the best option it can based on the hints you provide - even if you ask for it, it may not be pinned/etc as whether such a feature exists depends on the OS, the hardware, and even if it does exist on the platform it may be in limited supply.

Unless you’re hitting specific performance bottlenecks and are concerned with one particular piece of hardware on an in-house or turn-key solution, it’s probably not worth worrying about.

You would either use map and unmap, or a read/write command, each time the data must be modified on the host. The latter will always create a copy of your data using the specified pointer. While map might return a pointer to a user space copy of the data that already exists, which might avoid the extra copy. The unmap command is necessary to tell the runtime that it is safe for it to resume using the user space copy.

If a modified version of the data is resident on a discrete device, then the map command will need to copy the data back to host memory. In this case the map operation might still be more efficient if there is greater overhead associated with setting up the read or write command.

The performance of either approach depends on the device load, whether any of the operands are modified on other devices, the size of the read/write, etc.

That’s not the right way to do it. The right way is: map, read/write to the mapped pointer, then unmap. Reusing an old pointer you got in a previous map call is not a good idea at all.

That’s not the right way to do it. The right way is: map, read/write to the mapped pointer, then unmap. Reusing an old pointer you got in a previous map call is not a good idea at all.[/quote]
Thanks!

Yeah; I sort of knew I was being bad (especially when I read that there’s a counter that’s incremented/decremented!). But it did work, so I thought I’d deal with it later. Also didn’t want to add unneeded calls for time savings, so… :oops: :slight_smile:

This was using CL_MEM_USE_HOST_PTR, by the way. Does that matter much if at all?