OpenCL memory model

OpenCL describes a “relaxed” memory model.
Where can I find more information on this?

Specifically, what is the expected architecture between the host and the device?
Is it anticipated to be a pipe line over a bus or is it expecting actual shared memory
between the host and the device?

Is there any provision to take advantage of true shared memory between the host and the device?
Do the clEnqueueMapBuffer() and clEnqueueUnmapBuffer() functions work with actual shared memory between the host and device.
Will this eliminate the need for block copies using the read and write functions?

please refer to slide 19 in the OpenCL overview:
http://www.khronos.org/developers/libra … erview.pdf
I am curious to know if the red block labeled “Global Memory” is on the host or on the device,
and what the bi directional arrow between the “Compute Device” block and the “Compute Device Memory” block represents.

Thank you

I would suggest looking at the OpenCL spec. It covers a fair amount of detail as to what the memory model consists of.

With regards to performance optimizations for mapping vs. reading, those are up to the vendors to implement for their platforms as needed, so the spec doesn’t really say anything about them.

re: dbs2’s comment that the spec “covers a fair amount of detail”

Perhaps it covers a fair amount of detail, but it is not sufficient.

In particular, the operations relating to USE_HOST_PTR and COPY_HOST_PTR
do not specify the obligation of the implementation relative to updates to
the host buffer on the host side of a host-gpu interface, nor does it
specify adequately or unambiguously the obligation of the implementation
relative to updates on the gpu side to the memory object.

This should be evident by the number of queries in these forums relating
to USE_HOST_PTR and the like.

Specifying a memory consistency model is hard. I know, because I’ve been involved in several such efforts. The spec needs to avoid assumptions around the interpretation of words like “MAP” or even “READ” and “WRITE.” For instance:

  1. In an EnqueueMapBuffer operation, if the mapped address region pointed to by the return value is updated by the host [before the call, during the map operation, after the map operation completes] what is the obligation of the implementation to reflect that update to the memory in a GPU’s global memory?

  2. Similar to case 1, if the GPU makes an update to a mapped region, what is the obligation of the implementation to update the host memory if the update happens [before, during, after] the call to EnqueueMapBuffer?

  3. If the CL_MEM_COPY_HOST_PTR is set for a CreateBuffer call, when does the copy operation actually take place? What are the obligations of the implementation relative to Host or Device updates to the memory object? (Again this may need to be specified in terms of before, after, during various other operations.)

The specification needs work.

How are specification issues actually resolved? Who is the source of definitive
interpretations? Why isn’t this documented in the specification itself?

I believe the answers to your questions are:
1: before: undefined; during: undefined; after: update on unmap
2: before: at next map; during: undefined; after: undefined
3: It’s a copy. It should take place when the call is made.

(I’m not the definitive one to answer these, but those are my interpretations of the spec.)

Here’s what the spec says about mapping:

5.2.8.1 Behavior of OpenCL commands that access mapped regions of a memory object
The contents of the regions of a memory object mapped for writing (i.e. CL_MAP_WRITE is set in map_flags argument to clEnqueueMapBuffer or clEnqueueMapImage) are considered to be undefined until this region is unmapped. Reads and writes by a kernel executing on a device to a memory region(s) mapped for writing are undefined.
Multiple command-queues can map a region or overlapping regions of a memory object for reading (i.e. map_flags = CL_MAP_READ). The contents of the regions of a memory object mapped for reading can also be read by kernels executing on a device(s). The behavior of writes by a kernel executing on a device to a mapped region of a memory object is undefined.
Mapping (and unmapping) overlapped regions of a buffer or image memory object for writing is undefined.
The behavior of OpenCL function calls that enqueue commands that write or copy to regions of a memory object that are mapped is undefined.