Issues in OpenCL spec 1.2

  1. Why cl_mem flags are ignored while filling buffer/images?
  2. Why data transfer is compulsory with CL_MAP_WRITE. Shouldn’t it be renamed to CL_MAP_READ_WRITE then? CL_MAP_WRITE_INVALIDATE_REGION can be CL_MAP_WRITE then.
  3. What is the meaning of value CL_KERNEL_ARG_ACCESS_NONE for CL_KERNEL_ARG_ ACCESS_QUALIFIER?
  4. What is the use of CL_KERNEL_ARG_NAME?
  5. Why event is returned from clEnqueueBarrierWithWaitList when it is a blocking call.

I will try to answer to the best of my ability.

  1. Why cl_mem flags are ignored while filling buffer/images?

Are you asking why clEnqueueFillBuffer() isn’t affected by the read/write flags passed when the buffer was created? The reason is because those flags only affect NDRanges and Tasks, not other APIs such as clEnqueueWriteBuffer() or clEnqueueFillBuffer().

  1. Why data transfer is compulsory with CL_MAP_WRITE. Shouldn’t it be renamed to CL_MAP_READ_WRITE then? CL_MAP_WRITE_INVALIDATE_REGION can be CL_MAP_WRITE then.

Backwards compatibility with CL 1.0 and CL 1.1.

  1. What is the meaning of value CL_KERNEL_ARG_ACCESS_NONE for CL_KERNEL_ARG_ ACCESS_QUALIFIER?

Isn’t the specification clear? “If argument is not an image type, CL_KERNEL_ARG_ACCESS_NONE is returned.”. Only images can be qualified as __read_only or __write_only. It’s illegal to apply that qualifier to any other data type.

  1. What is the use of CL_KERNEL_ARG_NAME?

It was one of the most demanded features by third-party developers. It’s particularly useful if you are writing middleware.

  1. Why event is returned from clEnqueueBarrierWithWaitList when it is a blocking call.

clEnqueueBarrierWithWaitList() is not a blocking call. You may be confused by this language: “This command blocks command execution, that is, any following commands enqueued after it do not execute until it completes.”. Perhaps it could be worded better, but what it’s saying is that it acts as a barrier (i.e. blocking) on the device, not on the host.

Thanks David for clearing up on them.

I am okay with all answers except the second one.
OpenCL 1.1 was previously having two flags CL_MAP_READ, CL_MAP_WRITE. And as far as AMD implementation is concerned, it was doing the stuff that is now assigned to CL_MAP_WRITE_INVALIDATE_REGION with CL_MAP_WRITE, which seems reasonable to me.

I would have gone with adding a new flag CL_MAP_READ_WRITE, where data transfer from device to host would have been mandatory. The name of the flag makes perfect sense in that case. But now as a OpenCL developer I have to change the sections of code where I used CL_MAP_WRITE because its meaning has now changed.

Also I has few more questions. I would be happy if someone is able to clear them too.

  1. What is the importance of CL_MEM_HOST_NO_ACCESS?
  2. Why it is force to have pattern size in EnqueueFillBuffer?

And as far as AMD implementation is concerned, it was doing the stuff that is now assigned to CL_MAP_WRITE_INVALIDATE_REGION with CL_MAP_WRITE, which seems reasonable to me.

If that’s the case then AMD’s implementation was not following the specification. The definition of CL_MAP_WRITE has not changed.

Well Okay,
I understand, it wasn’t the spec fault. But implementer had done nice job by optimizing the unmaps/maps.

It would be nice if some feedback can be given on remaining questions too :slight_smile:

But implementer had done nice job by optimizing the unmaps/maps.

Violating the specification is not a very good optimization :slight_smile:

  1. What is the importance of CL_MEM_HOST_NO_ACCESS?

I imagine that on desktop systems the driver can use this flag to determine whether to allocate the buffer in a region of memory that can be accessed directly by the host or not.

  1. Why it is force to have pattern size in EnqueueFillBuffer?

Can you rephrase the question? If there was no pattern size then how would it be different from clEnqueueWriteBuffer()?

That’s FALSE. The 1.2 specification added the language about “guaranteed to contain the latest bits”. The 1.1 specification didn’t say, so AMD’s implementation, in my mind, was quite acceptable.

Now you’re making their implementation wrong, and when they fix it, any code that used that optimization will get slower.

I really don’t understand why this changed.
CL_MAP_READ implied the buffer is guaranteed to contain the latest bits, and should be used for reading, with no expectation of an update on the unmap.
CL_MAP_WRITE didn’t imply latest bits, and should be used for whole-buffer writing.
CL_MAP_READ | CL_MAP_WRITE should be used for sparse updates.

It was simple, it worked, and now it’s broken.

A far better change would have been a way to defer the write decision to the unmap call, which would have allowed more flexible buffer handling. As it stands, I have no way to abandon a mapped buffer (mapped with CL_MAP_WRITE but I later decide I don’t need to) without freeing it and reallocating.