Why set dstAccessMask to *_WRITE_BIT for write-after-read?

My current and most-likely flawed understanding of srcAccessMask/dstAccessMask is they are responsible for only flushing/invalidating caches used in srcStageMask/dstStageMask stages. srcAccess is responsible for specifying which cache to flush, and dstAccess specifies what to invalidate.
For read-after-write situations, this understanding makes perfect sense.

However, I am lost when trying to understand what srcAccessMask/dstAccessMask means for the write-after-read case.
For instance, I always see this code in examples:

// pseudo-code
vkCmdPipelineBarrier(cmdBuffer,
srcStageMask = VK_PIPELINE_STAGE_TRANSFER_BIT,
dstStageMask = VK_PIPELINE_STAGE_TRANSFER_BIT,
srcAccessMask = VK_ACCESS_TRANSFER_READ_BIT,
dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT);

First of all, why would you ever set srcAccessMask to VK_ACCESS_TRANSFER_READ_BIT? The cache should already be invalidated by a previous pipelinebarrier that sets dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT. And so srcAccessMask should be 0.

Second, what does it mean to make dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT? Since this is a write-after-read situation, I would think the execution dependency with srcStageMask/dstStageMask would be enough to make sure any reads finish before writes. And so I would think dstAccessMask should be 0.

Thanks.

Don’t try to map bits to specific pieces of hardware. How these bits get converted into specific cache flush/invalidate/etc operations is essentially irrelevant. It may be that for most (or all) hardware, this combination of bits means that no caches need work. But that’s for the implementation to decide; the job of the barrier is to tell the implementation what you’re doing.

srcAccessMask is for telling the implementation how you accessed the memory before the barrier. You read from it, so you set the bit to read. The same goes for setting dstAccessMask to write; that’s what you’re doing, so you tell the implementation what you’re doing.

That’s what I was doing initially, but there is so much conflicting information online.

If you read this: Reads in memory barrier srcAccessMask · Issue #131 · KhronosGroup/Vulkan-Docs · GitHub

They say…

Can any *_READ flag in srcAccess be just substituted for 0 (in all situations)?

>>Source access masks are intended to determine visibility of write accesses, and nothing else - so yes.

and

So we’ve now completely resolved that READ in srcAccessMask is completely a no-op. If we hadn’t been previously implying that you needed it, then we’d probably make it invalid. Instead we’re just going to leave it as a no-op, and hopefully add a warning to the validation layers that it’s a no-op. This should be clear in the spec in the next couple of weeks.

I want to understand synchronization, but the documentation and spread-out information are making it hard.

And then look at this code snippet krOoze just gave me,

// Dependency of the first Render Pass Instance
VkSubpassDependency rp1Dependency = {
    .srcSubpass = lastSubpass; // Last subpass color attachment is used in
    .dstSubpass = VK_SUBPASS_EXTERNAL;
    .srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_BIT; // matches storeOp on a color attachment
    .dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT; // to be synchronized later
    .srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT; // matches VK_ATTACHMENT_STORE_OP_STORE
    .dstAccessMask = 0;
};
 
// Dependency of the second Render Pass Instance
VkSubpassDependency rp2Dependency = {
    .srcSubpass = VK_SUBPASS_EXTERNAL;
    .dstSubpass =  firstSubpass; // First subpass color attachment is used in
    .srcStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT; // chains to rp1Dependency::dstStageMask stage
    .dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_BIT; // matches loadOp on a color attachment
    .srcAccessMask = 0;
    .dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT;  // matches VK_ATTACHMENT_LOAD_OP_LOAD
};

He sets dstAccessMask = 0 (instead of VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT), and srcAccessMask=0 (instead of VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT again).

I am definitely missing something.
./CONFUSED

It is partly the Specification’s fault. For a long time the Synchronization was described quite badly. Then there were series of updates. For better or worse the Specification is a living document. It receives updates frequently. And yea, there are still problematic areas, and horde of obscure corner-cases nobody had time to think through.

Anyway at that point it was not clear what access flags should be used. And even some in-spec examples used non-zero flags where zero would suffice. From that the Validation Layers oftentimes forced people to use non-zero flags. Materials were written using non-zero flags. Rest is inertia.

The situation is 0 access flag makes sense sometimes, and non-zero flags are allowed (and should not be harmful; I think, unless Validation Layers became stricter in the meantime) – some may consider it easier to read and less error prone.

BTW as for external materials… I am not sure if I am being unortodox here, but I would think their job is to get someone up to speed. Specifications job is to get someone do things correctly. So I would say the external materials only need to get you ready to read the spec. From that point only spec should be used, and if you are confused about something in this autoritative document, then probably others are too, and you should submit an Issue (or PR). I don’t think many people think the same way, as there are not that many contributors to the repo compared to how many there must be Vulkan users.

Anyway that is a long introduction to basically me saying “read the spec”. Synchronization is, I think, exactly the one thing you do not want to learn from third-party. Third-party may explain it deceptively easily (but in the long run more formal, but potentially harder to read description is better).

Now, to reasoning for the above code snippet (and it may be wrong, because I pulled it outa my ass, or because I misunderstood your situation):
I probably confused you with the Dependency Chaining. It should be equivalent to a single Dependency, which should have non-zero access flags (I provided this alternative in the original thread).

At the risk of being another third-party explaining things, the Specification explains Memory Dependencies in terms of Availability and Visibility. Specific state of the memory location (i.e. the modification by a write) must be made Available From the source. And it must be made Visible To the destination.

To make things harder, Availability and Visibility happen as another operation on the Queue (and need to be synchronized too, in a way). Only state of memory that is Available From can become Visible To.

So, the above is valid.

The first dependency makes memory Available From the source. The Availability Op Happens-Before VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT (i.e. dstStage). I use that stage only as a Chaining stage, and it even cannot access memory, so the Access Flags is 0.

The second dependency has Execution Dependency on VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT. That means the Availability Op has finished before the Dependency does anything else. That stage still cannot do anything with memory so Access Flags is still 0. Then the dependency takes any Available memory and makes it Visible To the VK_PIPELINE_STAGE_COLOR_ATTACHMENT_BIT.