Render pass dependencies issue

Hi,

I am trying to implement a simple render to texture but I can’t make render pass depencies work correctly.

I have 2 render passes in the same command buffer :

  • First render pass clears color/depth/stencil and draws the scene offscreen
  • Second render pass is a simple post effect that reads previous color attachment and writes to the swapchain images with loadOp=VK_ATTACHMENT_LOAD_OP_DONT_CARE and no depth stencil buffer

With this configuration I have small artifacts on one corner of my fullscreen quad, I tried every possible combinations of render pass dependencies with VK_SUBPASS_EXTERNAL but nothing worked.
The validation layer does not say anything.

https://drive.google.com/open?id=0B_BnxvUKqqRPOURwRGVldTRDXzg

However any of the following modifications fixes the issue :

  • Add an explicit image memory barrier between the two render passes
  • Use loadOp=VK_ATTACHMENT_LOAD_OP_CLEAR on the second render pass

I was able to reproduce the issue on Sascha Willems’s radial blur sample with a few modifications :

  • Same command buffer for the 2 render passes
  • I removed the blending and used loadOp=VK_ATTACHMENT_LOAD_OP_DONT_CARE on the blur render target
  • I removed the depth stencil buffer in the blur render pass

https://drive.google.com/open?id=0B_BnxvUKqqRPUUpzckZfY0Jyem8

Could it be a driver issue or am I missing something? I tested on a GTX 670 and a GTX 1060.

Thanks,

Eric

I’m a novice, so take this with some salt:

Pretty sure you’re supposed to want a memory barrier of some kind, because it’s the only thing stopping the 2nd pass from reading pixels that haven’t yet been written.

I’m guessing that the clear op also fixed it because during the time it took to clear the image your initial texture writes were completed.

Are the validation layers meant to be intelligent enough to detect race conditions?

Well I thought that was the point of specifying render pass dependencies, to let the driver know what the dependencies are so it can implicitely add the required barriers. I have read implicit barriers were preferred as it let the driver perform barriers more optimally. It also seems far less error prone to me as it makes reordering render passes simpler without having to bother with barriers.

Yeah, that sounds right - I just reread the spec and it seems to say that. I’m still a newb and I was also having a bad reading day, sorry :/.

Do the artifacts move around with a static scene?
Are you able to use a Vulkan debugger?
Perhaps post details of the safest VK_SUBPASS_EXTERNAL dependency setup you tried? I don’t think I’ll be much help but maybe someone else here will.

In Sascha’s radial blur code (and base code), the only subpass dependencies I see are tied to VK_ACCESS_MEMORY_READ_BIT (either as the only source or the only destination access flag).
I’m not sure if that’s enough (although I admit I find all this really confusing (mostly because I find the spec a bit terse and ambiguous at present)). The spec says:

VK_ACCESS_MEMORY_READ_BIT: Read access via non-specific entities. These entities include the Vulkan device and host, but may also include entities external to the Vulkan device or otherwise not part of the core Vulkan pipeline. When included in a destination access mask, makes all available writes visible to all future read accesses on entities known to the Vulkan device.
The VK_ACCESS_MEMORY_WRITE_BIT entry has sentences for both “included in a [source/destination] mask”, yet the VK_ACCESS_MEMORY_READ_BIT entry only explains destination. Is that deliberate or accidental?

Anyway, is it possible some writes are sneaking through outside the scope of those dependencies? (Does blending count as a read and write?)

Does adding “| VK_ACCESS_MEMORY_WRITE_BIT” to each VK_ACCESS_MEMORY_READ_BIT make any difference?

Well I thought that was the point of specifying render pass dependencies,

There is no such thing as a render pass dependency. There are sub-pass dependencies between sub-passes inside a render pass. Draw commands between two different RENDER PASSES do require a barrier.

There are also dependencies between a renderpass’s subpass and things outside that renderpass - hence VK_SUBPASS_EXTERNAL.

The documentation on VK_SUBPASS_EXTERNAL seems to indicate that it can be used to create implicit barriers between draw commands in different renderpasses.

If srcSubpass is equal to VK_SUBPASS_EXTERNAL, the first synchronization scope includes commands submitted to the queue before the render pass instance began.
And similar for dstSubpass.

I must admit I’m having trouble understanding why manually specifying subpass dependencies is necessary at all. You pretty much never want to introduce race conditions which means you pretty much always want to wait on earlier writes to an image before using it, or reads before writing to it. The driver/GPU tracks all the necessary image-use info anyway (it has to so that external subpass dependencies know what to wait for) so there’s no additional cost imposed there.

Maybe if you’re doing something like reading from one part of an image while writing to a separate part with no overlap then you don’t need barriers. But this is a pretty rare case and it’d be easier to explicitly specify these cases rather than every other case.
Maybe if you have two attachments and have your rendering setup such that that a completed (no pending reads/writes) A guarantees a completed B then you don’t need to wait on B - but the driver can’t deduce this by itself. However, a wait on a completed attachment should be practically instantaneous.

IMHO it’d be easier to just assume default subpass dependencies for everything and allow the developer to explicitly remove them, perhaps via a list of "VkSubpassNonDependency"s.

The documentation on VK_SUBPASS_EXTERNAL seems to indicate that it can be used to create implicit barriers between draw commands in different renderpasses.

The fact that you had to create it yourself means that it’s an explicit barrier, not an implicit one.

You pretty much never want to introduce race conditions which means you pretty much always want to wait on earlier writes to an image before using it, or reads before writing to it.

OK, but how does the API know that you have issued a write command to an image when it sees that you’re trying to read from it? The write command could be in another command buffer. Therefore the only system that could possibly know when such an event has taken place is the queue, where you execute all of the commands in a well-defined order.

And the GPU queue doesn’t have the ability to detect such things. Which means that executing such a queue would require CPU intervention to create those barriers. And thus, you lose a big part of the advantages of command-buffer-style APIs, since the CPU is basically having to do a lot of the work for something that could otherwise be done entirely by the GPU.

Explicit synchronization is not optional if you’re want an API where you can have multiple threads can create commands, where at command creation time, the API is completely blind to what has happened before.

Plus, since you know your workload better than the GPU, you can do specific things like partially clear a region of memory, instead of assuming that the entire region of memory may have been written to. For example, you can write to part of an image. But the API cannot see that’s what you’re doing; only you know that. So if the API had to insert barriers itself, it would have no choice but to insert a full image barrier. Whereas you have the ability to insert a memory barrier that only covers part of the image rather than the whole thing.

Not to mention that you could read from a different part of the image which hasn’t been written to. The API can’t detect that either. So in this case, you don’t need a barrier at all. But the system cannot possibly detect that.

Subpasses don’t have to be dependent on each other. And if there is no dependency between two subpasses, then the GPU should be free to execute them in the most optimal order, or even interleave their execution, should sufficient resources exist.

The driver/GPU tracks all the necessary image-use info anyway (it has to so that external subpass dependencies know what to wait for)

No, you tell the API what to wait for. Subpass dependencies are no different from any other manual barrier in that regard. If an image has changed and you want to make that change visible to the subpass, then you have to add an explicit memory barrier for that.

The only images where that isn’t the case is dependencies for render targets. And those are treated specially by subpasses anyway. Each subpass says what it is writing and to where, so the system has the information to build those memory dependencies explicitly. For more general dependencies involving arbitrary memory, that is not the case.

Also, render targets during a renderpass are not considered to exist in normal memory anyway.

IMHO it’d be easier to just assume default subpass dependencies for everything and allow the developer to explicitly remove them, perhaps via a list of "VkSubpassNonDependency"s.

Vulkan doesn’t exist to make things “easier”. It exists to give you explicit and direct control over as many aspects of GPU behavior as is practical. And control over synchronization is a big part of that.