Strange Artifact on NVIDIA hardware

Hello guys, today i recognized a strange artifact on my NVIDIA laptop.

[ATTACH=CONFIG]128[/ATTACH]
[ATTACH=CONFIG]129[/ATTACH]

This Artifact becomes more and more noticeable if i render the scene in a lower resolution.
The second image is rendering the scene in just one pixel and render it then on a fullscreen quad.
On my AMD PC everything works fine. Do you have any idea what could cause this?

Thanks in Advance
SH

Hard to tell by the screenshots, but I’ve seen similar while porting vkQuake to the Shield TV and in the end this turned out to be a synchronization issue. Do you have proper barriers and synchronization semaphores in place?

That’s a good question… i just can say i hope it xD
I am using only one semaphore which syncs the acquiring of the swapchain-image with the following usage. Everything else is synchronized with Subpass-Dependencies (via VK_SUBPASS_EXTERNAL).

Are you using proper fencing? How do the subpass dependencies look?

My Subpass-Dependencies for every RenderPass:

        VkSubpassDependency subpassDependencys[2];
        subpassDependencys[0].srcSubpass        = VK_SUBPASS_EXTERNAL;
        subpassDependencys[0].dstSubpass        = 0;
        subpassDependencys[0].srcStageMask      = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
        subpassDependencys[0].dstStageMask      = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
        subpassDependencys[0].srcAccessMask     = VK_ACCESS_MEMORY_READ_BIT;
        subpassDependencys[0].dstAccessMask     = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
        subpassDependencys[0].dependencyFlags   = VK_DEPENDENCY_BY_REGION_BIT;

        subpassDependencys[1].srcSubpass        = 0;
        subpassDependencys[1].dstSubpass        = VK_SUBPASS_EXTERNAL;
        subpassDependencys[1].srcStageMask      = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
        subpassDependencys[1].dstStageMask      = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
        subpassDependencys[1].srcAccessMask     = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
        subpassDependencys[1].dstAccessMask     = VK_ACCESS_MEMORY_READ_BIT;
        subpassDependencys[1].dependencyFlags   = VK_DEPENDENCY_BY_REGION_BIT;

My draw-method (called every frame and between updates):

    // Submit the recorded command buffer alternating and submit the rendered result to the surface
    void RenderingEngine::draw()
    {
        // Calculate new frame-data-index
        frameDataIndex = (frameDataIndex + 1) % frameResources.size();
        FrameData& frameData = frameResources[frameDataIndex];

        // Wait on the frame-data fence if necessary. This guarantees that everything needed this frame can safely be reused
        vkWaitForFences(device0, 1, &frameData.fence, VK_FALSE, UINT64_MAX);
        vkResetFences(device0, 1, &frameData.fence);

        // Record commands into command-buffers
        recordCommandBuffers();

        // Gather all Command-Buffers and submit them all in once
        std::vector<CommandBuffer*> commandBuffers;

        {
            // Add Shadow-Map Rendering Command Buffer
            if (settings.renderShadows)
                commandBuffers.push_back(subRenderer[SHADOW]->getCMD(frameDataIndex));

            // Add the Primary-CMD containing all secondary-cmds which renders the scene
            commandBuffers.push_back(&frameResources[frameDataIndex].primaryCmd);

            // Add Post-Processing Command Buffer
            commandBuffers.push_back(subRenderer[POSTPROCESS]->getCMD(frameDataIndex));

            // Add GUI Command Buffer
            if (settings.renderGUI)
                commandBuffers.push_back(subRenderer[GUI]->getCMD(frameDataIndex));
        }

        // Submit all Command Buffer in the List at once
        CommandBuffer::submit(graphicQueue, commandBuffers);

        // Submit rendered result to the presentation-engine using the color-image
        // from the last post-processed framebuffer (on which the gui was rendered aswell)
        VulkanBase::submitFrame(subRenderer[POSTPROCESS]->getOutputFramebuffer()->getColorImage());
    }

There should probably also be semaphore between the usage and the presentation.

I hope you are using validation layers.

BTW If you mean PIPELINE_STAGE_ALL use PIPELINE_STAGE_ALL, not PIPELINE_STAGE_BOTTOM. It is nicer to read and less easier to confuse (TOP and BOTTOM must be swiched for src and dst). Besides it is not very obvious to me from the spec that all previous stages will be included in the memory dependency automatically if there happens to be one.

[QUOTE=krOoze;41188]There should probably also be semaphore between the usage and the presentation.

I hope you are using validation layers.

BTW If you mean PIPELINE_STAGE_ALL use PIPELINE_STAGE_ALL, not PIPELINE_STAGE_BOTTOM. It is nicer to read and less easier to confuse (TOP and BOTTOM must be swiched for src and dst). Besides it is not very obvious to me from the spec that all previous stages will be included in the memory dependency automatically if there happens to be one.[/QUOTE]

Of course i am using validation layers :wink: Would be horrific if not.
There is a semaphore between the usage and the presentation of the swapchain-image:

// Copy the given renderedImage into the appropriate 
    // swapchain-image and present it finally on screen
    void VulkanBase::submitFrame(const VkImage& renderedImage)
    {
        Swapchain* swapchain = window->getSwapchain();
        FrameData& frameData = frameResources[frameDataIndex];

        // Next image in the swapchain used for presenting
        uint32_t nextImage;

        // Aquire next image. Present-Complete Semaphore gets signaled when presentation is complete.
        swapchain->aquireNextImageIndex(UINT64_MAX, frameData.presentCompleteSem, NULL, &nextImage);

        // Copy the rendered image into the appropriate swapchain-image
        CommandBuffer& blitCmd = frameData.blitCmd;

        blitCmd.begin(VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT);
        {
            VkImageSubresourceRange subResourceRange = {};
            subResourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
            subResourceRange.layerCount = 1;
            subResourceRange.levelCount = 1;
            blitCmd.setImageLayout(renderedImage, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, subResourceRange);
            blitCmd.setImageLayout(swapchain->getImage(nextImage), VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, subResourceRange);

            blitCmd.copyImage({ Window::getWidth(), Window::getHeight(), 1 },
                              renderedImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
                              swapchain->getImage(nextImage), VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
            
            blitCmd.setImageLayout(renderedImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, subResourceRange);
            blitCmd.setImageLayout(swapchain->getImage(nextImage), VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, subResourceRange);
        }
        blitCmd.end();

        // Submit the copying command, wait until the image has been presented
        // This is the last submit before presenting so signal the fence in the frame-data struct
        blitCmd.submit(graphicQueue, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
			           frameData.presentCompleteSem, NULL, frameData.fence);

        // Submit swapchain-image to the presentation engine
        swapchain->queuePresent(presentingQueue, {}, nextImage);

After reading this i noticed something. In my mind the rendering itself and the following copy into the swapchain image is synchronized by a subpass-dependency. But are Subpass-Dependency only synchronize commands in the same command buffer? I cant be sure that the blit-cmd is already doing his work, when the rendering is still in process, can i? Than i would need definetely a semaphore here.

The spec says:

Each subpass dependency defines an execution and memory dependency between two sets of commands, with the second set depending on the first set. When srcSubpass does not equal dstSubpass then the first set of commands is:

  • All commands in the subpass indicated by srcSubpass, if srcSubpass is not VK_SUBPASS_EXTERNAL.

  • All commands before the render pass instance, if srcSubpass is VK_SUBPASS_EXTERNAL.

    While the corresponding second set of commands is:

  • All commands in the subpass indicated by dstSubpass, if dstSubpass is not VK_SUBPASS_EXTERNAL.

  • All commands after the render pass instance, if dstSubpass is VK_SUBPASS_EXTERNAL.

Does this mean - in the case of VK_SUBPASS_EXTERNAL - all commands before/after the render pass instance on the same queue or on the same command buffer?

Should be “same queue”, but it is a bit of an ongoing discussion in the GitHub Issues (apparently the synchronization specification needs bit of a cleanup).

That being said, you have dstStageMask=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT (AKA do not block). So there’s practicaly no dependency between the rendering and the transfer (unless relying on something else).

[QUOTE=Twanks123;41194]The spec says:

Does this mean - in the case of VK_SUBPASS_EXTERNAL - all commands before/after the render pass instance on the same queue or on the same command buffer?[/QUOTE]

The intent of the specification is that there is no difference between commands before it in the CB and commands before that CB in the queue. “Before” always represents commands that happen before it in the queue, and “after” always represents commands after it in the queue.

[QUOTE=krOoze;41195]Should be “same queue”, but it is a bit of an ongoing discussion in the GitHub Issues (apparently the synchronization specification needs bit of a cleanup).

That being said, you have dstStageMask=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT (AKA do not block). So there’s practicaly no dependency between the rendering and the transfer (unless relying on something else).[/QUOTE]

I have tried VK_PIPELINE_STAGE_ALL, but nothing has changed. Mhh Its so hard to get the synchronisation right in vulkan.

Well, you can always try vkDeviceIdle()s to make sure it is synchronization problem in the first place…

I wonder what’s that checkered pattern? Try disabling the BY_REGION too.

[QUOTE=krOoze;41200]Well, you can always try vkDeviceIdle()s to make sure it is synchronization problem in the first place…

I wonder what’s that checkered pattern? Try disabling the BY_REGION too.[/QUOTE]

Tried both but the pattern artifact still occurs. Very weird. Maybe it isnt a synchronisation problem?

Finally i found some time to investigate this issue and it was indeed an synchronisation issue.
But why and why it works how i fixed it is not really clear for me, so any explanations would be awesome!

So the problem is that the commands in my Post-Processing Command-Buffer accessing the rendered Image BEFORE the whole rendering of my 3d-scene is finished.

This is how i submit my command buffers:

        // Gather all Command-Buffers and submit them all in once
        std::vector<const CommandBuffer*> commandBuffers;
        {
            // Add the cmd which renders the scene
            commandBuffers.push_back(&frameResources[frameDataIndex].primaryCmd);

            // Add Post-Processing Command Buffer
            commandBuffers.push_back(subRenderer[POSTPROCESS]->getCMD(frameDataIndex));
        }
        // Submit all Command Buffer in the List at once
        CommandBuffer::submit(graphicQueue, commandBuffers);

My first found fix was to submit the scene-cmd and post-processing cmd just one by one instead of in a batch:

        frameResources[frameDataIndex].primaryCmd.submit(graphicQueue);
        subRenderer[POSTPROCESS]->getCMD(frameDataIndex)->submit(graphicQueue);

Why this is working?

My second fix was to add a pipeline-barrier right before the post-processing renderer records all commands: (Weird to me is that i dont even have to specify anything in this command and it works)

 vkCmdPipelineBarrier(postProcessingCmd, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, 0, 0, NULL, 0, NULL, 0, NULL);

But the real question is why the commands in the post-processing cmd access the rendered-image to early?
I thought my rendering is synchronized by subpass-dependencies.
I have tried to use every access-flags and every stage-masks for both subpass-dependencies (VK_SUBPASS_EXTERNAL to 0 and vice versa) but it didnt fix this problem.

Hopy you can help me to understand this :slight_smile:

Thanks
SH

If you used the subpass dependencies you posted before for both cmdbuffers and submitted them in the same batch, then I think it wouldn’t work.
The dependencies assume there will be a semaphore between the two cmdbuffers.