Hi,
I have been timing portions of my code as part of an attempt to get a better grasp of how the presentation engine behaves. The code I’m using looks something like this:
// imageCount==2 for FIFO, 3 for Mailbox
// minImageCount==2
uint32_t idx;
acquiredImageAvailableSemaphore = device.createSemaphoreUnique({});
device.acquireNextImageKHR(*swapchain, timeout_infinite, *acquiredImageAvailableSemaphore, {}, &idx);
imageAvailableSemaphores[idx].swap(acquiredImageAvailableSemaphore);
device->waitForFences(1, &*presentationBufferExecutionFences[idx], VK_TRUE, vkt::timeout_infinite);
device->resetFences(1, &*presentationBufferExecutionFences[idx]);
vk::CommandBuffer& cb = *presentationCommandBuffers[idx];
cb.begin(&beginInfo);
cb.beginRenderPass(&renderPassInfo, vk::SubpassContents::eInline);
// I don't actually record any commands here at the mome
cb.endRenderPass();
cb.end();
vk::SubmitInfo submitInfo = {};
const vk::PipelineStageFlags waitStage = { vk::PipelineStageFlagBits::eColorAttachmentOutput };
submitInfo.waitSemaphoreCount = 1;
submitInfo.pWaitSemaphores = &imageAvailableSemaphores[idx];
submitInfo.pWaitDstStageMask = &waitStage;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &cb;
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = &presentWaitSemaphores[idx];
graphicsQueue.submit(1, &submitInfo, *presentationBufferExecutionFences[idx]);
vk::PresentInfoKHR presentInfo = {};
presentInfo.waitSemaphoreCount = 1;
presentInfo.pWaitSemaphores = &presentWaitSemaphores[idx];
presentInfo.swapchainCount = 1;
presentInfo.pSwapchains = &*swapchain;
presentInfo.pImageIndices = &idx;
presentQueue.presentKHR(&presentInfo);
The timings I get with mailbox look like this ([milliseconds::microseconds], release, no validation layers):
[ 5089:: 65] > acquiring image
[ 5089:: 72] > acquired image: 0
[ 5089:: 78] > waitForFences start
[ 5089:: 80] > waitForFences end
[ 5089:: 85] > submit
[ 5089::137] > presentKHR
[ 5089::300] > end
[ 5089::323] > acquiring image
[ 5089::330] > acquired image: 1
[ 5089::335] > waitForFences start
[ 5089::336] > waitForFences end
[ 5089::341] > submit
[ 5089::396] > presentKHR
[ 5089::532] > end
[ 5089::536] > acquiring image
[ 5089::558] > acquired image: 2
[ 5089::563] > waitForFences start
[ 5089::565] > waitForFences end
[ 5089::569] > submit
[ 5089::603] > presentKHR
[ 5089::705] > end
[ 5089::710] > acquiring image
[ 5089::715] > acquired image: 0
[ 5089::734] > waitForFences start
[ 5089::736] > waitForFences end
[ 5089::740] > submit
[ 5089::788] > presentKHR
[ 5089::957] > end
...
There are some things I’m wondering about:
- The acquired images are always in consecutive order [0, 1, 2, 0, 1, 2, etc], though I would expect the presentation engine to be presenting one of them, resulting in something like [0, 1, 2, 1, 2, 1, 0, 2, 0, 2]. I guess the presentation engine works a bit differently internally and makes a copy of the relevant data?
- Submit takes a bit of time, this makes sense. PresentKHR takes significantly more time. Is this normal?
- Am I handling the semaphores correctly?
However, the really odd part was when I used the FIFO presentmode. I expected to have vkAcquireImageKHR to block, but what I got instead was this:
[ 7305:: 69] > acquiring image
[ 7305:: 84] > acquired image: 1
[ 7305:: 92] > waitForFences start
[ 7305:: 94] > waitForFences end
[ 7305::106] > submit
[ 7305::166] > presentKHR
[ 7321::533] > end
[ 7321::553] > acquiring image
[ 7321::583] > acquired image: 0
[ 7321::604] > waitForFences start
[ 7321::607] > waitForFences end
[ 7321::620] > submit
[ 7321::676] > presentKHR
[ 7338::135] > end
...
As you can see, acquiring the image is instantaneous. Instead, vkQueuePresentKHR seems to be the synchronization point for my code. Why? Am I doing something wrong? Is this expected (undocumented?) behaviour?
I’m using a g-sync compatible laptop with a GTX980M. The drivers are approximately one week old and g-sync is disabled in the NVIDIA control panel.
Any help and advice is appreciated (relevant to the topic or not)!
Best,