Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework synchronization #871

Open
SaschaWillems opened this issue Aug 29, 2021 · 14 comments
Open

Rework synchronization #871

SaschaWillems opened this issue Aug 29, 2021 · 14 comments

Comments

@SaschaWillems
Copy link
Owner

SaschaWillems commented Aug 29, 2021

The current synchronization involves a vkQueueWaitIdle after frame submission. While this makes things a lot easier, it is not optimal and shows a bad practice. The samples should be updated to use proper synchronization with per-frame resources where required and then should ditch the vkQueueWaitIdle.

This will be a large updated as it affects all samples. Progress will be tracked in this branch.

Progress can be followed at https://docs.google.com/spreadsheets/d/1qS6eg0zsGRRKKUkz2eRqg0qV_CzSloMH6G2BoA0Ucrc/edit?usp=sharing

Second step will be fixing all the issues reported by the synchronization validation.

@martin-ejdestig
Copy link

martin-ejdestig commented Aug 30, 2021

Hello @SaschaWillems , I was just looking at your shadow mapping example on the master branch and wondered why it was enough with a single shadow map buffer. Found vkQueueWaitIdle() and figured that is why.

But I also happened to notice this issue and saw that you on the proper_sync_dynamic_cb branch first modified the shadowmapping/ example to have more than a single shadow map but then changed back to 1. Where is the synchronization that makes it possible to have only a single shadow map?

Is it the following block from createShadowMapObjects()

                dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL;
                dependencies[0].dstSubpass = 0;
                dependencies[0].srcStageMask = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
                dependencies[0].dstStageMask = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
                dependencies[0].srcAccessMask = VK_ACCESS_SHADER_READ_BIT;
                dependencies[0].dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
                dependencies[0].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;

that prevents the shadow map to be written to before previous frame has finished reading? Similar to how the subpass dependencies for the main rendering pass allows for only using a single depth buffer? (I have not yet fully understood VkSubpassDependency.)

(Thank you for the wonderful examples btw, learning Vulkan would be much harder without them.)

(I also realize that this is maybe not the proper place to ask, but could perhaps say that it is coupled to the proper_sync_dynamic_cb branch. Commenting the code about how it works is perhaps a bit much if it assumes one understands VkSubpassDependency fully.)

@SaschaWillems
Copy link
Owner Author

Yes, the sub pass dependency handles this. There is no need to duplicate the attachment per frame.

I also plan on adding some additional documentation regarding the reworked synchronization.

@kyamant
Copy link

kyamant commented Sep 4, 2021

On the Rework Synchronization page , the cloning instruction is
git clone --recursive https://github.com/SaschaWillems/Vulkan.git.
Shouldn't the particular branch (proper_sync_dynamic_cb) be mentioned in that command?

@SaschaWillems
Copy link
Owner Author

SaschaWillems commented Sep 4, 2021

I don't think that makes sense for a branch. That branch is also heavy work-in-progress and people shouldn't use it.

@kyamant
Copy link

kyamant commented Sep 6, 2021

@SaschaWillems : Should I open a separate ticket for this? Just want to make sure this problem with newly worked synchronization got your attention.

  • defrerred example is giving validation errors.
  • offscreen example is giving validation errors.
  • radialblur example is giving validation errors.
  • pipelinestatistics example is giving validation errors.
  • pbrbasic example has a similar problem
  • descriptorindexing example has a similar problem as well as couple of triangles sticking out from the middle cube

Disabling the overlay by settings.overlay = false; makes the validation errors go away.

Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0x9f1516000000008c, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0x9f1516000000008c[] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.182.0/windows/1.2-extensions/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)

Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0xd3dd54000000008e, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0xd3dd54000000008e[] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.182.0/windows/1.2-extensions/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)

Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0xe647ea0000000090, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0xe647ea0000000090[] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.182.0/windows/1.2-extensions/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)

Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0x8168780000000092, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0x8168780000000092[] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.182.0/windows/1.2-extensions/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)
...

@SaschaWillems
Copy link
Owner Author

There is no need to report these. The new synchronization is still very much work in progress, and the way the ui overlay is handled will be completely reworked too.

@filnet
Copy link

filnet commented May 25, 2024

A recent nvidia driver update broke my rendering loop. The change apparently made vkQueuePresent() non blocking.
I was wrongly relying on this behavior... I also had a wait on the "in flight" fence but it was always signaled when reaching the wait.

Anyways, I went looking around how pros were doing synchronization of their rendering loop and naturally came here :).
I also looked at the official Vulkan samples.

Here is what I gathered:

In the Sascha Willems samples there is a wait for the in flight fence right after the vkQueueSubmit() call.
This effectively turns the vkQueueSubmit() call into a blocking call.
See

VK_CHECK_RESULT(vkWaitForFences(logicalDevice, 1, &fence, VK_TRUE, DEFAULT_FENCE_TIMEOUT));
.

The Vulkan sample, on the other hand, waits for the in flight fence right after calling vkAcquireNextImage().
See https://github.com/KhronosGroup/Vulkan-Samples/blob/8a25cc4b4ab9b02cc68aabcc47f2efdad76230f7/framework/rendering/render_context.cpp#L349.
I believe this is the correct approach as it defers the wait for as long as possible. And the wait will probably be non existent most of the time as the the execution of the command buffer (of the N-2 frame) will be long done.

This is a tricky subject, so I am not 100% sure about the above.
Feedback is welcome.

@SaschaWillems
Copy link
Owner Author

The correct way, as noted in this very issue, is the one used in the Khronos samples.

@filnet
Copy link

filnet commented May 25, 2024

Yep.

@kyamant
Copy link

kyamant commented May 26, 2024

A recent nvidia driver update broke my rendering loop. The change apparently made vkQueuePresent() non blocking. I was wrongly relying on this behavior... I also had a wait on the "in flight" fence but it was always signaled when reaching the wait.

Anyways, I went looking around how pros were doing synchronization of their rendering loop and naturally came here :). I also looked at the official Vulkan samples.

Here is what I gathered:

In the Sascha Willems samples there is a wait for the in flight fence right after the vkQueueSubmit() call. This effectively turns the vkQueueSubmit() call into a blocking call. See

VK_CHECK_RESULT(vkWaitForFences(logicalDevice, 1, &fence, VK_TRUE, DEFAULT_FENCE_TIMEOUT));

.
The Vulkan sample, on the other hand, waits for the in flight fence right after calling vkAcquireNextImage(). See https://github.com/KhronosGroup/Vulkan-Samples/blob/8a25cc4b4ab9b02cc68aabcc47f2efdad76230f7/framework/rendering/render_context.cpp#L349. I believe this is the correct approach as it defers the wait for as long as possible. And the wait will probably be non existent most of the time as the the execution of the command buffer (of the N-2 frame) will be long done.

This is a tricky subject, so I am not 100% sure about the above. Feedback is welcome.

The piece of code you are quoting from Sascha Willems' example is flushCommandBuffer() which is part of a general purpose utility for say updating images, etc. The rendering loop revolves around prepareFrame() and submitFrame(). The VulkanExampleBase class has wait fences (1-to-1 with draw command buffers which are in turn 1-to-1 with swapchain images) but I do not see them being used at all. @SaschaWillems am I missing something?

Thanks for your input as I am always trying to enhance my understanding of synchronization.

@filnet
Copy link

filnet commented May 26, 2024

Yes, my bad for pointing to irrelevant code.

The triangle.cpp example is much closer to what the official Vulkan samples do in terms of synchronization.
To the exception that waiting for the "in flight" fence is done before acquiring the next image and not after.
I don't think it makes a difference in terms of correctness but you want to wait as late as possible.

EDIT: here is the relevant code location

VK_CHECK_RESULT(vkResetFences(device, 1, &waitFences[currentFrame]));

@filnet
Copy link

filnet commented May 26, 2024

My understanding of vkAcquireNextImage is that it will hand out the index of the image to use but the image might not be usable yet.
Synchronization of vkAcquireNextImage/vkQueueSubmit/vkQueuePresent is done internally in the driver via the use of semaphores.

The triangle.cpp is not optimal in terms of semaphore usage:

  • vkAcquireNextImage() should signal an acquiredImageSemaphore instead of a presentCompleteSemaphore.
  • vkQueueSubmit() should wait on an acquiredImageSemaphore instead of a presentCompleteSemaphore and should signal a renderCompleteSemaphore (which it does).
  • finally vkQueuePresent() should wait on a renderCompleteSemaphore (which it does).

For the above to work you also need to have multiple copies of the resources used when building the command buffer.
So it requires quite a bit of infrastructure work before hand.

But I am pretty sure Sascha knows all this and the problem is the shear amount of work it would take to update all samples.

@filnet
Copy link

filnet commented May 26, 2024

Here is very good tutorial about swapchains and synchronization : https://www.intel.com/content/www/us/en/developer/articles/training/api-without-secrets-introduction-to-vulkan-part-2.html

@SaschaWillems
Copy link
Owner Author

SaschaWillems commented May 26, 2024

Just a quick note: This is an Issue to track synchronization rework for my sample, not a discussion topic. If you want to further discuss this, feel free to move it elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants