Vulkan Performance Support

For my final year project I am doing a performance comparison between DirectX 11 and Vulkan on a multitude of different hardware and operating systems. Early signs currently point to DirectX 11 being faster in my implementation.
I was wondering if anybody had any suggestions for Vulkan features or settings to look into? I was hoping that Vulkan would outperform DirectX 11 really…

My Vulkan project was built using: https://vulkan-tutorial.com/

Currently both projects just load in the Sponza scene as an OBJ, with 8 dynamic spotlights and a camera following a pre-determined path.

I am personally working on a laptop with an Intel i-7 CPU and a NVIDIA GeForce 840M GPU.

I am using Vulkan 1.0.61.1.

Any advice or suggestions would be welcome, I am happy to post specific pieces of code if that would help?

Thanks!

Right, tough subject. Some random thoughts then, maybe something will be useful to you:

  • Anything you measure may be outdated before you manage to finish your work.

  • There is no such thing as “performance of Vk/DX11” – to get something like that one would have to try any possible kind of workload and to somehow do a weighted average of those results – humanely impossible task. It is hard to make a measure (i.e. that is not complete BS) to responsibly compare performance of two APIs.

  • It is hard to make a fair implementation to compare APIs, even for single case. If it is using common codebase, then whichever API is implemented to the codebase second is likely to show worse results. It is probably two man job. One guy should try to make a DX implementation from scratch and try his best to optimize. Second guy should separately try to make a Vulkan implementation (and try his best to optimize). Anyway my point is, this is IMO the most common bias: ThisAPI programmer tried to compare ThisAPI vs OtherAPI – guess what, ThisAPI somehow turned to out be the fastest one in his results. Or: Game in ThisAPI later implements OtherAPI – guess what, the original ThisAPI implementation somehow turns out to be faster.

  • Despite what said above, you probably want to establish some feel for “real world” performance. You probably want to compare your results to the results of other people trying more complex, but realistic use, things like 3D games, to judge if your results are “sane”.

  • Some people try to run something at >300 FPS, and try to pretend the results mean anything. That’s potentially a concern with Sponza on modern high-level HW.

  • BTW, your SDK is outdated.

  • Mantle derived APIs were marketed that they have better CPU performance.

    • Your CPU seem overpowered, while your GPU seem weaker. Interesting would be to switch that around (weak CPU + strong GPU).
    • Something like CPU-time or power usage could be potentially measured.
    • Trying to do things in multi-threaded fashion could show aditional improvements in Vulkan - but needs specific 3D app, where that could become a bottleneck for DX11
  • Vulkan is bit more low-level and explicit. DX11 potentially has to do some guess-work. It would be interesting to somehow try to trick DX in making the wrong guess. Unfortunately I don’t have specific example, but maybe something can be found on the internet (i.e. something that is known to perform badly on DX, and trying to reimplement the same in Vulkan).

  • Vulkan is supposed to be more friendly to mobile GPUs. But I guess hard to compare with DX on the same device.

  • Comparison of APIs is tricky. You potentially want to to some degree abuse the API, to see its weaknesses. E.g. try to make a lot of draw calls. Or maybe try large number of command buffers (which can be build in multitheaded way in Vulkan). Or frequent switching of pipelines. Or frequent switching between compute and grafic.

  • DX11 potentially has some home turf advantage. It may be interesting to try without a swapchain and render offscreen.

  • Vulkan has extensions. They require extra coding and work only on some HW, but may offer “free” performance for some specific workload.

  • Mantle derived APIs are supposed to have better “soft” performance. That is, it should be possible to program in such a way, that the performance should be more predictable. E.g. it might be interesting to measure individual frame times to see how much hitching is there.

  • ?

Vulkan’s performance will primarily be based on how you use the API. There are many things which the API makes quite clear will kill performance. The fact that you have to explicitly specify synchronization makes it clear that syncs are not cheap. The fact that you have to end render passes to do arbitrary reads from written textures makes it clear that this is probably not as cheap as using input attachments. And so forth.

If you use Vulkan the way you use D3D11, it probably won’t work out very well.

Currently both projects just load in the Sponza scene as an OBJ, with 8 dynamic spotlights and a camera following a pre-determined path.

This is a horrible judge of performance for Vulkan. It’s a purely static scene, with a moving camera. It doesn’t exercise any of the things that make Vulkan a better API.

Vulkan’s primary performance gain is in the ability to thread the construction of commands. In D3D11, you have to issue your commands on a single thread. In Vulkan, you can build command buffers independently on multiple threads. While you do submit them on a single thread, that submission cost is nothing next to the cost of building those CBs.

If you’re rendering single-threaded on Vulkan, you might see only modest performance gains.

Indeed, the main purpose of APIs like Vulkan is that they allow you to do things that you just couldn’t do effectively in immediate APIs. Streaming is a great example. Streaming often works in blocks; you load a fixed-sized block of data into memory, and you pull out the textures/vertex arrays/whatever from that block.

By giving you direct, low-level control over device-accessible memory, you give the user the ability to do things they couldn’t do before. In immediate APIs, you can’t change the size of a texture without requesting a whole new texture object. Which means that you have to design your streaming system around that limitation. Instead of the block being based on a fixed-sized chunk of memory, it has to be based on specific sizes and formats of objects. Each block contains 4 1024x1024 textures and X bytes of vertex arrays.

That limits the ability of your artists to do things. They can’t trade in some vertex array storage to have more textures. They can’t exchange one of the textures with 4 512x512 textures to improve image diversity in this region of the world. And so forth.

Because Vulkan works with memory more explicitly, you can actually do these things. You can create new images that are bound to existing memory. So instead of having to keep the same texture and buffer objects around, you can quickly create new VkImage and VkBuffer objects, but using the same memory behind them. This makes streaming much more flexible for your artists.

So unless you’re testing scenarios that actually tests Vulkan’s ability to improve performance in real-world situations, your tests are meaningless.

Hi guys,

Firstly, thanks for responding and so quickly at that!

Secondly, answering your questions and concerns:

  • I know measuring performance is incredibly difficult and problematic. I’m going to use frames per second and memory usage with the understanding that these aren’t completely conclusive but that they should be usable enough to check my implementations.

  • I actually currently have 5 different tests, I just didn’t want to put too much into my initial question. They are: Sponza with dynamic lights, Sponza with ambient only, a larger model with ~900,000 triangles and 1 texture, 500+ cubes and 300+ cubes with differing levels of transparency. I’m planning to create a couple more tests at some point to test this further.

  • I know that 1.0.65.0 is the current available SDK, I am just waiting to make sure it’s completely stable before I update.

  • I will be testing my project on multiple different machines, I just included my personal machine for reference.

  • Trying to find something that DX struggles with is a good suggestion, thank you. :slight_smile:

  • I am going to build an Android project just to see how easy it is to make Vulkan work on Android.

  • Is rendering without a swapchain and/or rendering off-screen common in games? I understand why it would remove the home-turf advantage but is it a fair test?

  • Again, if they only work on certain hardware, I don’t think I could use it. I don’t think it’d be completely fair?

  • I am measuring individual frame times.

  • I currently have 3 command buffers which are created during the project setup on a single thread and then they’re never re-created. Wouldn’t I only need to multi-thread the creation of them if I was dynamically re-creating them? Or are you saying that I should add a test that re-creates the command buffers?

  • How do I render on multiple threads? Could you possibly send me some code or point me to a URL that talks me through implementing this?

  • I thought Vulkan would be faster at rendering geometry? I understand that they have improved other aspects of the API but I’m confident that I can implement all of them in the time frame.

Again, thanks for the support.

I know measuring performance is incredibly difficult and problematic. I’m going to use frames per second

I find these statements to be contradictory. If you are willing to use FPS as a performance metric, then I cannot believe that you fully understand how performance measurement works.

Especially since it is easy to use actual time as a performance metric.

They are: Sponza with dynamic lights, Sponza with ambient only, a larger model with ~900,000 triangles and 1 texture, 500+ cubes and 300+ cubes with differing levels of transparency. I’m planning to create a couple more tests at some point to test this further.

OK, so… what do these things actually test?

You can’t just grab a couple of models, render them, measure the performance of that rendering, and then say that those numbers mean something. Performance testing is like a scientific investigation. You have some general notions you want to explore, form falsifiable hypotheses, develop experiments to verify or falsify those hypotheses, perform the experiments, and draw conclusions (with further experiments to come from that).

You just seem to be trying stuff. That’s not going to lead to any meaningful conclusions.

For example, take your “500+ cubes” example. Are these static cubes or are they dynamically moving? How are you making them move dynamically? Are you using instancing or UBO data? How many draw calls are you issuing? Are you doing state changes between those calls, or is it all done through memory? And so forth. Doing each of these will result in different performance results, for different reasons.

Meaningful performance testing is never as simple as “draw a bunch of cubes, see how long it takes”. If you do that, then the numbers you get, even if they’re in time rather than FPS, mean absolutely nothing.

Wouldn’t I only need to multi-thread the creation of them if I was dynamically re-creating them?

Yes. That’s what real applications do. So if you want to have performance numbers that actually mean something to the real world, then you’re going to have to do things as they are done in real applications.

Or at the very least, when doing your performance test, you need to state up-front that you’re using static CBs.

I thought Vulkan would be faster at rendering geometry?

Command buffer-style APIs exist in part because the CPU overhead of immediate APIs had become so great that CPUs were often the bottleneck in large-scale applications.

If all you’re doing is setting up some state and issuing a single draw call to blast something onto the screen, CB APIs should be approximately equivalent in performance to immediate APIs. But that’s not what real-world applications do, so such numbers are essentially useless for comparing the utility of these APIs.

Could you suggest a usable way to measure performance, please? I do understand that there is more to performance measuring than FPS and intend to include other statistics like Virtual Memory used and average time spent on each GPU core, etc.

I only have a couple of months to put this test together, so I am focusing on basic rendering in both APIs. The cubes aren’t being instanced, they will each be a separate draw call using UBO data to rotate. Again, could you suggest a meaningful test or two, please?

I am looking into multi-threading command buffers now. I agree, that was a massive short-coming and now that my application runs, I’m going to look into changing it to multi-threaded and add the ability to dynamically re-create command buffers.

That makes sense, basically, I have come on to this forum because I am aware that my project needs some guidance and support from people who understand this sort of thing better than I do.

Thanks again, I do appreciate you taking the time to help me with this. :slight_smile: