I have some serious rendering performance issues in my engine (using Vulkan), and I’m having a lot of trouble pinning down the cause.
I’ve used a mesh with 29952 vertices and 9984 triangles as reference. I can render the mesh 262 times in the source engine and my FPS count is still at a steady 90 fps:
That’s ~0.0415ms per mesh (Ignoring level geometry and such, which means it would actually be even faster).
In my engine I already run into massive performance problems if I just render a handful.
I’ve used timestamp queries to time how long it takes to render 36 of them and it turned out to be ~46.1373ms (~1.282ms per mesh). That’s about 30 times slower compared to the source engine (And that’s without a texture, lighting effects, etc!).
I’ve already ruled a few things out:
[ul]
[li]State Changes: I can render hundreds of different small objects (= a lot of state changes) just fine, no issues whatsoever.
[/li][li]CPU:
[/li]I did some measurements on the CPU and was able to narrow the it down to this function-call:
vkDevice.waitForFences(1,&fence,true,std::numeric_limits<uint64_t>::max());
I’m using FIFO present mode with 2 swapchain images (Mailbox isn’t supported on my GPU). For each image there is a fence, to make sure all previous render-calls for the command buffer for that swapchain image have been completed. The above call waits for that fence, and thus waits until the command-buffer has executed all commands in the queue. This is where my program spends most of its time (~95%), which means it’s mostly just waiting for the GPU. (Which concurs with the timestamp measurements I mentioned.)
[li]Shaders: The shaders I’ve used for testing are as simple as can be:
[/li][/ul]
Fragment Shader:
#version 440
#extension GL_ARB_separate_shader_objects : enable
#extension GL_ARB_shading_language_420pack : enable
layout(location = 0) out vec4 fs_color;
void main()
{
fs_color = vec4(1,0,0,1);
}
Vertex Shader:
#version 440
#extension GL_ARB_separate_shader_objects : enable
#extension GL_ARB_shading_language_420pack : enable
layout(location = 0) in vec3 in_vert_pos;
layout(push_constant) uniform Matrices {
mat4 MVP;
} u_matrices;
void main()
{
gl_Position = u_matrices.MVP *vec4(in_vert_pos,1);
}
(Back-facing triangles are discarded by the cull mode.)
I’m using Vulkan 1.0.26 and my drivers are up to date.
Everything points towards the GPU struggling to render the meshes, but that doesn’t explain how so many can be rendered in the source engine (And my GPU definitely should be able to handle it).
I haven’t posted any code because I don’t even know what to look for. What can I try to narrow the problem down further?