Vulkan (vs.)? MultiDrawIndirect

Christoph · July 7, 2016, 12:49am

Hi,

I have recently discovered that drawCalls are expensive CPUwise (I did not care before) and that individual uniforms are just criminal (hurra for ubos).

Thus I was delighted to discover multi indirect calls and the simplicity with which I could blast scene geometry, while retaining the model structure of the scene (hurra for glDrawID and SSBOs).
Culling using multiIndirect is easy to do quick and dirty (Hurra for Compute Shader and binding the IndirectBuffer via SSBO).

Of course, as soon as I need different states (Alpha Blending), I need to dismantle the multi indirect calls again. One for opaque, one for transparent, each with its respective states and shaders.
Add shadow mapping and stereo framebuffer switching, txaa and hierachical-z culling and the simple pipeline grows rather quickly.
Then comes vulkan and promises to pre-record draw calls and probably state changes as well (I have not yet started to toy with that.)

But as for my question:

Multi Draw Indirect uses a single draw call, all scene data is on the GPU and can be updated either directly on GPU via compute or by using buffer uploads. However, the shader requires several indirections to reconstruct model specific data.

Vulkan allows to keep the shaders simple, each model has its own UBO, no indirections are necessary (multi draw indirect appears to exist as well) but again all the data is on GPU and we have no CPU overhead safe for the essential buffer updates.

So basically: do I still need multi draw indirect using Vulkan and can I modify the command buffer via compute (I would guess not, since the command buffer seems to me like a glorified display list)?

Salabar · July 7, 2016, 3:21am

Indirect draws should only be used when host side cannot predetermine geometry setup (it is what it is meant for). I doubt indirect is completely free for any current or future hardware: it at very least requires additional barrier.

Now, regarding multidraw. In OpenGL, glMultiDraw* replaces this kind of loops

foreach mesh in scene using current shader setup:
glBindSomeState
.
.
.
glBindSomeState
glDraw

Because glBind* is expensive, MultiDraw allows driver to perform this loop more optimally.

In Vulkan vkCmdDraw is not the same thing as glDraw - it is a primitive that driver uses in OpenGL to build any sorts of draws, e.g. MultiDraw commands. To emulate glMultiDraw you should define a shader variable UserDefinedDrawID with “layout (location = push_constant)” and record a command buffer like this:

foreach mesh in scene using current shader setup:
vkCmdPushConstant(DrawID)
vkCmdDraw

(there are render passes and whatnot).

each model has its own UBO

This is bad. It means you have to call vkCmdBindDescriptorSet for each mesh which is cheaper than binding multiple individual buffers and textures in OpenGL, but is still not ideal. Something akin to what you already use in OpenGL and what I described above is more optimal.

can I modify the command buffer via compute

You cannot modify command lists after they are baked (you don’t even know, where they reside, actually). But recording them is super cheap: you simply reset them each frame.

Christoph · July 7, 2016, 4:19am

Thank you for your quick reply.

I am currently modifying the scenerendering examples of Sascha Willems and finding my way around the descriptor jungle.
https://github.com/SaschaWillems/Vulkan/tree/master/scenerendering

He appears to provide a descriptorSet per model.
The sets are switched and bound during “recording”.

From your reply I take that this is not optimal, while I do not have the CPU overhead from oldGL Style binding, the actual (pre-recorded) binding is still expensive (or more expensive).

So what I will try next, is to “implement” a Multi(Indirect)DrawCall using the vulkan command buffer and vkCmdDrawIndirect.

Next Question: Can I modify the buffer of VkDrawIndirectCommand structs and cull geometry by setting VkDrawIndirectCommand.instanceCount = 0.

This would seem possible as I only modify buffer contents, but is it wise?

Christoph · July 7, 2016, 5:20am

An additional question to this sample of Willems:

He provides a pipeline per material.
Why not use a single one?
Or can multiple pipelines be parallelized?
Does the pipeline setup influence the binding costs of the descriptor sets?

Best Regards

Sascha_Willems · July 7, 2016, 5:27am

[QUOTE=Christoph;40453]An additional question to this sample of Willems:

He provides a pipeline per material.[/QUOTE]

That’s not the case. The material only stores a reference to one of the two pipelines created in the example. One for solid materials and one for blended materials. So there are actually only two pipelines. If you use multiple pipelines you’d do pipeline derivatives (like in my pipeline example) to optimize setup and rendering.

As for the rest of the sample (sry, I’m at work so no detailed answer):
Yes, you could do lots of things different if you’re only going after performance. The sample e.g. uses per-Material descriptor sets and push constants just to show what’s possible. So you may e.g. only use push constants or render sorted by material, descriptor set, pipeline or something like this. But it’s a basic example that shows one of the ways to do it. How you organize pipelines (and layouts), descriptor sets, bindings, etc. heavily depends on your use case

Christoph · July 7, 2016, 6:04am

Thank you for your reply!!

I am still a little swamped with new information and thus missed the part about reference.

I want to wrap my head around the descriptor stuff by combining samples and modifying existing code and performance gain is a nice motivator

But to make sure I understand correctly:

Binding descriptorSets per vkCmdDrawIndexed is (comparatively and as the Vulkan equivalent) just as “bad” as the “old” GL style of rendering a loop and updating the shader uniforms one by one.
Better to upload the model specific data beforehand (into an UBO[]) and accessing it via a “modelID” during rendering.
Such modelID would then be the only information to be uploaded.
?

Best Regards

Alfonse_Reinheart · July 7, 2016, 6:33am

Binding descriptorSets per vkCmdDrawIndexed is (comparatively and as the Vulkan equivalent) just as “bad” as the “old” GL style of rendering a loop and updating the shader uniforms one by one.

No, it is better than the equivalent OpenGL code. That doesn’t mean that it’s good, but you at least avoid the validation overhead OpenGL has.

Better to upload the model specific data beforehand (into an UBO) and accessing it via a “modelID” during rendering.
Such modelID would then be the only information to be uploaded.

You have several alternatives. If your per-model data is always in a UBO, you can use a dynamic UBO. This allows your descriptor to provide a VkBuffer and size of memory to use, but the actual byte offset is something you provide via vkCmdBindDescriptorSet. So you would not be changing descriptor sets; you’d just be slightly modifying them, which is much cheaper.

That’s a good idea mainly the only difference between objects is the contents a UBO. If you need to fetch more complicated data, then it’s better to use a single Push Constant, then use that to index into an SSBO to get the mesh’s per-object data.

Salabar · July 7, 2016, 7:45am

Binding descriptorSets per vkCmdDrawIndexed is (comparatively and as the Vulkan equivalent) just as “bad” as the “old” GL style of rendering a loop and updating the shader uniforms one by one.

It’s not AS bad. To put it in perspective: in Direct3D9 you had to make an API call per uniform, per mesh and per texture (there are workarounds to avoid some of them, not the point though). Direct3D10 added UBOs and textures arrays so you needed to perform only two API calls to change all of the shader uniforms and textures of the same size. Descriptor sets is the further development of this idea: it allows changing all vertex data, ubos, ssbos and images in one call using single prebaked object. Binding still has associated costs, but those are seriously mitigated. On another hand, if you are crazy enough to build your own rendering engine in 2016, you shouldn’t settle on a sub-optimal solution.

Alfonse_Reinheart · July 7, 2016, 8:08am

Descriptor sets is the further development of this idea: it allows changing all vertex data, ubos, ssbos and images in one call using single prebaked object.

Well, not vertex data, as processed by the input stage.

Christoph · July 7, 2016, 8:52am

Thank you for your elaborations on the topic.

All this is purely academic but to actually understand vulkan, there is no other way. Besides, its fun and teaches stuff I’ve never wanted to know until now.

On topic: I have read that vulkan is not bindless because of descriptorSets.
However, the nice thing about MultiIndirect & bindlessTextures ist the fact, that I can encapsulate my materialData, including textureHandles, in a struct and push a buffer of structs that is then accessed via multiDraw.
Without bindless, I could use a textureArray to a similar effect.

So the question: WhatWouldVulkanDo? I am guessing the array because I’ve just seen the texture array example @SaschaWillems.
But still, is bindless a dead end or still sensible ?

Salabar · July 7, 2016, 10:07am

Bindless is a solution to eliminate CPU overhead of a state change. It is a superior one, but also requires more complex hardware to support this feature. Vulkan resource binding is good enough though: a programmer can avoid state change most of the time, and when he\she can’t, state change is not too taxing compared to OpenGL and it can be supported by high range of existing hardware. Perhaps at some point, as hardware and rendering techniques advance, bindless will become a necessity (or descriptor sets will become a useless abstraction) , but for the time being it is not needed.

Sascha_Willems · July 7, 2016, 10:24am

[QUOTE=Christoph;40463]So the question: WhatWouldVulkanDo? I am guessing the array because I’ve just seen the texture array example @SaschaWillems.
But still, is bindless a dead end or still sensible ?[/QUOTE]

Well, as with most things 3D API related this depends on your use case

So if you have lots of textures, using texture arrays (combined with sparse allocations if necessary) is a good option IMO that works nicely with indirect drawing. Especially as Vulkan requires a minimum of 256 image array layers to be supported.

Alfonse_Reinheart · July 7, 2016, 10:40am

[QUOTE=Christoph;40463]So the question: WhatWouldVulkanDo? I am guessing the array because I’ve just seen the texture array example @SaschaWillems.
But still, is bindless a dead end or still sensible ?[/QUOTE]

Using an array of textures (as opposed to array textures. That is, texture2D variable[50] instead of texture2DArray variable) can be a much more spacious solution, for hardware that supports shaderSampledImageArrayDynamicIndexing. NVIDIA hardware supports 8192 textures per-stage, while AMD hardware supports truly arbitrary numbers. You can also sparsely use array indices, with some array indices left unfilled.

Of course, not all hardware supports that, so if you wish to support such hardware, array textures are a broader solution. Just not as roomy of a solution, compared to hardware most hardware that supports dynamically uniform indexing. That being said, most Vulkan implementations give you at least 1024 array layers.

[QUOTE=Sascha Willems;40465]Well, as with most things 3D API related this depends on your use case

So if you have lots of textures, using texture arrays (combined with sparse allocations if necessary) is a good option IMO that works nicely with indirect drawing. Especially as Vulkan requires a minimum of 256 image array layers to be supported.[/QUOTE]

And what of hardware that does not support sparse allocations? That can be an expensive prospect, memory-wise.

Christoph · July 8, 2016, 7:49am

Again, thank your for your insight!

Does that mean my numbers of textures is fixed? Aka texture2D variable[50] or can I define something like texture2D variable[].
Do I have to provide a maximum/fixed number at for every scene?

When trying to use a uniform buffer[] in GL in the MultiDraw context, I was scolded at by the compiler that I should use a SSBO because the index (drawID) was unpredictable or something.
Do I have to care about that when using arrays?

Edit:
https://www.khronos.org/registry/vulkan/specs/misc/GL_KHR_vulkan_glsl.txt

It appears that:

layout(constant_id = 17) const int arraySize = 12;
vec4 data[arraySize]; // legal, even though arraySize might change

might be what I am looking for, I just have to figure out if/how to set the constant from host .

Alfonse_Reinheart · July 8, 2016, 8:36am

[QUOTE=Christoph;40484]Again, thank your for your insight!

Does that mean my numbers of textures is fixed? Aka texture2D variable[50] or can I define something like texture2D variable[].[/quote]

No, you cannot make it unbounded. As I pointed out, implementations have limits on the number of per-stage sampled images you can use. And you have to respect those.

Well, except for AMD, whose “limits” are MAX_INT…

That’s up to you. Hardware has limits; how you live within them is up to you.

I don’t know what you mean by that. “Or something” doesn’t say much about the error in question, nor is it clear what a “uniform buffer” is.

If you are saying that you were trying to use an array of uniform blocks, and you tried to index it with gl_DrawIDARB, and the compiler failed to compile, then the compiler is broken. ARB_shader_draw_parameters makes it abundanetly clear that gl_DrawIDARB is a dynamically uniform value, and you are allowed to access arrays of blocks and opaque types with dynamically uniform values.

Assuming what I said above is what you’re talking about, then this is exactly what I meant when I brought up the shaderSampledImageArrayDynamicIndexing feature. As the Vulkan specification explains, if hardware does not provide this feature, then you can only index arrays of samplers/textures with constant values. If it does have this feature, then you may use a dynamically uniform value.

Christoph · July 8, 2016, 10:21am

Thank you for you reply and please excuse my confusing syntax.

Regarding the array of textures:
It appears to be wrong to use bindings with texture arrays:

layout (set = 1, binding = … ) uniform texture2D diffuseTextures[2048];
As soon as i define the binding I am limited to 76 textures, otherwise: “‘binding’ : sampler binding not less than gl_MaxCombinedTextureImageUnits (using array)”

Does that also mean that I do not have to provide a “descriptorSetLayoutBinding” and use only the set for identification? That seems wrong, where should I the put the “VK_DESCRIPTOR_TYPE_SAMPLER”?

Best regards

"

Christoph · July 8, 2016, 10:48am

Edit, to the post above:

I am currently trying to populate the descriptorSet with textureDescriptors analogue to the samples:

	for (size_t i = 0; i &lt; materials.size(); i++)
		{
			VkDescriptorImageInfo texDescriptor =

				vkTools::initializers::descriptorImageInfo(
					materials[i].diffuse.sampler,
					materials[i].diffuse.view,
					VK_IMAGE_LAYOUT_GENERAL);


			// Binding 0: Diffuse texture
			writeDescriptorSets.push_back(vkTools::initializers::writeDescriptorSet(
				descriptorSetMaterial,
				VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,
				0,
				&texDescriptor));

		}

yet this does not seem sensible.
Not only does it require binding information but also provides a sampler, which I would expect not to need, as I want to provide a texture array. Consequently I would expect to provide a separate sampler.

Could you please provide some explanation or point me to an example?
Best Regards.

Edit:
Ok, the computeShader sample suggest to provide “VK_NULL_HANDLE” for the sampler.
That leaves the mystery of the binding point

Alfonse_Reinheart · July 8, 2016, 11:02am

As soon as i define the binding I am limited to 76 textures, otherwise: “‘binding’ : sampler binding not less than gl_MaxCombinedTextureImageUnits (using array)”

gl_MaxCombinedTextureImageUnits, as the name suggests, is an OpenGL variable. You’re compiling for Vulkan. So the compiler seems broken in that regard, or you are using it wrong in some way.

Not only does it require binding information but also provides a sampler, which I would expect not to need, as I want to provide a texture array. Consequently I would expect to provide a separate sampler.

TYPE_SAMPLED_IMAGE maps to texture* types. These descriptors do not need a sampler, and whatever sampler data you provide to the descriptor update function will be ignored.

TYPE_COMBINED_IMAGE_SAMPLER maps to sampler* types. These use both sampler and image data, and the sampler/image data from the descriptor update function will be used.

TYPE_SAMPLER maps to the sampler type (no *). This takes just the sampler data from the descriptor update.

Christoph · July 8, 2016, 11:20am

I compile using the prebuild validator from the LunarSDK and use
glslangvalidator -V scene.frag -o scene.frag.spv

I take it then that i MUST always provide binding information (makes sense) and also build the compiler from source.

Thank you for your advice!

Christoph · July 8, 2016, 11:58am

It appears that even the latest glslangvalidator causes the invocation of “gl_MaxCombinedTextureImageUnits (using array)” when binding the array with texture2D [78+].

It there anything else to do but to run the validator with “-V”?

Best Regards