Occlusion Culling using HiZ test: optimization question.

Hi guys, hope you have a good day!

I have implemented occlusion culling via ‘transform feedback’ using HiZ map builded from reprojected previous frame.

It is working fine, so I’m glad with results.

But one case produse interesting result:

If the scene have two cubes: one before other and camera can see only the nearest one, then the hidden one is not rejected by the algorithm.

I understand the reason:
The building of HiZ map select biggest depth for each pixel from previous mipmap level. As a result high levels of mipmap have more deep area then previouses.
For occluding test shader select high level of ‘HiZ’ mipmap (3x2 or like that), because hided cube have big size on the sceen frame.

Thus the algorithm is working ok.

See screenshot:

  • selected frame rounded by the green line is a basic layer on HiZ mapmap.
  • selected frame rounded by the red line is a high level of Hiz mipmap.
    So it is demonstrating the fact of hidden cube is not covered by depth map of the selected layer.

HiZ

But one cube is definitely hidden by the other.
May be it exists some cheap solution to avoid render hidden items like that?

Thank you for any help!

Is there possibility to know if object visible after frame has rendered without using transform feedback?

That’s what Occlusion queries are for. There are other ways, but this is probably the simplest.

Hm…
The scene contains several thousands objects, and I have no idea how can I do that without transform feedback and geometry shader.

But it seems I found some interesting thing: Coherent Hierarchical Culling.
And I have another question: It is good idea implement octree computation with frustum culling in the Compute Shader? I mean the influence to main pipeline computation.

Occlusion queries with Conditional Rendering would work, but might not perform as well as you’d like for thousands of objects.

Let me ask you this. Are you using a 1x1 Hi-Z sample to test for occlusion, or are you testing a 2x2 sample? (See this post by rastergrid; in particular, this figure). That’ll help but won’t resolve problems in all cases if you’re only checking one level, of course.

Alternatively, if the case you’re testing failing because the Hi-Z texels you are testing pull in “far” pixels which are around the cube in screen space? In the general case, you’d like to walk down the Hi-Z MIP levels until you resolve whether or not the object you’re considering is occluded. Depending on where that lies relative to the Hi-Z pixels, that might be a few levels down from the level where you’d ideally like to test.

I’m using 2x2 sample, exactly as does it article’s author.
The culling is working correctly and there is no better solution without performance overhead.

I loaded huge scene and realized OC is not bottleneck at all.
There are too many objects go through OC query and through Cascade Shadow Mapping.
I thing I have to implements Coherent Hierarchical Culling.

PS: Photon, thank you very much!

I have a annoying bug:

My scene have 3 batches of items. Every batch based on own Mesh.
I render those batches using glDrawElementsInstanced.

There is frustum culling with transform feedback.

I have wrong result after 1st step - render instanceIds of visible objects to the same SSBO for all meshes.

It is working correctly if frustum see at least one object from 1st batch.
In that way SSBO contains correct instanceIds.

But if the frustum does not contain any object from 1st batch, then SSBO contains trash.

Thus, if 1st glDrawElementsInstanced renders nothing, then SSBO became filled by trash.

Have no idea what to do with that.

Maybe I should initialize SSBO with first value or something like that?

Thank you for any help.

For us to be able to help you, you need to describe a more detail how your frustum culling pass and draw pass are intended to work.

One way to do what you’re talking about is to do a cull pass for all models instanced from a single, shared mesh. The output of the pass is: 1) a list of per-instance data in a buffer object, 2) the number of instances (the number of instances can be captured to a buffer object using ARB_query_buffer_object, among others). Then when you issue the subsequent draw call for those instances, you can feed that primitive count into the draw call using (for instance) ARB_indirect_parameters.

Now how exactly are you handling the frustum culling and the issuing of the subsequent draw call?
What data is written out by the former and consumed by the latter using gl_InstanceID?
Per cull pass, are you only culling models instanced from a single, shared mesh, or are you doing something different?

If you’re not handling the case where 0 instances may be written out to the buffer, you need to. It’s unclear, but it sounds like that’s what you’re missing in your technique.

Of course.

I have 2 step algorithm:

  • 1st step is an occluding for whole scene that has one output SSBO with instanceIds of accepted objects.
  • 2d step drawing all accepted objects.

All objects of the scene organised as sorted map ‘Mesh -> array of objects’.
Map sorted by Mesh to guarantee the same order of them for occluding and drawing stages.
Actually the difference between objects related one Mesh is just model matrix.

I try to describe both steps in detail sкipping stuff that does not matter.

Occluding:
I use one ‘transform feedback’ (TF) loop for whole scene.
In the scope of that TF I call glDrawElementsInstanced (with GL_POINTS - centres of mesh) for each Mesh using own ‘occlusion query’.
For occluding I use center and parameters defained bounding box (BB).
The occluding technically the such as in ‘Mountains with HiZ’ example.
The vertex shader define occluding fact for center using BB, then send the flag to geometry shader.
The geometry shader just rejects or accepts current object (actually just center of it) using flag as input.
Output of geometry shader is the instanceId of current object.
So, as a result I have set of ‘occluding query’ results for each mesh and filled SSBO with instanceIds for each Mesh.

Drawing:
I call glDrawElementsInstanced (with full data) for each Mesh.
Using ‘occluding query’ results I can calculate offset for SSBO to define needed data region in it.
So, using current instanceId and offset for current Mesh I can get accepted instanceId from SSBO.
Using accepted instanceId I get all needed info for current object from additional buffers.

So basically that is it.

In my case I reduce the scene to make it simple.
I have 3 meshes: just cubes with different textures.
The first Mesh has 3 objects, and others have 2 objects each.
So SSBO result of occluding if all objects are visible is: 0, 1, 2, 0, 1, 0, 1;
0, 1, 2 - instanceIds for 1st mesh (occluding quesy result: 3)
0, 1 - instanceIds for 2st mesh (occluding quesy result: 2)
0, 1 - instanceIds for 3d mesh (occluding quesy result: 2)
Thus if I draw 3d mesh instances I just calculate offset using query result for prev instances: 3 + 2.

It seems I can describe what is wrong with my occluding step.
If all of 1st Mesh objects are invisible (occuding query result is zero), then output SSBO has trash like:
-383292386, 0, 838237456, 0 (2 objects per 2d and 3d meshes)
If any object of 1st Mesh is visible, then everything is ok: 2, 0, 1, 0, 1

Are you using selective emission from the geometry shader in your Occluding pass to determine if/when to write out instance IDs? If so, then you need to ensure that your Drawing pass does not read past the end of the element list written by the Occluding pass. You do this by feeding the number of primitives written count from the Occluding pass into the Drawing pass. 0 is a valid number of primitives that might be written.

Yes.

I’m sure of that. To draw instances I use exact result of ‘occlusion query’ from culling pass.
The problem appear before drawing pass.
I read content of SSBO and can see trash, as I wrote in my prev comment.