MRT using FBOs seems to be painfully slow.

I’m delving a bit more into OpenGL, and trying out some deferred rendering using framebuffer objects. I got it all set up and it is working fine, but I’m noticing a huge drop in FPS, especially when using multiple render targets.

This is my code to activate the FBO, which gets about 48 FPS on average with the scene I’m drawing.


gBufferFBO.bind(); // glBindFramebuffer(GL_FRAMEBUFFER, gBufferFBO);

GLenum mrt[] = { GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1 };
glDrawBuffers(2, mrt);
		
glClearColor(0, 0, 0, 0);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
		
gl.shaderManager.getShader("g-buffer")->use(); // Used to save the correct info to each target
render();
gl.shaderManager.unUseProgram();

glBindRenderbuffer(GL_RENDERBUFFER, 0);
glBindFramebuffer(GL_FRAMEBUFFER, 0);
		
glBindTexture(GL_TEXTURE_2D, gBufferFBO.getTargets()[0]); // Color target
glGenerateMipmapEXT(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, gBufferFBO.getTargets()[1]); // Depth and normal target
glGenerateMipmapEXT(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, 0);

When I comment all the FBO related stuff, leaving the following code, I get over 250 FPS.

	
glClearColor(0, 0, 0, 0);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

gl.shaderManager.getShader("g-buffer")->use(); // Used to save the correct info to each target
render();
gl.shaderManager.unUseProgram();

Could anyone provide any insight as to how I might improve this?

Thanks

Yes, rendering to two buffers is slower than rendering to one. And I’m guessing from the comment, “Depth and normal target,” that this render target is at least RGBA16F, if not RGBA32F in size. That’ll hurt performance too.

Depth should be coming from the depth buffer you used to render the scene, not a color attachment. An normals don’t deserve anything more than RGB10_A2 image formats, and even that’s overkill.

Also, why are you generating mipmaps for your g-buffers? You really, really don’t need mipmaps for these. You don’t even need GL_LINEAR filtering on them; you’re going to be accessing each texel in turn. Your g-buffers are the same size as your screen (right?), so there’s a 1:1 ratio between texels and pixels. Mipmapping buys you nothing, and generating mipmaps wastes performance.

Thanks for the reply. I’m generating mipmaps because I actually couldn’t get any output at all without it! Still not sure why, hopefully you can help.

Also, the performance drop isn’t just when using MRT, when I just have one buffer enabled its about 70 - 80 FPS, which is still about 3 times lower than without a custom framebuffer.

In regards to the formats of the textures, I was hoping to somehow pack the data like the Killzone 2 engine does, into 3 or 4 RGBA8 textures, but I’ll ask that question later in another thread.

Thanks.

Edit: also, how would you go about creating the g-buffer efficiently if not the way I’m doing it?

Also, I need the depth buffer to be in screen space in a texture, so thats why its in a color attachment (my gbuffer shader computes it)

I’m generating mipmaps because I actually couldn’t get any output at all without it! Still not sure why, hopefully you can help.

Then that’s the first thing you need to fix.

Also, I need the depth buffer to be in screen space in a texture

Depth buffers are always screen space. And there’s nothing stopping you from making depth buffer you use for the FBO a texture.

If you don’t get any output without mipmaps, then maybe you are sampling your G-buffer with a texture filter that uses mipmaps. The default texture filter mode uses mipmaps. You can try nearest filtering without mipmaps.

Not necessarily. It depends on your other requirements. But in the absence of special requirements, yes, it should be.

The OP may not be aware you can 1) use a depth texture as an FBO “depth” buffer and 2) rebind that same texture in the lighting pass to a texture unit and read the window-space Z value from it for position reconstruction.

If you don’t need a separate depth channel in your G-buffer for any particular reason, using the depth buffer you rasterized the G-buffer with in the lighting pass saves you 16-32 bits of write bandwidth per sample rasterizing your G-Buffer.

Its really puzzling me, but for some reason non-mipmapped FBO-attached textures just started magically working, without me changing anything. shrugs That really brought up the performance, although its still certainly sub-optimal.

So, now onto making this better. Does binding a texture as a depth attachment cause it to contain depth values mapped from 0…1 based on the frustum? I’m going to try implementing that now and see if it improves the performance.

Thanks for the help, its really appreciated

Yes, but here’s the catch: as far as I know, you can only bind textures which have a base type of GL_DEPTH_COMPONENT or GL_DEPTH_STENCIL as the depth attachment of an FBO (i.e. it has to support depth). Don’t think you can just go grab a texture with some arbitrary format. For instance: GL_DEPTH_COMPONENT24, GL_DEPTH24_STENCIL8 are examples of internal texture formats that check the box.

Then after you’re done rendering to it, you can rebind it to your shader via a sampler2D (or similar) and read the 0…1 window-space depth values out of it.

Sorry to reopen this topic, but I have a very related question. If I wanted to imitate a linear depth buffer scale like:

linearDepth = -(position).z * inverseFarPlane,

what would I need to correctly map the automatically saved FBO depth buffer?

I’m not entirely sure what the function is that OpenGL uses to map the depth to a texture, so I’ve been having trouble accurately modelling the inverse.