Flip queue (AKA Pre-rendered frames)

It appears that some drivers implement a “flip queue” such that, even with vsync enabled, the first few calls to swap buffers return immediately (queuing those frames for later use). It is only after this queue is filled that buffer swaps will block to synchronize with vblank.

This behavior is detrimental to my application. It creates latency. Does anyone know of a way to disable it or a workaround for dealing with it?

The OpenGL Wiki on Swap Interval suggests a call to glFinish after the swap but I’ve had no such luck with that trick.

[QUOTE=applejohn;1257418]It appears that some drivers implement a “flip queue” such that, even with vsync enabled, the first few calls to swap buffers return immediately (queuing those frames for later use). It is only after this queue is filled that buffer swaps will block to synchronize with vblank.

This behavior is detrimental to my application. It creates latency. Does anyone know of a way to disable it or a workaround for dealing with it?[/QUOTE]

Put a glFinish() after your SwapBuffers() call. This tells OpenGL: “finish all the work I’ve given you including the buffer swap (which waits on vsync), and don’t return to me until you’re done.”

This also has the benefit of making full-frame timing statistics reasonable, which is important if you want to detect when you overrun a frame (i.e. took too long to render and missed a vsync).

Is this true of OpenGL 3.2+? I’ve tried both glFlush() and glFinish() after SwapBuffers() and it appears to have no effect on the flip queue behavior.

How are you gauging the effect on the flip queue behavior?

Also:

It is only after this queue is filled that buffer swaps will block to synchronize with vblank.

This is not what I’ve seen with no glFinish() after swap buffers. What I see is that the API blocks at some random point while queuing some future frame, not necessarily at the end of a frame.

I’m taking a timestamp directly after calling SwapBuffers() (and glFinish() but it makes no difference). What you see is that the first few swaps occur almost instantly. Subsequent swaps align with the refresh interval as expected.

The code is nothing more than:

enable vsync
for a few seconds
  SwapBuffers()
  glFinish()
  record timestamp
end

Here’s what the durations between timestamps look like. Notice the first few small durations before it jumps up to the correct ~16.7ms:

That’s interesting. A few questions come to mind. Are you waiting until the buffer is mapped first? Are you rendering indirectly through a compositing window manager or directly to a screen window? Windows or Linux? If an NVidia, which yield behavior are you using (sched_yield, usleep(0), or never yield).

To verify your assumption that it is the flip queue that’s needing to fill, after say frame 50 or so when you’ve reached a steady state, sleep in your application a second, and then start drawing again. Do you see the same “ramp up” behavior with your frame times?

Yes.

Directly to a screen window.

Windows.

It’s an ATI card. From what I’ve read Nvidia actually offers a way to disable the flip queue in their drivers. ATI apparently used to but no longer does.

Yes, I see the same ramp up behavior after sleeping the application for a second and then drawing again.

Interesting. And weird. Seems to suggest that ATI is not doing a glFinish() when you request one.