The OpenGL transformation sequence consists of four steps:
- Multiply the modelview matrix with the model space position to get view space position
- Multiply the projection matrix with the view space position to get clip space position
- divide clip space xyz by clip space w to get normalized device coordinates. These range from -1 to 1, vertices outside that range are clipped.
- apply the viewport transformation to get window coordinates.
Vertex shaders replace steps 1 and 2, however many vertex shaders still use a projection matrix so we’ll ignore that for now. We’ll also ignore step 1 and start with view space depth and look what happens to it:
Step 2
The typical projection matrix as created by glFrustum looks like this:
A 0 B 0
0 C D 0
0 0 (f+n)/(n-f) 2fn/(n-f)
0 0 -1 0
We’ll ignore the x and y components as they are irrelevant for depth. When we multiply this matrix with the view space position we get:
Zclip = ((f+n)/(n-f)) * Zview + (2fn/(n-f)) * Wview
Wclip = -1 * Zview
Wview is usually 1 but it doesn’t hurt to keep it in the equations.
Step 3
Divide clip space coordinates by W to get normalized device coordinates. We can ignore clipping for now.
Zndc = Zclip/Wclip = (f+n)/(f-n) + (2fn/(f-n)) * Wview/Zview
Step 4
The viewport transformation remaps depth from [-1, 1] in NDC to [N, F] in window coordinates. N and F are the values set by calling glDepthRange, note that these are different from the near and far plane distances n and f which are implied in the projection matrix.
Zw = ((F-N)/2) * Zndc + (F+N)/2
N and F default to 0 and 1, respectively, and are both clamped to [0, 1] when you set them. Assuming the default values (as you rarely need to change them), you get:
Zw = Zndc/2 + 1/2
Zw = (fn/(f-n)) * Wview/Zview + [(f+n)/(f-n) + 1]/2
This value is between 0 and 1. It is the value that gets linearly interpolated across triangles in window space to yield a per-sample window space depth value. This is compared to the value already in the depth buffer, and if the test succeeds it gets written to the depth buffer.
How the depth value is actually stored is implementation dependent. Usually you get a 16 or 24 bit normalized integer, but it can also be a 32 bit float representation.