depth coordinate

I’m having trouble understanding this line from my book:

In perspective projection, the transformed depth coordinate (like the x and y coordinates) is subject to perspective division by the w coordinate. As the transformed depth coordinate moves farther away from the near clipping plane, its location becomes increasingly less precise. (See Figure 3-18.)

[i]Figure 3-18 : Perspective Projection and Transformed Depth Coordinates

Therefore, perspective division affects the accuracy of operations which rely upon the transformed depth coordinate, especially depth-buffering, which is used for hidden surface removal. [/i]

first I’m wondering, is the depth coordinate the same as the z-coordinate you originally specify in GLvertex? Second, why does the accuracy decrease? Whats causing that to happen?

The depth values are stored as reciprocals in the depth buffer, with the result that depth buffer resolution is best near the near plane but decreases fast when you move out. I suggest you read this: http://www.sjbaker.org/steve/omniv/love_your_z_buffer.html

“is the depth coordinate the same as the z-coordinate you originally specify in GLvertex”

No, it isn’t. You specify a point in WORLD-space, so z there is only your third dimension. The “depth”-value (the z-coordinate in SCREEN-space) is the distance from your CURRENT camera position to the rasterized pixel. Only that it is not the actual distance (in world-units), but it is a value between 0 and 1, where 0 is exactly on your near-plane and 1 is exactly on your far-plane.

Additionally those depth-values are not distributed linearly. A value of 0.5 is not exactly half-way between near and far-plane, but it is much closer to the near-plane. That is exactly what your figure visualizes. The “density” (or distribution) of the depth-values is much higher closer to the near-plane. Therefore close to the near-plane many different distances can be represented (good precision) but close the the far-plane only few different values are representable (bad precision). That is why in games you sometimes see objects flicker (for example a sliding door inside of a wall), because there is not enough precision to differentiate between the depth of the door and the wall, so sometimes the door is rendered, although it should be hidden inside the wall. But when you come closer the problem disappears.

Hope my explanation helps a bit. Keep on reading about it, you will understand it soon.

Jan.

My thanks to Jan, for your explanation, and to you, Zengar, for that link. That article does seem to discuss precisely what I wanted to know, but I am puzzled by this equation:

z_buffer_value = (1<<N) * ( a + b / z )

Where:

 N = number of bits of Z precision
 a = zFar / ( zFar - zNear )
 b = zFar * zNear / ( zNear - zFar )
 z = distance from the eye to the object

…and z_buffer_value is an integer.

I have no idea what the notation (1<<N) stands for! O_o

1<<N is bit-shifting
equals pow(2.0,N)
So, if N=16, (1<<N) = 65536

AH! Thank you, Ilian!

Okay… this confounds me.

So we store the reciprocal of the depth coordinate in the z_buffer, according to the following formula:

z_buffer = 2^(z_buffer_size)far(1 - near/z)/(far - near)

note i applied some factoring, and i’m using far and near instead of z_far and z_near.

Lets assume for simplicity that the buffer size is 4 so that 2^(z_buffer_size) is 16. Let the near = 3 and far = 10. Our equation becomes:
z_buffer = 16*(10)*(1 - 3/z)/7.

Now since this equation is supposed to yield the inverse of the depth coordinate, then z values close to zNear should output a very large number, since the depth coordinate is supposed to be close to zero (1/large number). Yet, plugging in values close to 3 (the near clipping plane) yield values close to 0! (thanks to the (1 - 3/z) factor.) and the depth coordinate would therefore be large for items close to z_near!

Likewise, z values close to Zfar, around z = 10, should give you close to the reciprocal of 1, or just values close to 1. but plugging in 10, yields approximately 16, who’s inverse yields a depth coordinate close to zero for a point near the far plane.

this seems backwards! is my math not correct? Is the formula posted in that link correct?

(edit) Incidentally, my current understanding is that the calculated z_buffer value is inverted to obtain the depth coordinate for that pixel. At least thats how I interpreted what Zengar said. Is this correct? My analysis seems to suggest that the z_buffer is scaled by a constant factor to range from 0 to 1, and this scaled down value is what we call the depth_coordinate. But thats just scaling, not taking the reciprocal!

I’m not so good at this, but i assume:

far*(1 - near/z)/(far - near)

Only this part calculates a value between 0 and 1 (and it is not linearly distributed, but a reciprocal).

Now, z_buffer_size is the bits, that your buffer has, such that

2^(z_buffer_size) * depth

is actually the depth-value (0 - 1) stored in an integer-buffer (well as BITS somehow). And of course this range is between 0 and 16. Of course, when you use that value later on, you need to interpret (int) 16 as (float) 1.0, but that’s done by the hardware.

So, no it is all absolutely correct. It is the final value, no additional scaling to be done.

Jan.

So, no it is all absolutely correct. It is the final value, no additional scaling to be done.

umm…

Of course, when you use that value later on, you need to interpret (int) 16 as (float) 1.0, but that’s done by the hardware.

wouldn’t reducing [0,16] to [0,1] constitute scaling? :stuck_out_tongue: thats exactly what I meant when I asked if the result was scaled.

I THINK I’m getting it now, though. So on a 16 bit buffer, we’d store the z_buffer value as an integer between 0 and 65536 (2^16)
and then the hardware divides this by 65536 (which is what i meant by scaling) to produce a depth coordinate between 0 and 1? Is that how it works?

(edit) Ugh… I’m loosing track here:
The depth (z) coordinate is encoded during the viewport transformation (and later stored in the depth buffer). You can scale z values to lie within a desired range with the glDepthRange() command. (Chapter 10 discusses the depth buffer and the corresponding uses for the depth coordinate.) Unlike x and y window coordinates, z window coordinates are treated by OpenGL as though they always range from 0.0 to 1.0.

depth coordinate, z coordinate, z values, depth buffer, z buffer, z window coordinates. I’m lost in the terminology here, I can’t tell which one’s which.

is this maybe a topic I should have posted in the ‘advanced’ forum?

I attempted to understand your problem but I am getting confused…

when you use this formula (without encoding to an integer):

z_buffer_value = far*(1 - near/z)/(far - near)

z values are between near and far
z_buffer_value values are between 0 and 1
when z=near, z_buffer_value= 0
when z=far, z_buffer_value= 1

So what is the problem? here you get your normalized z value, no?

convert this value as an integer is just a bit shifting (fixed point to integer) and decode, another bit shifting (integer to fp).

originally, I was trying to figure out why the depth test gets less accurate as objects move further away, but now i’m generally trying to understand the exact process of hidden surface removal and stored depth values, and resolve a lot of puzzling questions i have.

My main source of confusion was this: we have normalized device coordinates who’s z values range from -1 to 1, depth buffer values ranging from 0 to (for instance) 65535, and also this rule that all depth values lie between (0, 1). This seemed to be 3 different explanations of the same thing and so I was wondering how they fit together. I was also confused because some of the terminology seemed ambiguous.

Fortunately, i found a really good article on the subject, and I THINK what happens, is each pixel on the screen has an associated depth value (between 0 and 1) but ALSO has an associated z buffer value, which is the one used for hidden surface removal comparisons. I don’t see why we’d have two sets of depth coordinates, which is probably why I was slow to arrive at this hypothesis, but is this correct?

Another thing that puzzled me was this line from my book: The depth (z) coordinate is encoded during the viewport transformation (and later stored in the depth buffer).

the viewport transformation (I thought) dealt only with normalized device coordinates, whereas the previously mentioned algorithms, calculate z buffer values based on eye coordinates. I therefore hypothesized that openGL makes a new set of values for each coordinate system, rather than transforming one directly into another. Is THIS correct?

The OpenGL transformation sequence consists of four steps:

  1. Multiply the modelview matrix with the model space position to get view space position
  2. Multiply the projection matrix with the view space position to get clip space position
  3. divide clip space xyz by clip space w to get normalized device coordinates. These range from -1 to 1, vertices outside that range are clipped.
  4. apply the viewport transformation to get window coordinates.

Vertex shaders replace steps 1 and 2, however many vertex shaders still use a projection matrix so we’ll ignore that for now. We’ll also ignore step 1 and start with view space depth and look what happens to it:

Step 2
The typical projection matrix as created by glFrustum looks like this:

A 0      B         0
0 C      D         0
0 0 (f+n)/(n-f) 2fn/(n-f)
0 0     -1         0

We’ll ignore the x and y components as they are irrelevant for depth. When we multiply this matrix with the view space position we get:

Zclip = ((f+n)/(n-f)) * Zview + (2fn/(n-f)) * Wview
Wclip = -1 * Zview

Wview is usually 1 but it doesn’t hurt to keep it in the equations.

Step 3
Divide clip space coordinates by W to get normalized device coordinates. We can ignore clipping for now.

Zndc = Zclip/Wclip = (f+n)/(f-n) + (2fn/(f-n)) * Wview/Zview

Step 4
The viewport transformation remaps depth from [-1, 1] in NDC to [N, F] in window coordinates. N and F are the values set by calling glDepthRange, note that these are different from the near and far plane distances n and f which are implied in the projection matrix.

Zw = ((F-N)/2) * Zndc + (F+N)/2

N and F default to 0 and 1, respectively, and are both clamped to [0, 1] when you set them. Assuming the default values (as you rarely need to change them), you get:

Zw = Zndc/2 + 1/2
Zw = (fn/(f-n)) * Wview/Zview + [(f+n)/(f-n) + 1]/2

This value is between 0 and 1. It is the value that gets linearly interpolated across triangles in window space to yield a per-sample window space depth value. This is compared to the value already in the depth buffer, and if the test succeeds it gets written to the depth buffer.

How the depth value is actually stored is implementation dependent. Usually you get a 16 or 24 bit normalized integer, but it can also be a 32 bit float representation.

I think this great explanation should be in the FAQ if it is not already.

Thanks for that detailed explanation, XMas! Thats exactly the kind of details I’m interested in.

the last bit is the part I don’t get:

This value is between 0 and 1. It is the value that gets linearly interpolated across triangles in window space to yield a per-sample window space depth value. This is compared to the value already in the depth buffer, and if the test succeeds it gets written to the depth buffer.

How the depth value is actually stored is implementation dependent. Usually you get a 16 or 24 bit normalized integer, but it can also be a 32 bit float representation.

again, we seem to be talking of a window depth coordinate, and a z buffer value, which I’m still unsure are the same or different. What puzzles me is how we are comparing the window depth coordinate, a real number between [0,1] to the z_buffer value, which everyone says is an integer, ranging from 0 to 65535 (assuming 16 bits). Obviously we can’t compare the two without some sort of scaling.

I guess my main question is, are the window depth coordinates, and the z buffer value, the same thing? If they are, why do we say the window coordinates range from 0 to 1 when we store them as an integer from [0, to 2^n]? If they aren’t the same, why do we keep two depth values, and how can the window depth value be compared to the z buffer integer?

The value in the depth buffer is a “window depth coordinate”, meaning it represents a depth in window space. If you want to compare two values, like the depth test does, they both need to be in the same coordinate space. So the incoming depth value for the fragment/sample currently being rendered is also a window depth coordinate.

The depth buffer value is not an integer. It is a real value in the range [0, 1]. Don’t confuse this with the binary representation of said value!

How these values are stored as bit patterns is up to the implementation. Most implementations support a fixed point (aka normalized integer) representation with 16 or 24 bits. Some also support 32 bit floating point values.

okay…

so if I’m to understand you correctly, all values in the depth buffer, represent a real number within [0,1] which we call the window depth coordinates. In otherwords, a window depth coordinate, and a corresponding z buffer value, are stored in the same location, the integer idea is just to describe how its represented in binary.

All correct?

Yes, from the beginning to the end, there is only one depth value which is in different spaces.