depth coordinate

mikau · June 23, 2008, 3:26pm

I’m having trouble understanding this line from my book:

In perspective projection, the transformed depth coordinate (like the x and y coordinates) is subject to perspective division by the w coordinate. As the transformed depth coordinate moves farther away from the near clipping plane, its location becomes increasingly less precise. (See Figure 3-18.)

[i]Figure 3-18 : Perspective Projection and Transformed Depth Coordinates

Therefore, perspective division affects the accuracy of operations which rely upon the transformed depth coordinate, especially depth-buffering, which is used for hidden surface removal. [/i]

first I’m wondering, is the depth coordinate the same as the z-coordinate you originally specify in GLvertex? Second, why does the accuracy decrease? Whats causing that to happen?

Zengar · June 23, 2008, 4:15pm

The depth values are stored as reciprocals in the depth buffer, with the result that depth buffer resolution is best near the near plane but decreases fast when you move out. I suggest you read this: http://www.sjbaker.org/steve/omniv/love_your_z_buffer.html

Jan · June 23, 2008, 5:44pm

“is the depth coordinate the same as the z-coordinate you originally specify in GLvertex”

No, it isn’t. You specify a point in WORLD-space, so z there is only your third dimension. The “depth”-value (the z-coordinate in SCREEN-space) is the distance from your CURRENT camera position to the rasterized pixel. Only that it is not the actual distance (in world-units), but it is a value between 0 and 1, where 0 is exactly on your near-plane and 1 is exactly on your far-plane.

Additionally those depth-values are not distributed linearly. A value of 0.5 is not exactly half-way between near and far-plane, but it is much closer to the near-plane. That is exactly what your figure visualizes. The “density” (or distribution) of the depth-values is much higher closer to the near-plane. Therefore close to the near-plane many different distances can be represented (good precision) but close the the far-plane only few different values are representable (bad precision). That is why in games you sometimes see objects flicker (for example a sliding door inside of a wall), because there is not enough precision to differentiate between the depth of the door and the wall, so sometimes the door is rendered, although it should be hidden inside the wall. But when you come closer the problem disappears.

Hope my explanation helps a bit. Keep on reading about it, you will understand it soon.

Jan.

mikau · June 23, 2008, 7:40pm

My thanks to Jan, for your explanation, and to you, Zengar, for that link. That article does seem to discuss precisely what I wanted to know, but I am puzzled by this equation:

z_buffer_value = (1<<N) * ( a + b / z )

Where:

 N = number of bits of Z precision
 a = zFar / ( zFar - zNear )
 b = zFar * zNear / ( zNear - zFar )
 z = distance from the eye to the object

…and z_buffer_value is an integer.

I have no idea what the notation (1<<N) stands for! O_o

Ilian_Dinev · June 23, 2008, 8:23pm

1<<N is bit-shifting
equals pow(2.0,N)
So, if N=16, (1<<N) = 65536

mikau · June 23, 2008, 8:31pm

AH! Thank you, Ilian!

mikau · June 24, 2008, 2:41am

Okay… this confounds me.

So we store the reciprocal of the depth coordinate in the z_buffer, according to the following formula:

z_buffer = 2^(z_buffer_size)far(1 - near/z)/(far - near)

note i applied some factoring, and i’m using far and near instead of z_far and z_near.

Lets assume for simplicity that the buffer size is 4 so that 2^(z_buffer_size) is 16. Let the near = 3 and far = 10. Our equation becomes:
z_buffer = 16*(10)*(1 - 3/z)/7.

Now since this equation is supposed to yield the inverse of the depth coordinate, then z values close to zNear should output a very large number, since the depth coordinate is supposed to be close to zero (1/large number). Yet, plugging in values close to 3 (the near clipping plane) yield values close to 0! (thanks to the (1 - 3/z) factor.) and the depth coordinate would therefore be large for items close to z_near!

Likewise, z values close to Zfar, around z = 10, should give you close to the reciprocal of 1, or just values close to 1. but plugging in 10, yields approximately 16, who’s inverse yields a depth coordinate close to zero for a point near the far plane.

this seems backwards! is my math not correct? Is the formula posted in that link correct?

(edit) Incidentally, my current understanding is that the calculated z_buffer value is inverted to obtain the depth coordinate for that pixel. At least thats how I interpreted what Zengar said. Is this correct? My analysis seems to suggest that the z_buffer is scaled by a constant factor to range from 0 to 1, and this scaled down value is what we call the depth_coordinate. But thats just scaling, not taking the reciprocal!

Jan · June 24, 2008, 4:04am

I’m not so good at this, but i assume:

far*(1 - near/z)/(far - near)

Only this part calculates a value between 0 and 1 (and it is not linearly distributed, but a reciprocal).

Now, z_buffer_size is the bits, that your buffer has, such that

2^(z_buffer_size) * depth

is actually the depth-value (0 - 1) stored in an integer-buffer (well as BITS somehow). And of course this range is between 0 and 16. Of course, when you use that value later on, you need to interpret (int) 16 as (float) 1.0, but that’s done by the hardware.

So, no it is all absolutely correct. It is the final value, no additional scaling to be done.

Jan.

mikau · June 24, 2008, 12:02pm

So, no it is all absolutely correct. It is the final value, no additional scaling to be done.

umm…

Of course, when you use that value later on, you need to interpret (int) 16 as (float) 1.0, but that’s done by the hardware.

wouldn’t reducing [0,16] to [0,1] constitute scaling? thats exactly what I meant when I asked if the result was scaled.

I THINK I’m getting it now, though. So on a 16 bit buffer, we’d store the z_buffer value as an integer between 0 and 65536 (2^16)
and then the hardware divides this by 65536 (which is what i meant by scaling) to produce a depth coordinate between 0 and 1? Is that how it works?

(edit) Ugh… I’m loosing track here:
The depth (z) coordinate is encoded during the viewport transformation (and later stored in the depth buffer). You can scale z values to lie within a desired range with the glDepthRange() command. (Chapter 10 discusses the depth buffer and the corresponding uses for the depth coordinate.) Unlike x and y window coordinates, z window coordinates are treated by OpenGL as though they always range from 0.0 to 1.0.

depth coordinate, z coordinate, z values, depth buffer, z buffer, z window coordinates. I’m lost in the terminology here, I can’t tell which one’s which.

mikau · June 24, 2008, 11:32pm

is this maybe a topic I should have posted in the ‘advanced’ forum?

dletozeun · June 25, 2008, 1:10am

I attempted to understand your problem but I am getting confused…

when you use this formula (without encoding to an integer):

z_buffer_value = far*(1 - near/z)/(far - near)

z values are between near and far
z_buffer_value values are between 0 and 1
when z=near, z_buffer_value= 0
when z=far, z_buffer_value= 1

So what is the problem? here you get your normalized z value, no?

convert this value as an integer is just a bit shifting (fixed point to integer) and decode, another bit shifting (integer to fp).

mikau · June 25, 2008, 1:56am

originally, I was trying to figure out why the depth test gets less accurate as objects move further away, but now i’m generally trying to understand the exact process of hidden surface removal and stored depth values, and resolve a lot of puzzling questions i have.

My main source of confusion was this: we have normalized device coordinates who’s z values range from -1 to 1, depth buffer values ranging from 0 to (for instance) 65535, and also this rule that all depth values lie between (0, 1). This seemed to be 3 different explanations of the same thing and so I was wondering how they fit together. I was also confused because some of the terminology seemed ambiguous.

Fortunately, i found a really good article on the subject, and I THINK what happens, is each pixel on the screen has an associated depth value (between 0 and 1) but ALSO has an associated z buffer value, which is the one used for hidden surface removal comparisons. I don’t see why we’d have two sets of depth coordinates, which is probably why I was slow to arrive at this hypothesis, but is this correct?

Another thing that puzzled me was this line from my book: The depth (z) coordinate is encoded during the viewport transformation (and later stored in the depth buffer).

the viewport transformation (I thought) dealt only with normalized device coordinates, whereas the previously mentioned algorithms, calculate z buffer values based on eye coordinates. I therefore hypothesized that openGL makes a new set of values for each coordinate system, rather than transforming one directly into another. Is THIS correct?

Xmas · June 25, 2008, 5:04am

The OpenGL transformation sequence consists of four steps:

Multiply the modelview matrix with the model space position to get view space position
Multiply the projection matrix with the view space position to get clip space position
divide clip space xyz by clip space w to get normalized device coordinates. These range from -1 to 1, vertices outside that range are clipped.
apply the viewport transformation to get window coordinates.

Vertex shaders replace steps 1 and 2, however many vertex shaders still use a projection matrix so we’ll ignore that for now. We’ll also ignore step 1 and start with view space depth and look what happens to it:

Step 2
The typical projection matrix as created by glFrustum looks like this:

A 0      B         0
0 C      D         0
0 0 (f+n)/(n-f) 2fn/(n-f)
0 0     -1         0

We’ll ignore the x and y components as they are irrelevant for depth. When we multiply this matrix with the view space position we get:

Zclip = ((f+n)/(n-f)) * Zview + (2fn/(n-f)) * Wview
Wclip = -1 * Zview

Wview is usually 1 but it doesn’t hurt to keep it in the equations.

Step 3
Divide clip space coordinates by W to get normalized device coordinates. We can ignore clipping for now.

Zndc = Zclip/Wclip = (f+n)/(f-n) + (2fn/(f-n)) * Wview/Zview

Step 4
The viewport transformation remaps depth from [-1, 1] in NDC to [N, F] in window coordinates. N and F are the values set by calling glDepthRange, note that these are different from the near and far plane distances n and f which are implied in the projection matrix.

Zw = ((F-N)/2) * Zndc + (F+N)/2

N and F default to 0 and 1, respectively, and are both clamped to [0, 1] when you set them. Assuming the default values (as you rarely need to change them), you get:

Zw = Zndc/2 + 1/2
Zw = (fn/(f-n)) * Wview/Zview + [(f+n)/(f-n) + 1]/2

This value is between 0 and 1. It is the value that gets linearly interpolated across triangles in window space to yield a per-sample window space depth value. This is compared to the value already in the depth buffer, and if the test succeeds it gets written to the depth buffer.

How the depth value is actually stored is implementation dependent. Usually you get a 16 or 24 bit normalized integer, but it can also be a 32 bit float representation.

dletozeun · June 25, 2008, 6:02am

I think this great explanation should be in the FAQ if it is not already.

mikau · June 25, 2008, 2:02pm

Thanks for that detailed explanation, XMas! Thats exactly the kind of details I’m interested in.

the last bit is the part I don’t get:

This value is between 0 and 1. It is the value that gets linearly interpolated across triangles in window space to yield a per-sample window space depth value. This is compared to the value already in the depth buffer, and if the test succeeds it gets written to the depth buffer.

How the depth value is actually stored is implementation dependent. Usually you get a 16 or 24 bit normalized integer, but it can also be a 32 bit float representation.

again, we seem to be talking of a window depth coordinate, and a z buffer value, which I’m still unsure are the same or different. What puzzles me is how we are comparing the window depth coordinate, a real number between [0,1] to the z_buffer value, which everyone says is an integer, ranging from 0 to 65535 (assuming 16 bits). Obviously we can’t compare the two without some sort of scaling.

I guess my main question is, are the window depth coordinates, and the z buffer value, the same thing? If they are, why do we say the window coordinates range from 0 to 1 when we store them as an integer from [0, to 2^n]? If they aren’t the same, why do we keep two depth values, and how can the window depth value be compared to the z buffer integer?

Xmas · June 26, 2008, 3:04am

The value in the depth buffer is a “window depth coordinate”, meaning it represents a depth in window space. If you want to compare two values, like the depth test does, they both need to be in the same coordinate space. So the incoming depth value for the fragment/sample currently being rendered is also a window depth coordinate.

The depth buffer value is not an integer. It is a real value in the range [0, 1]. Don’t confuse this with the binary representation of said value!

How these values are stored as bit patterns is up to the implementation. Most implementations support a fixed point (aka normalized integer) representation with 16 or 24 bits. Some also support 32 bit floating point values.

mikau · June 26, 2008, 12:18pm

okay…

so if I’m to understand you correctly, all values in the depth buffer, represent a real number within [0,1] which we call the window depth coordinates. In otherwords, a window depth coordinate, and a corresponding z buffer value, are stored in the same location, the integer idea is just to describe how its represented in binary.

All correct?

dletozeun · June 26, 2008, 12:25pm

Yes, from the beginning to the end, there is only one depth value which is in different spaces.