DirectX Next

Anyone have any comments on this article? http://www.beyond3d.com/articles/directxnext/

Looks like the super buffers extension will cover a fair chunk of that functionality. The Topology processor sounds interesting. I’m not so sure about the surfaces tesselator.

I was going to ask the same questions, almost word for word.

I’ve been wanting to use a topology processor for a long time as I’m a big fan of procedural geometry. Emulating this in a VP was just too much of a pain so far.

The virtual video memory confuses me a bit, since that’s what we normally do anyway, at least one-way. Maybe it’s just more “managed” for us simpletons. The big question is whether page faults can be absorbed or will they cause stalls. That’s the tricky bit, IMO.

I do like the bit about arraying all scene matrices and letting geometry seamlessly index into those (my interpretation of the bit about instancing). We’ve talked about a couple of ways to do that with current hardware, but I haven’t heard of any GL extension proposal that’ll allow that explicitly. I could use that yesterday.

Avi

final page of meltdown’03 “Future Features” presentation:
Please send in your feature requests

  • What features are still missing?
  • What scenarios does this feature set still not enable?
  • What syntax would you like to see used to expect these features?
  • Mail to directx@microsoft.com

translation: VaporwareX Next (R).

MICROS~1.OFT clearly doesn’t know yet what DireX Next will be.

Since DX Next will ship with Longhorn, you’re looking at a 2006 launch time. So Microsoft doesn’t really need to have a firm, definitive knowledge of what it will be yet.

Impressive but OpenGL will be able to do the same thing with extention…

Well, MS had been planning on DirectX 9 to last quite a while, a fact made quite obvious by the existance of the very powerful shader version 3.0 specs. During the wait perhaps video card design will mature further. FP32 on all new cards, plus good FP render target and texture support. In truth there’s still a heck of alot that can be improved on video cards with the current DirectX specs and OpenGL [ARB] extensions.

A related question, anyone know the status of the super buffers working group? The ARB meeting notes aren’t up yet from the September meeting. And I think there is supposed to be another meeting coming up this week.

I have a comment on the frame buffer access page: Hardware vendors, please please please give us access to the frame buffer from within fragment program…even if it’s just the fragment we’re about to write to. It’s so important to be able to do customized blending, particularly to higher-precision buffers.

If the statement that hardware vendors want to drop this is accurate, could someone please explain why?

If the statement that hardware vendors want to drop this is accurate, could someone please explain why?

Fundamentally, it’s the same reason why uploading a texture is faster than downloading it from hardware. The pipeline works best/fastest when it goes one way. Which is why the blend stage is at the very end of the pipe, and it is also why turning on alpha blending causes a performance drop.

Also, it prevents them from making parallelizing optimizations, where a program might be running such that it is overwriting the same fragment on two different pipelines. The “alpha blending” unit would be responsible for sorting out which data goes where. If this is possible in advanced hardware, a framebuffer read operation would have to cause a full-pipeline stall.

In general, from a performance standpoint, it is an all-around bad idea.

Besides, what do you need custom blend operations for if you have have infinite length fragment programs?

From an article on the inquirer about a leaked ATI document.
“ATI’s PCI Express…will offer bi-directional simultaneous 4GB/s bandwidth” http://www.theinquirer.net/?article=12991

If they are going to give us lots of bidirectional bandwidth then presumably they are going to make use of that bandwidth. Could it mean we are getting fast frame buffer reads soon?

[This message has been edited by Adrian (edited 12-07-2003).]

Originally posted by Korval:
Fundamentally, it’s the same reason why uploading a texture is faster than downloading it from hardware. The pipeline works best/fastest when it goes one way.

I disagree with your analogy. Texture downloading is slow partially because any remaining graphics instructions must be flushed first, and partially for some mysterious reason I have yet to figure out. Texture downloading is broken somehow…it is certainly NOT AGP speed. I have heard that whatever this mysterious problem is, it will be fixed with PCI express, so it’s not fundamental…in fact it probably has something to do with current PC architecture, not some inherent “you’re going the wrong way” problem.

Which is why the blend stage is at the very end of the pipe, and it is also why turning on alpha blending causes a performance drop.

Blending is at the end of the pipe because has to be. It requires the final about-to-be-written fragment color which is only available at the end of the pipe. It causes a performance drop simply because there is additional calculation (and, yes, a read instruction) per pixel. The performance drop is not very significant, though.

Also, it prevents them from making parallelizing optimizations, where a program might be running such that it is overwriting the same fragment on two different pipelines.

The “alpha blending” unit would be responsible for sorting out which data goes where. If this is possible in advanced hardware, a framebuffer read operation would have to cause a full-pipeline stall.

This argument is wrong if the fragment program is only allowed to read the pixel it is about to write. I think that’s a fair and understandable limitation. I just want a register that has the color of the pixel at the current location.

In general, from a performance standpoint, it is an all-around bad idea.

I don’t think you’ve supported this point.

Besides, what do you need custom blend operations for if you have have infinite length fragment programs?

Sure, I guess if the fragment program could be infinite length and have infinite storage, I could upload my whole scene to it, sort out the transparent items, and software rasterize the triangles. Nothing’s infinite, though. Blending is necessary ANY time you have two different translucent materials on top of eachother in the frame buffer. A (normal, non-infinite) fragment program only deals with 1 fragment from 1 triangle at a time. You can’t get info about other triangles that rasterize to the same point but different depths in screen space, so you can’t use it to do what you would normally use blending to do. Think about seeing through water or glass or rendering a particle system.

I disagree with your analogy. Texture downloading is slow partially because any remaining graphics instructions must be flushed first, and partially for some mysterious reason I have yet to figure out. Texture downloading is broken somehow…it is certainly NOT AGP speed. I have heard that whatever this mysterious problem is, it will be fixed with PCI express, so it’s not fundamental…in fact it probably has something to do with current PC architecture, not some inherent “you’re going the wrong way” problem.

AGP only works one way: to the graphics card. Access from the CPU uses the PCI bus, which is very slow. Apparently, PCI express will solve this.

This argument is wrong if the fragment program is only allowed to read the pixel it is about to write.

My point was that, in an advance architecture (one that is very different from the current one), another pipe might be running on a fragment that is going to write the same pixel. If it issues a read request, it would have to stall until all other dependencies are finished. If it can’t issue that request, then it is up to later parts of the pipeline to stall, ones that might be designed to do so.

A (normal, non-infinite) fragment program only deals with 1 fragment from 1 triangle at a time. You can’t get info about other triangles that rasterize to the same point but different depths in screen space, so you can’t use it to do what you would normally use blending to do.

But what do you need that functionality for? If you’re doing physically correct blending, then you don’t need a fragment program to do it; the current blending technology is sufficient. If you aren’t doing physically correct blending, then tough.

Think about seeing through water or glass or rendering a particle system.

I don’t see how either of those needs anything more blending operations than what they already have.

Originally posted by Korval:
My point was that, in an advance architecture (one that is very different from the current one), another pipe might be running on a fragment that is going to write the same pixel. If it issues a read request, it would have to stall until all other dependencies are finished. If it can’t issue that request, then it is up to later parts of the pipeline to stall, ones that might be designed to do so.

Of course, in that case, a stall would be caused. What would be wrong with making a system with the understanding that if you read from somewhere other than where you are about to write, either order is not guaranteed or there will be a stall?

But what do you need that functionality for?

I’m not sure why you always take this angle when people talk about feature requests. It shouldn’t matter to you, but I’ll answer it anyway. There is not a single thing that one has to have this feature for. It is simply nice. It would be useful in 1000 different cases and essential in none. You can always draw to a buffer then bind it back in as a texture and shuttle it through the entire pipeline again. Nevermind that doing so takes an additional pass and obfuscates code. Nevermind that this may make some scenes almost impossible to get right. Nevermind that it seems horribly inefficient. Nevermind that you could use this to argue for the removal of any blending unit whatsoever, or whole classes of OpenGL functionality. Why would anyone need it? Because it would be damn convenient. Same reason there are polygon modes other than GL_TRIANGLE and the same reason there is an XPD instruction in fragment program.

I don’t see how either of those needs anything more blending operations than what they already have.

Because what we have doesn’t support high-precision blending. What we have doesn’t support exponential light decay based on thickness (obtained by having the ability to read depth or stencil buffer of current pixel in fp). What we have doesn’t support using a different blend equation for color and alpha, or for each color channel. An analogy for you: Ability to do blending by reading frame/depth/stencil buffer from fragment program is to the current blending model as fragment programs are to register combiners. The former is completely general whereas the latter is nothing more than flipping a few hard-wired switches.

I’m not sure why you always take this angle when people talk about feature requests.

Because that is the angle that ought to be taken. Rather than innundating hardware developers with random requests for functionality, the functionality in question should have some justification to it.

I could just as easily say that hardware should do shadows for you. However, the complicated nature of shadows in a scan converter, coupled with the means of accessing it in a fragment program, makes this too difficult to implement in hardware. As such, even making the request is unreasonable.

The existence of fragment and vertex programs is justified. The existence of “primitive” programs is justified. The existence of floating-point render targets and textures is justified. However, there is an argument for each of these features that justifies them. If you can’t justify a feature, it shouldn’t be added.

There is not a single thing that one has to have this feature for. It is simply nice. It would be useful in 1000 different cases and essential in none.

Then name some cases. Justify the necessity of having this functionality in the same way that other features are justified. Or are you simply wanting to have a feature simply to have it? That kind of thinking leads to a hardware nightmare, where you just add an opcode because it sounded like a good idea at the time, rather than evaluating the need for a feature.

If you tell me that an entire class of advanced rendering techniques would use this functionality, and without it they would run 20x slower, and that they are crucial towards the ultimate goal of photorealism, then there is sufficient justification for adding the feature. If you can’t do that for this feature, then there is no point in having it.

Because what we have doesn’t support high-precision blending.

Which hardware developers have promised to provide in the future (NV40/R420). So that point is moot.

What we have doesn’t support exponential light decay based on thickness (obtained by having the ability to read depth or stencil buffer of current pixel in fp).

Which, of course, could be passed in as the “alpha” given the above operations. So, once again, hardly a necessity.

What we have doesn’t support using a different blend equation for color and alpha, or for each color channel.

How useful is this, compared to what you already have? And how often will this functionality be required?

Also, EXT_blend_func_separate exists, so at least RGB and ALPHA can be blended separately. Said functionality could be extended to offer independent RGB blend functions.

An analogy for you: Ability to do blending by reading frame/depth/stencil buffer from fragment program is to the current blending model as fragment programs are to register combiners. The former is completely general whereas the latter is nothing more than flipping a few hard-wired switches.

That’s not justification; that’s explaining the current situation. Also, it presupposes a certain state of mind: that fixed-function is always bad, and that programmability is always good. This is not the case for all fixed-functionality. Should we start ripping out bilinear/trilinear/anisotropic filtering operations and just let the fragment shader do it? It can, so why waste the hardware? Except for the fact that the texture unit will always be much faster at it than a fragment program.

The justification for fragment programs is pretty obvious; a programmable model is needed in order to support the flexibility of modern and advanced graphics needs. Virtually any advanced graphics application will need fragment programs.

Most of these applications will be just fine with the regular alpha blending ops.

I’m just asking questions that hardware developers ask. No more, no less. It is precisely these questions that lead to hardware vendors telling Microsoft to remove the feature from DX Next.

[This message has been edited by Korval (edited 12-08-2003).]

[This message has been edited by Korval (edited 12-08-2003).]

Distance through fog with FBColor:

float depth = … ;

if (gl_FBColor.a > 0.0){
gl_FragColor = depth;
} else {
gl_FragColor = abs(depth - gl_FBColor.a);
}

Trying to do the same with standard blending:

  • Create 2 additional render targets
  • Draw two passes with max and min blending
  • Pass both RTs to the fragment program and subtract there.
  • Take care of the case when you’re inside the fog volume.
  • If you want to use your distance through fog for something more complex than what the standard blending offers (likely) you will need another render target.

[This message has been edited by Humus (edited 12-08-2003).]

Originally posted by Humus:
[b]Distance through fog with FBColor:

[quote]

float depth = … ;

if (gl_FBColor.a > 0.0){
gl_FragColor = depth;
} else {
gl_FragColor = abs(depth - gl_FBColor.a);
}

[This message has been edited by Humus (edited 12-08-2003).][/b][/QUOTE]

Humus - doing volume fog is almost like trying to do a per-pixel water shading based on line of sight depth. Korval explained how to do this above:

Which, of course, could be passed in as the “alpha” given the above operations. So, once again, hardly a necessity.

See, instead of reading from the color or depth buffers, you should make a texture map for your volume fog/water geometry, transform this texture to window space, cast a ray from the viewer through each texel and generate an rgb depth/alpha map for the volume (per frame). Then your blending parameters can be passed in as alpha values and you won’t have to bother the hardware vendors for new redundant features.

In fact, I have come to a new enlightenment. All that I really need is a CPU and frame buffer . I can’t believe I’ve been wasting my money on fancy graphics cards all these years!

Well, i agree with Zeno.
I my app i came to a point where i needed to blend and then modify the value even more. That´s impossible at the moment - at least in one pass.
So i had to do two passes.
So in general it is not necessary, but it if it works fast enough, than it will speed up a lot of programs that use fragment programs. Plus, it makes life a lot easier.

So, we don´t need it, but it could make some stuff “realtime”, which is today simply to slow because of too many required rendering-passes. If that´s not a good reason, than hardware-vendors can drop it.

But i am quite sure that at least one of them will try to make it possible on their hardware - simply because of competition - which would make developers use their hardware in the first place and therefore will force other vendors to add the feature too.

Jan.

Distance through fog with FBColor:

I’m not quite sure what it is that this method is trying to accomplish. The alpha of the destination color seems to be a depth value, but it is also a color (since you’re setting the fragment color to it)?

See, instead of reading from the color or depth buffers, you should make a texture map for your volume fog/water geometry, transform this texture to window space, cast a ray from the viewer through each texel and generate an rgb depth/alpha map for the volume (per frame). Then your blending parameters can be passed in as alpha values and you won’t have to bother the hardware vendors for new redundant features.

I presume the feature you’re interested in is the ability to apply blending based on the eye-radial distance (ie, z-depth) between the object you’re rendering and the objects that have been previously rendered? So, what you really want isn’t color reads; it’s depth reads. So what you should do is just bind the depth buffer as a render source when you do your fog pass. You’re not going to be using depth buffer writes when you’re doing this fogging, so it makes sense.

See? You can find features and power that you didn’t even know you had simply by looking for them. This method doesn’t induce much slowdown, and it certainly doesn’t require hardware developers to rebuild the lower-end of the rendering pipeline to make it operate in reverse.

In fact, I have come to a new enlightenment. All that I really need is a CPU and frame buffer . I can’t believe I’ve been wasting my money on fancy graphics cards all these years!

If you don’t want to justify your requests, then don’t. Just don’t complain when hardware vendors don’t bend over for unjustified requests for features.

Having access to the current frame buffer pixel from a pixel shader would just be damned handy. I can’t count on both hands how many times I’ve wished I had this feature. The lack of it has cost me extra render targets/passes/complexity many times. If it’s really going to cost us more in performance than extra render targets/passes/complexity, than I don’t want it, but I doubt it would (maybe I’m wrong). It would be a nice thing to have, and the HW peeps are in the business of making our lives easier, so why not. I mean, with enough passes and off-screen buffers, you can acheive the effects of any modern shaders on 3-year-old hardware, but it’s such a major pain in the ass (and slow) that it’s just not practical. Giving us access to the current pixel would likewise make a lot of things that are “technically possible” now actually practical. My 2 pesos…

deshfrudu - Thanks for clarifying what I was trying to say above. There is no example I can give that can satisfy Korval, because he will always be able to come up with some other multi-pass method of doing the same thing. It’s a convenience feature, I admit it.

After all of this arguing, my question is still not answered, so let me rephrase it:

What is it about the design of a modern graphics card that would make reading from the destination buffer at the position of the current fragment impractical? Also, how would access to that pixel be much different from the read that must already take place in fixed-function blending?