Computer speed - issue or non-issue

howie · July 3, 2004, 1:36pm

Excluding sound and other non-visual elements, since the video card can be used to crunch the vertex data (T&L), the speed of the computer shouldn’t be an issue in game speed. Would this be a correct assumption in theory? All the real labor is handed off to the gpu and the cpu is free to do simpler tasks. This assumes the card supports T&L.

ZbuffeR · July 3, 2004, 3:02pm

If you have only static geometry in display lists and you merely animate camera and some objects only with gltranslate/rotate, that may be true.

But you forget to speak about physics, realtime deformations, dynamic textures. Most of these still can not be done on the GPU. Some parts can be done with fancy shaders nowadays, but you will need pretty recent cards.

Ever played Far Cry ? Even at lowests graphic details I get slowdowns when blasting rockets, probably because of the physics and rag-doll dynamics.

Of course, it all depends on what you call ‘non visual elements’…

howie · July 6, 2004, 4:03am

Is it possible the computer can be faster at T&L then the GPU. Say for example all your doing is translating/rotating static verticies. Does it make sence to have the GPU do it? What would you say would be a ruff guess as to the speed difference between doing your own T&L vs the GPU?

ZbuffeR · July 6, 2004, 7:34am

I would say ‘it depends’

No, really, it depends on the CPU, the GPU, and what scene you are talking about.

I seen on this forum some people doing a lot of vertex transformations on the CPU, using SSE or 3dnow!.

howie · July 7, 2004, 6:01am

I did some speed tests and culling can take up to twice as long as translating the points. I don’t suppose the GPU also does the culling?

ZbuffeR · July 7, 2004, 9:25am

>> culling can take up to twice as long as translating the points

Your statement is almost useful
Can you precise in which context ? The operations were done with CPU or GPU ? What is your card/system/OS/RAM/drivers/etc ?

Most T&L GPU will also do the culling per triangle. So it is better to cull only large chunks of geometry with the CPU.

howie · July 7, 2004, 6:17pm

<< Your statement is almost useful >>

Hmmmm, not sure how to take that.

I’m a throw back from the old days of 3D when the programmer had to do everything including projection and rendering in software.

As you can probably guess, my 3D engine does everything in software except render. Other then starting up OpenGL all I use it for is glTexCoord2f, glColor3f, glVertex3f becaue I already have done the T&L and culling.

<< Can you precise in which context ? The operations were done with CPU or GPU ? >>

These are software tests with the CPU with my engine. My past tests with using the GPU didn’t seem to give me the great speed I was expecting. If it was faster, it was only faster by a hair. These were simple rendering tests using an environement of over 30,000 vertices.

<< What is your card/system/OS/RAM/drivers/etc ? >>

My video card is a GForce4 MX 420, Win98, 256 ram, P4 1.5 Ghz. Not a great system by any means.

The thing I wonder about having the video card do the T&L is that you need the transformed vertices and normals back to do other things like collision detection and AI to name a few. I can only assume the speed benifit brought by using the GPU is lost by asking for all that data back across the bus. One step forward, three steps back.

Another thought is if you have the GPU do the T&L, are you taking away some of the rendering speed? Rendering has plenty of floating point calculation in it as well. Many per pixel.

So I timed how long it took to translate the vertices and normals against how long it took to cull with the CPU. The culling was a little bit longer then the translating of all the points.

I’m still doing tests to figure out the speed benifits, if any.

3B1 · July 9, 2004, 2:54pm

Originally posted by howie:

As you can probably guess, my 3D engine does everything in software except render. Other then starting up OpenGL all I use it for is glTexCoord2f, glColor3f, glVertex3f becaue I already have done the T&L and culling.

That sounds like your problem right there…
sending vertex data with glVertex etc. is about the slowest way possible in GL, probably at least an order of magnitude slower than something like VBOs. 3+ function calls per vertex pretty quickly overwhelms any speed benefit of using the hardware.

Originally posted by howie:
[b]

The thing I wonder about having the video card do the T&L is that you need the transformed vertices and normals back to do other things like collision detection and AI to name a few. I can only assume the speed benifit brought by using the GPU is lost by asking for all that data back across the bus. One step forward, three steps back.
[/b]
Trying to get transformed data back from the GPU would probably be much slower than doing it yourself, so if you really need it, you are probably better off doing it on the CPU…

It seems likely though, that AI and collissions could be done without transforming all vertices in most cases, either by using a lower detail mesh, or by using bounding volumes of some sort, in which case GPU t&l becomes more useful.

Originally posted by howie:
[b]

Another thought is if you have the GPU do the T&L, are you taking away some of the rendering speed? Rendering has plenty of floating point calculation in it as well. Many per pixel.

[/b]
T&L and rasterizing are pretty much independent, so overall speed is generally determined by the slower of the 2 for a given scene, not by the total time taken by both.

That does bring up another issue you might have missed if you are new to hardware accelerated OpenGL, which is that you need to make sure the part of the pipeline you are trying to benchmark is actually the part that is controlling the render time…
Try shrinking the window, and see if frame rate improves. If it does, then you are fill rate limited, and transforming vertices faster won’t help (sending them to the GPU more efficiently is still a good idea though, as it will free CPU time for other tasks).

system · July 11, 2004, 4:21am

T&L and rasterizing are pretty much independent, so overall speed is generally determined by the slower of the 2 for a given scene, not by the total time taken by both.
If you set your matrices to identity, it boosts performance. If lighting is also off, it’s like skipping the TnL stage.

I don’t know which is faster: Geforce4 or P4 1.GHz but the TnL stage is not a big issue.
Having a fast fragment pipeline and fast rasterizer is more important (old games have very low poly count).

If you wanna boost performance, try rendering from front to back to reduce fill.
It’s one of the main suggestions for improving performance.

As far as culling goes, these video cards are extremely efficient at culling and that includes the entire Geforce line.

I recommend you don’t do software TnL anyway.