I’m working on a program that measures various aspects of your OpenGL implementation’s performance. One thing we’re trying to measure is the size of the vertex cache. Currently I’m trying to do this by iteratively rendering indexed vertex arrays, where the independent variable is the number of vertices referenced by the indices. So first we try using indices of value 0 only. Then indices with values in [0, 1]. Then [0, 2]. And so on. The dependent variable in the simulation is the vertex processing rate. The notion is that once we begin to exceed the size of the cache, you’ll see a noticeable performance drop-off. Unfortunately, it looks like performance is constant, independent of the number of vertices referenced (even up to 1024). We’ve tried more expensive per-vertex processing states, such as 8 spot lights and specular materials, in the hopes of making a vertex cache more useful, but that didn’t change anything. It looks like my Radeon 9800 may optimize out glLoadIdentity() in the modelview and do full multiplication on matrices loadde with glLoadMatrixf(identitymatrix), but I’m still not getting a nice performance plateau and then drop-off.
Does anyone have any ideas as to how I could go about measuring this?
That’s exactly what I’ve been doing, but the measured results have been totally bizarre. On all nvidia cards I’ve tried (QuadroFX 1000, GF3, GF4MX), it doesn’t seem to matter how many vertices you hit… On my Radeon, the only difference I’ve been able to see is if I use glLoadMatrix as opposed to glLoadIdentity. It doesn’t matter (in this particular case) what is in the loadmatrix call. I agree that drivers could optimize away identity matrices no matter how they’re specified, though.
Is there any vendor documentation on exactly how vertex caches work?
Originally posted by namespace:
[b]Found these two lines in nvidias tristrip-library:
//GeForce1 and 2 cache size #define CACHESIZE_GEFORCE1_2 16
//GeForce3 cache size #define CACHESIZE_GEFORCE3 24[/b]
That`s interesting and sort of make sense, since the cards of the late 90s had a v-cache size of 4 and some 8.
We can assume the FX generation has 32.
aegis, you might want to look at one of DX tools called MeshViewer (or something like that)
It comes with the DX SDK, and you can have it do some tests for you, then optimize the mesh.
You mean what the cache scheme is? What the cache line size is?
I don`t think any company documents this.