GL_TRIANGLES vs GL_TRIANGLE_FAN/STRIP

Using Vertex Index Objects, is there an efficiency difference between glDrawArrayElements(GL_TRIANGLES,…) and glDrawArrayElements(GL_TRIANGLE_FAN/STRIP,…)?

I know that in the old OpenGL (glBegin()/glEnd()), there was a big efficiency issue since, using TRIANGLES, the user’d request that OpenGL repeat a series of triangles in a mesh. However, in version 3.3, since you throw all the indexes at once, does OpenGL do some optimizations for you, removing unnecessary repetitions? I’m guessing not, but hope dies last.

If OpenGL doesn’t optimize and there is a significant difference, then here’s the practical problem. My program needs to display a triangle mesh that’s spit out by a third-party program. The output file is nice and simple, basically in the format:

VRTX (vrtx-id) (x-coord) (y-coord) (z-coord)
VRTX (vrtx-id) (x-coord) (y-coord) (z-coord)
VRTX (vrtx-id) (x-coord) (y-coord) (z-coord)
...
TRGL (vrtx-id) (vrtx-id) (vrtx-id)
TRGL (vrtx-id) (vrtx-id) (vrtx-id)
TRGL (vrtx-id) (vrtx-id) (vrtx-id)
...

Where everything in parentheses is a place-holder for values. So, basically, it declares all the vertexes and their coordinates and then lists the different triangles to be made. This makes it quite perfect for use with OpenGL, but there’s just one problem: the triangles are given in a seemingly random order, so there’s no obvious way to implement it using either TRIANGLE_FAN or _STRIP. Is there some common or clever algorithm to rearrange these so that either FAN or STRIP can be used instead of TRIANGLES?

Using Vertex Index Objects, is there an efficiency difference between glDrawArrayElements(GL_TRIANGLES,…) and glDrawArrayElements(GL_TRIANGLE_FAN/STRIP,…)?

Um, yes. The full answer is much more complicated.

Oh, and there’s no “glDrawArrayElements”.

On most meshes, you will have more index data when you use GL_TRIANGLES than with GL_TRIANGLE_STRIPs (fans are almost completely useless. There are a few corner cases where they’re useful, but outside of the obvious, don’t bother). Naturally, this takes up more memory and therefore more bandwidth to read.

This is the only concern you have if you aren’t interested in doing any processing of your data. If all you want to do is spit the triangles out exactly as they were given to you, then this is your only concern.

However, if you’re interested in maximum performance, then you need to process your data. You will need to implement an algorithm that rearranges the order of triangles so that it improves vertex processing time via more efficient usage of the post-T&L cache.

There are algorithms to do so for triangle lists and for triangle strips. The stripping algorithms are not as efficient at the task however, since they still need to form effective strips. They will still be smaller in memory though, which for exceedingly large meshes can be important.

Your hardware’s vertex cache can only work if it’s given indexes to work from. Indexes may consume more bandwidth, but indexed triangles can give much more efficient triangle reduction in your mesh. Modern hardware is more likely to be better optimised for indexed triangles; in particular because they’re what modern games use. One feeds off the other.

Run Doom 3 under GLIntercept some time and have a look at the draw calls it uses; you won’t find one single strip or fan in the entire scene. Everything goes through “glDrawElements (GL_TRIANGLES, …”.

Oh, I’d certainly use indexes. My question is merely if there is a considerable difference between glDrawElements(GL_TRIANGLES,…) and glDrawElements(GL_TRIANGLE_STRIP,…), but I think that has been appropriately answered.

And I have no idea why I wrote it as glDrawArrayElements before…

You can still use glDrawElements (GL_TRIANGLES,…) but order your vertexes as if they were a strip. There should be no difference at all unless the hardware has some really exotic components especially optimised for strips.

mhagain, you mean GL_TRIANGLE_STRIP? Yeah, I agree.

No, I don’t mean it. Wherever did you get that idea from?

OK, by all means use GL_TRIANGLE_STRIP if you want, and have fun fooling around with degenerate triangles and/or primitive restart. And get a best-case vertex reuse of 1 per tri (instead of a common case of 0.65 and best of 0.5) that limits the amount of geometry you can use. And have nothing more than a heavy investment in an overly complex and fragile renderer to show at the end of it. Have fun handling complex models that need to be represented by a combination of strips and fans too.

Otherwise order your vertexes and indexes in the most appropriate layout and just use GL_TRIANGLES with no messing.

Yes, well the problem is exactly that I don’t know how to order the vertexes. The triangles are given with out-of-order indexes (1-2-3 20-40-35 2-67-9), so part of my question is exactly if there is a known algorithm to sort these triangles into handy ordered lists.

Sure; have a look here: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=257252

A Big Dirty Secret is that if you’re coding on Windows you can actually use D3D to do this too, even in an OpenGL program; the D3D optimization functions don’t require a device to be created. :wink:

It’s worth just using the data as-is and seeing if performance is acceptable enough without optimization though. You might find that you run quite well without needing to commit to ordering the triangles at all.

OK, by all means use GL_TRIANGLE_STRIP if you want, and have fun fooling around with degenerate triangles and/or primitive restart. And get a best-case vertex reuse of 1 per tri (instead of a common case of 0.65 and best of 0.5) that limits the amount of geometry you can use. And have nothing more than a heavy investment in an overly complex and fragile renderer to show at the end of it.

… What?

All of the stuff we’re discussing is on the pre-processing side. It has nothing to do with making a “renderer”, so strips vs. list would introduce neither complexity nor fragilty into the renderer. Nor would either side limit “the amount of geometry you can use.”

Vertex reuse is not the end-all-be-all of vertex processing performance.

I rarely agree with Alfonse, but I do now. At runtime you aren’t going to do anything with triangle lists and triangle strips other than blasting them full speed at the GPU. Well, I don’t know what one could do with the models at runtime, other than blast them to the GPU.

What I meant to say with GL_TRIANGLE_STRIP was, that it is possible to reorder triangles into an optimal (cache-reusing order) and then stripify them. With triangle lists or triangle strips the vertices are sent to the GPU for processing; it can cache tristrip vertices equally well as triangle list vertices. It does take more effort to create the strips though, I agree.

Seems too bothersome to do, but in theory you could get some minimal gains by doing it (less indices to store and less indices to read for the GPU).

What I meant to say with GL_TRIANGLE_STRIP was, that it is possible to reorder triangles into an optimal (cache-reusing order) and then stripify them.

Can you tell me where triangle strips can reuse vertices? Yes, there are some rare situations but usually it won’t reuse any vertices.

Also, if you think about putting all your triangle indices as if you would render triangle strips then you would actually get the same performance as vertex cache miss would be exactly 1 like in case of strips (as mhagain mentioned), except a little bit of overhead due to usually having more indices (however, this may not even be true as you need degenerate triangles or primitive restart indices, thus this advantage is maybe lost as well). But the key point is that triangle lists allow much better reordering than strips allow thus you can reduce the vertex cache miss ratio usually downto about 0.65 (again, as mhagain already mentioned).

Finally, you mentioned to stripify the mesh after reordering the triangles into an optimal (cache-reusing order). Well, stripifying will override the whole order as strips also need special triangle order so your argument is completely invalid.

I don’t say never use triangle strips, but usually you’ll get better performance with triangle lists and it is much less burden from asset preparation point of view. There are, of course, some situations when triangle strips are easy, convenient and more efficient (e.g. terrain rendering), but using strips as a general method for meshes is a definitely wrong direction.

Can you tell me where triangle strips can reuse vertices? Yes, there are some rare situations but usually it won’t reuse any vertices.

If by “reuse vertices” you mean “hit the post-T&L cache,” strips don’t need to explicitly hit the post-T&L cache as often because they’re strips. By definition, the second triangle will use two vertices that are already in the post-T&L cache. As will the third. As will the fourth. Etc. Therefore by definition, they will “reuse” plenty of vertices.

Finally, you mentioned to stripify the mesh after reordering the triangles into an optimal (cache-reusing order). Well, stripifying will override the whole order as strips also need special triangle order so your argument is completely invalid.

Striping will not “override the whole order.” It will override it partially, but that’s a far cry from saying that it will have no or little cache hits. It may not be rendering in the one most optimal order, but it is still getting plenty of post-T&L cache use.

And again, there’s this whole ignoring the fact that strips use no less than 50% fewer indices than lists. That’s really important if you’ve got a lot of indices.

aqnuep, you can stripify an arbitrary sequence of triangles, even if they don’t share any vertices, true you need to use tricks to do it and it may not be worth the effort, but it is possible.

So what I had in mind was stripifying the triangle sequence you would usually draw as GL_TRIANGLES. The vertices would be referenced exactly the same with GL_TRIANGLE_STRIP as with GL_TRIANGLES so there would be exactly the same number of cache hits/misses, but the storage requirements/memory accesses would not be as high.

I am not championing the use of triangle strips, but they are not obsolete, just less convenient to use.

If by “reuse vertices” you mean “hit the post-T&L cache,” strips don’t need to explicitly hit the post-T&L cache as often because they’re strips. By definition, the second triangle will use two vertices that are already in the post-T&L cache. As will the third. As will the fourth. Etc. Therefore by definition, they will “reuse” plenty of vertices.

Yes, exactly you’ve explained. It has a constant cache miss ratio of 1 (in practice more, as there are the degenerate triangles or primitive restart indices).

Striping will not “override the whole order.” It will override it partially, but that’s a far cry from saying that it will have no or little cache hits. It may not be rendering in the one most optimal order, but it is still getting plenty of post-T&L cache use.

Thinking about how small is the post-T&L cache, it is very optimistic to say that it will get “plenty of post-T&L cache use”.

Yes, exactly you’ve explained. It has a constant cache miss ratio of 1 (in practice more, as there are the degenerate triangles or primitive restart indices).

I don’t understand what you’re talking about. Neither degenerate triangles nor primitive restart indices have anything to do with cache missing. Neither of these flush the cache or clear it, so whether the next vertex hits or misses the cache is a function of whether that index is still in the cache or not. And only of that.

How do you compute this “miss ratio of 1” exactly? Is that number of indices vs. number of cache misses? If so, I’m not sure how that’s relevant, since strips also have fewer indices than lists. Of course lists have better a better index-per-miss ratio: they start off with 2-3x as many indices.

But vertexes use more bandwidth than indexes. This is true even of VBOs, you;ve still got bandwidth considerations for moving data from GPU memory to the vertex processor. Even more true if you ever have to deal with dynamic geometry. But it is use-case depedent, OK.

Another thing I forgot to mention is batching, of course.

The last time triangle strips were not obselete was 1998. Unless you’re coding to very specific hardware that prefers triangle strips there is nothing to gain and everything to lose by preferring them. Otherwise I begin to suspect that you have a heavy investment in code that uses strips and you’re unwilling to let it go.

In 1999 Quake 3 Arena was released. John Carmack made a document available for the benefit of driver developers that described the rendering architecture, which is well worth a read; it’s still available here if you’re interested: http://www.gamers.org/dEngine/quake3/johnc_glopt.html

In summary, one of the key points is that 99% of draw calls went through a single function, and THIS is the path for driver developers to optimise. Guess what that path was? “glDrawElements (GL_TRIANGLES, …”

Since then strips have been effectively dead.

But vertexes use more bandwidth than indexes.

That doesn’t mean indicies are free either. Index reads are typically uncached memory reads.

This is true even of VBOs, you;ve still got bandwidth considerations for moving data from GPU memory to the vertex processor.

Which is why attribute inputs are cached memory reads. A cache miss on the post-T&L cache doesn’t mean that you’ll miss on the pre-T&L cache.

Also, you haven’t explained how strips have lots more post-T&L cache misses than lists.

Another thing I forgot to mention is batching, of course.

And what does that have to do with strips vs. lists? They both batch equally, so what’s the problem?

Since then strips have been effectively dead.

Of course. They’re dead because John Carmack, the lord high father of all graphics, decreed it so, not for any actual substantive reason. The source code of the Quake 3 engine is the most perfect rendering engine of all time and everyone should write their engines exactly as he did.

Might I suggest that you try an argument that is something more than an appeal to authority. Doing what Carmack did over a decade ago may be fine for you, but for some of us, we want a coherent explanation for why triangle lists are a priori superior to triangle strips.

I thought I gave one upthread with the example of a model that would otherwise need to be represented with a combination of strips and fans. Or did you miss that?

I thought I gave one upthread with the example of a model that would otherwise need to be represented with a combination of strips and fans. Or did you miss that?

I didn’t see it. Unless it was in an off-thread link or something.

Either way, the possibility of a model that strips poorly does not mean that strips are “obsolete” or otherwise not a worthwhile tool to use.