Performance suggestion -- get rid of <p> and <v>

I was doing some profiling of load times on COLLADA files, and I figured out the major culprit:

and <v> elements. These elements represent the vast majority of elements in a typical COLLADA file. For example, for one static mesh test file, there were 2507 total elements in the file, 2330 of which are

elements (92% of the total elements). Elements are very slow to initialize and allocate in the XML parser library – each element represents 4 dynamic memory allocations, and a memory overhead of at least 120 bytes each. As a test, I changed all the "

" and “</p>” elements to “.p.” and “./p.” in a file and the XML parsing speed tripled (the XML parsing time represents 90% of my all-inclusive read time).

The performance difference is huge enough that I think it’s worth it to change the spec to include a concept of an “integer array with separators” that’s a single element (similar to how <array> efficiently holds scalar data). Something like this (using a single character as a separator for space efficiency):


   <polyarray indexCount="14">
0 1 2 p
1 2 3 4 p
5 6 7 H 8 9 10 h 11 p
   </polyarray>

I like the inclusion of the “indexCount” (the total number of indices for the entire array) because it allows me to allocate a buffer for indices without parsing the list first like I have to do now. Note that the capital and lower case “H” elements indicate the start and end of holes – single characters for parsing efficiency.

PLEASE add this or something similar to Collada v1.3!!! I’m tempted to hack it in myself to my own exporter and parser, but I’d rather have an officially supported solution.

Thanks,
Jason Hoerner
Technical Director
Heavy Iron Studios

Can you tell us more about your implementation? and your data set?

The COLLADA sample code can import 1M polygons in about 3 seconds on a typical PC so we know parsing can be plenty fast.

Since we chose XML as our Markup language for COLLADA, we try to avoid using alternative markups within the data, such as your suggested ‘p’, ‘h’, and ‘H’ markup encoding.

I decided to put my code where my mouth is, and I modified my writer and parser to use “polyarray” data as I suggested. It only took a couple hours to make the changes, and my code is still backward compatible with old data. It turns out that running my performance tests in the debugger was skewing my results badly, so Collada parsing isn’t nearly as slow as I thought. All tests below were done from the command line with nothing else running.

I’m using the standard Collada 1.2 distribution, modified to load data into a sorta generic internal binary format. I modified “DAE_SceneRead” to accept input in any of three formats: file, memory buffer, or “xmlDocPtr” (pre-parsed XML). I ran 100 scene read / scene free calls in a row for each test. My computer is a 3.6GHz P4 with 2GB RAM. My first test file has 5 “<polygons>” elements with 3 indices per vertex and 2330 total polygons (all tris and quads). The latter test files contain 10k and 100k triangles respectively with only position data and 1 index per vertex. The three tests (file, memory, pre-parsed XML) are listed as times in seconds, “fast” is with my new polygon index format:

[ul]test model fast – 0.0097, 0.0093, 0.0056
test model slow – 0.0174, 0.0157, 0.0067
10k bunny fast – 0.0102, 0.0072, 0.0050
10k bunny slow – 0.0393, 0.0362, 0.0077
100k buddha fast – 0.129, 0.074, 0.050
100k buddha slow – 0.395, 0.360, 0.080[/ul]
As you can see, in the former implementation, XML parsing represents 60% to 80% of the time of my reader depending on the test case, less than the ~90% I erroneously measured before (darn debugger!)… You can see that overall parse time has improved by around 40% for my example with 3 indices per vertex, or around 70% for the other cases. The file sizes are also a bit smaller – 10% smaller for the 3-index test model, and 20% smaller for the others.

There’s a substantial advantage to getting rid of the

and <v> elements in the format, although it’s not as significant as I made it out before. Still, I would like to see the idea considered. I know format changes are painful, but this one is relatively simple. It was exactly 10 lines of changes to the Maya exporter to export the new data format (I can send someone the source if they are curious), and around 20 lines added to my parser. I know it breaks the spirit of XML markup, but I think for anything that represents a large array of homogenous data (such as scalar float arrays or polygon index data), performance considerations become important. The change doesn’t hurt human readability in my opinion, which I know is a big reason for using XML.

–Jason

Just wanted to add a clarification – the times I listed above are in average seconds per test, not seconds for all 100 tests… If Collada was that fast, I wouldn’t complain at all :).

Which XML parser are you using? We use one made with FLEX that gives a callback for each element (no dynamic memory allocation), so parsing XML

elements is about as fast as parsing custom markup would be.

I’m using “libxml2” which comes with the Collada distribution. I’m also using their utility libraries (DAE.c, Read.c, Write.c) which are dependent on the libxml2 DOM, thus I can’t switch to a callback-based parser. I may stop using those source files in the future so I can make a faster parser, but for now I’m stuck with it. Parsing custom markup would still be a bit faster though, since you avoid callback and context tracking overhead. Plus the custom markup version results in a smaller file size (mainly due to elimination of leading whitespace).

I still like the idea of including an “indexCount” attribute in “polygons” or “combiner” elements with a count of the total number of indices. This wouldn’t break any existing applications since they could just ignore the attribute if they didn’t care about it. It would just make it more efficient to allocate space when reading in a file if it was there.

I like that idea too. I think it will help improve the speed of parsing.

You were also saying that reading <v> and

is slow… So here is my alternative suggestion.

I have not tested this idea, but I believe that for some cases it will give smaller file size and faster reading. For other cases it may give a slightly bigger file size compared to the current method.

My idea is to keep a unique list of <v>.

<combiner> now holds a reference to one of these <v> for each vertex.

e.g.

<source>
<array type=“int” id=“weight-vertex-array” count=“…”>0 2 3 4 5 6 …</array>
<technique profile=“COMMON”>
<accessor source=“#…” count=“…” stride=“2”>
<param name=“WEIGHT_INDEX” type=“float” flow=“OUT”/>
<param name=“JOINT_INDEX” type=“float” flow=“OUT”/>
</accessor>
</technique>
</source>

<combiner count=“…”>
<input semantic=“WEIGHT_VERTEX” source=“#weight-vertex-array”>
0 0 1 2 3 4 6 8 9</combiner>

Each element in the combiner maps to an element in the weight-vertex-array. Each element in this array maps to another array which holds the data. We know what type of data each index maps to because of the param name.

For skinning this would help where shapes are entirely skinned to a single bone. (e.g. a head), for meshes this would help where some data is supplied for a single vertex. i.e. vertex 1 is a member of face 1 and face 2. And for both faces it has the same colour and uv coordinates.

Helpful?

I spent the last three days completely rewriting my Collada parser as a SAX parser. I looked at the impact of using “polyarray” data as I originally suggested with a SAX parser model. Here’s my latest figures (times in seconds):


6.5 MB real world test scene
0.464 = DOM from disk
0.121 = DOM pre-parsed
0.122 = SAX from disk
0.116 = SAX from memory
0.107 = SAX from disk, no scientific notation
0.051 = SAX from disk, parse time only
0.035 = Disk I/O time only

4.1 MB buddha (100k tris, ~50k verts)
0.494 = DOM, polygons
0.129 = DOM, poly array
0.108 = SAX, polygons
0.101 = SAX, polygons, no scientific notation
0.095 = SAX, poly array
0.090 = SAX, poly array, no scientifc notation
0.071 = SAX, parse time only
0.021 = Disk I/O time only

In the final analysis, the “polyarray” implementation is not much faster once you go to SAX parsing. I expect it would only be ~5-10% faster on typical scenes, not enough to worry about. I heavily optimized the

and <v> cases in my parser, bypassing the usual stack mechanism I use, and just setting a flag instead. I collect all my

or <v> data into a single index array plus an array marking the start of each polygon or vertex, rather than say having a variable sized allocation per polygon or vertex, which would be insanely expensive. I made some optimizations to floating point parsing as well. I tested the difference between supporting scientific notation or not (I know there was an earlier thread debating this issue, whether “decimal” or “double” should be used for floating point values). For my real-world test scene, it’s around 12%… Not too significant.

I still would like to desperately plead for adding an “indexCount” to <polygons> and <combiner> elements to help me allocate space… Right now I potentially have to re-allocate as I’m parsing

or <v> elements. I initially allocate more than I think I’ll need, but then I’m wasting memory, which isn’t good either…

–Jason

In the past when someone has proposed adding a count or index attribute, the result has been that the attribute becomes a required attribute in the schema. This is because implementations then want to rely on it so it must be required, and it must be correct.

This particular suggestion had come up last year and it was shot down because it makes it more difficult to hand edit geometry. If you add or remove

elements by hand then you have to go and correct that index count attribute by hand as well.

The XML schema allows attributes to be declared optional, and a compliant parser should take that into account. Besides, an implementation doesn’t need to rely on it. In fact, the attribute could be considered purely a “hint” which may be inaccurate, and it would still be useful. My index parsing code looks like this (pseudo-code in STL like syntax – I don’t actually use STL for this array):


    // Init code when the first 

 element found
    // Arbitrary heuristic -- assume 4 vertices average per polygon
    indexList.reserve(numPolygons*4*numIdx);

    // ...

    // Code for each index found -- will reallocate if necessary
    indexList.push_back(newIndex);

All I’d have to do if I had an index count hint is just change the line that initially reserves memory. The code would function 100% correct regardless of how inaccurate the index count was – it would just potentially be less efficient (more re-allocations or some wasted memory), but no more inefficient than my code is now making an arbitrary prediction. The attribute could be called “indexCountHint” so a human reading the file would realize that it’s just a hint, although I’d still prefer it to be accurate.

Lots of things in a Collada file are difficult to hand edit. For instance, if you removed a vertex, you’d have to edit every index in the polygon array, which is not practical to do by hand. While that’s an extreme example, even adding an extra value to the end of an array would require fixing the count in both the array and the accessor. If you add an animation key, you’d have to insert data at the exact same spot in each of the parallel arrays (input, output, tangents), a difficult task.

I happen to think that polygon data is not practically hand-editable anyway. For example, imagine how difficult it would be to do a trivial operation like a single edge collapse on a reasonably sized mesh by hand. Many useful polygon operations require edge/face connectivity information, and the Collada format doesn’t include it.

I’m not saying “everything” in a Collada file needs to be pre-counted. However, for the particular case of polygon and combiner index data, we are dealing with something that will occur in vast quantities in a typical scene, thus having a major influence on memory and parse time.

Thanks,
Jason Hoerner
Technical Director
Heavy Iron Studios, THQ Inc.