URI encoding, FilePathTest in the model bank

Looking at the FilePathTest example in the model bank, the README.txt seems to assert that URIs need to be %-encoded.

However the Collada 1.4 and 1.5 pdf specs say that all URIs (as far as I can see) should be of type xs:anyURI. This page defines the anyURI data type: XML Schema Part 2: Datatypes Second Edition. It says:

The mapping from anyURI values to URIs is as defined by the URI reference escaping procedure defined in Section 5.4 Locator Attribute of [XML Linking Language]

My interpretation of this is that an anyURI can contain pretty much any character and it is up to the application to %-encode as needed when using the URI to load the associated resource.

Is my interpretation correct?

Cheers,

Edd

Sorry nope. See http://tools.ietf.org/html/rfc3986 for general URI syntax and note that there are scheme specific constraints allowed.

Hi marcus, thanks for your reply.

Could you point me to the place in the XML spec where it references RFC3986? I can’t see it in the definition of xs:anyURI.

Therefore, I still think my conclusion is correct; note the quote I gave from the spec in the original post. It points to “Section 5.4 Locator Attribute of XML Linking Language”. According to the XML spec this is the algorithm that converts from an anyURI to an actual URI as defined by RFCs 2396 and 2732. So, the input to that algorithm – i.e. the stuff in one’s XML file – does not require %-encoding.

Furthermore, I found additional evidence from people who definitely know far more about XML than I do:

http://markmail.org/thread/exljaglxcnhf6s65
http://www.oasis-open.org/committees/uddi-spec/doc/tn/uddi-spec-tc-tn-anyurihandling.htm
http://xformsinstitute.com/essentials/browse/re57.php

Now I think it’s good for consumers to be liberal and accept %-encoded data in URIs where possible, but I don’t think such a scheme is mandated for producers as far as XML (and therefore Collada) is concerned. In fact, I think such producers would technically be at fault.

I will happily be proven wrong, as I really want to get to the bottom of this, but I’m finding little that explicitly disagrees with my interpretation of the XML spec.

Cheers,

Edd

Quick follow-up: I’ve just noticed that RFC3986 obsoletes 2396 and 2732 (sorry, didn’t spot that before), but I don’t think that changes the meaning of the XMl spec.

I guess I don’t understand your question.

My short answer is that you can put any character in the value string of anyURI that passes XML validation using any validating XML parser. Characters that cause such validation to fail need to be percent encoded.