On Namespaces in JSON

01-21-2010

David Baron wrote a thought-provoking short post on Distributed Extensibility two months ago, which I had been meaning to comment on. I'm particularly interested in how to handle distributed extensibility of "RPC-like" network messages, which these days usually means talking about REST and JSON. A couple years ago, it meant talking about XML Namespaces, which is a topic that often causes cries of dismay among developers, so the design pressure to do something different (and hopefully better) is strong.

A tiny introduction to XML Namespaces (or, XMLNS) and their discontents, for the uninitiated

To create XML documents which contain elements defined by more than one authority, the XMLNS spec allows authors to embed prefix declarations into an XML document. These prefixes can be used to "namespace-qualify" any other element, which indicates that they belong to that other namespace.

In theory, this means that you can save lots of room by only declaring an external namespace once (which is good, since namespace identifiers tend to be big URLs). In practice, it means that the tag name of an element in XML document cannot be reasoned about with constructing a data structure encoding all the parent elements of the element and dealing with a variety of tricky corner cases. This means that simple lexical scanners (i.e. regular expressions) cannot (100%) correctly process a document that contains namespace prefixes.

A number of important XML specifications, most notably XSL, have an uncomfortable relationship with prefixed elements. A number of other important specifications, such as XML Exclusive Canonicalization (which is a critical piece of XML Signature), have to jump through a number of hoops to interact nicely with XMLNS.

Then, a couple days ago, no less an authority than Tim Bray weighed in with some recommendations on how to think about JSON extensibility. The short version is this: when extending JSON, use globally-unique names, which encode the design authority of the extender, and use the unique names everywhere. This implies that receivers should follow a MustIgnore policy for messages they receive, since any message could contain extensions a receiver doesn't understand. (It is significant that the SOAP specification considering this an important enough feature to encode "MustUnderstand" as a required attribute of the root-level extensibility definition: Bray's proposal has no such mechanism. It could easily be handled at a higher level, as part of the envelope-level wrapping of a JSON message, though.)

I spent a long time dealing with corner cases of XML Namespaces, especially when I was implementing XML Security, so I have a lot of personal scar tissue around what that specification does well, and does not so well. For exploratory purposes, I thought I'd make a quick table of how these approaches stack up.

GoalXML NamespacesGlobal Names in JSON
Globally Unique Up to each domain host, which is managable Same. But terser.
Efficient for Transport Yes: a normalized XML document declares each namespace only once Not particularly. The namespace identifier is effectively redeclared for each attribute.
Self-documenting Good: The URL can be loaded in a web browser Not great: There's a hint to the controlling authority and that's it. Is that a problem? You can problem just pop it into a search engine and you're done.
Document fragments are legal Definitely not: Removing a subelement from an XML document with namespaces requires complicated DOM-level manipulation of the document tree. Yes, trivially
Handles versioning reasonably Not particularly: In most cases, bumping an XML Namespace version means dropping in an entirely new set of element handlers Maybe: Bray's scheme could handle refinement at the level of a single attribute.

On balance, I think Bray's proposal comes out ahead. It's verbose, but that's what gzip is for. The ability to process JSON fragments with more-or-less context-free lexical scanners is a big win. And I think we've learned that having namespace URIs that resolve to documents wasn't all that important.