Metadata for News
Jeff Jarvis reports on a proposal by the Associated Press and the Media Standards Trust for a new standard for metadata for news. The motivator for this appears to be Google's announcement in May that they will support metadata display in their search preview (a feature they are calling "rich snippets" -- and, as an aside, the comments on that page are a wonderful example of what Google is in for on this).
Commenters at Jarvis' site point out that the International Press Telecommunications Council has been working on metadata standards for news for years, with recommendations ranging from Dublin Core XML to NewsML, SportsML, and EventsML. That is hardly surprising -- the wonderful thing about metadata standards is that there are so very many to choose from.
Aside: The Media Standards Trust, which I've never heard of before this, is an "independent, UK-based registered charity", funded by the MacArthur Foundation and the Knight Foundation, with staff at MIT and Southampton University. They appear to be trying to put some higher polish on the idea of metadata-enhanced news. Okay, fine.
I'm feeling a strange sense of deja vu on this. I spent a while at Apple in the late 90's helping define and push the Dublin Core metadata standard, which was an effort to define a standard set of cross-domain tags. At the time, I remember a lot of talk about how the widespread adoption of these standards was a critical requirement for proper search and indexing of the web. Without machine-readable creator, publisher, format, language, and rights tagging of everyweb page, we said, the web will never become a trusted, global information system.
In practice, of course, things turned out rather differently. Google took the idea that subject search was the most important user task, and that reputation was the critical ranking factor. They identified a critical piece of metadata that the Dublin Core process had missed entirely: the links to a page from elsewhere. The absence of this information was a critical factor in HTTP/HTML's success (for more on alternative futures, read up on Project Xanadu) -- but reconstructing the list of inbound links by sheer brute force allowed Google to infer both the important subjects, and the relative level of community approval, for every publicly-viewable page on the web. The consequences are, by now, obvious.
What does this mean for news publishing in 2009? I'm not completely sure yet. But I am certain that anyone that says, "if only we had the right metadata on this digital artifact, we would solve our search / revenue / digital rights management / relevance issues," is barking up the wrong tree.