There has been a fair amount of hype surrounding XML, or the eXtensible Markup Language, over the past year and the hype is slowly but surely growing. I decided it would be a prudent thing to think about, and chewed it over. I came to the following conclusion: XML will fail before it has even started on its most basic premise: openly structured data.
Now careful here, before you take out your pitchforks and hot tar, I’m not saying that XML cannot be useful in a number of arenas. In the backend of websites and in certain limited academic exchanges it may prove the perfect sort of thing to use. But I don’t think that the web will move from HTML to XML.
Why? Because very few people with content want that content to be well-structured externally. One example that is given is of a recipe for cookies. The idea is that if you structure your recipe (author, title, ingedients, directions) then other people can parse that data in a useful way and you could automatically be entered in, say, a recipe database.
While that may sound very cool technologically, it’s a really dumb idea in the bigger picture. If you produce content, you do not want other people to automatically be able to take the content without the context! The majority of content-producing sites on the Net (e.g., Yahoo!, news.com, and Suck, are all advertiser supported. If you could just grab out the author, date, and content from all of their articles and put together a virtual daily newspaper for yourself with none of their advertisements, then you have deprived them of their sole source of revenue. Since anybody in their right mind who could turn the advertising off would, the site would not have a way of sustaining itself and would be forced to shut down. More likely, the site would realize that making their data easy to export in the first place would be an awfully bad idea. As long as the data is hard to extract (but relatively easy to search for), there is a need to come to the page and see the design, the advertisements, the other articles, and all the site has to offer. It gives them a chance to create a website, a homestead, and a destination, not just small chunks of universally amalgamated data.
So what is the answer? I think it is something like slashdot, in which people point other people to neat and useful things going on on the Net (usually to URLs on other sites that are advertiser supported). In this way, people get access to information they find useful, the sites get plenty of traffic and advertising impressions, and everyone walks out happy.
It’s not just advertiser impressions, though. Copyright is important, too. Although I value openness, it is nice to know how and where your work is being used. XML would be the ultimate plagiarist’s tool. So I think that most people who are actually producing content, from the small scale, non-commercial folks (like me!) to the commercial information service sites, will resist a flow to structured data and XML. If websites don’t push over to XML, browsers will have no incentive to work hard to support it. It is an unimportant issue, like asking if the latest release of your product shipped with a Xhosa language pack. XML will be dead on its very premise: content producers do not want their content to be openly structured!