On a daily basis, I happily use a subset of XML which is quite simple, notwithstanding some little issues with namespaces and qualified names, which I can live with. This includes using XML for XHTML, XForms, XSLT, XPL, RSS, DocBook, custom document formats, and whatnots.
But while writing a short introduction to XML in the upcoming book about Web 2.0 we are writing with Alex Vernet, Eric van der Vlist, Danny Ayers and Joe Fawcett, it struck me again how complex XML actually is. You go back to the XML specification, and you realize that XML covers such monstrosities as (gasp) DTDs, entities and (an arguable gasp here) processing instructions. These features alone seem to eat up more than half of the XML specification (I haven't counted the lines). Together they build a tricky mess of scary syntax, difficult concepts, processing hell, and the one question that pops up when you are done skimming through the spec again is: why?
We actually know why: XML has history. For example, it made sense to migrate SGML DTDs to XML back in 1996-1998 when there was nothing else around. Nowadays, for all its shortcomings, people tend to use XML Schema, or the more user-friendly Relax NG and Schematron. There is nothing that DTD offers that these new validation languages don't.
The same goes for entities. They are today used for various purposes, including specifying characters by name, and inclusions. The former use certainly needs a new solution (some have already been proposed), and the latter is solved with the cleaner approach of XInclude.
Finally, processing instructions are a contentious subject. I for one can live with them, but I wouldn't complain if they disappeared in an upcoming revision of XML.
What I have realized is that these issues are no revelation in 2006. I propose below links to several related articles and proposals dating back to 2002:
Tim Bray's 2002 XML-SW proposal.
Kendall Grant Clark's 2002 XML 2.0 -- Can We Get There From Here? article.
Norm Walsh's 2004 XML 2.0 article, which also points to more resources.
The list of issues raised in those articles and proposals goes further than the two or three issues I tackle above, but DTDs and entities appear to be recurrent.
So what do you do until a hypothetical XML 2.0 sees the light of day? It's fairly simple: just make sure you don't unnecessarily carry the burden of legacy and compatibility features such as DTDs and entities, and stick with the features of XML that are simpler to understand.
NOTE: There is currently no such thing as "XML 2.0". The term is just a placeholder for a hypothetical future version of XML that would improve and simplify XML 1.0 and XML 1.1.