Web Development

April 16th, 2019Why you should be using HTML 4.01 instead of XHTML

Digg It

Del.icio.us

Reddit

Furl It

My Web

Why you should be using HTML 4.01 instead of XHTML

Original Post: Beware of XHTML

If you’re a web developer, you’ve probably heard about XHTML, the markup language developed in 1999 to implement HTML as an XML format. Most people who use and promote XHTML do so because they think it’s the newest and hottest thing, and they may have heard of some (usually false) benefits here and there. But there is a lot more to it than you may realize, and if you’re using it on your website, even if it validates, you are probably using it incorrectly.

I should make it clear that I hope XHTML has a bright future on the Web. That is precisely why I have written this article. The state of XHTML on the Web today is more broken than the state of HTML, and most people don’t realize because the major browsers aren’t even treating those pages like real XHTML. If you hope for XHTML to succeed on the Web, you should read this article carefully.

Some of the issues discussed in this article are complicated and technical. If you find it difficult to follow, I suggest at least taking a look at the myths of XHTML, examples of latent compatibility issues, and the list of standards-related XHTML sites that break when treated properly.

Some quotes from prominent people/vendors:

Microsoft (Internet Explorer):: “If we tried to support real XHTML in IE 7 we would have ended up using our existing HTML parser (which is focused on compatibility) and hacking in XML constructs. It is highly unlikely we could support XHTML well in this way”
Mozilla (Firefox):: “If you are using the usual HTML features […] serving valid HTML 4.01 as text/html ensures the widest browser and search engine support.”
Apple (Safari):: “On today’s web, the best thing to do is to make your document HTML4 all the way. Full XHTML processing is not an option, so the best choice is to stick consistently with HTML4.”
Håkon Wium Lie (from Opera, W3C):: “I don’t think XHTML is a realistic option for the masses. HTML5 is it.”
Anne van Kesteren (from Opera):: “I’m an advocate of using XHTML only in the correct way, which basically means you have to use HTML. Period.”
Ian Hickson (from Opera, Google, W3C):: “Authors intending their work for public consumption should stick to HTML 4.01″

What is XHTML?
Myths of XHTML
Benefits of XML
Content type is everything
HTML compatibility guidelines
Internet Explorer incompatibility
Content negotiation
Null End Tags (NET)
Firefox and other problems
Conclusion
List of standards-related sites that break as XHTML
List of standards-related sites that stick with HTML
Related sites
See also

What is XHTML?

XHTML is a markup language hoped to eventually (in the distant future) replace HTML on the Web. For the most part, an XHTML 1.0 document differs from an HTML 4.01 document only in the lexical and syntactic rules: HTML is written in its own unique subset of SGML, while XHTML is written in a different subset of SGML called XML. SGML subsets are differentiated by the sets of characters that delimit tags and other constructs, whether or not certain types of shorthand markup may be used (such as minimized attributes, omitted start/end tags, etc.), whether or not tag names or character entities are case sensitive, and so on.

The Document Type Definition (DTD, which is referenced by the doctype) then defines which elements, attributes, and character entities exist in the language and where the elements may be in the document. The DTDs of XHTML 1.0 and HTML 4.01 are nearly identical, meaning that, as far as things like elements and attributes go, XHTML 1.0 and HTML 4.01 are basically the same language. The only added benefit of XHTML is that it uses XML’s subset of SGML and shares the benefits XML has over HTML’s subset.

Myths of XHTML

There are many false benefits of XHTML promoted on the Web. Let’s clear up some of them at a glance (with details and other pitfalls provided later):

XHTML does not promote separation of content and presentation any more than HTML does. XHTML has all of the same elements and attributes (including presentational ones) that HTML has, and it doesn’t offer any additional CSS features. Semantic markup and separation of content and presentation is absolutely possible in HTML and is equally easy.
Most XHTML pages on the Web are not parsed as XML by today’s web browsers. The vast majority of XHTML pages on the Web cannot be parsed as XML. Even many valid XHTML pages cannot be parsed as XML. See the Validity and Well-Formedness article for details and examples.
HTML is not deprecated and is not being phased out at this time. In fact, the World Wide Web Consortium recently renewed the HTML working group which is working to develop HTML 5.
XHTML does not have good browser support. Most browsers simply treat XHTML pages as regular HTML (which presents a number of problems). Some major browsers like Firefox, Opera, and Safari may attempt to handle the page as proper XHTML, but usually only if you include a certain special HTTP header. However, when you do so, Internet Explorer and a number of other user agents will choke on it and won’t display a page at all. Even when handled as XHTML, the supporting browsers have a number of additional bugs.
Browsers do not parse valid XHTML dramatically faster than valid HTML, even when they’re parsing XHTML correctly. Although the browser can lose certain shorthand logic, it now has to use extra logic to confirm that the document is well-formed. Although XHTML, when parsed with an XML parser, may be somewhat faster to parse than typical HTML, the difference usually isn’t very significant. And either way, download speed is usually the bottleneck when it comes to document parsing, so users won’t notice any speed improvement.
XHTML is not extensible if you hope to support Internet Explorer or the number of other user agents that can’t parse XHTML as XML. They will handle the document as HTML and you will have no extensibility benefit.
XHTML source does not necessarily look much different from HTML source. If you prefer making sure all of your non-empty elements have close tags, you may use close tags in HTML, too. The only real markup differences between an HTML document and an XHTML document following the legacy compatibility guidelines are the doctype, html element, and the /> tag ends (which are just XML shorthand constructs like so many people claim to dislike about HTML).

Benefits of XML

XML has a number of improvements over HTML’s subset of SGML:

Although HTML’s subset allowed for a lot of shorthand markup and other flexibility, it proved too difficult to write a correct and fully-featured parser for it. As a result, most user agents, including all of today’s major web browsers, make many technically unsound assumptions about the lexical format of HTML documents and don’t support a number of shorthand features like Null End Tags (<tag/Content/), unclosed start/end tags (<tag<tag>), and empty tags (<>). XML was designed to eliminate these extra features and restrict documents to a tight set of rules that are more straight-forward for user agents to implement. In effect, XML defines the assumptions that user agents are allowed to make, while still resulting in a file that a theoretical fully-featured SGML user agent could parse once pointed to XML’s SGML declaration.It should be noted that an XML parser for the most part is not dramatically easier to write than the level of HTML support offered by most HTML parsers. Most of the features that would make HTML more difficult to write a parser for, such as custom SGML declarations, additional marked sections, and most of the shorthand constructs, have negligible use on the Web anyway and generally have poor or absent support in major web browsers. The most significant difference is XML’s lack of support for omitted start and end tags, which in theory could amount to complicated logic in HTML for elements not defined as empty. Even still, most browsers have those rules hard-coded rather than derived from the DTD, so this isn’t a major difference in difficulty either.
To minimize the occurrence of nasty surprises when parsing the document, XML user agents are told to not be flexible with error handling: if a user agent comes upon a problem in the XML document, it will simply give up trying to read it. Instead, the user will be presented with a simple parse error message instead of the webpage. This eliminates the compatibility issues with incorrectly-written markup and browser-specific error handling methods by requiring documents to be “well-formed”, while giving webpage authors immediate indication of the problem. This does, however, mean that a single minor issue like an unescaped ampersand (&) in a URL would cause the entire page to fail, and so most of today’s public web applications can’t safely be incorporated in a true XHTML page.While user agents are supposed to fail on any page that isn’t well-formed (in other words, one that doesn’t follow the generic XML grammar rules), they do not have to fail on a page that is well-formed but invalid. For example, although it is invalid to have a span element as an immediate child of the body element, most XML-supporting web browsers won’t provide indication of the error because the page is still well-formed — that is, the DTD is violated, but not the fundamental rules of XML itself. Some user agents may choose to be “validating” agents and will also fail on validity errors, but they aren’t common.Despite popular assumption, even if an XML page is perfectly valid, it still might not be well-formed.
Unlike HTML’s subset, which was specifically made for HTML, XML is a common subset used in many different languages. This means that a single simple parser can easily be written to support a number of different languages. It also paved the way for the Namespaces in XML standard which allows multiple documents in different XML formats to be combined in a single XML document, so that you can have, for example, an XHTML page that contains one or more SVG images that use MathML inside them.

Content type is everything

When your website sends a document to the visitor’s browser, it adds on a special content type header that lets the browser know what kind of document it’s dealing with. For example, a PNG image has the content type image/png and a CSS file has the content type text/css. HTML documents have the content type text/html. Web servers typically send this content type whenever the file extension is .html, and server-side scripting languages like PHP also typically send documents as text/html by default.

XHTML does not have the same content type as HTML. The proper content type for XHTML is application/xhtml+xml. Currently, many web servers don’t have this content type reserved for any file extension, so you would need to modify the server configuration files or use a server-side scripting language to send the header manually. Simply specifying the content type in a meta element will not work over HTTP.

When a web browser sees the text/html content type, regardless of what the doctype says, it automatically assumes that it’s dealing with plain old HTML. Therefore, rather than using the XML parsing engine, it treats the document like tag soup, expecting HTML content. Because HTML 4.01 and simple XHTML 1.0 are often very similar, the browser can still understand the page fairly well. Most major browsers consider things like the self-closing portion of a tag (as in <br />) as a simple HTML error and strip it out, usually ending up with the HTML equivalent of what the author intended.

However, when the document is treated like HTML, you get none of the benefits XHTML offers. The browser won’t understand other XML formats like MathML and SVG that are included in the document, and it won’t do the automatic validation that XML parsers do. In order for the document to be treated properly, the server would need to send the application/xhtml+xml content type.

The problems go deeper. Comment markers are sometimes handled differently depending on the content type, and when you enclose the contents of a script or style element with basic SGML-style comments, it will cause your script and style information to be completely ignored when the document is treated like XML. Also, any special markup characters used in the inline contents of a style or script element will be parsed as markup instead of being treated as character data like in HTML. To solve these problems, you must use an elaborate escape sequence described in the article Escaping Style and Script Data, and even then there are situations in which it won’t work.

Furthermore, the CSS and DOM specifications have special provisions for HTML that don’t apply to XHTML when it’s treated as XML, so your page may look and behave in unexpected ways. The most common problem is a white gap around your page if you have a background on the body, no background on the html element, and any kind of spacing between the elements, such as a margin, padding, or a body height under 100% (browsers typically have some combination of these by default). In scripting, tag names are returned differently and document.write() doesn’t work in XHTML treated as XML. Table structure in the DOM is different between the two parsing modes. These are only a select few of the many differences.

The following are some examples of differing behavior between XHTML treated as HTML and XHTML treated as XML. The anticipated results are based on the way Internet Explorer, Firefox, and Opera treat XHTML served as HTML. Some other browsers are known to behave differently. Also note that Internet Explorer doesn’t recognize the application/xhtml+xml content type (see below for an explanation), so it will not be able to view the examples in the second column.

Example 1	Example 1
Example 2	Example 2
Example 3	Example 3
Example 4	Example 4
Example 5	Example 5
Example 6	Example 6
Example 7	Example 7
Example 8	Example 8
Example 9	Example 9
Example 10	Example 10

HTML compatibility guidelines

When the XHTML 1.0 specification was first written, there were provisions that allowed an XHTML document to be sent as text/html as long as certain compatibility guidelines were followed. The idea was to ease migration to the new format without breaking old user agents. However, these provisions are now viewed by many as a mistake. The whole point of XHTML is to be an XML alternative to HTML, yet due to the allowance of XHTML documents to be sent as text/html, most so-called XHTML documents on the Web now would break if they were treated like XML (see the real-world examples below). Aware of the problem, the W3C had these provisions removed in the first revision of the XHTML specification. In XHTML 1.1 and onward, the W3C now clearly says that an XHTML document should not be sent as text/html. XHTML should be sent as application/xhtml+xml or one of the more elaborate XHTML content types.

Internet Explorer incompatibility

Internet Explorer does not support XHTML. Like other web browsers, when a document is sent as text/html, it treats the document as if it was a poorly constructed HTML document. However, when the document is sent as application/xhtml+xml, Internet Explorer won’t recognize it as a webpage; instead, it will simply present the user with a download dialog. This issue still exists in Internet Explorer 7.

Although all other major web browsers, including Firefox, Opera, Safari, and Konqueror, support XHTML, the lack of support in Internet Explorer as well as major search engines and web applications makes use of it very discouraged.

Content negotiation

Content negotiation is the idea of sending different content depending on what the user agent supports. Many sites attempt to send XHTML as application/xhtml+xml to those who support it, and either XHTML as text/html or real HTML to those who don’t.

There are two methods generally used to determine what the user agent supports, using the Accept HTTP header: most often, sites use the incorrect method where they simply look for the string “application/xhtml+xml” in the header value; although some sites will use the correct method, where they actually parse the header value, supporting wildcards and ordering by q value.

Unfortunately, neither of these methods works reliably.

The first method doesn’t work because not all XHTML-supporting user agents actually have the text “application/xhtml+xml” in the Accept header. Safari and Konqueror are two such browsers. The application/xhtml+xml content type is implied by a wildcard value instead. Meanwhile, not all HTML-supporting user agents have “text/html” in the header. Internet Explorer, for example, doesn’t mention this content type. Like Safari and Konqueror, it implies this support by using a wildcard. Even among those user agents that support XHTML and mention application/xhtml+xml in the header, it may have a lower q value than text/html (or a matching wildcard), which implies that the user agent actually prefers text/html (in other words, its XHTML support may be experimental or broken).

The second method (the correct, 100% standards-complaint one) doesn’t work because most major browsers have inaccurate Accept headers:

Firefox 2 and below have application/xhtml+xml listed with a higher q value than text/html, even though Mozilla has posted an official recommendation on its site saying that websites should use text/html for these versions if they can, for reasons described below.
Internet Explorer doesn’t list either text/html or application/xhtml+xml in its Accept header. Instead, both content types are covered by a single wildcard value (which implies that every content type in existence is supported equally well, which is obviously untrue). So Internet Explorer is saying that it supports both text/html and application/xhtml+xml equally, even though it actually doesn’t support application/xhtml+xml at all. In the case that a user agent claims to support both equally, the site is supposed to use its own preference. A possible workaround is for the site to “prefer” sending text/html or, in a toss-up situation, only send application/xhtml+xml if it’s actually mentioned explicitly in the header. However…
Safari and Konqueror, which support XHTML, also gives text/html and application/xhtml+xml the same q value (in fact, like Internet Explorer, they also claim to support everything in existence equally well). But they don’t mention application/xhtml+xml explicitly — it’s implied by a wildcard. So if you use the above workaround, Safari and Konqueror will receive text/html even though they really do support application/xhtml+xml.

As disappointing as it may be, content negotiation simply isn’t a reliable approach to this problem.

Null End Tags (NET)

In XHTML, all elements are required to be closed, either by an end tag or by adding a slash to the start tag to make it self-closing. Since giving empty elements like img or br an end tag would confuse browsers treating the page like HTML, self-closing tags tend to be promoted. However, XML self-closing tags directly conflict with a little-known and poorly supported HTML/SGML feature: Null End Tags.

A Null End Tag is a special shorthand form of a tag that allows you to save a few characters in the document. Instead of writing <title>My page</title>, you could simply write <title/My page/ to accomplish the same thing. Due to the rules of Null End Tags, a single slash in an empty element’s start tag would close the tag right then and there, meaning <br/ is a complete and valid tag in HTML. As a result, if you have <br/> or <br />, a browser supporting Null End Tags would see that as a br element immediately followed by a simple > character. Therefore, an XHTML page treated as HTML could be littered with unwanted > characters.

This problem is often overlooked because most popular browsers today are lacking support for Null End Tags, as well as some other SGML shorthand features. However, there are still some smaller user agents that properly support Null End Tags. One of the more well-known user agents that support it is the W3C validator. If you send it a page that uses XHTML self-closing tags, but force it to parse the page as HTML/SGML like most user agents do for text/html pages, you can see the results in the outline: immediately after each of the self-closing elements, there is an unwanted > character that will be displayed on the page itself.

(It should be noted that the W3C Validator is unusual in that it generally determines the parsing mode from the doctype, rather than from the content type as most other user agents do. Therefore, an HTML doctype was used in the above example just so the validator would attempt to parse the page using the HTML subset of SGML as all major browsers will for text/html pages regardless of the doctype. The Null End Tag rules are actually set in the SGML subset definition, not the DTD, so this example is accurate to what you should expect in a fully compliant SGML user agent even with an XHTML doctype.)

Technically, a restricted and altered form of Null End Tags exists in XML and is frequently used: the self-closing portion of the start tag. While Null End Tags are defined as / … / in HTML’s subset of SGML, they are specially defined as / … > in XML with the added restriction that it must close immediately after it is opened, meaning the element must have no content. This was designed to look similar to a regular start tag for web developers who are unfamiliar with typical Null End Tags. However, in the process it creates inherent incompatibility with HTML’s subset of SGML for all empty elements.

In summary, although this issue doesn’t show in most popular web browsers, a user agent that more fully supports SGML would see unwanted > characters all over XHTML pages that are sent with the text/html content type. If the goal of using XHTML is to help promote standards, then it’s quite counterproductive to cause unnecessary problems for user agents that more correctly comply to the SGML standard.

Firefox and other problems

Although Firefox supports the parsing of XHTML documents as XML when sent with the application/xhtml+xml content type, its performance in versions 2.0 and below is actually worse than with HTML. When parsing a page as HTML, Firefox will begin displaying the page while the content is being downloaded. This is called incremental rendering. However, when it’s parsing XML content, Firefox 2.0 and below will wait until the entire page is downloaded and checked for well-formedness before any of the content is displayed. This means that, although in theory XML is supposed to be faster to parse than HTML, in reality these versions of Firefox usually display HTML content to the user much faster than XHTML/XML content. Thankfully, this issue is expected to be resolved in Firefox 3.0.

However, there are also issues in other browsers, such as certain HTML-specific provisions in the CSS and DOM standards being mistakenly applied to XHTML content parsed as XML. For example, if there is a background set on the body element and none on the html element, Opera will apply the background to the html element as it would in HTML. So even when dealing exclusively with XHTML parsed as XML, you still run into a number of the same problems that you do when trying to serve XHTML either way.

All in all, true XHTML support in major user agents is still very weak. Because a key user agent — namely, Internet Explorer — has made no visible effort to support XHTML, other major user agents have continued to see it as a relatively low priority and so these bugs have lingered. HTML is recommended over XHTML by both Mozilla and Safari and is generally better supported than XHTML by all major browsers.

Conclusion

XHTML is a very good thing, and I certainly hope to see it gain widespread acceptance in the future. However, it simply isn’t widely supported in its proper form. XHTML is an XML format, and to force a web browser to treat it like HTML is going against the whole purpose of XHTML and also inevitably causes other complications. Assuming you don’t want to dramatically limit access to your information, XHTML can only be used incorrectly, be interpretted as invalid markup by most user agents, cause unwanted results in others, and offer no added benefit over HTML. HTML 4.01 Strict is still what most user agents and search engines are most accustomed to, and there’s absolutely nothing wrong with using it if you don’t need the added benefits of XML. HTML 4.01 is still a W3C Recommendation, and the W3C has even announced plans to further develop HTML alongside XHTML in the future.

List of standards-related sites that break as XHTML

The following are just a few of the countless sites that use an XHTML doctype but, as of this moment of writing, completely fail to load or otherwise work improperly when parsed as XML, thus missing the whole point of XHTML. The authors of most of these sites are quite prominent in the web standards community — many are involved in the Web Standards Project (WaSP) — yet they have still fallen victim to the pitfalls of current use of XHTML. In fact, I have found that nearly all XHTML websites owned by WaSP members have failures when parsed as XML.

You could consider this a “shame list” of sorts. These are the same people who are supposed to be teaching others how to use web standards properly, yet they have written markup that basically depends on browsers treating it incorrectly. But the main point of this list isn’t to pick on individuals; it’s to reinforce the fact that even so-called experts at web standards have trouble juggling the different ways XHTML will inevitably be handled on the Web. And what benefit does it bring? None of the following sites make use of anything XHTML offers over HTML.

You can test a page’s actual XHTML rendering in Firefox using the Force Content-type extension and setting the new content-type to application/xhtml+xml.

Accessify - WaSP Steering Committee, Accessibility Task Force: Displayed as generic XML, not interpretted as XHTML. The XML namespace was omitted.
all in the <head> - WaSP Steering Committee: Page doesn’t load. Not well-formed. (Note: this page is valid according to the XHTML DTD and XML’s subset of SGML, but XML has additional rules to define well-formed pages which this page breaks, observed in the Textpattern and the Technorati Link Count Widget post. A similar test case is available.)
And all that Malarkey - WaSP Accessibility Task Force: Page doesn’t load. Not well-formed.
CSS Zen Garden - WaSP: Top background doesn’t display. The page relies on HTML-specific background behavior. Numerous designs have errors with a similar cause.
dean.edwards.name/weblog/ - WaSP DOM Scripting Task Force, Microsoft Task Force: For browsers that support behavior binding (including Firefox) for the dynamic syntax highlighting of the code snippits, most of the code boxes fail to load the contents, resulting in many empty boxes where code snippits should be.
dog or higher: Page doesn’t load. Not well-formed.
Elly Thompson’s Weblog: Page doesn’t load. Not well-formed.
g9g.org - WaSP Steering Committee: There is a thick white gap around the page. The page relies on HTML-specific background behavior.
holly marie - WaSP Steering Committee: Page doesn’t load. Not well-formed.
Jeffrey Veen - WaSP emeritus: Page doesn’t load. Not well-formed.
KuraFire - WaSP: Page doesn’t load. Not well-formed.
Meriblog: Background appears white instead of purple. The page relies on HTML-specific background behavior.
mezzoblue - WaSP: Displayed as generic XML, not interpretted as XHTML. The XML namespace was omitted. Also, individual post pages don’t load. Not well-formed.
microformats: Page doesn’t load. Not well-formed.
molly.com - WaSP Group Lead: Flickr script fails to initialize because the script contents are commented out.
Off the Top - WaSP Steering Committee: Page doesn’t load. Not well-formed.
unadorned.org - WaSP Steering Committee: Stylesheet doesn’t load because the import rule is commented out.
WordPress - WaSP: Page doesn’t load. Not well-formed.

List of standards-related sites that stick with HTML

The following are some significant sites relevant to web standards that continue to use HTML rather than XHTML.

456 Berea Street
Anne van Kesteren
Bite Size Standards
David Baron’s Homepage
Hixie’s Natural Log
Jonathan Snook’s Blog
meyerweb.com
Mozilla
Web Devout
WebKit

This work is copyright © /2007/ David Hammond and is licensed under a Creative Commons Attribution Share-Alike License. It may be copied, modified, and distributed freely as long as it attributes the original author and maintains the original license. See the license for details.

Labels

html, web design, xhtml

23 Responses to “Why you should be using HTML 4.01 instead of XHTML”

zcorpan Says:
April 16th, /2007/ at 7:41 pm
While I agree with the article by and large, I’d like to comment on some things.

Some quotes from prominent people/vendors:

FWIW, all the people you quoted here are members of (at least) the W3C HTML WG.

HTML is written in its own unique subset of SGML

Not quite. Formally it is not a subset of SGML. When HTML was first invented, it was not an application of SGML, it was a separate language influenced by SGML. HTML 2.0 through HTML 4.01 have claimed that HTML is an application of SGML, but the only UAs that have been using SGML for parsing HTML has been validators. In practise, HTML is a separate language from SGML, and HTML5 is the first spec to admit it and to define the parsing rules for it.

The DTDs of XHTML 1.0 and HTML 4.01 are nearly identical, meaning that, as far as things like elements and attributes go, XHTML 1.0 and HTML 4.01 are basically the same language.

XHTML1 and HTML4 are basically the same language, but not because of the DTDs, but because of the spec prose. XHTML1 defines XHTML to be a reformulation of HTML4 in XML.

Myths of XHTML

You forgot to say what the myths actually were. What you currently have is a list of truths, not myths.

Although the browser can lose certain shorthand logic, it now has to use extra logic to confirm that the document is well-formed.

The well-formedness checking is not extra logic at all. It’s just drocanian error handling. In particular, well-formedness checking is not a separate process after you’ve parsed the document — it’s part of the parsing. When parsing fails, you have stumbled upon a well-formedness error, and that’s all there is to it.

The only real markup differences between an HTML document and an XHTML document following the legacy compatibility guidelines are the doctype, html element, and the /> tag ends

As of HTML5, “xmlns="http://www.w3.org/1999/xhtml"” is allowed on the html element, and /> on void elements is also allowed. You could also use “<!DOCTYPE html>” as doctype in XHTML5 if you want to.

In effect, XML defines the assumptions that user agents are allowed to make, while still resulting in a file that a theoretical fully-featured SGML user agent could parse once pointed to XML’s SGML declaration.

…and you follow the SGML compatibility guidelines in the XML spec (e.g., only use /> on elements that are declared as EMPTY in the DTD, or even using a DTD in the first place, which is something many XML experts advocate against). Being compatible with SGML is not relevant in practise.

It should be noted that an XML parser for the most part is not dramatically easier to write than the level of HTML support offered by most HTML parsers.

Not automatically true anymore, since there is a parsing spec for HTML5. The XML spec only defines what is well-formed and what is not — the process of converting a stream of bytes into a tree is implied. Most people also don’t think about the internal subset in XML here.

Some user agents may choose to be “validating” agents and will also fail on validity errors, but they aren’t common.

Mostly validators. In any case, even for validating UAs, validation errors are non-fatal in XML.

Despite popular assumption, even if an XML page is perfectly valid, it still might not be well-formed.

No. An XML document must be well-formed in order to be valid. All the document you pointed to did was to find some bugs in the W3C Validator, which is the result of it not using a real XML parser. See spec.

Aware of the problem, the W3C had these provisions removed in the first revision of the XHTML specification.

No, they didn’t. Firstly, XHTML 1.1 is not a revision of XHTML 1.0. It’s a separate spec. Secondly, it’s RFC2854 that says that XHTML 1.0 is allowed to be served as text/html — not the XHTML 1.0 spec.

Null End Tags (NET)

SGML is effectively irrelevant to HTML today.

(It should be noted that the W3C Validator is unusual in that it generally determines the parsing mode from the doctype, rather than from the content type as most other user agents do.

This is another bug in the W3C Validator (bug #1500).

However, when it’s parsing XML content, Firefox 2.0 and below will wait until the entire page is downloaded and checked for well-formedness before any of the content is displayed.

The lack of incremental rendering in Firefox 2.0 has nothing to do with well-formedness checking. It’s simply a bug.

For example, if there is a background set on the body element and none on the html element, Opera will apply the background to the html element as it would in HTML.

This was fixed in Opera 8 IIRC.

Conclusion

I would like to add that switching from working text/html-XHTML to HTML4 does not add any practical benefit (just like switching from HTML4 to text/html-XHTML didn’t). It’s still allowed to serve tag soup compatible XHTML as text/html, and it is just as easy to migrate to HTML5 from text/html-XHTML as it is from HTML4. So long as you don’t consider it to be HTML and don’t try to serve it as XML, you’re fine. (Trying to make it work as XML when you don’t need it to work as XML will waste your time, just as switching between HTML4 and text/html-XHTML wastes your time. No practical benefit.)
zcorpan Says:
April 16th, /2007/ at 7:52 pm
(Er, that last paragraph should say “… So long as you consider it to be HTML …”.)
boa Says:
April 17th, /2007/ at 12:31 am
Dear blog author,

Why do you use XTHML for this very page then?

(Also, the font you use is incredibly tiny.)
nabeel Says:
April 17th, /2007/ at 4:15 am
testing
lcs Says:
April 17th, /2007/ at 6:04 am
Mozilla does use XHTML…
JD Says:
April 17th, /2007/ at 8:58 am
This article is copy of the article found here: http://www.webdevout.net/articles/beware-of-xhtml
Jack Sleight Says:
April 17th, /2007/ at 9:00 am
@zcorpan: This article was actually written by David Hammond, and the original can be found here: http://www.webdevout.net/articles/beware-of-xhtml. This blogs authors didn’t steal anything, as the article is licensed under the Creative Commons Attribution Share-Alike Licence, but you might want to direct your (perfectly valid) comments to him.

@boa: Same as above, this blogs authors didn’t write this article.
Manny Says:
April 17th, /2007/ at 9:56 am
I agree. If you have to hack something then it’s busted - pure and simple!

- Tired of the hacks I have to use.
- Waste so much time looking for solutions.
- Emails are still HTML 4

Really worried about when Billy and his crew from MS finally get their stuff together - I’ll be major OT ripping crap out.
Ryan Says:
April 17th, /2007/ at 12:18 pm
Are you David Hammond? This is the same article I just read earlier today.

http://www.webdevout.net/articles/beware-of-xhtml
Anonymous Says:
April 17th, /2007/ at 12:44 pm
Dear blog “author”,

Please publish original content, rather than cutting and pasting articles from the New York Times, TechCrunch and in this case WebDevOut: http://www.webdevout.net/articles/beware-of-xhtml .

Anyone linking to this article (in tools such as del.icio.us) should remove the link and link to the original article instead.
links for /2007/-04-18 « Talkabout Says:
April 18th, /2007/ at 1:50 am
[…] Web Development » Blog Archive » Why you should be using HTML 4.01 instead of XHTML “The state of XHTML on the Web today is more broken than the state of HTML, and most people don’t realize because the major browsers aren’t even treating those pages like real XHTML.” (tags: xhtml xml html critique comparison) […]
Realazy Says:
April 18th, /2007/ at 2:40 am
So the XHTML 1.0 Transitional is the best choice.
Live from Yokohama » Blog Archive » links for /2007/-04-17 Says:
April 19th, /2007/ at 8:42 am
[…] Why you should be using HTML 4.01 instead of XHTML Interesting article about HTML usage. I’d like to see what the xml folks say about this. (tags: web xhtml html) […]
Web Hosting Says:
May 2nd, /2007/ at 9:14 am
I have 2 sites on xhtml
Johan Says:
May 12th, /2007/ at 9:43 pm
Same kind of article than many of others which explain why you should be using Windows instead of Linux……. And with the same kind of conclusions : “Linux is really nice blablabla”..

“HTML war” is not from yesterday : HTML 5.0 vs XHTML 2.0 (commercials companies vs W3C), it’s a since long ago now ! Make a search for “WHATWG”…

By the way, staying with HTML 4 you don’t have to rework your sites templates and scripts ! A very good way to save money by still selling old-design to clients.. But web site’s prices are not cheaper regarding the template design’s age..

The same thing for companies/editors and their HTML rendering engines.. If you want browsers to stay free, you don’t have to spend too much time to implement new technologies.. At the end, if you do, who will pay the developpers ?

So, no more interest in that article.. Finally, I stay with XHTML (with a server-side script which can switch HTML content from XHTML 1.1 to HTML 4.01 ; script source here >> http://keystonewebsites.com/articles/mime_type.php), even if many people don’t like it for various reasons, mostly financials as I can understand.
corsiingleseijn Says:
May 29th, /2007/ at 8:09 am
There we go.
you got a interesting place =)
Someone here was need to send corsi piemonte
Corsi proposti danza srl su offre corsi di nostri corsi formazione insegnanti convegni domicilio corsi formazione coaching diversi formaz.
And… I need to take out complete pages about http://oss-corsi.italy-school.com/ - oss corsi
I presume, We looked for a few :-/
Emlak Says:
July 22nd, /2007/ at 2:04 pm
By the way, staying with HTML 4 you don’t have to rework your sites templates and scripts ! A very good way to save money by still selling old-design to clients.. But web site’s prices are not cheaper regarding the template design’s age…
stefan Says:
September 26th, /2007/ at 7:11 am
I find this seite really well does properly joke to yourself here to look around. I will come again on jedenfall. Further so good infos agrees mann, unfortunately, nich always.

thx
watch replica Says:
October 15th, /2007/ at 11:02 am
Our watch replica price is cheaper than other website, and the watch replica is the good watch. you can choose any watch replica to placed order, Once we receive your payment, we will handle the watch replica shipment.
resim Says:
October 29th, /2007/ at 5:12 am
Web Development » Blog Archive » Why you should be using HTML 4.01 instead of XHTML “The state of XHTML on the Web today is more broken than the state of HTML, and most people don’t realize because the major browsers aren’t even treating those pages like real XHTML.” (tags: xhtml xml html critique comparison)
sohbet Says:
October 29th, /2007/ at 1:30 pm
firstly thanks for this article i like html and xhtml i have lots of page with them and also google loves html and xhtml
i bookmarked your site i can learn this site lots of thing thanks.
küresel ısınma Says:
October 29th, /2007/ at 2:16 pm
Mozilla does use XHTL
araba Says:
November 15th, /2007/ at 2:59 pm
bookmarked your site i can learn this site lots of thing thanks.