FIrst of all: This bug relates to the XML parsing of XHTML documents (not text/html parsing!). However this bug also is related to text/html issues, which I explain along the way.
How to reproduce the bug:
(1) Add this DOCTYPE to a XHTML document. The Interntal DTD Subsets inside the DOCTYPE appliesa hack in the form of a XHTML processing instruction, to fool text/html parsers from displaying a "]>" inside the body. The whole hack is explained in a e-mail message to the W3 validator's mailinglist: http://lists.w3.org/Archives/Public/www-validator/2010Mar/0026.html
This is the code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<!ATTLIST html class CDATA #IMPLIED>
(2) If you wish, try to load the page as text/html. However, the point in this bug is XML, so load the page as "application/xhtml+xml".
(3) Results in Firefox, Konqueror and Opera: works 100%
(4) Result in Webkit: "yellow scren of death" in the form of the following message:
"This page contains the following errors: error on line 3 at column 1: Extra content at the end of the document"
In short: Nothing is displayed.
(5) Remove the "<?parser-hack ><!--?>" and reload the page - voila, it works in Webkit as well.
(6) Place the "<?parser-hack ><!--?>" inside the body of the XHTML page. Reload. No problems
CONCLUSION ABOUT THE PROBLEM:
Apparently, when a PI is placed inside the internal subset of an XHTML Doctype, then Webkit parses the XHTML PI as if it was a HTML4 PI. Meaning, that it thinks that it ends when it sees the first ">". And thus, Webkit also sees the HTML comment "start tag" - the "<!--".
In text/HTML mode, then the point of this hack is exactly that the browser thinks the PI ends with the ">" and that it also sees the "<!--".
However, this is in XHTML/XML mode. And thus is should parse the DOCTYPE, including PIs, according to XHTML/XML rules. Hence: it is permitted withi a ">" inside the PI. And a "<!--" should not affect the parsing.
I tested in Webkit latest nightly version 4.0.4 (5531.21.10, r55610). And also in iCab, And in Safari for Mac Intel and PPC and for Windows.
I will once again stress that this bug is about application/xhtml+xml parsing.
Created attachment 74246 [details]
Same test as an attachment
Created attachment 74247 [details]
Modified to pass in Firefox.
This is weird - the only callbacks we get from libxml2 are startDocumentHandler, internalSubsetHandler and then normalErrorHandler, so this looks almost like a libxml2 bug. Note that internalSubsetHandler only carries name, externalID, systemID - we certainly aren't handling DTD itself in WebKit.
But command line xmllint doesn't seem to have a problem with this file.
Created attachment 109901 [details]
Shows that Webkit *does* follow XML PI-syntax
My diagnosis was wrong: The attached XHTML file includes a HTML PI inside the DTD, and Webkit then correctly reports that the PI never ends (because there is no "?>" to end it.
Created attachment 109902 [details]
Shows that Webkit accepts a closed comment inside the PI
A XML comment inside an XML processing instruction, is not a XML comment. But Webkit apparently sees it as one. And as long as it perceives it as a well formed comment, it accepts its - as the demo shows.
Created attachment 109903 [details]
Reduction of the problem: Webkit doesn't accept a "unclosed" comment inside the PI
Add minimal demo to show what Webkit doesn't accept.
Created attachment 109906 [details]
Workaround: Shows how to circument the problem - perhaps point at a solution?
This test file shows how to workaround the problem. Please read the comments in the test file.
Created attachment 109928 [details]
Workaround 2: Here the ]> appears right after the processing instruction has started
In this new attachment, the ]> comes right aft the processing instruction has begun:
<!DOCTYPE html SYSTEM "about:legacy">
So, seemingly, as long as Webkit is able to
a) find 2 occurences of the string ']>', and
b) the string occurs immediately after the PI has begun or
inside (!) a comment right after the then DTD has ended
then webkit allows any content inside the processing instruction.
(For more comments and speculation, see the attachment.)
Created attachment 109929 [details]
Workaround 3: Add a comment inside the DTD, after the PI
Created attachment 109930 [details]
Workaround: Shows that a "HTML5 comment" - a "short comment" (<!-->) can be used as workaround
Mass moving XML DOM bugs to the "DOM" Component.