|Summary:||Processing instructions inside DOCTYPE internal subset are parsed incorrectly (by libxml2?)|
|Product:||WebKit||Reporter:||Leif Halvard Silli <firstname.lastname@example.org>|
|Component:||XML DOM||Assignee:||Nobody <email@example.com>|
|Version:||528+ (Nightly build)|
FIrst of all: This bug relates to the XML parsing of XHTML documents (not text/html parsing!). However this bug also is related to text/html issues, which I explain along the way. How to reproduce the bug: (1) Add this DOCTYPE to a XHTML document. The Interntal DTD Subsets inside the DOCTYPE appliesa hack in the form of a XHTML processing instruction, to fool text/html parsers from displaying a "]>" inside the body. The whole hack is explained in a e-mail message to the W3 validator's mailinglist: http://lists.w3.org/Archives/Public/www-validator/2010Mar/0026.html This is the code: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" [ <!ATTLIST html class CDATA #IMPLIED> <?parser-hack ><!--?> ]> <!--><?!--> (2) If you wish, try to load the page as text/html. However, the point in this bug is XML, so load the page as "application/xhtml+xml". (3) Results in Firefox, Konqueror and Opera: works 100% (4) Result in Webkit: "yellow scren of death" in the form of the following message: "This page contains the following errors: error on line 3 at column 1: Extra content at the end of the document" In short: Nothing is displayed. (5) Remove the "<?parser-hack ><!--?>" and reload the page - voila, it works in Webkit as well. (6) Place the "<?parser-hack ><!--?>" inside the body of the XHTML page. Reload. No problems CONCLUSION ABOUT THE PROBLEM: ====================== Apparently, when a PI is placed inside the internal subset of an XHTML Doctype, then Webkit parses the XHTML PI as if it was a HTML4 PI. Meaning, that it thinks that it ends when it sees the first ">". And thus, Webkit also sees the HTML comment "start tag" - the "<!--". In text/HTML mode, then the point of this hack is exactly that the browser thinks the PI ends with the ">" and that it also sees the "<!--". However, this is in XHTML/XML mode. And thus is should parse the DOCTYPE, including PIs, according to XHTML/XML rules. Hence: it is permitted withi a ">" inside the PI. And a "<!--" should not affect the parsing. I tested in Webkit latest nightly version 4.0.4 (5531.21.10, r55610). And also in iCab, And in Safari for Mac Intel and PPC and for Windows.
I will once again stress that this bug is about application/xhtml+xml parsing.
Created an attachment (id=74246) [details] test Same test as an attachment
Created an attachment (id=74247) [details] test Modified to pass in Firefox.
This is weird - the only callbacks we get from libxml2 are startDocumentHandler, internalSubsetHandler and then normalErrorHandler, so this looks almost like a libxml2 bug. Note that internalSubsetHandler only carries name, externalID, systemID - we certainly aren't handling DTD itself in WebKit. But command line xmllint doesn't seem to have a problem with this file.
Created an attachment (id=109901) [details] Shows that Webkit *does* follow XML PI-syntax My diagnosis was wrong: The attached XHTML file includes a HTML PI inside the DTD, and Webkit then correctly reports that the PI never ends (because there is no "?>" to end it.
Created an attachment (id=109902) [details] Shows that Webkit accepts a closed comment inside the PI A XML comment inside an XML processing instruction, is not a XML comment. But Webkit apparently sees it as one. And as long as it perceives it as a well formed comment, it accepts its - as the demo shows.
Created an attachment (id=109903) [details] Reduction of the problem: Webkit doesn't accept a "unclosed" comment inside the PI Add minimal demo to show what Webkit doesn't accept.
Created an attachment (id=109906) [details] Workaround: Shows how to circument the problem - perhaps point at a solution? This test file shows how to workaround the problem. Please read the comments in the test file.
Created an attachment (id=109928) [details] Workaround 2: Here the ]> appears right after the processing instruction has started In this new attachment, the ]> comes right aft the processing instruction has begun: <!DOCTYPE html SYSTEM "about:legacy"> <?pi ]> <whatever><!--goes here ?> ]> So, seemingly, as long as Webkit is able to a) find 2 occurences of the string ']>', and b) the string occurs immediately after the PI has begun or inside (!) a comment right after the then DTD has ended then webkit allows any content inside the processing instruction. (For more comments and speculation, see the attachment.)
Created an attachment (id=109929) [details] Workaround 3: Add a comment inside the DTD, after the PI