NEW 35954
Processing instructions inside DOCTYPE internal subset are parsed incorrectly (by libxml2?)
https://bugs.webkit.org/show_bug.cgi?id=35954
Summary Processing instructions inside DOCTYPE internal subset are parsed incorrectly...
Leif Halvard Silli
Reported 2010-03-09 20:17:42 PST
FIrst of all: This bug relates to the XML parsing of XHTML documents (not text/html parsing!). However this bug also is related to text/html issues, which I explain along the way. How to reproduce the bug: (1) Add this DOCTYPE to a XHTML document. The Interntal DTD Subsets inside the DOCTYPE appliesa hack in the form of a XHTML processing instruction, to fool text/html parsers from displaying a "]>" inside the body. The whole hack is explained in a e-mail message to the W3 validator's mailinglist: http://lists.w3.org/Archives/Public/www-validator/2010Mar/0026.html This is the code: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" [ <!ATTLIST html class CDATA #IMPLIED> <?parser-hack ><!--?> ]> <!--><?!--> (2) If you wish, try to load the page as text/html. However, the point in this bug is XML, so load the page as "application/xhtml+xml". (3) Results in Firefox, Konqueror and Opera: works 100% (4) Result in Webkit: "yellow scren of death" in the form of the following message: "This page contains the following errors: error on line 3 at column 1: Extra content at the end of the document" In short: Nothing is displayed. (5) Remove the "<?parser-hack ><!--?>" and reload the page - voila, it works in Webkit as well. (6) Place the "<?parser-hack ><!--?>" inside the body of the XHTML page. Reload. No problems CONCLUSION ABOUT THE PROBLEM: ====================== Apparently, when a PI is placed inside the internal subset of an XHTML Doctype, then Webkit parses the XHTML PI as if it was a HTML4 PI. Meaning, that it thinks that it ends when it sees the first ">". And thus, Webkit also sees the HTML comment "start tag" - the "<!--". In text/HTML mode, then the point of this hack is exactly that the browser thinks the PI ends with the ">" and that it also sees the "<!--". However, this is in XHTML/XML mode. And thus is should parse the DOCTYPE, including PIs, according to XHTML/XML rules. Hence: it is permitted withi a ">" inside the PI. And a "<!--" should not affect the parsing. I tested in Webkit latest nightly version 4.0.4 (5531.21.10, r55610). And also in iCab, And in Safari for Mac Intel and PPC and for Windows.
Attachments
test (192 bytes, application/xhtml+xml)
2010-11-18 09:53 PST, Alexey Proskuryakov
no flags
test (239 bytes, application/xhtml+xml)
2010-11-18 09:57 PST, Alexey Proskuryakov
no flags
Shows that Webkit *does* follow XML PI-syntax (294 bytes, application/xhtml+xml)
2011-10-05 18:51 PDT, Leif Halvard Silli
no flags
Shows that Webkit accepts a closed comment inside the PI (408 bytes, application/xhtml+xml)
2011-10-05 18:58 PDT, Leif Halvard Silli
no flags
Reduction of the problem: Webkit doesn't accept a "unclosed" comment inside the PI (366 bytes, application/xhtml+xml)
2011-10-05 19:02 PDT, Leif Halvard Silli
no flags
Workaround: Shows how to circument the problem - perhaps point at a solution? (1.76 KB, application/xhtml+xml)
2011-10-05 19:28 PDT, Leif Halvard Silli
no flags
Workaround 2: Here the ]> appears right after the processing instruction has started (1.92 KB, application/xhtml+xml)
2011-10-06 00:14 PDT, Leif Halvard Silli
no flags
Workaround 3: Add a comment inside the DTD, after the PI (1.54 KB, application/xhtml+xml)
2011-10-06 00:59 PDT, Leif Halvard Silli
no flags
Workaround: Shows that a "HTML5 comment" - a "short comment" (<!-->) can be used as workaround (2.04 KB, application/xhtml+xml)
2011-10-06 01:01 PDT, Leif Halvard Silli
no flags
Leif Halvard Silli
Comment 1 2010-03-09 20:20:34 PST
I will once again stress that this bug is about application/xhtml+xml parsing.
Alexey Proskuryakov
Comment 2 2010-11-18 09:53:06 PST
Created attachment 74246 [details] test Same test as an attachment
Alexey Proskuryakov
Comment 3 2010-11-18 09:57:21 PST
Created attachment 74247 [details] test Modified to pass in Firefox.
Alexey Proskuryakov
Comment 4 2010-11-18 11:35:26 PST
This is weird - the only callbacks we get from libxml2 are startDocumentHandler, internalSubsetHandler and then normalErrorHandler, so this looks almost like a libxml2 bug. Note that internalSubsetHandler only carries name, externalID, systemID - we certainly aren't handling DTD itself in WebKit. But command line xmllint doesn't seem to have a problem with this file.
Leif Halvard Silli
Comment 5 2011-10-05 18:51:48 PDT
Created attachment 109901 [details] Shows that Webkit *does* follow XML PI-syntax My diagnosis was wrong: The attached XHTML file includes a HTML PI inside the DTD, and Webkit then correctly reports that the PI never ends (because there is no "?>" to end it.
Leif Halvard Silli
Comment 6 2011-10-05 18:58:16 PDT
Created attachment 109902 [details] Shows that Webkit accepts a closed comment inside the PI A XML comment inside an XML processing instruction, is not a XML comment. But Webkit apparently sees it as one. And as long as it perceives it as a well formed comment, it accepts its - as the demo shows.
Leif Halvard Silli
Comment 7 2011-10-05 19:02:01 PDT
Created attachment 109903 [details] Reduction of the problem: Webkit doesn't accept a "unclosed" comment inside the PI Add minimal demo to show what Webkit doesn't accept.
Leif Halvard Silli
Comment 8 2011-10-05 19:28:03 PDT
Created attachment 109906 [details] Workaround: Shows how to circument the problem - perhaps point at a solution? This test file shows how to workaround the problem. Please read the comments in the test file.
Leif Halvard Silli
Comment 9 2011-10-06 00:14:59 PDT
Created attachment 109928 [details] Workaround 2: Here the ]> appears right after the processing instruction has started In this new attachment, the ]> comes right aft the processing instruction has begun: <!DOCTYPE html SYSTEM "about:legacy"> <?pi ]> <whatever><!--goes here ?> ]> So, seemingly, as long as Webkit is able to a) find 2 occurences of the string ']>', and b) the string occurs immediately after the PI has begun or inside (!) a comment right after the then DTD has ended then webkit allows any content inside the processing instruction. (For more comments and speculation, see the attachment.)
Leif Halvard Silli
Comment 10 2011-10-06 00:59:35 PDT
Created attachment 109929 [details] Workaround 3: Add a comment inside the DTD, after the PI
Leif Halvard Silli
Comment 11 2011-10-06 01:01:29 PDT
Created attachment 109930 [details] Workaround: Shows that a "HTML5 comment" - a "short comment" (<!-->) can be used as workaround
Lucas Forschler
Comment 12 2019-02-06 09:03:33 PST
Mass moving XML DOM bugs to the "DOM" Component.
Note You need to log in before you can comment on or make changes to this bug.