RESOLVED INVALID 6314
Unclosed <style> element in <head> makes page completely blank
https://bugs.webkit.org/show_bug.cgi?id=6314
Summary Unclosed <style> element in <head> makes page completely blank
camillo.lists
Reported 2005-12-31 11:13:34 PST
If an html file's header contains an empty style tag, the page will not render in WebKit. The render tree window shows an empty RenderBody. 100% reproducible, and verified in the latest nightly build (416.13). Here is a valid XHTML 1.0 document with which you can reproduce the bug: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/ DTD/xhtml1-transitional.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> <style type="text/css" /> <title>Test</title> </head> <body> <h1>TEST PAGE</h1> <p>This page is invisible in WebKit. The "culprit" is the style tag in the header, but it is valid XHTML 1.0 Transitional according to the W3C's validator.</p> </body> </html>
Attachments
Test case (481 bytes, text/html)
2005-12-31 11:15 PST, camillo.lists
no flags
Same test case sent as xhtml (481 bytes, application/xhtml+xml)
2005-12-31 11:23 PST, Eric Seidel (no email)
no flags
Patch v1 (16.34 KB, patch)
2006-03-12 19:50 PST, David Kilzer (:ddkilzer)
darin: review-
camillo.lists
Comment 1 2005-12-31 11:15:22 PST
Created attachment 5400 [details] Test case Test case added as attachment for your convenience.
Eric Seidel (no email)
Comment 2 2005-12-31 11:23:23 PST
Created attachment 5401 [details] Same test case sent as xhtml
Eric Seidel (no email)
Comment 3 2005-12-31 11:26:38 PST
The problem here is that you're sending this "valid xhtml 1.0" file as text/html from your server. <style / > is interpreted as an unclosed <style> by the html parser (/ is ignored as an invalid attribute). WebKit has trouble with unclosed tags in the <head> it seems (there is another similar bug on an unclosed <title> causing a blank page). IF you were to actually vend this file as xhtml from your server, it would display fine (as you see in the second copy of your test case, now attached). So this is a bug in two places. 1. your server is misconfigured (as are most on the web), or rather, you've named the file .html when it really shoudl be .xhtml on your server. 2. there seems to be a bug in WebKit where we don't recover from an unclosed <style> tag in html. We'll keep this bug to track the unclosed <style> tag behavior.
Eric Seidel (no email)
Comment 4 2005-12-31 11:29:18 PST
One of the most common mistakes on today's web it to write "valid xhtml 1.0" and then vend it from a server as text/html. What that does, is you end up vending something which is actually (by design) invalid html. The proper solution is to just vend the file as application/xhtml+xml, like it was designed to be... the only problem with that is that WinIE doesn't handle xhtml very well :( FireFox, Safari and Opera handle xhtml fine though.
Eric Seidel (no email)
Comment 5 2005-12-31 11:37:17 PST
http://bugzilla.opendarwin.org/show_bug.cgi?id=3905 is the related bug about an unclosed title element causing the page to be blank. Marking it as "depends on" even though one of these could probably be fixed w/o the other. Likely they could both be pretty easily fixed at once.
camillo.lists
Comment 6 2005-12-31 12:23:47 PST
You're assuming too much. ;-) It's not my server: I encountered this bug on a website, and stripped it down to a smaller text case for your convenience. The bug reappears when loading the file from the HD, because of the html extension. I have no way of changing the way that server (or most other servers on the internet) serve XHTML content. Regardless: 1) is it wise to ignore the doctype specified inside the file and trust the HTTP content-type instead? The doctype declaration is put in by the author, while the content-type is chosen by a webserver which is often misconfigured, as you all know: it seems to me that the former should be considered more reliable. 2) The W3C HTML Validator has no complaints about an XML file uploaded with an .html extension, or served as text/html: http://validator.w3.org/check?uri=http%3A%2F%2Fbugzilla.opendarwin.org% 2Fattachment.cgi%3Fid%3D5400 I understand that the validator is not the source of all wisdom, but most webmasters will be satisfied once it passes their pages. What am I supposed to tell them? "Go fiddle with your web server's settings because Safari thinks your XHTML files are HTML 4.0, even though you put an explicit doctype declaration inside them?" 3) I'm going to go out on a limb here, and I haven't checked the standards recently, but wasn't the XHTML syntax meant to be mostly compatible with HTML? Is it really so crazy to serve XHTML as text/ html, expecting XHTML-compliant browsers to handle it correctly (recognizing it through the doctype), and older browsers to make a best effort at rendering it as HTML, instead of deciding to download it to disk because it has an unrecognized MIME type?
Eric Seidel (no email)
Comment 7 2005-12-31 13:50:03 PST
(In reply to comment #6) > You're assuming too much. ;-) My appologies. :) I did assume too much. And you are correct, there is a real Safari bug here. We should handle an unclosed <style> element better than we do. > 1) is it wise to ignore the doctype specified inside the file and trust the HTTP content-type instead? The > doctype declaration is put in by the author, while the content-type is chosen by a webserver which is > often misconfigured, as you all know: it seems to me that the former should be considered more > reliable. Unfortunately I'm not familiar enough all the reasons for this decision. If you could show some other browser respecting a <meta> tag over the server's sent Content-Type, I'd be very interested to know. I think you'll find that our behavior here is in line with other browsers. > 2) The W3C HTML Validator has no complaints about an XML file uploaded with an .html extension, or > served as text/html: http://validator.w3.org/check?uri=http%3A%2F%2Fbugzilla.opendarwin.org% > 2Fattachment.cgi%3Fid%3D5400 > I understand that the validator is not the source of all wisdom, but most webmasters will be satisfied > once it passes their pages. What am I supposed to tell them? "Go fiddle with your web server's settings > because Safari thinks your XHTML files are HTML 4.0, even though you put an explicit doctype > declaration inside them?" I think that you'll find that FireFox/IE treats this file as html as well. If you don't believe me, try putting invalid xhtml in it, and watch them not complain (as the xhtml spec would require, were they really parsing as xhtml). Instead I think you'll find that this page is going through thier html parsers and parsed with all the forgivingness that the text/html content type requires. The validator you referenced does correctly identify this as xhtml 1.0. I don't think it looks at the Content-Type: header when doing so. Safari also handles the page perfectly fine when it's sent with an xhtml Content-Type. > 3) I'm going to go out on a limb here, and I haven't checked the standards recently, but wasn't the > XHTML syntax meant to be mostly compatible with HTML? Is it really so crazy to serve XHTML as text/ > html, expecting XHTML-compliant browsers to handle it correctly (recognizing it through the doctype), > and older browsers to make a best effort at rendering it as HTML, instead of deciding to download it to > disk because it has an unrecognized MIME type? Yes, it is meant to be compatble. But not *all* xhtml 1.0 is compatible with HTML. For example, if you check the html 4.01 spec, you'll see that <style>*requires* an end tag. So this <style> tag w/ a bogus "/" attribute, and no end, is invalid html. Again, Safari should handle it better than we do, but the html in invalid. http://www.w3.org/TR/html4/present/styles.html#h-14.2.3 Perhaps other WebKit hackers who know more about the decision to respect Content-Type over a <meta> tag will be able to more fully answer your above questions. Thanks again for reporting the bug. We'll definitely look into the issue of the unterminated <style> tag not being handled properly by our html parser.
camillo.lists
Comment 8 2006-01-01 11:12:18 PST
(In reply to comment #7) > Unfortunately I'm not familiar enough all the reasons for this decision. If you could show some other > browser respecting a <meta> tag over the server's sent Content-Type, I'd be very interested to know. I > think you'll find that our behavior here is in line with other browsers. I tested Firefox 1.5 with an XHTML+SVG document, and indeed it seems to give precedence to the Content-Type (the SVG is not visible when the content-type is text/html, which suggests that it's using the HTML parser, that has no support for namespaces). However, I can give you something better than the example of another browser: the XHTML 1.0 standard _itself_, which is an XHTML 1.0 Strict document, is served as text/html: $ curl -I 'http://www.w3.org/TR/xhtml1/' HTTP/1.1 200 OK Date: Sun, 01 Jan 2006 18:30:23 GMT Server: Apache/1.3.33 (Unix) PHP/4.3.10 P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml" Cache-Control: max-age=21600 Expires: Mon, 02 Jan 2006 00:30:23 GMT Last-Modified: Thu, 01 Aug 2002 13:56:02 GMT ETag: "3d493df2" Accept-Ranges: bytes Content-Length: 71514 Content-Type: text/html; charset=utf-8 Surely the people who wrote the XHTML standard would know how to serve it properly. :-) This example strongly suggests that both Firefox and Safari (and any other browsers that do the same thing) are in error, and the doctype declaration should take precedence over the Content-Type. > I think that you'll find that FireFox/IE treats this file as html as well. If you don't believe me, try putting > invalid xhtml in it, and watch them not complain (as the xhtml spec would require, were they really > parsing as xhtml). Instead I think you'll find that this page is going through thier html parsers and > parsed with all the forgivingness that the text/html content type requires. Honestly, I don't think anyone is writing an XHTML doctype declaration and relying on it being ignored. :-) But if you want to be forgiving, I have no problem with that. The problem is that your forgiveness is currently misdirected: you are accepting things that are invalid, and rejecting things which are perfectly valid. > The validator you referenced does correctly identify this as xhtml 1.0. I don't think it looks at the > Content-Type: header when doing so. Safari also handles the page perfectly fine when it's sent with an > xhtml Content-Type. And that's exactly the problem: you shouldn't rely on the Content-Type when it's explicitly contradicted by the doctype. Now, if the Content-Type was image/jpeg, I'd understand assuming that it is correct and not looking for a doctype. But when parsing text/html, you have to read and parse the doctype anyway, so why not listen to it? > Yes, it is meant to be compatble. But not *all* xhtml 1.0 is compatible with HTML. For example, if you > check the html 4.01 spec, you'll see that <style>*requires* an end tag. So this <style> tag w/ a bogus > "/" attribute, and no end, is invalid html. Again, Safari should handle it better than we do, but the html > in invalid. But the idea was not to have all browsers treat is as HTML. The idea was to have modern browsers treat is as what it is, ie XHTML, and let old browsers attempt to parse it as HTML (which often gives acceptable results). And the W3C itself seems to think that there is nothing wrong with serving XHTML as text/html, as shown above. After all, isn't XHTML 1.0 the legitimate successor to HTML 4? > Perhaps other WebKit hackers who know more about the decision to respect Content-Type over a > <meta> tag will be able to more fully answer your above questions. Thanks again for reporting the > bug. We'll definitely look into the issue of the unterminated <style> tag not being handled properly by > our html parser. Thank you for that, and also for taking the time to discuss this issue with me. I look forward to hearing any clarifications that other WebKit developers will want to give.
Alexey Proskuryakov
Comment 9 2006-01-07 04:57:25 PST
It is a known issue (sorry, couldn't find it in Bugzilla) that WebKit sends an "Accept: */*" header in its requests, which causes most servers to provide text/html, not application/xhtml+xml. There's a lot of material about Content-Type handling available; see, for example, <http:// ppewww.ph.gla.ac.uk/~flavell/www/content-type.html> or <http://www.hixie.ch/advocacy/xhtml> (the latter is displayed incorrectly in Safari, which is also known). In short, it's not possible or desirable to rely on DOCTYPE when there is a contradicting Content-Type header.
camillo.lists
Comment 10 2006-01-07 06:57:14 PST
Thanks, I understand the issue now.
David Kilzer (:ddkilzer)
Comment 11 2006-02-25 10:03:18 PST
The fix for this bug is very similar to Bug 3905, so I'm taking this bug. Patch in a few.
David Kilzer (:ddkilzer)
Comment 12 2006-02-25 16:59:28 PST
In Firefox 1.5.0.1, a document with a <style> tag with no closing </style> causes what would be the contents of the <style></style> tags to be put into the body of the document, and the single <style> tag is ignored. In MSIE 6, the browser consumes everything after the single <style> tag to the end of the document, then adds </style></body></html> to the end of the document if the unmatched <style> tag is in the body, or adds </style></head><body></body></html> if the unmatched <style> tag is in the head.
David Kilzer (:ddkilzer)
Comment 13 2006-02-26 05:43:17 PST
(In reply to comment #11) > The fix for this bug is very similar to Bug 3905, so I'm taking this bug. > Patch in a few. I'm waiting on Bug 3905 to be fixed since the patch to fix this bug will have to modify code from that bug.
David Kilzer (:ddkilzer)
Comment 14 2006-03-07 20:34:48 PST
Hixie: In HTML 5, will missing </style> tags be treated the same as missing </title> tags, e.g., if the whole document is parsed and no </style> tag is found, should the document be reparsed starting after the <style> tag with no special handling such that the <style> tag is implicitly closed when the next tag is found?
David Kilzer (:ddkilzer)
Comment 15 2006-03-12 19:07:23 PST
I asked Ian Hixie about how HTML 5 would handle missing </style> tags, and he brought up a security issue: [3:41pm] Hixie: ddk: that's what my testing found too [3:42pm] ddk: Hixie: How can a missing </style> tag cause a security issue?  I'm having a hard time figuring out how that would be exploited (other than a buffer overflow or something from too much input). [3:44pm] Hixie: ddk: make a victim site display the content "<style> <script> evil script </script> </style>" (which would pass the site's security test because <style> is safe) then cause the network connection to abort just before the </style> tag and the UA will execute the script. [3:44pm] Hixie: ddk: same reason comments can't safely be reparsed
David Kilzer (:ddkilzer)
Comment 16 2006-03-12 19:50:13 PST
Created attachment 7041 [details] Patch v1 This patch is very similar to the final patch for Bug 3905 (less the SegmentedString changes). Please note the security concerns from Comment #15, although I don't see how you get any more protection from an attacker since the attacker could simply add the missing </style> tag (unless you're relying on another piece of software to scan the HTML before it gets to the browser).
Darin Adler
Comment 17 2006-03-12 20:39:51 PST
Comment on attachment 7041 [details] Patch v1 Looks nice, r=me.
David Kilzer (:ddkilzer)
Comment 18 2006-03-19 06:10:15 PST
Verified in r13385.
Darin Adler
Comment 19 2006-03-19 09:50:24 PST
Looks like there's a major problem with this fix and the previous one for <title>. The problem is that at the time the tokenizer is called we don't necessarily have the entire source. In layout tests we parse things as one big chunk so it's not an issue. So when the code sees that there's no "src" left that doesn't mean we are at the end of the entire document -- it just means that we're at the end of what we currently have. Additional write() calls can happen later to give us more data. So this code is kicking in, in cases where it should not, causing major regressions. I'm going to roll this fix out and we'll have to come up with one that works even when subsequent writes occur.
Darin Adler
Comment 20 2006-03-19 09:52:12 PST
We can probably write some tests for the broken case using document.write -- not sure.
Darin Adler
Comment 21 2006-03-19 09:53:46 PST
The saved state will have to go in a member variable of the tokenizer, and the whole thing will need to be tracked by the state machine instead of done all in one place.
Jon
Comment 22 2006-03-19 10:00:47 PST
The patch for this bug (committed as r13381) has broken pages which use inline styles. These include forums.macnn.com and phpbb.com, among others. Both of these pages are broken in the same way, their content is rendered without the styles that are declared within the document and not in an external file.
David Kilzer (:ddkilzer)
Comment 23 2006-03-19 10:48:09 PST
(In reply to comment #19) > Looks like there's a major problem with this fix and the previous one for > <title>. D'OH! That's Bug 3905.
David Kilzer (:ddkilzer)
Comment 24 2006-06-12 08:32:47 PDT
Reassigning this bug back to webkit-unassigned in case anyone else is interested in fixing it. I plan to look at it in a couple weeks. Before this is fixed, an http test needs to be created that reliably reproduces the latest problem found (see Comment #19).
Joost de Valk (AlthA)
Comment 25 2006-07-06 05:49:17 PDT
ddkilzer, any plans of looking at this one again? :)
Joost de Valk (AlthA)
Comment 26 2006-07-06 05:54:47 PDT
*** Bug 9443 has been marked as a duplicate of this bug. ***
Joost de Valk (AlthA)
Comment 27 2006-07-06 05:55:12 PDT
*** Bug 8772 has been marked as a duplicate of this bug. ***
Robert Burns
Comment 28 2006-09-14 19:27:05 PDT
this was due to loading XHTML as text/html instead of application/xhtml+xml. Once this was fixed the elements werre no longer moved to the body. It still may be a bug to move HTML elements about in a document (as opposed to leaving them in place and not rendering or processing them), but it's not a bug with the XML implementation as I had reported.
David Kilzer (:ddkilzer)
Comment 29 2006-09-14 20:19:22 PDT
(In reply to comment #28) > this was due to loading XHTML as text/html instead of application/xhtml+xml. > Once this was fixed the elements werre no longer moved to the body. It still > may be a bug to move HTML elements about in a document (as opposed to leaving > them in place and not rendering or processing them), but it's not a bug with > the XML implementation as I had reported. I believe the above comment was meant for Bug 10507.
Robert Burns
Comment 30 2006-09-19 12:56:00 PDT
(In reply to comment #29) > (In reply to comment #28) > > this was due to loading XHTML as text/html instead of application/xhtml+xml. > > Once this was fixed the elements werre no longer moved to the body. It still > > may be a bug to move HTML elements about in a document (as opposed to leaving > > them in place and not rendering or processing them), but it's not a bug with > > the XML implementation as I had reported. > > I believe the above comment was meant for Bug 10507. > Yes, that's correct. Sorry for the mixup.
Ian 'Hixie' Hickson
Comment 31 2007-12-27 23:22:30 PST
I believe this is INVALID. Reparsing is a security risk and HTML5 says not to. Other browsers are converging on that behaviour too.
Note You need to log in before you can comment on or make changes to this bug.