Bug 6314 - Unclosed <style> element in <head> makes page completely blank
Summary: Unclosed <style> element in <head> makes page completely blank
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 8772 9443 (view as bug list)
Depends on: 3905
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-31 11:13 PST by camillo.lists
Modified: 2007-12-27 23:22 PST (History)
5 users (show)

See Also:


Attachments
Test case (481 bytes, text/html)
2005-12-31 11:15 PST, camillo.lists
no flags Details
Same test case sent as xhtml (481 bytes, application/xhtml+xml)
2005-12-31 11:23 PST, Eric Seidel (no email)
no flags Details
Patch v1 (16.34 KB, patch)
2006-03-12 19:50 PST, David Kilzer (:ddkilzer)
darin: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description camillo.lists 2005-12-31 11:13:34 PST
If an html file's header contains an empty style tag, the page will not render in WebKit. The render tree 
window shows an empty RenderBody. 100% reproducible, and verified in the latest nightly build 
(416.13). Here is a valid XHTML 1.0 document with which you can reproduce the bug:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/
DTD/xhtml1-transitional.dtd">
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
    <style type="text/css" />
    <title>Test</title>
</head>
<body>

<h1>TEST PAGE</h1>
<p>This page is invisible in WebKit. The "culprit" is the style tag in the header, but it is valid XHTML 1.0 
Transitional according to the W3C's validator.</p>

</body>
</html>
Comment 1 camillo.lists 2005-12-31 11:15:22 PST
Created attachment 5400 [details]
Test case

Test case added as attachment for your convenience.
Comment 2 Eric Seidel (no email) 2005-12-31 11:23:23 PST
Created attachment 5401 [details]
Same test case sent as xhtml
Comment 3 Eric Seidel (no email) 2005-12-31 11:26:38 PST
The problem here is that you're sending this "valid xhtml 1.0" file as text/html from your server.  <style /
> is interpreted as an unclosed <style> by the html parser (/ is ignored as an invalid attribute).  WebKit 
has trouble with unclosed tags in the <head> it seems (there is another similar bug on an unclosed 
<title> causing a blank page).  IF you were to actually vend this file as xhtml from your server, it would 
display fine (as you see in the second copy of your test case, now attached).

So this is a bug in two places.  1.  your server is misconfigured (as are most on the web), or rather, you've 
named the file .html when it really shoudl be .xhtml on your server.  2.  there seems to be a bug in WebKit 
where we don't recover from an unclosed <style> tag in html.

We'll keep this bug to track the unclosed <style> tag behavior.
Comment 4 Eric Seidel (no email) 2005-12-31 11:29:18 PST
One of the most common mistakes on today's web it to write "valid xhtml 1.0" and then vend it from a 
server as text/html.  What that does, is you end up vending something which is actually (by design) invalid 
html.  The proper solution is to just vend the file as application/xhtml+xml, like it was designed to be... 
the only problem with that is that WinIE doesn't handle xhtml very well :(  FireFox, Safari and Opera handle 
xhtml fine though.
Comment 5 Eric Seidel (no email) 2005-12-31 11:37:17 PST
http://bugzilla.opendarwin.org/show_bug.cgi?id=3905
is the related bug about an unclosed title element causing the page to be blank.  Marking it as "depends 
on" even though one of these could probably be fixed w/o the other.  Likely they could both be pretty 
easily fixed at once.
Comment 6 camillo.lists 2005-12-31 12:23:47 PST
You're assuming too much. ;-) It's not my server: I encountered this bug on a website, and stripped it 
down to a smaller text case for your convenience. The bug reappears when loading the file from the 
HD, because of the html extension. I have no way of changing the way that server (or most other 
servers on the internet) serve XHTML content. Regardless:

1) is it wise to ignore the doctype specified inside the file and trust the HTTP content-type instead? The 
doctype declaration is put in by the author, while the content-type is chosen by a webserver which is 
often misconfigured, as you all know: it seems to me that the former should be considered more 
reliable.

2) The W3C HTML Validator has no complaints about an XML file uploaded with an .html extension, or 
served as text/html: http://validator.w3.org/check?uri=http%3A%2F%2Fbugzilla.opendarwin.org%
2Fattachment.cgi%3Fid%3D5400
I understand that the validator is not the source of all wisdom, but most webmasters will be satisfied 
once it passes their pages. What am I supposed to tell them? "Go fiddle with your web server's settings 
because Safari thinks your XHTML files are HTML 4.0, even though you put an explicit doctype 
declaration inside them?"

3) I'm going to go out on a limb here, and I haven't checked the standards recently, but wasn't the 
XHTML syntax meant to be mostly compatible with HTML? Is it really so crazy to serve XHTML as text/
html, expecting XHTML-compliant browsers to handle it correctly (recognizing it through the doctype), 
and older browsers to make a best effort at rendering it as HTML, instead of deciding to download it to 
disk because it has an unrecognized MIME type?
Comment 7 Eric Seidel (no email) 2005-12-31 13:50:03 PST
(In reply to comment #6)
> You're assuming too much. ;-)

My appologies. :)  I did assume too much.  And you are correct, there is a real Safari bug here.  We 
should handle an unclosed <style> element better than we do.

> 1) is it wise to ignore the doctype specified inside the file and trust the HTTP content-type instead? 
The 
> doctype declaration is put in by the author, while the content-type is chosen by a webserver which is 
> often misconfigured, as you all know: it seems to me that the former should be considered more 
> reliable.

Unfortunately I'm not familiar enough all the reasons for this decision.  If you could show some other 
browser respecting a <meta> tag over the server's sent Content-Type, I'd be very interested to know.  I 
think you'll find that our behavior here is in line with other browsers.

> 2) The W3C HTML Validator has no complaints about an XML file uploaded with an .html extension, 
or 
> served as text/html: http://validator.w3.org/check?uri=http%3A%2F%2Fbugzilla.opendarwin.org%
> 2Fattachment.cgi%3Fid%3D5400
> I understand that the validator is not the source of all wisdom, but most webmasters will be satisfied 
> once it passes their pages. What am I supposed to tell them? "Go fiddle with your web server's 
settings 
> because Safari thinks your XHTML files are HTML 4.0, even though you put an explicit doctype 
> declaration inside them?"

I think that you'll find that FireFox/IE treats this file as html as well.  If you don't believe me, try putting 
invalid xhtml in it, and watch them not complain (as the xhtml spec would require, were they really 
parsing as xhtml).  Instead I think you'll find that this page is going through thier html parsers and 
parsed with all the forgivingness that the text/html content type requires.

The validator you referenced does correctly identify this as xhtml 1.0.  I don't think it looks at the 
Content-Type: header when doing so.  Safari also handles the page perfectly fine when it's sent with an 
xhtml Content-Type.

> 3) I'm going to go out on a limb here, and I haven't checked the standards recently, but wasn't the 
> XHTML syntax meant to be mostly compatible with HTML? Is it really so crazy to serve XHTML as 
text/
> html, expecting XHTML-compliant browsers to handle it correctly (recognizing it through the 
doctype), 
> and older browsers to make a best effort at rendering it as HTML, instead of deciding to download it 
to 
> disk because it has an unrecognized MIME type?

Yes, it is meant to be compatble.  But not *all* xhtml 1.0 is compatible with HTML.  For example, if you 
check the html 4.01 spec, you'll see that <style>*requires* an end tag.  So this <style> tag w/ a bogus 
"/" attribute, and no end, is invalid html.  Again, Safari should handle it better than we do, but the html 
in invalid.

http://www.w3.org/TR/html4/present/styles.html#h-14.2.3

Perhaps other WebKit hackers who know more about the decision to respect Content-Type over a 
<meta> tag will be able to more fully answer your above questions.  Thanks again for reporting the 
bug.  We'll definitely look into the issue of the unterminated <style> tag not being handled properly by 
our html parser.
Comment 8 camillo.lists 2006-01-01 11:12:18 PST
(In reply to comment #7)
> Unfortunately I'm not familiar enough all the reasons for this decision.  If you could show some other 
> browser respecting a <meta> tag over the server's sent Content-Type, I'd be very interested to know.  
I 
> think you'll find that our behavior here is in line with other browsers.

I tested Firefox 1.5 with an XHTML+SVG document, and indeed it seems to give precedence to the 
Content-Type (the SVG is not visible when the content-type is text/html, which suggests that it's using 
the HTML parser, that has no support for namespaces).

However, I can give you something better than the example of another browser: the XHTML 1.0 
standard _itself_, which is an XHTML 1.0 Strict document, is served as text/html:

$ curl -I 'http://www.w3.org/TR/xhtml1/'
HTTP/1.1 200 OK
Date: Sun, 01 Jan 2006 18:30:23 GMT
Server: Apache/1.3.33 (Unix) PHP/4.3.10
P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
Cache-Control: max-age=21600
Expires: Mon, 02 Jan 2006 00:30:23 GMT
Last-Modified: Thu, 01 Aug 2002 13:56:02 GMT
ETag: "3d493df2"
Accept-Ranges: bytes
Content-Length: 71514
Content-Type: text/html; charset=utf-8

Surely the people who wrote the XHTML standard would know how to serve it properly. :-)
This example strongly suggests that both Firefox and Safari (and any other browsers that do the same 
thing) are in error, and the doctype declaration should take precedence over the Content-Type.

> I think that you'll find that FireFox/IE treats this file as html as well.  If you don't believe me, try 
putting 
> invalid xhtml in it, and watch them not complain (as the xhtml spec would require, were they really 
> parsing as xhtml).  Instead I think you'll find that this page is going through thier html parsers and 
> parsed with all the forgivingness that the text/html content type requires.

Honestly, I don't think anyone is writing an XHTML doctype declaration and relying on it being ignored. 
:-)
But if you want to be forgiving, I have no problem with that. The problem is that your forgiveness is 
currently misdirected: you are accepting things that are invalid, and rejecting things which are perfectly 
valid.

> The validator you referenced does correctly identify this as xhtml 1.0.  I don't think it looks at the 
> Content-Type: header when doing so.  Safari also handles the page perfectly fine when it's sent with 
an 
> xhtml Content-Type.

And that's exactly the problem: you shouldn't rely on the Content-Type when it's explicitly contradicted 
by the doctype. Now, if the Content-Type was image/jpeg, I'd understand assuming that it is correct 
and not looking for a doctype. But when parsing text/html, you have to read and parse the doctype 
anyway, so why not listen to it?

> Yes, it is meant to be compatble.  But not *all* xhtml 1.0 is compatible with HTML.  For example, if 
you 
> check the html 4.01 spec, you'll see that <style>*requires* an end tag.  So this <style> tag w/ a 
bogus 
> "/" attribute, and no end, is invalid html.  Again, Safari should handle it better than we do, but the 
html 
> in invalid.

But the idea was not to have all browsers treat is as HTML. The idea was to have modern browsers treat 
is as what it is, ie XHTML, and let old browsers attempt to parse it as HTML (which often gives 
acceptable results). And the W3C itself seems to think that there is nothing wrong with serving XHTML 
as text/html, as shown above. After all, isn't XHTML 1.0 the legitimate successor to HTML 4?

> Perhaps other WebKit hackers who know more about the decision to respect Content-Type over a 
> <meta> tag will be able to more fully answer your above questions.  Thanks again for reporting the 
> bug.  We'll definitely look into the issue of the unterminated <style> tag not being handled properly 
by 
> our html parser.

Thank you for that, and also for taking the time to discuss this issue with me. I look forward to hearing 
any clarifications that other WebKit developers will want to give.
Comment 9 Alexey Proskuryakov 2006-01-07 04:57:25 PST
It is a known issue (sorry, couldn't find it in Bugzilla) that WebKit sends an "Accept: */*" header in its 
requests, which causes most servers to provide text/html, not application/xhtml+xml.

There's a lot of material about Content-Type handling available; see, for example, <http://
ppewww.ph.gla.ac.uk/~flavell/www/content-type.html> or <http://www.hixie.ch/advocacy/xhtml> (the 
latter is displayed incorrectly in Safari, which is also known). In short, it's not possible or desirable to rely 
on DOCTYPE when there is a contradicting Content-Type header.
Comment 10 camillo.lists 2006-01-07 06:57:14 PST
Thanks, I understand the issue now.
Comment 11 David Kilzer (:ddkilzer) 2006-02-25 10:03:18 PST
The fix for this bug is very similar to Bug 3905, so I'm taking this bug.  Patch in a few.
Comment 12 David Kilzer (:ddkilzer) 2006-02-25 16:59:28 PST
In Firefox 1.5.0.1, a document with a <style> tag with no closing </style> causes what would be the contents of the <style></style> tags to be put into the body of the document, and the single <style> tag is ignored.

In MSIE 6, the browser consumes everything after the single <style> tag to the end of the document, then adds </style></body></html> to the end of the document if the unmatched <style> tag is in the body, or adds </style></head><body></body></html> if the unmatched <style> tag is in the head.
Comment 13 David Kilzer (:ddkilzer) 2006-02-26 05:43:17 PST
(In reply to comment #11)
> The fix for this bug is very similar to Bug 3905, so I'm taking this bug. 
> Patch in a few.

I'm waiting on Bug 3905 to be fixed since the patch to fix this bug will have to modify code from that bug.
Comment 14 David Kilzer (:ddkilzer) 2006-03-07 20:34:48 PST
Hixie: In HTML 5, will missing </style> tags be treated the same as missing </title> tags, e.g., if the whole document is parsed and no </style> tag is found, should the document be reparsed starting after the <style> tag with no special handling such that the <style> tag is implicitly closed when the next tag is found?
Comment 15 David Kilzer (:ddkilzer) 2006-03-12 19:07:23 PST
I asked Ian Hixie about how HTML 5 would handle missing </style> tags, and he brought up a security issue:

[3:41pm] Hixie: ddk: that's what my testing found too
[3:42pm] ddk: Hixie: How can a missing </style> tag cause a security issue?  I'm having a hard time figuring out how that would be exploited (other than a buffer overflow or something from too much input).
[3:44pm] Hixie: ddk: make a victim site display the content "<style> <script> evil script </script> </style>" (which would pass the site's security test because <style> is safe) then cause the network connection to abort just before the </style> tag and the UA will execute the script.
[3:44pm] Hixie: ddk: same reason comments can't safely be reparsed
Comment 16 David Kilzer (:ddkilzer) 2006-03-12 19:50:13 PST
Created attachment 7041 [details]
Patch v1

This patch is very similar to the final patch for Bug 3905 (less the SegmentedString changes).

Please note the security concerns from Comment #15, although I don't see how you get any more protection from an attacker since the attacker could simply add the missing </style> tag (unless you're relying on another piece of software to scan the HTML before it gets to the browser).
Comment 17 Darin Adler 2006-03-12 20:39:51 PST
Comment on attachment 7041 [details]
Patch v1

Looks nice, r=me.
Comment 18 David Kilzer (:ddkilzer) 2006-03-19 06:10:15 PST
Verified in r13385.
Comment 19 Darin Adler 2006-03-19 09:50:24 PST
Looks like there's a major problem with this fix and the previous one for <title>.

The problem is that at the time the tokenizer is called we don't necessarily have the entire source. In layout tests we parse things as one big chunk so it's not an issue. So when the code sees that there's no "src" left that doesn't mean we are at the end of the entire document -- it just means that we're at the end of what we currently have. Additional write() calls can happen later to give us more data.

So this code is kicking in, in cases where it should not, causing major regressions. I'm going to roll this fix out and we'll have to come up with one that works even when subsequent writes occur.
Comment 20 Darin Adler 2006-03-19 09:52:12 PST
We can probably write some tests for the broken case using document.write -- not sure.
Comment 21 Darin Adler 2006-03-19 09:53:46 PST
The saved state will have to go in a member variable of the tokenizer, and the whole thing will need to be tracked by the state machine instead of done all in one place.
Comment 22 Jon 2006-03-19 10:00:47 PST
The patch for this bug (committed as r13381) has broken pages which use inline styles. These include forums.macnn.com and phpbb.com, among others. Both of these pages are broken in the same way, their content is rendered without the styles that are declared within the document and not in an external file. 
Comment 23 David Kilzer (:ddkilzer) 2006-03-19 10:48:09 PST
(In reply to comment #19)
> Looks like there's a major problem with this fix and the previous one for
> <title>.

D'OH!  That's Bug 3905.
Comment 24 David Kilzer (:ddkilzer) 2006-06-12 08:32:47 PDT
Reassigning this bug back to webkit-unassigned in case anyone else is interested in fixing it.  I plan to look at it in a couple weeks.

Before this is fixed, an http test needs to be created that reliably reproduces the latest problem found (see Comment #19).
Comment 25 Joost de Valk (AlthA) 2006-07-06 05:49:17 PDT
ddkilzer, any plans of looking at this one again? :)
Comment 26 Joost de Valk (AlthA) 2006-07-06 05:54:47 PDT
*** Bug 9443 has been marked as a duplicate of this bug. ***
Comment 27 Joost de Valk (AlthA) 2006-07-06 05:55:12 PDT
*** Bug 8772 has been marked as a duplicate of this bug. ***
Comment 28 Robert Burns 2006-09-14 19:27:05 PDT
this was due to loading XHTML as text/html instead of application/xhtml+xml. Once this was fixed the elements werre no longer moved to the body. It still may be a bug to move HTML elements about in a document (as opposed to leaving them in place and not rendering or processing them), but it's not a bug with the XML implementation as I had reported.
Comment 29 David Kilzer (:ddkilzer) 2006-09-14 20:19:22 PDT
(In reply to comment #28)
> this was due to loading XHTML as text/html instead of application/xhtml+xml.
> Once this was fixed the elements werre no longer moved to the body. It still
> may be a bug to move HTML elements about in a document (as opposed to leaving
> them in place and not rendering or processing them), but it's not a bug with
> the XML implementation as I had reported.

I believe the above comment was meant for Bug 10507.
Comment 30 Robert Burns 2006-09-19 12:56:00 PDT
(In reply to comment #29)
> (In reply to comment #28)
> > this was due to loading XHTML as text/html instead of application/xhtml+xml.
> > Once this was fixed the elements werre no longer moved to the body. It still
> > may be a bug to move HTML elements about in a document (as opposed to leaving
> > them in place and not rendering or processing them), but it's not a bug with
> > the XML implementation as I had reported.
> 
> I believe the above comment was meant for Bug 10507.
> 

Yes, that's correct. Sorry for the mixup.
Comment 31 Ian 'Hixie' Hickson 2007-12-27 23:22:30 PST
I believe this is INVALID. Reparsing is a security risk and HTML5 says not to. Other browsers are converging on that behaviour too.