Bug 15338

Summary: Error-handling when multiple conflicting Content-Type headers sent differs from Firefox
Product: WebKit Reporter: Henk <henk.kampman>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Major CC: joost, mnot, mrowe
Priority: P2 Keywords: InRadar
Version: 523.x (Safari 3)   
Hardware: Mac   
OS: OS X 10.4   
URL: http://www.kelderbouw.nl/

Description Henk 2007-10-01 23:39:24 PDT
Instead of a rendered page you'll see the following error message:

"This page contains the following errors:

error on line 3 at column 6: XML declaration allowed only at the start of the document
Below is a rendering of the page up to the first error."

Also reproducable with the latest Leopard beta.

No problems with Safari 2.0.4, FireFox or IE

Same problem with the latest
Comment 1 Mark Rowe (bdash) 2007-10-02 00:15:12 PDT
mrowe@daisy:~$ curl -H "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9" -I http://www.kelderbouw.nl/
HTTP/1.1 200 OK
Date: Tue, 02 Oct 2007 07:14:23 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Set-Cookie: RQFW={6F44610A-4A15-4D2E-9754-AAC41B089604}; path=/;
Content-Type: application/xhtml+xml
Content-Length: 10466
Content-Type: text/html
Set-Cookie: ASPSESSIONIDCSDSASBC=NHNDPFFAJDFOLILCMPMAMHLF; path=/
Cache-control: private

mrowe@daisy:~$ 

It's returning two Content-Type headers.  The first one is application/xhtml+xml so we're parsing the document as XML, which is stricter than HTML parsing.
Comment 2 Mark Rowe (bdash) 2007-10-02 00:17:30 PDT
This works in Firefox because it's treating it as text/html, not application/xhtml+xml.  This can be seen by looking at the "Type" field in the "Page Info" dialog after the page has loaded.
Comment 3 Mark Rowe (bdash) 2007-10-02 00:28:30 PDT
From http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2:
"Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list."

Content-Type is not defined as taking a comma-separated list, so it's not valid for the server to be sending it multiple times.  The "correct" fix would be for the web site in question to be fixed.  Tweaking WebKit's error-handling logic in this case should also be possible to paper over the problem.
Comment 4 Henk 2007-10-02 01:07:31 PDT
"The "correct" fix would be for the web site in question to be fixed."

I fully agree, however try explaining that to a Safari user :)

A better solution would be to use the same  aproach as FireFox by using the last defined value of a header.
Comment 5 Mark Rowe (bdash) 2007-10-02 01:14:46 PDT
I'm not sure it's as simple as that.  I've looked further at the rules we use for picking which value of the Content-Type header and it's not as simple as "the first" or "the last".  Fixing this specific issue without introducing other compatibility issues may be tricky.
Comment 6 Mark Rowe (bdash) 2007-10-02 01:19:09 PDT
It may also be worth contacting the site and making sure they are aware of the issue.  It should be a relatively simple issue to resolve from their end.  I would send them an email, but due to the content being in Dutch I'm not sure precisely who I should contact.
Comment 7 Mark Rowe (bdash) 2007-10-02 01:36:45 PDT
http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsURLHelper.cpp#834 appears to be the related Mozilla code that parses the Content-Type header.  It looks as though it will simply return the last header found if multiple values are present in the one header.  If multiple headers are present, the code at http://lxr.mozilla.org/seamonkey/source/netwerk/protocol/http/src/nsHttpResponseHead.cpp#221 suggests that the last header seen will be respected.
Comment 8 Mark Rowe (bdash) 2007-10-02 01:37:49 PDT
This leaves me wondering why CFNetwork does something more complicated.  I'm going to pull this bug into Radar so that team can take a look at it.  This feels like something that should be fixed at a lower level.
Comment 9 Mark Rowe (bdash) 2007-10-02 01:58:10 PDT
<rdar://problem/5517248>
Comment 10 David Kilzer (:ddkilzer) 2007-10-02 05:06:00 PDT
(In reply to comment #6)
> It may also be worth contacting the site and making sure they are aware of the
> issue.  It should be a relatively simple issue to resolve from their end.  I
> would send them an email, but due to the content being in Dutch I'm not sure
> precisely who I should contact.

Perhaps Joost or Henk could send them a message.