Bug 17360 - REGRESSION: mp4 file downloaded from server is downloaded as html
Summary: REGRESSION: mp4 file downloaded from server is downloaded as html
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: Nobody
URL: http://www.slicerecords.com/chopin.m4a
Keywords: InRadar, Regression
Depends on:
Blocks:
 
Reported: 2008-02-14 05:33 PST by Michael DiMaria
Modified: 2008-02-20 14:53 PST (History)
5 users (show)

See Also:


Attachments
Screenshot of Safari 2 with original WebKit (226.78 KB, image/png)
2008-02-14 16:48 PST, David Kilzer (:ddkilzer)
no flags Details
Ignore Content-Type: text/plain if the extension is of a non-textual type (12.94 KB, patch)
2008-02-15 01:01 PST, mitz
no flags Details | Formatted Diff | Diff
Ignore Content-Type: text/plain if the extension is of a non-textual type (27.82 KB, patch)
2008-02-15 11:51 PST, mitz
ap: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael DiMaria 2008-02-14 05:33:43 PST
When this file is accessed from Safari (without nightly) the file downloads as an mp4 as should. When accessed with the nightly build it downloads as an html.
Comment 1 Robert Blaut 2008-02-14 06:40:07 PST
Incorrect MIME type.

curl -I "http://www.slicerecords.com/chopin.m4a"
HTTP/1.1 200 OK
Date: Thu, 14 Feb 2008 19:30:09 GMT
Server: Apache/1.3.33 (Unix) DAV/1.0.3 mod_perl/1.29 PHP/4.4.4 mod_ssl/2.8.22 OpenSSL/0.9.7e
Last-Modified: Tue, 12 Feb 2008 22:53:26 GMT
ETag: "3a2c1-4e2ece-47b22366"
Accept-Ranges: bytes
Content-Length: 5123790
Content-Type: text/plain


Comment 2 Mark Rowe (bdash) 2008-02-14 07:10:15 PST
Confirmed.  This displays as text in a recent build of TOT but is treated as a download by Safari 3.0.4.
Comment 3 Mark Rowe (bdash) 2008-02-14 07:11:12 PST
<rdar://problem/5743131>
Comment 4 Alexey Proskuryakov 2008-02-14 14:06:23 PST
Regressed in r26758.

Formally, the new behavior is correct, as the content type is text/plain. However, Firefox detects the file type correctly (as AAC audio), and suggests to download it.
Comment 5 Robert Blaut 2008-02-14 14:35:32 PST
Opera also suggests to download the file.

I found in this doc: http://developer.mozilla.org/en/docs/How_Mozilla_determines_MIME_Types the clause regarding this issue and how Gecko copes with such issues:

"When the Content-Type sent by the server is one of (case-sensitively)

* text/plain
* text/plain; charset=ISO-8859-1
* text/plain; charset=iso-8859-1

and the server did not send a Content-Encoding header, Mozilla will sniff the first block of data it gets and check for non-text bytes. Text bytes are 9-13, 27, and 31-255. When encountering a non-text byte, the helper app dialog will be shown, showing the MIME type corresponding to the extension of the file."
Comment 6 David Kilzer (:ddkilzer) 2008-02-14 16:48:47 PST
Created attachment 19131 [details]
Screenshot of Safari 2 with original WebKit
Comment 7 mitz 2008-02-15 01:01:11 PST
Created attachment 19134 [details]
Ignore Content-Type: text/plain if the extension is of a non-textual type

No change log or test yet (if a test is even possible). Things that require special attention:
1. The set of extensions (I picked extensions out of Leopard's /etc/apache2/mime.types).
2. Should this be kept Leopard-only?
3. Is this the right way to override a method in a category?
Comment 8 Alexey Proskuryakov 2008-02-15 01:28:42 PST
Comment on attachment 19134 [details]
Ignore Content-Type: text/plain if the extension is of a non-textual type

r=me, with comments.

+ * Copyright (C) 2008 Apple Inc. All rights reserved.

I think we need a three-clause license, such as the one in WebTextRenderer.h.

+static NSSet *createBinaryExtensionsSet()

Where does this list come from? Could you add .eps, .ps, .ttf, .otf, .pfa, .pfb, .qxp, .indd, .ai, to make publishing folks a little happier?

+        return [binaryExtensions containsObject:[[self suggestedFilename] pathExtension]] ? MIMEType : @"text/plain";

Is there lowercasing performed in some place that I missed?
Comment 9 Alexey Proskuryakov 2008-02-15 01:31:04 PST
Comment on attachment 19134 [details]
Ignore Content-Type: text/plain if the extension is of a non-textual type

Oops, overlooked the comments that came with the patch. I now see where the list came from (would be nice to extend it though). I think it helps to fix the problem on Tiger, too - but I do not have an answer to #3.
Comment 10 Robert Blaut 2008-02-15 03:43:52 PST
If I correctly read the code the proposed patch only looks at extension of the file. No content sniffing is performed. Unfortunately I don't think it's correct approach. If it would be landed Webkit will fail a couple of tests from this test suite: http://hixie.ch/tests/adhoc/http/content-type/sniffing/

Current situation is incorrect either ;)
Comment 11 Robert Blaut 2008-02-15 03:49:54 PST
HTML5 working draft have proposition how to deal with the issues -> http://www.whatwg.org/specs/web-apps/current-work/#content-type-sniffing
Comment 12 Alexey Proskuryakov 2008-02-15 03:57:21 PST
(In reply to comment #10)
> If I correctly read the code the proposed patch only looks at extension of the
> file. No content sniffing is performed. Unfortunately I don't think it's
> correct approach.

That's true - content sniffing is supposed to be performed by the HTTP stack, not by WebKit. So, this is only a temporary workaround (as a comment in the code states).
Comment 13 mitz 2008-02-15 11:51:25 PST
Created attachment 19141 [details]
Ignore Content-Type: text/plain if the extension is of a non-textual type
Comment 14 Alexey Proskuryakov 2008-02-15 13:24:46 PST
Comment on attachment 19141 [details]
Ignore Content-Type: text/plain if the extension is of a non-textual type

"Binrary data not loaded as text. Maybe PASS."
(1) a typo (2) how will one be sure? ;)

r=me
Comment 15 mitz 2008-02-15 13:31:50 PST
Fixed in <http://trac.webkit.org/projects/webkit/changeset/30323>.
Comment 16 Robert Blaut 2008-02-20 14:53:30 PST
*** Bug 10003 has been marked as a duplicate of this bug. ***