25064 – Implement a content sniffer

RESOLVED WONTFIX 25064

Implement a content sniffer

https://bugs.webkit.org/show_bug.cgi?id=25064

Summary Implement a content sniffer

Adam Barth

Reported 2009-04-06 16:08:16 PDT

Currently, WebKit relies on the networking library to perform content sniffing. This means every port has to roll their own, creating compatibility / security differences between WebKit ports. I spoke to Maceij on #webkit today, and we agreed that we should have our own sniffer in WebKit. We should base our content sniffer on the HTML 5 sniffing algorithm. Ideally, we'd converge with Firefox on a common, standardized algorithm. http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc might be a decent starting place. If no one else gets to this soon, I'll probably be able to work on this in a few months.

Attachments
Step 1: Add sniffer entry points (12.21 KB, patch) 2009-05-06 15:30 PDT, Adam Barth	mjs: review-	Details Formatted Diff Diff
Step 2: Hook sniffer into network stack (23.94 KB, patch) 2009-05-06 16:40 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Step 2: Hook sniffer into network stack (improved) (30.72 KB, patch) 2009-05-06 16:53 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Step 2: Hook sniffer into network stack (ready for review) (33.44 KB, patch) 2009-05-06 18:38 PDT, Adam Barth	mjs: review-	Details Formatted Diff Diff
Step 3: Implement sniffing algorithm (18.49 KB, patch) 2009-05-17 19:45 PDT, Adam Barth	mjs: review-	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Adam Barth

Comment 1 2009-05-06 15:30:49 PDT

Created attachment 30069 [details] Step 1: Add sniffer entry points Starting to work on this. Adding entry points to the sniffing algorithm.

Adam Barth

Comment 2 2009-05-06 16:40:55 PDT

Created attachment 30073 [details] Step 2: Hook sniffer into network stack Work in progress. I'll nominate for review when its appropriate. Let me know if you have feedback about the design of how I've hooked this in.

Adam Barth

Comment 3 2009-05-06 16:53:10 PDT

Created attachment 30074 [details] Step 2: Hook sniffer into network stack (improved) Better separation of concerns.

Adam Barth

Comment 4 2009-05-06 18:38:32 PDT

Created attachment 30080 [details] Step 2: Hook sniffer into network stack (ready for review) This version is ready for review.

Adam Barth

Comment 5 2009-05-17 19:45:18 PDT

Created attachment 30433 [details] Step 3: Implement sniffing algorithm This is a port of the chromium sniffer. This is probably a reasonable starting point. We can tune it in subsequent steps, if you like. This code is BSD licensed in the Chromium tree, which should fine to incorporate into WebKit.

Maciej Stachowiak

Comment 6 2009-05-22 21:36:23 PDT

Comment on attachment 30069 [details] Step 1: Add sniffer entry points This part looks ok. Though I am not sure it makes sense to send by itself.

Maciej Stachowiak

Comment 7 2009-05-22 21:42:40 PDT

Comment on attachment 30080 [details] Step 2: Hook sniffer into network stack (ready for review) The low-level details of the code look ok, but I'm not entirely happy with the design approach. It seems like overkill to create a generic forwarding resource handle client just to attach some built-in behavior to didReceiveResponse. I think it would make more sense to implement an appropriate method in ResourceHandleBase and require ResourceHandle subclasses to call that method at the appropriate point. Also: This adds a call to shouldSniffMIMEType, but no call to sniffMIMEType. Part 3 doesn't call it either, so how does any actual sniffing ever happen? It might actually be simpler to combine this and part 1 with the patch to add the sniffer logic. Seems like it would be easier to review it all as a unit so that the reviewer can follow the logic. r- for the above.

Adam Barth

Comment 8 2009-05-22 21:52:13 PDT

I'm happy to collapse these steps into one patch if that would be easier to review. I was worried about creating something that was too big to review effectively. No actual sniffing is going on yet. The remaining steps are: 4) Call the sniffer once we receive bytes from the network (i.e., in didReceiveData). 5) Buffer network bytes if we can't guess the type from the first bytes we receive. 6) Disable the existing platform-level sniffer. Would you like all these in one patch?

Maciej Stachowiak

Comment 9 2009-05-22 21:56:47 PDT

Comment on attachment 30433 [details] Step 3: Implement sniffing algorithm This can't land for at least the following reasons: 1) In light of the r- of part 2. 2) Because there are no test cases included. Here are some other comments on the details: A) These macros seem a little clunky. Perhaps it would be better to have a .in-type file to generate a source file that has the static sniffing tables. > #define MAGIC_NUMBER(mimeType, magic) \ > { (mimeType), (magic), sizeof(magic)-1, false }, B) For the sniffing rules that don't come from the HTML5 spec, should they be proposed for HTML5? C) On sniffing feeds - I think the security risk of misinterpreting a non-feed XML document as a feed is low, so I would suggest to err on the side of being more generous. I think the code here will not sniff a feed in all the cases that Safari will, which is probably unwise. D) Nothing in this patch set seems to turn off the built-in sniffing of NSURL, CFNetwork, or other network layers that may do their own sniffing. In addition, I'll try to do some research on CFNetwork's current sniffing rules to see if there are any important ones that should be added here. Nice work so far, please revise per above.

Maciej Stachowiak

Comment 10 2009-05-22 21:57:19 PDT

Comment on attachment 30069 [details] Step 1: Add sniffer entry points flagging as - in retrospect, because it doesn't make sense to land this in isolation, and the patchset should probably be combined anyway.

Adam Barth

Comment 11 2011-05-23 10:14:55 PDT

Sounds like we're not planning to do this anytime soon.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution WONTFIX

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component Platform

Assignee

Adam Barth

Reported

2009-04-06 16:08 PDT

Modified

2011-05-23 13:44 PDT History

CC List

12 users Show

URL

Keywords

Depends on

Blocks

25939

Dependencies

tree graph