Currently, WebKit relies on the networking library to perform content sniffing. This means every port has to roll their own, creating compatibility / security differences between WebKit ports. I spoke to Maceij on #webkit today, and we agreed that we should have our own sniffer in WebKit. We should base our content sniffer on the HTML 5 sniffing algorithm. Ideally, we'd converge with Firefox on a common, standardized algorithm. http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc might be a decent starting place. If no one else gets to this soon, I'll probably be able to work on this in a few months.
Created attachment 30069 [details] Step 1: Add sniffer entry points Starting to work on this. Adding entry points to the sniffing algorithm.
Created attachment 30073 [details] Step 2: Hook sniffer into network stack Work in progress. I'll nominate for review when its appropriate. Let me know if you have feedback about the design of how I've hooked this in.
Created attachment 30074 [details] Step 2: Hook sniffer into network stack (improved) Better separation of concerns.
Created attachment 30080 [details] Step 2: Hook sniffer into network stack (ready for review) This version is ready for review.
Created attachment 30433 [details] Step 3: Implement sniffing algorithm This is a port of the chromium sniffer. This is probably a reasonable starting point. We can tune it in subsequent steps, if you like. This code is BSD licensed in the Chromium tree, which should fine to incorporate into WebKit.
Comment on attachment 30069 [details] Step 1: Add sniffer entry points This part looks ok. Though I am not sure it makes sense to send by itself.
Comment on attachment 30080 [details] Step 2: Hook sniffer into network stack (ready for review) The low-level details of the code look ok, but I'm not entirely happy with the design approach. It seems like overkill to create a generic forwarding resource handle client just to attach some built-in behavior to didReceiveResponse. I think it would make more sense to implement an appropriate method in ResourceHandleBase and require ResourceHandle subclasses to call that method at the appropriate point. Also: This adds a call to shouldSniffMIMEType, but no call to sniffMIMEType. Part 3 doesn't call it either, so how does any actual sniffing ever happen? It might actually be simpler to combine this and part 1 with the patch to add the sniffer logic. Seems like it would be easier to review it all as a unit so that the reviewer can follow the logic. r- for the above.
I'm happy to collapse these steps into one patch if that would be easier to review. I was worried about creating something that was too big to review effectively. No actual sniffing is going on yet. The remaining steps are: 4) Call the sniffer once we receive bytes from the network (i.e., in didReceiveData). 5) Buffer network bytes if we can't guess the type from the first bytes we receive. 6) Disable the existing platform-level sniffer. Would you like all these in one patch?
Comment on attachment 30433 [details] Step 3: Implement sniffing algorithm This can't land for at least the following reasons: 1) In light of the r- of part 2. 2) Because there are no test cases included. Here are some other comments on the details: A) These macros seem a little clunky. Perhaps it would be better to have a .in-type file to generate a source file that has the static sniffing tables. > #define MAGIC_NUMBER(mimeType, magic) \ > { (mimeType), (magic), sizeof(magic)-1, false }, B) For the sniffing rules that don't come from the HTML5 spec, should they be proposed for HTML5? C) On sniffing feeds - I think the security risk of misinterpreting a non-feed XML document as a feed is low, so I would suggest to err on the side of being more generous. I think the code here will not sniff a feed in all the cases that Safari will, which is probably unwise. D) Nothing in this patch set seems to turn off the built-in sniffing of NSURL, CFNetwork, or other network layers that may do their own sniffing. In addition, I'll try to do some research on CFNetwork's current sniffing rules to see if there are any important ones that should be added here. Nice work so far, please revise per above.
Comment on attachment 30069 [details] Step 1: Add sniffer entry points flagging as - in retrospect, because it doesn't make sense to land this in isolation, and the patchset should probably be combined anyway.
Sounds like we're not planning to do this anytime soon.