Bug 67882 - Remove DocumentWriter::deprecatedFrameEncoding()
Summary: Remove DocumentWriter::deprecatedFrameEncoding()
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on: 75905
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-09 20:31 PDT by Eric Seidel (no email)
Modified: 2012-01-18 06:12 PST (History)
8 users (show)

See Also:


Attachments
Patch (18.65 KB, patch)
2011-09-09 20:34 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Mock HTTP response (150 bytes, application/octet-stream)
2011-09-09 20:38 PDT, Adam Barth
no flags Details
Updated to apply to tip of tree (18.56 KB, patch)
2011-09-09 20:51 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (9.66 KB, patch)
2011-09-10 00:41 PDT, Adam Barth
eric: review+
eric: commit-queue+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2011-09-09 20:31:09 PDT
Remove DocumentWriter::deprecatedFrameEncoding() by implementing RFC 6266
Comment 1 Eric Seidel (no email) 2011-09-09 20:34:34 PDT
Created attachment 106957 [details]
Patch
Comment 2 Adam Barth 2011-09-09 20:38:04 PDT
Created attachment 106958 [details]
Mock HTTP response

To test whether we can handle UTF-8 encoded filename tokens, download this attachment and run:

nc -l 9999 < resp.utf8

Next, navigate Safari to http://localhost:9999/xyzzy

Check whether the downloaded file has a file name consisting of Chinese characters.
Comment 3 Eric Seidel (no email) 2011-09-09 20:51:57 PDT
Created attachment 106959 [details]
Updated to apply to tip of tree
Comment 4 Alexey Proskuryakov 2011-09-09 22:47:55 PDT
Comment on attachment 106959 [details]
Updated to apply to tip of tree

The attached patch doesn't implement any new encoding scheme, it only removes existing support for non-ASCII file names. So, it will break downloading files with non-ASCII names from Web sites. That's of course unacceptable.

There is more processing going on before Foundation puts a suggested file name into NSURLResponse, so new encoding schemes for Content-Disposition cannot be implemented in WebKit.

We'll need to pass suggested encoding to network layer in foreseeable future (perhaps as long as there remains a user visible concept of "file"), so what we need is to give deprecatedFrameEncoding a better name, and to remove comments implying that it's going away.
Comment 5 Adam Barth 2011-09-09 22:58:37 PDT
Did you run the test described in Comment #2 with the patch applied?
Comment 6 Adam Barth 2011-09-09 23:08:29 PDT
Put another way, is your concern that this patch does not correctly implement the consensus among browser vendors or is your concern that you don't agree with that consensus?
Comment 7 Alexey Proskuryakov 2011-09-09 23:34:31 PDT
> Did you run the test described in Comment #2 with the patch applied?

I didn't test it, but I'm sure that it would pass (CFNetwork uses UTF-8 as default fallback). The practical issue is with pages that use other encodings for file names in Content-Disposition.

> Put another way, is your concern that this patch does not correctly implement the consensus among browser vendors or is your concern that you don't agree with that consensus?

I don't know if there has ever been consensus. I certainly voiced concerns against the RFC, and didn't follow it afterwards to have a chance to agree. The blogs.msdn.com post has a fairly heated argument between RFC editor and Eric Lawrence, too.

The same blogs.msdn.com post doesn't say anything about dropping support for non-UTF-8 raw bytes, and I doubt that they did that. Can you test that? Note that to test IE, you need to change the primary language on Windows.

FWIW, old school specs don't acknowledge the fact that browsers have and use default encodings, so you can never just go ahead and implement those to the letter. RFC 6266 in particular fails to mention default encoding even in non-normative section that discusses existing implementations.
Comment 8 Adam Barth 2011-09-09 23:46:36 PDT
Ok.  If you'd prefer Safari to retain this broken behavior, we can put this code behind the appropriate port-specific flags.  As things stand currently, this code only affects CFNetwork anyway.
Comment 9 Adam Barth 2011-09-09 23:48:58 PDT
> I didn't test it, but I'm sure that it would pass (CFNetwork uses UTF-8 as default fallback).

I'd encourage you to test before making such statements.  CFNetwork (at least on Mac OS 10.6) does not appear to use UTF-8 as a fallback unless instructed, as it is in this patch.

> The practical issue is with pages that use other encodings for file names in Content-Disposition.

Correct.  As far as I can tell, everyone except Safari and Opera has agreed not to support those sites.  If you don't wish to converge with other browsers, that's fine, but I'd rather not have this junky code masquerade as not being port-specific.
Comment 10 Alexey Proskuryakov 2011-09-09 23:55:02 PDT
Putting this behind a flag seems fine. We should also stop calling the functions "deprecated", because an engineer seeing it is not supposed to do what they do when seeing other "deprecated" functions in WebKit code base.

> I'd encourage you to test before making such statements.  CFNetwork (at least on Mac OS 10.6) does not > appear to use UTF-8 as a fallback unless instructed, as it is in this patch.

That's surprising, but OK. I appreciate that you made sure not to break this case.

> As far as I can tell, everyone except Safari and Opera has agreed not to support those sites.

So, are you willing to test IE and Firefox to confirm whether they delivered?
Comment 11 Adam Barth 2011-09-09 23:56:51 PDT
> We should also stop calling the functions "deprecated", because an engineer seeing it is not supposed to do what they do when seeing other "deprecated" functions in WebKit code base.

On the contrary, I believe they should.  This function not be called by any more code and it should be removed as soon as possible, even if you're not willing to remove it today.
Comment 12 Alexey Proskuryakov 2011-09-10 00:13:18 PDT
I think that you'd be doing a disservice to WebKit contributors if you keep calling the function "deprecated". Eric has already wasted his time trying to remove it, and others can fall into the same trap.

In order to call existing code "junky", you need stronger evidence that it's incorrect - in the form of actually testing other browsers, or ideally in the form of testing the Internet.
Comment 13 Adam Barth 2011-09-10 00:27:33 PDT
Removing this function and it's associated paraphernalia is not a waste of time.  In fact, I would argue that it's the best thing for the project and for the web.  I wrote a long explanation, but I've deleted it because I don't feel like this is a productive line of discussion.
Comment 14 Adam Barth 2011-09-10 00:41:56 PDT
Created attachment 106969 [details]
Patch
Comment 15 Eric Seidel (no email) 2011-09-10 00:50:34 PDT
Comment on attachment 106969 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=106969&action=review

There is nothing contentious about this change.  This code is only used on the MAC and WIN ports and after investigations this evening is believed to be wrong, but isolating it to those ports for now will at least stop us from crashing here on other ports.

> Source/WebCore/loader/FrameLoader.cpp:2549
> +    // For a newly opened frame with an empty URL, encoding() should not be

You mean document->encoding(), no?
Comment 16 Adam Barth 2011-09-10 00:52:28 PDT
(In reply to comment #15)
> (From update of attachment 106969 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=106969&action=review
> 
> > Source/WebCore/loader/FrameLoader.cpp:2549
> > +    // For a newly opened frame with an empty URL, encoding() should not be
> 
> You mean document->encoding(), no?

This comment has rotted.  I'll update it before landing.
Comment 17 Adam Barth 2011-09-10 00:53:39 PDT
Committed r94902: <http://trac.webkit.org/changeset/94902>
Comment 18 Alexey Proskuryakov 2011-09-10 02:24:33 PDT
Eric, I think that you failed to exercise good judgement about what you should review and what you shouldn't.

This patch has multiple issues:
- investigation of other browsers behavior hasn't been performed, even though it has been explicitly requested;
- the patch has thus added a number of FIXME comments that have no basis in reality;
- it has a FIXME comment about CFNetwork-based ports behavior that you had no authority to make or approve;
- it introduces incorrect formatting of comments, even those that haven't been modified otherwise (we don't break lines at 80 characters);
- it does not do what bug title claims that it does.

This is an excessive amount of problems for a cleanup-only patch.
Comment 19 Adam Barth 2011-09-10 09:10:40 PDT
Look, if you don't want to participate in standards and you want to maintain Apply-only quirks, I can respect your desire for self-determination.  If you want to impune someone'e judgement because you don't like the format of the comments, that seems like you've lost the forest for the trees.
Comment 20 Alexey Proskuryakov 2011-09-10 10:33:53 PDT
So I've tested today's nightly of Firefox 9, and it of course respects referring frame encoding.
Comment 21 Adam Barth 2011-09-10 15:28:19 PDT
Look, we had this whole discussion in the IETF over six months ago.  I actually advocated the position you're espousing now.  There were many emails and lots of discussion about the pros and cons of various courses of action.  In the end, we decide not to use the referring page's encoding to understand the filename.  Standards are about compromise, and after that whole discussion I support the where the working group ended up.

It's a shame that you chose not to participate in the standards process, because that's the proper forum for these discussions.  Frankly, those of us who put in the time and effect to engage with other browser vendors and with the large standards community find your Monday morning quarterbacking distasteful.

Having said all that, no one is questioning your right to decide how the Apple port behaves in this regard.  If you want Apple's port to keep this quirk, I'm not going to stand in your way.

The patch we landed causes zero change in behavior for anyone.  From my point of view, the main benefit of this patch is that non-Apple ports no longer need to link in this code, which has been known to crash in the past.  That code was providing zero benefit to anyone besides Apple and now any stability risk imposed by executing it also resides squarely with Apple.

My hope is that in the long term you'll come around and decide both to participate in standards and to converge behavior with other browsers.  I can only lead you to water.  I can't make you drink.
Comment 22 Julian Reschke 2011-09-12 09:39:38 PDT
(In reply to comment #7)
> I don't know if there has ever been consensus. I certainly voiced concerns against the RFC, and didn't follow it afterwards to have a chance to agree. The blogs.msdn.com post has a fairly heated argument between RFC editor and Eric Lawrence, too.

It was the rough consensus of the IETF HTTPbis working group, based on discussions on that WG's mailing list. Some vendors showed up, some did not. 

And yes, Eric Lawrence was not happy with the direction, but in the end Microsoft couldn't make a proposal that would work better. In the meantime, IE9 added the RFC5987 encoding, so IE9 interoperates (*) with all other desktop browsers except Safari. (* with the restriction that they only support UTF-8)
 
> The same blogs.msdn.com post doesn't say anything about dropping support for non-UTF-8 raw bytes, and I doubt that they did that. Can you test that? Note that to test IE, you need to change the primary language on Windows.

I don't think any browser has dropped support for non-ASCII characters yet.

> FWIW, old school specs don't acknowledge the fact that browsers have and use default encodings, so you can never just go ahead and implement those to the letter. RFC 6266 in particular fails to mention default encoding even in non-normative section that discusses existing implementations.

Indeed it does not. If you believe that this would be helpful then please go ahead and propose additional text. RFC 6266 is likely going to be revised when HTTPbis is done.
Comment 23 Alexey Proskuryakov 2011-09-12 10:48:12 PDT
Thank you for returning this discussion to professional tone and constructive direction, Julian!

> Indeed it does not. If you believe that this would be helpful then please go ahead and
> propose additional text. RFC 6266 is likely going to be revised when HTTPbis is done.

It seems appropriate that RFCs are not going to talk about browser specific concepts like default encoding and frames, but I hope that both HTTPbis spec and RFC 6266 could provide hooks for applications (AKA higher level specs) to provide a default encoding. Not sure how exactly it should be worded, but for document encoding in particular, it's clear that a very substantial amount of Web pages aren't going to gain proper encoding declarations, and browsers will keep using other default encodings in foreseeable future. And for Content-Disposition, dropping such tailoring is quite contentious, too.

> so IE9 interoperates (*) with all other desktop browsers except Safari.

I haven't re-tested IE9 specifically, but I expect that in practice,  it interoperates will all major desktop browsers (except Chrome?) when using non-ASCII bytes in locale standard encoding, like earlier versions did. My opinion is that it would be good for network libraries beneath Safari to additionally support RFC 5987 encoding, but I cannot comment on whether or when that would happen.

FWIW, I'm going to sit on this for a little more to make sure that emotions don't go in the way, and likely roll out this patch. It's at intersection of my primary competencies, and I believe that I'm the right person to make the call on what the "proper" WebKit behavior should be. Ports are of course free to deviate by not taking the fallback list into account, just like they already did by Friday night.

We can re-consider once RFC 6266 actually becomes a standard, and other browsers actually stop using frame and/or default encoding for Content-Dispositon.
Comment 24 Adam Barth 2011-09-12 10:59:02 PDT
Please do not roll this patch out.  This code is actively harmful to non-Apple ports.  If you want to adjust the comments and rename the functions, that's fine, but the Chromium port, at the least, does not wish to execute these lines of code.
Comment 25 Alexey Proskuryakov 2011-09-12 11:07:02 PDT
I disagree that it's OK for any port to not execute these lines of code. If Chromium ifdefs this code out, there is a possibility of causing trouble to regular ports.
Comment 26 Alexey Proskuryakov 2011-09-12 11:09:00 PDT
It's especially undesirable to comment out regular code for a port that's used by commit queue.
Comment 27 Adam Barth 2011-09-12 11:13:47 PDT
> I disagree that it's OK for any port to not execute these lines of code.

I'm not sure I understand.  Why should a port be required to execute lines of code that do nothing and yet can crash (and have crashed in the past)?
Comment 28 Julian Reschke 2011-09-12 11:25:35 PDT
(In reply to comment #23)
> Thank you for returning this discussion to professional tone and constructive direction, Julian!

Just trying to help :-)

> > Indeed it does not. If you believe that this would be helpful then please go ahead and
> > propose additional text. RFC 6266 is likely going to be revised when HTTPbis is done.
> 
> It seems appropriate that RFCs are not going to talk about browser specific concepts like default encoding and frames, but I hope that both HTTPbis spec and RFC 6266 could provide hooks for applications (AKA higher level specs) to provide a default encoding. Not sure how exactly it should be worded, but for document encoding in particular, it's clear that a very substantial amount of Web pages aren't going to gain proper encoding declarations, and browsers will keep using other default encodings in foreseeable future. And for Content-Disposition, dropping such tailoring is quite contentious, too.

There are multiple problems with that. First of all, RFC2616 can be read to say that the default is ISO-8859-1, and there are UAs that respect that; and we do not want to break them. Furthermore, making the semantics of an HTTP message depend on out-of-band data is a no-go.

The encoding defined in RFCs 2231 and 5987 helps; it allows senders to tunnel everything through ASCII. Problem solved; except for the UAs that need to keep old senders happy.

> > so IE9 interoperates (*) with all other desktop browsers except Safari.
> 
> I haven't re-tested IE9 specifically, but I expect that in practice,  it interoperates will all major desktop browsers (except Chrome?) when using non-ASCII bytes in locale standard encoding, like earlier versions did. My opinion is that it would be good for network libraries beneath Safari to additionally support RFC 5987 encoding, but I cannot comment on whether or when that would happen.

If you look at <http://greenbytes.de/tech/tc2231/#attwithutf8fnplain> you will see that UAs differ in what they do when the page encoding is ISO-8859-1, but the bytes smell like UTF-8. There is no interop here.

> FWIW, I'm going to sit on this for a little more to make sure that emotions don't go in the way, and likely roll out this patch. It's at intersection of my primary competencies, and I believe that I'm the right person to make the call on what the "proper" WebKit behavior should be. Ports are of course free to deviate by not taking the fallback list into account, just like they already did by Friday night.
> 
> We can re-consider once RFC 6266 actually becomes a standard, and other browsers actually stop using frame and/or default encoding for Content-Dispositon.

For all practical purposes RFC 6266 is a "standard", unless you're referring to the three IETF standards maturity levels.
Comment 29 Alexey Proskuryakov 2011-09-12 11:42:42 PDT
Adam, I will post more complete rationale when/if rolling out the patch. I noted your disagreement, as well as the fact that two respected contributors were in favor of this change, so I want to take some time to consider what's best without haste. As mentioned above, I think that I have the authority to make a decision due to relevant expertise.

> Why should a port be required to execute lines of code that do nothing and yet can crash (and have crashed in the past)?

This is a tradeoff. We want to have as few #ifdefed out parts as possible, because #ifdefing out code paths has lots of well known downsides that I'm sure I don't need to tell you about. This is sometimes unavoidable when this code has high runtime costs, or is unequivocally of no interest to engineers working on other ports.
Comment 30 Alexey Proskuryakov 2011-09-12 11:54:57 PDT
> There are multiple problems with that. First of all, RFC2616 can be read to say that the
> default is ISO-8859-1, and there are UAs that respect that; and we do not want to break
> them. Furthermore, making the semantics of an HTTP message depend on out-of-band
> data is a no-go.

This disagreement between spec authors and browser vendors is quite unfortunate. This approach is not a no-go for us, as every significant browser already went with it.

> There is no interop here.

Right, we've previously established that we have different concepts of "interoperability" in mind. For me, it's when you can make a Web page that works for your target audience. You could easily achieve that in IE, Firefox and Safari by using pages and HTTP headers encoded with default encoding for your region. Unfortunately, RFC 6266 disallows this method, which used to be the only "interoperable" one historically.

Many authors only care about this kind of interoperability, not declaring page encoding at all.

> For all practical purposes RFC 6266 is a "standard", unless you're referring to the three IETF standards maturity levels.

Indeed, that's what I've been referring to. Someone just educated me that RFC maturity level doesn't matter all that much.
Comment 31 Julian Reschke 2011-09-12 12:05:18 PDT
(In reply to comment #30)
> > There are multiple problems with that. First of all, RFC2616 can be read to say that the
> > default is ISO-8859-1, and there are UAs that respect that; and we do not want to break
> > them. Furthermore, making the semantics of an HTTP message depend on out-of-band
> > data is a no-go.
> 
> This disagreement between spec authors and browser vendors is quite unfortunate. This approach is not a no-go for us, as every significant browser already went with it.

...in different ways.

> > There is no interop here.
> 
> Right, we've previously established that we have different concepts of "interoperability" in mind. For me, it's when you can make a Web page that works for your target audience. You could easily achieve that in IE, Firefox and Safari by using pages and HTTP headers encoded with default encoding for your region. Unfortunately, RFC 6266 disallows this method, which used to be the only "interoperable" one historically.

This method doesn't work for sites that need to work for a global audience. There's no reliable way to sniff the locale of the client.

If this *was* something that works reliably, why do sites like GMail use something else?

Also, RFC6266 isn't in any position to allow or disallow it; it doesn't invent a new header but just clarifies the description of something that has been there for a long time.


> Many authors only care about this kind of interoperability, not declaring page encoding at all.
> 
> > For all practical purposes RFC 6266 is a "standard", unless you're referring to the three IETF standards maturity levels.
> 
> Indeed, that's what I've been referring to. Someone just educated me that RFC maturity level doesn't matter all that much.

It doesn't matter as much as it should, as most specs never progress. But that's a separate story :-).
Comment 32 Julian Reschke 2012-01-18 06:12:46 PST
Here's a quick attempt at a test case:

  <http://greenbytes.de/tech/tc2231/greek.asis#attwithisofnplain>

The page is encoded in ISO-8859-7, and the actual test case linked from it uses non-ASCII characters in the filename parameter.

As far as I can tell, *Firefox* is the only UA that decodes this as ISO-8859-1.

Is there a problem with the test case?