WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
159581
Stop supporting compressed character sets BOCU-1 and SCSU
https://bugs.webkit.org/show_bug.cgi?id=159581
Summary
Stop supporting compressed character sets BOCU-1 and SCSU
John Wilander
Reported
2016-07-08 13:53:50 PDT
WebKit still supports the compressed character sets BOCU-1 and SCSU whereas Chrome and Firefox don't. These old formats may pass server-side character filters while still rendering in WebKit.
Attachments
Patch
(6.94 KB, patch)
2016-07-08 16:36 PDT
,
John Wilander
no flags
Details
Formatted Diff
Diff
Patch
(7.76 KB, patch)
2016-07-08 16:42 PDT
,
John Wilander
no flags
Details
Formatted Diff
Diff
Patch
(8.36 KB, patch)
2016-07-11 14:12 PDT
,
John Wilander
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Myles C. Maxfield
Comment 1
2016-07-08 14:55:34 PDT
We shouldn't remove support for this without knowing the impact of which/how many sites will break.
John Wilander
Comment 2
2016-07-08 16:36:59 PDT
Created
attachment 283218
[details]
Patch
John Wilander
Comment 3
2016-07-08 16:40:37 PDT
Comment on
attachment 283218
[details]
Patch webkit-patch complained about the characters in one of the test file's names and so I changed it but didn't add it back in. New patch coming up.
John Wilander
Comment 4
2016-07-08 16:42:12 PDT
Created
attachment 283221
[details]
Patch
John Wilander
Comment 5
2016-07-08 16:44:49 PDT
Subversion decided to fiddle with the test file. I might have to pull it down in whatever format it is and see that it tests the right thing.
Alexey Proskuryakov
Comment 6
2016-07-08 17:15:35 PDT
Comment on
attachment 283221
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=283221&action=review
> LayoutTests/ChangeLog:17 > + * http/tests/misc/resources/scsu-moscow.html: Added.
A binary file wouldn't be scanned for a <meta>. Do you need the charset to be defined by HTTP Content-Type for this test?
> LayoutTests/http/tests/misc/resources/bocu-1.html:7 > +%8C%C3%B3%C2%B9%C0%C4%8E%B1%BC%B5%C2%C4x%81y%8C%7F%C3%B3%C2%B9%C0%C4%8E
When I open this file in shipping Safari, I get some garbage text. What is it expected to be?
John Wilander
Comment 7
2016-07-11 08:51:23 PDT
(In reply to
comment #6
)
> Comment on
attachment 283221
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=283221&action=review
> > > LayoutTests/ChangeLog:17 > > + * http/tests/misc/resources/scsu-moscow.html: Added. > > A binary file wouldn't be scanned for a <meta>. Do you need the charset to > be defined by HTTP Content-Type for this test? > > > LayoutTests/http/tests/misc/resources/bocu-1.html:7 > > +%8C%C3%B3%C2%B9%C0%C4%8E%B1%BC%B5%C2%C4x%81y%8C%7F%C3%B3%C2%B9%C0%C4%8E > > When I open this file in shipping Safari, I get some garbage text. What is > it expected to be?
It was an attempt to use a PoC string for BOCU-1 but I decided to not spend more time on it. Without support for BOCU-1 you will see the actual characters (%8C…). But you are right, either both test cases should show something readable if the charsets *are* supported or if they are *not* supported. Not a mix as it is now.
John Wilander
Comment 8
2016-07-11 08:56:32 PDT
(In reply to
comment #6
)
> Comment on
attachment 283221
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=283221&action=review
> > > LayoutTests/ChangeLog:17 > > + * http/tests/misc/resources/scsu-moscow.html: Added. > > A binary file wouldn't be scanned for a <meta>. Do you need the charset to > be defined by HTTP Content-Type for this test?
Sorry, should reply to this too. It might be that I have to set the charset to something to get it to pick up the meta charset. That is the behavior I saw – a HTTP Content-Type "text/html; charset=bogus" plus a <meta charset=SCSU>. I will switch to a PHP test case and do this.
John Wilander
Comment 9
2016-07-11 09:00:50 PDT
I will add a reference to the HTML spec in the new patch. It explicitly says we should not support BOCU-1 nor SCSU: "The above prohibits supporting, for example, CESU-8, UTF-7, BOCU-1, SCSU, EBCDIC, and UTF-32."
https://html.spec.whatwg.org/#character-encodings
We support a bunch of EBCDIC charsets, CESU-8, and UTF-32 too. Either we open a new bug for those or bundle into this patch.
Alexey Proskuryakov
Comment 10
2016-07-11 09:55:11 PDT
Better to do multiple patches - it's easier to find site compatibility regression culprits when changes aren't bundled.
John Wilander
Comment 11
2016-07-11 14:12:41 PDT
Created
attachment 283341
[details]
Patch
John Wilander
Comment 12
2016-07-11 16:00:31 PDT
(In reply to
comment #10
)
> Better to do multiple patches - it's easier to find site compatibility > regression culprits when changes aren't bundled.
Filed
https://bugs.webkit.org/show_bug.cgi?id=159651
.
Brent Fulgham
Comment 13
2016-07-25 16:26:49 PDT
(In reply to
comment #1
)
> We shouldn't remove support for this without knowing the impact of which/how > many sites will break.
Given that: (1) Chrome and Firefox already block these characters sets (2) The HTML spec explicitly prohibits these character sets ... why are we concerned there is a compatibility issue here?
Brent Fulgham
Comment 14
2016-07-25 16:29:08 PDT
Comment on
attachment 283341
[details]
Patch r=me. Do you plan on adding the other "forbidden" character sets in a future patch?
John Wilander
Comment 15
2016-07-25 16:58:42 PDT
(In reply to
comment #14
)
> Comment on
attachment 283341
[details]
> Patch > > r=me. Do you plan on adding the other "forbidden" character sets in a future > patch?
Yes,
https://bugs.webkit.org/show_bug.cgi?id=159651
.
Myles C. Maxfield
Comment 16
2016-07-25 17:54:04 PDT
(In reply to
comment #13
)
> (In reply to
comment #1
) > > We shouldn't remove support for this without knowing the impact of which/how > > many sites will break. > > Given that: > (1) Chrome and Firefox already block these characters sets
Historically, this has been an imperfect signal for these kinds of decisions. I know at least one case where we refuse to break sites which all the other browsers break.
> (2) The HTML spec explicitly prohibits these character sets
The spec is a poor signal for compatibility.
> > ... why are we concerned there is a compatibility issue here?
Brent Fulgham
Comment 17
2016-07-25 19:48:16 PDT
(In reply to
comment #16
)
> (In reply to
comment #13
) > > (In reply to
comment #1
) > > > We shouldn't remove support for this without knowing the impact of which/how > > > many sites will break. > > > > Given that: > > (1) Chrome and Firefox already block these characters sets > > Historically, this has been an imperfect signal for these kinds of > decisions. I know at least one case where we refuse to break sites which all > the other
Can you identify any websites that require these character sets? Who would we break with this change?
Myles C. Maxfield
Comment 18
2016-07-25 21:59:13 PDT
(In reply to
comment #17
)
> (In reply to
comment #16
) > > (In reply to
comment #13
) > > > (In reply to
comment #1
) > > > > We shouldn't remove support for this without knowing the impact of which/how > > > > many sites will break. > > > > > > Given that: > > > (1) Chrome and Firefox already block these characters sets > > > > Historically, this has been an imperfect signal for these kinds of > > decisions. I know at least one case where we refuse to break sites which all > > the other > > Can you identify any websites that require these character sets? Who would > we break with this change?
This is exactly the question I asked way above (in the second comment). The burden of proof should be on the one removing functionality, not the other way around. It seems to me we shouldn't remove functionality if we have no idea what will break.
Myles C. Maxfield
Comment 19
2016-07-25 22:03:39 PDT
(In reply to
comment #18
)
> (In reply to
comment #17
) > > (In reply to
comment #16
) > > > (In reply to
comment #13
) > > > > (In reply to
comment #1
) > > > > > We shouldn't remove support for this without knowing the impact of which/how > > > > > many sites will break. > > > > > > > > Given that: > > > > (1) Chrome and Firefox already block these characters sets > > > > > > Historically, this has been an imperfect signal for these kinds of > > > decisions. I know at least one case where we refuse to break sites which all > > > the other > > > > Can you identify any websites that require these character sets? Who would > > we break with this change? > > This is exactly the question I asked way above (in the second comment). > > The burden of proof should be on the one removing functionality, not the > other way around. > > It seems to me we shouldn't remove functionality if we have no idea what > will break.
I should probably clarify this: Removing functionality might have a very serious impact, or maybe none at all. Improving security will have a valuable benefit. Without knowing more information, it seems difficult to perform the calculus of whether or not this change should be committed. I'm not against the change - I am definitely pro-security! I just hesitate when we are considering willingly breaking an unknown number of websites.
Brent Fulgham
Comment 20
2016-07-25 23:28:37 PDT
(In reply to
comment #19
)
> > I'm not against the change - I am definitely pro-security! I just hesitate > when we are considering willingly breaking an unknown number of websites.
The problem here is that we could imagine any number of changes that could possibly break a website. We need to weigh the security benefits against the risk of causing harm. As I see it, we have a change that: 1. Closes a known security hole. 2. Brings us into compatibility with how the other major browsers work. 3. Brings us into agreement with the specification. Those seem like three significant reasons to make this change. On the other side, we have the hypothetical risk of breaking websites that rely on these disallowed character sets. As far as I know, we are unaware of any such sites existing, nor is there compelling reason that someone would choose to implement a site relying on them. I also do not see a way to know whether websites will break with this change, since there is no catalog of websites using particular character sets. I think the right choice here is to land the change, and let users of Safari Technology Preview or the Gtk-based browsers report any problems they encounter.
Myles C. Maxfield
Comment 21
2016-07-26 00:32:16 PDT
> I also do not see a way to know whether websites will break with this > change
We need a way to know the ramifications of our changes without breaking the product for our users. If such a mechanism does not exist right now, then all we can do is commit and cross our fingers.
WebKit Commit Bot
Comment 22
2016-07-26 09:58:35 PDT
Comment on
attachment 283341
[details]
Patch Clearing flags on attachment: 283341 Committed
r203725
: <
http://trac.webkit.org/changeset/203725
>
WebKit Commit Bot
Comment 23
2016-07-26 09:58:40 PDT
All reviewed patches have been landed. Closing bug.
Darin Adler
Comment 24
2016-07-26 10:38:52 PDT
Comment on
attachment 283341
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=283341&action=review
> Source/WebCore/ChangeLog:13 > + The HTML specification says "The above prohibits supporting, for example, > + CESU-8, UTF-7, BOCU-1, SCSU, EBCDIC, and UTF-32."
Could we please add tests that cover these other examples of prohibited character encodings, even the ones that are not explicitly included in the list? Maybe we could find a way to write a single test that iterates a list of encoding names and checks that each is not supported; then we can add names for all sorts of character sets that we should not support.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug