Created attachment 286264[details]
Archive of layout-test-results from ews113 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews113 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 286268[details]
Archive of layout-test-results from ews117 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 286275[details]
Archive of layout-test-results from ews101 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 286277[details]
Archive of layout-test-results from ews116 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 286279[details]
Archive of layout-test-results from ews106 for mac-yosemite-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Created attachment 286285[details]
Archive of layout-test-results from ews124 for ios-simulator-elcapitan-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5
Created attachment 286320[details]
Archive of layout-test-results from ews103 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 286322[details]
Archive of layout-test-results from ews104 for mac-yosemite-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Created attachment 286324[details]
Archive of layout-test-results from ews112 for mac-yosemite
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 286342[details]
Archive of layout-test-results from ews122 for ios-simulator-elcapitan-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews122 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5
(In reply to comment #37)
> Are all the "SUPPORTED (got: windows-1252)" results actually "unsupported"?
I guess it depends what you mean by "unsupported". If we failed to recognize the charset, we would fallback to "UTF-8". If we use "windows-1252" then I believe it means we recognized the label and decided to use windows-1252.
Created attachment 286355[details]
ICU labels support comparison (WK, WK w/ patch, Firefox)
Improved the excel sheet by:
- coloring browsers that agree with each other
- Dropping SUPPORTED / UNSUPPORTED as it was confusing
As you can see from the spreadsheet, when it comes to ICU labels:
- We are mostly aligned with Firefox
- In cases we are not aligned with Firefox, we maintain our previous behavior (the one in Safari 10).
- A lot of ICU labels give different results after my patch. However, in ALL cases, we are now consistent with Firefox.
All in all, this looks pretty good to me but I am no expert:
- We pass all W3C tests
- The new behavior is mostly aligned with Firefox and is aligned with Safari 10 otherwise.
If you see any encoding that is no longer supported with my patch but you think is important, I am happy to re-introduce them.
Created attachment 286358[details]
ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome)
Spreadsheet now includes Chrome results too. Chrome is super close to Firefox.
There are however a few cases (marked in red) where Chrome agrees with our old behavior. Therefore, I'll update my patch to maintain our old behavior in this case.
Comment on attachment 286360[details]
Patch
I think the results are even more convincing now (with the Chrome results added to the spreadsheet and the couple of fixes to align better with Chrome).
Looks pretty convincing to me too.
I feel like windows-1252 is an artifact of testing approach, so maybe testing with data: subframes wasn't sufficiently accurate (sorry about that!). Unknown encoding names are ignored, and I expect that to be the case for all browsers.
Should probably use HTTP to generate Content-Type with a charset. That may also help with some other surprising results, such as ISO-2022, HZ and other complex Asian encodings.
I'm confused by what's going on with aliases that become "replacement" - it seems wrong, but I do not have a theory about what's going on.
I'm not sure if we have test coverage for visual vs. logical RTL encodings - this table alone doesn't tell the full story for encodings in ISO-8859-6 and ISO-8859-8 groups.
Comment on attachment 286360[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=286360&action=review> Source/WebCore/platform/text/TextCodecICU.cpp:241
> + if (!strcmp(encodingName.name, "x-mac-turkish")) {
Are these Mac encodings also supported by TextCodecMac, using TEC?
> Source/WebCore/platform/text/TextCodecLatin1.cpp:-115
> - // ASCII and Latin-1 both decode as Windows Latin-1 although they retain unique identities.
Is this still accurate?
(In reply to comment #49)
> Looks pretty convincing to me too.
>
> I feel like windows-1252 is an artifact of testing approach, so maybe
> testing with data: subframes wasn't sufficiently accurate (sorry about
> that!). Unknown encoding names are ignored, and I expect that to be the case
> for all browsers.
>
> Should probably use HTTP to generate Content-Type with a charset. That may
> also help with some other surprising results, such as ISO-2022, HZ and other
> complex Asian encodings.
>
> I'm confused by what's going on with aliases that become "replacement" - it
> seems wrong, but I do not have a theory about what's going on.
>
> I'm not sure if we have test coverage for visual vs. logical RTL encodings -
> this table alone doesn't tell the full story for encodings in ISO-8859-6 and
> ISO-8859-8 groups.
I actually used http for the latest results because I could not get the data uri I frame to work in Chrome.
(In reply to comment #50)
> Comment on attachment 286360[details]
> Patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=286360&action=review
>
> > Source/WebCore/platform/text/TextCodecICU.cpp:241
> > + if (!strcmp(encodingName.name, "x-mac-turkish")) {
>
> Are these Mac encodings also supported by TextCodecMac, using TEC?
>
I will check but I remember I had to add them to make the layout tests pass.
> > Source/WebCore/platform/text/TextCodecLatin1.cpp:-115
> > - // ASCII and Latin-1 both decode as Windows Latin-1 although they retain unique identities.
>
> Is this still accurate?
I don't think so, I will drop the comment.
Comment on attachment 286376[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=286376&action=review> LayoutTests/imported/w3c/ChangeLog:12
> + 651. The only checks we're failing is due to "Big5-HKSCS" not being an alias
> + to "Big5".
What prevents us from fixing this at this time?
> Source/WebCore/platform/text/TextCodecICU.cpp:79
> +DECLARE_ALIASES(UTF_8, "unicode-1-1-utf-8", "utf8", "unicode11utf8", "unicode20utf8", "x-unicode20utf8");
We don’t use ICU for UTF-8, so it’s peculiar to have this inside TextCodecICU.
> Source/WebCore/platform/text/TextCodecICU.cpp:100
> +DECLARE_ALIASES(windows_1252, "ansi_x3.4-1968", "ascii", "cp1252", "cp819", "csisolatin1", "ibm819", "iso-8859-1", "iso-ir-100", "iso8859-1", "iso88591", "iso_8859-1", "iso_8859-1:1987", "l1", "latin1", "us-ascii", "x-cp1252");
We don’t use ICU for this encoding, so it’s peculiar to have this inside TextCodecICU.
> Source/WebCore/platform/text/TextCodecICU.cpp:134
> + size_t aliasCount;
Seems overkill to use size_t for this. Could be uint8_t, for example. If there’s no actual cost to using 64-bits for these small counts, maybe this is fine.
I don't feel very good about the Big-5 situation. To just keep the current behavior, we should probably also keep windows-950 group aliases. But longer term, I just can't tell which of the variants is a superset of which, which is supported in Apple fonts, and whether the super naive algorithm in the encoding spec is actually the way to go.
(In reply to comment #64)
> I don't feel very good about the Big-5 situation. To just keep the current
> behavior, we should probably also keep windows-950 group aliases. But longer
> term, I just can't tell which of the variants is a superset of which, which
> is supported in Apple fonts, and whether the super naive algorithm in the
> encoding spec is actually the way to go.
Brought back windows-950 in the last iteration and updated the spreadsheet accordingly.
(In reply to comment #62)
> Comment on attachment 286376[details]
> Patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=286376&action=review
>
> > LayoutTests/imported/w3c/ChangeLog:12
> > + 651. The only checks we're failing is due to "Big5-HKSCS" not being an alias
> > + to "Big5".
>
> What prevents us from fixing this at this time?
>
> > Source/WebCore/platform/text/TextCodecICU.cpp:79
> > +DECLARE_ALIASES(UTF_8, "unicode-1-1-utf-8", "utf8", "unicode11utf8", "unicode20utf8", "x-unicode20utf8");
>
> We don’t use ICU for UTF-8, so it’s peculiar to have this inside
> TextCodecICU.
>
> > Source/WebCore/platform/text/TextCodecICU.cpp:100
> > +DECLARE_ALIASES(windows_1252, "ansi_x3.4-1968", "ascii", "cp1252", "cp819", "csisolatin1", "ibm819", "iso-8859-1", "iso-ir-100", "iso8859-1", "iso88591", "iso_8859-1", "iso_8859-1:1987", "l1", "latin1", "us-ascii", "x-cp1252");
>
> We don’t use ICU for this encoding, so it’s peculiar to have this inside
> TextCodecICU.
>
> > Source/WebCore/platform/text/TextCodecICU.cpp:134
> > + size_t aliasCount;
>
> Seems overkill to use size_t for this. Could be uint8_t, for example. If
> there’s no actual cost to using 64-bits for these small counts, maybe this
> is fine.
I dropped the encodings for which we don't use ICU and used unsigned for aliasCount.
Comment on attachment 286380[details]
Patch for landing
View in context: https://bugs.webkit.org/attachment.cgi?id=286380&action=review> Source/WebCore/platform/text/TextCodecICU.cpp:128
> + const char* const name;
Just noticed that the second const here is redundant and not needed since this is a member of a struct which itself is only used const. The other two members are not marked const, nor need they be.
2016-08-16 20:19 PDT, Chris Dumez
2016-08-16 20:43 PDT, Build Bot
2016-08-16 20:56 PDT, Chris Dumez
2016-08-16 21:34 PDT, Build Bot
2016-08-16 21:36 PDT, Chris Dumez
2016-08-16 22:11 PDT, Chris Dumez
2016-08-16 22:39 PDT, Build Bot
2016-08-16 22:57 PDT, Build Bot
2016-08-16 23:01 PDT, Build Bot
2016-08-17 00:01 PDT, Build Bot
2016-08-17 12:49 PDT, Chris Dumez
2016-08-17 13:18 PDT, Build Bot
2016-08-17 13:40 PDT, Build Bot
2016-08-17 13:49 PDT, Build Bot
2016-08-17 14:06 PDT, Chris Dumez
2016-08-17 16:04 PDT, Chris Dumez
2016-08-17 16:04 PDT, Chris Dumez
2016-08-17 16:05 PDT, Chris Dumez
2016-08-17 16:05 PDT, Chris Dumez
2016-08-17 16:15 PDT, Chris Dumez
2016-08-17 16:22 PDT, Chris Dumez
2016-08-17 16:25 PDT, Build Bot
2016-08-17 17:02 PDT, Chris Dumez
2016-08-17 17:03 PDT, Chris Dumez
2016-08-17 19:02 PDT, Chris Dumez
2016-08-17 19:09 PDT, Chris Dumez
2016-08-17 19:45 PDT, Chris Dumez
2016-08-17 19:58 PDT, Chris Dumez
2016-08-17 19:59 PDT, Chris Dumez
2016-08-18 09:27 PDT, Chris Dumez
2016-08-18 09:28 PDT, Chris Dumez
2016-08-18 10:39 PDT, Chris Dumez
2016-08-18 10:41 PDT, Chris Dumez
2016-08-18 11:39 PDT, Chris Dumez
2016-08-18 11:39 PDT, Chris Dumez