Align our encoding labels with the encoding specification: - https://encoding.spec.whatwg.org/#names-and-labels This also aligns with Firefox and Chrome.
Created attachment 286260 [details] Patch
Comment on attachment 286260 [details] Patch Attachment 286260 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1884877 Number of test failures exceeded the failure limit.
Created attachment 286264 [details] Archive of layout-test-results from ews113 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews113 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 286265 [details] Patch
Comment on attachment 286265 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=286265&action=review > Source/WebCore/ChangeLog:9 > + Align our encoding labels with the encoding specification: > + - https://encoding.spec.whatwg.org/#names-and-labels The best approach here would be to have our own encoding name table, and never iterate ICU codecs.
(In reply to comment #5) > Comment on attachment 286265 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=286265&action=review > > > Source/WebCore/ChangeLog:9 > > + Align our encoding labels with the encoding specification: > > + - https://encoding.spec.whatwg.org/#names-and-labels > > The best approach here would be to have our own encoding name table, and > never iterate ICU codecs. Ok, I will try and refactor tomorrow then.
Comment on attachment 286265 [details] Patch Attachment 286265 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1885077 Number of test failures exceeded the failure limit.
Created attachment 286268 [details] Archive of layout-test-results from ews117 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews117 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 286269 [details] Patch
Created attachment 286274 [details] Patch
Comment on attachment 286274 [details] Patch Attachment 286274 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/1885316 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Created attachment 286275 [details] Archive of layout-test-results from ews101 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-yosemite Platform: Mac OS X 10.10.5
Comment on attachment 286274 [details] Patch Attachment 286274 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1885366 Number of test failures exceeded the failure limit.
Created attachment 286277 [details] Archive of layout-test-results from ews116 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews116 Port: mac-yosemite Platform: Mac OS X 10.10.5
Comment on attachment 286274 [details] Patch Attachment 286274 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/1885394 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html fast/encoding/char-encoding-mac.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-big-endian-nobom.xml http/tests/misc/submit-get-in-utf32be.html http/tests/misc/url-in-utf32be.html
Created attachment 286279 [details] Archive of layout-test-results from ews106 for mac-yosemite-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Comment on attachment 286274 [details] Patch Attachment 286274 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/1885471 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Created attachment 286285 [details] Archive of layout-test-results from ews124 for ios-simulator-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews124 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5
Created attachment 286317 [details] Patch
Comment on attachment 286317 [details] Patch Attachment 286317 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/1888764 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Created attachment 286320 [details] Archive of layout-test-results from ews103 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews103 Port: mac-yosemite Platform: Mac OS X 10.10.5
Comment on attachment 286317 [details] Patch Attachment 286317 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/1888848 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Created attachment 286322 [details] Archive of layout-test-results from ews104 for mac-yosemite-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews104 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Comment on attachment 286317 [details] Patch Attachment 286317 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1888859 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Created attachment 286324 [details] Archive of layout-test-results from ews112 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-yosemite Platform: Mac OS X 10.10.5
Created attachment 286327 [details] Patch
Created attachment 286335 [details] Test page to check ICU encodings support
Created attachment 286336 [details] Safari 10 ICU encodings support
Created attachment 286337 [details] Firefox 48 ICU encodings support
Created attachment 286338 [details] WebKit w/ patch ICU encodings support
Created attachment 286340 [details] ICU changes due to patch (side by side diff)
Created attachment 286341 [details] ICU changes due to patch (side by side diff)
Comment on attachment 286327 [details] Patch Attachment 286327 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/1889304 New failing tests: fast/scrolling/ios/scrollTo-at-page-load.html
Created attachment 286342 [details] Archive of layout-test-results from ews122 for ios-simulator-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews122 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5
Created attachment 286345 [details] ICU labels support comparison (WK, WK w/ patch, Firefox)
Created attachment 286346 [details] Test page to check ICU encodings support
Are all the "SUPPORTED (got: windows-1252)" results actually "unsupported"?
I think that non-ASCII compatible encodings like ISO_2022 need to be tested separately, they won't work with a data: URL.
I'm actually not sure what "ISO_2022,locale=ja,version=0" is and whether it is ASCII compatible.
(In reply to comment #37) > Are all the "SUPPORTED (got: windows-1252)" results actually "unsupported"? I guess it depends what you mean by "unsupported". If we failed to recognize the charset, we would fallback to "UTF-8". If we use "windows-1252" then I believe it means we recognized the label and decided to use windows-1252.
Created attachment 286355 [details] ICU labels support comparison (WK, WK w/ patch, Firefox) Improved the excel sheet by: - coloring browsers that agree with each other - Dropping SUPPORTED / UNSUPPORTED as it was confusing
As you can see from the spreadsheet, when it comes to ICU labels: - We are mostly aligned with Firefox - In cases we are not aligned with Firefox, we maintain our previous behavior (the one in Safari 10). - A lot of ICU labels give different results after my patch. However, in ALL cases, we are now consistent with Firefox. All in all, this looks pretty good to me but I am no expert: - We pass all W3C tests - The new behavior is mostly aligned with Firefox and is aligned with Safari 10 otherwise. If you see any encoding that is no longer supported with my patch but you think is important, I am happy to re-introduce them.
Created attachment 286357 [details] Test page to check ICU encodings support
I am updating the spreadsheet with Chrome results too.
Created attachment 286358 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome) Spreadsheet now includes Chrome results too. Chrome is super close to Firefox. There are however a few cases (marked in red) where Chrome agrees with our old behavior. Therefore, I'll update my patch to maintain our old behavior in this case.
Created attachment 286360 [details] Patch
Created attachment 286361 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome)
Comment on attachment 286360 [details] Patch I think the results are even more convincing now (with the Chrome results added to the spreadsheet and the couple of fixes to align better with Chrome).
Looks pretty convincing to me too. I feel like windows-1252 is an artifact of testing approach, so maybe testing with data: subframes wasn't sufficiently accurate (sorry about that!). Unknown encoding names are ignored, and I expect that to be the case for all browsers. Should probably use HTTP to generate Content-Type with a charset. That may also help with some other surprising results, such as ISO-2022, HZ and other complex Asian encodings. I'm confused by what's going on with aliases that become "replacement" - it seems wrong, but I do not have a theory about what's going on. I'm not sure if we have test coverage for visual vs. logical RTL encodings - this table alone doesn't tell the full story for encodings in ISO-8859-6 and ISO-8859-8 groups.
Comment on attachment 286360 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=286360&action=review > Source/WebCore/platform/text/TextCodecICU.cpp:241 > + if (!strcmp(encodingName.name, "x-mac-turkish")) { Are these Mac encodings also supported by TextCodecMac, using TEC? > Source/WebCore/platform/text/TextCodecLatin1.cpp:-115 > - // ASCII and Latin-1 both decode as Windows Latin-1 although they retain unique identities. Is this still accurate?
(In reply to comment #49) > Looks pretty convincing to me too. > > I feel like windows-1252 is an artifact of testing approach, so maybe > testing with data: subframes wasn't sufficiently accurate (sorry about > that!). Unknown encoding names are ignored, and I expect that to be the case > for all browsers. > > Should probably use HTTP to generate Content-Type with a charset. That may > also help with some other surprising results, such as ISO-2022, HZ and other > complex Asian encodings. > > I'm confused by what's going on with aliases that become "replacement" - it > seems wrong, but I do not have a theory about what's going on. > > I'm not sure if we have test coverage for visual vs. logical RTL encodings - > this table alone doesn't tell the full story for encodings in ISO-8859-6 and > ISO-8859-8 groups. I actually used http for the latest results because I could not get the data uri I frame to work in Chrome.
(In reply to comment #50) > Comment on attachment 286360 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=286360&action=review > > > Source/WebCore/platform/text/TextCodecICU.cpp:241 > > + if (!strcmp(encodingName.name, "x-mac-turkish")) { > > Are these Mac encodings also supported by TextCodecMac, using TEC? > I will check but I remember I had to add them to make the layout tests pass. > > Source/WebCore/platform/text/TextCodecLatin1.cpp:-115 > > - // ASCII and Latin-1 both decode as Windows Latin-1 although they retain unique identities. > > Is this still accurate? I don't think so, I will drop the comment.
Created attachment 286371 [details] Patch Added back support for "x-macroman" -> "macintosh" alias.
Created attachment 286372 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome)
Relevant mozilla bug for Big5-HKSCS -> Big5 merge: https://bugzilla.mozilla.org/show_bug.cgi?id=912470
(In reply to comment #55) > Relevant mozilla bug for Big5-HKSCS -> Big5 merge: > https://bugzilla.mozilla.org/show_bug.cgi?id=912470 http://app.midland.com.hk/residential_ebook/%A4%F1%B5%D8%A7Q%A4s-beverly-hill is an example of page using Big5-HKSCS charset.
(In reply to comment #56) > (In reply to comment #55) > > Relevant mozilla bug for Big5-HKSCS -> Big5 merge: > > https://bugzilla.mozilla.org/show_bug.cgi?id=912470 > > http://app.midland.com.hk/residential_ebook/%A4%F1%B5%D8%A7Q%A4s-beverly- > hill is an example of page using Big5-HKSCS charset. I personally cannot spot any difference with our without my change on this Big5-HKSCS page.
(In reply to comment #57) > (In reply to comment #56) > > (In reply to comment #55) > > > Relevant mozilla bug for Big5-HKSCS -> Big5 merge: > > > https://bugzilla.mozilla.org/show_bug.cgi?id=912470 > > > > http://app.midland.com.hk/residential_ebook/%A4%F1%B5%D8%A7Q%A4s-beverly- > > hill is an example of page using Big5-HKSCS charset. > > I personally cannot spot any difference with our without my change on this > Big5-HKSCS page. I do see some differences on http://www.nwstbus.com.hk/fareConcession/concession_detail.aspx?id=49&intLangID=2 though. The character coming after "厚德" seems to be missing with my change.
(In reply to comment #58) > (In reply to comment #57) > > (In reply to comment #56) > > > (In reply to comment #55) > > > > Relevant mozilla bug for Big5-HKSCS -> Big5 merge: > > > > https://bugzilla.mozilla.org/show_bug.cgi?id=912470 > > > > > > http://app.midland.com.hk/residential_ebook/%A4%F1%B5%D8%A7Q%A4s-beverly- > > > hill is an example of page using Big5-HKSCS charset. > > > > I personally cannot spot any difference with our without my change on this > > Big5-HKSCS page. > > I do see some differences on > http://www.nwstbus.com.hk/fareConcession/concession_detail. > aspx?id=49&intLangID=2 though. The character coming after "厚德" seems to be > missing with my change. I'll stop marking Big5-HKSCS be an alias for Big5 as it seems to break rendering for some sites. I'll go back to the old behavior of using Big5-HKSCS codec.
Created attachment 286376 [details] Patch Big5-HKSCS is no longer an alias to Big5 to maintain our previous behavior and avoid breakage.
Created attachment 286377 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome)
Comment on attachment 286376 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=286376&action=review > LayoutTests/imported/w3c/ChangeLog:12 > + 651. The only checks we're failing is due to "Big5-HKSCS" not being an alias > + to "Big5". What prevents us from fixing this at this time? > Source/WebCore/platform/text/TextCodecICU.cpp:79 > +DECLARE_ALIASES(UTF_8, "unicode-1-1-utf-8", "utf8", "unicode11utf8", "unicode20utf8", "x-unicode20utf8"); We don’t use ICU for UTF-8, so it’s peculiar to have this inside TextCodecICU. > Source/WebCore/platform/text/TextCodecICU.cpp:100 > +DECLARE_ALIASES(windows_1252, "ansi_x3.4-1968", "ascii", "cp1252", "cp819", "csisolatin1", "ibm819", "iso-8859-1", "iso-ir-100", "iso8859-1", "iso88591", "iso_8859-1", "iso_8859-1:1987", "l1", "latin1", "us-ascii", "x-cp1252"); We don’t use ICU for this encoding, so it’s peculiar to have this inside TextCodecICU. > Source/WebCore/platform/text/TextCodecICU.cpp:134 > + size_t aliasCount; Seems overkill to use size_t for this. Could be uint8_t, for example. If there’s no actual cost to using 64-bits for these small counts, maybe this is fine.
Comment on attachment 286376 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=286376&action=review >> LayoutTests/imported/w3c/ChangeLog:12 >> + to "Big5". > > What prevents us from fixing this at this time? Oh, I see your earlier comments about that now.
I don't feel very good about the Big-5 situation. To just keep the current behavior, we should probably also keep windows-950 group aliases. But longer term, I just can't tell which of the variants is a superset of which, which is supported in Apple fonts, and whether the super naive algorithm in the encoding spec is actually the way to go.
Created attachment 286380 [details] Patch for landing
Created attachment 286381 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome)
(In reply to comment #64) > I don't feel very good about the Big-5 situation. To just keep the current > behavior, we should probably also keep windows-950 group aliases. But longer > term, I just can't tell which of the variants is a superset of which, which > is supported in Apple fonts, and whether the super naive algorithm in the > encoding spec is actually the way to go. Brought back windows-950 in the last iteration and updated the spreadsheet accordingly.
(In reply to comment #62) > Comment on attachment 286376 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=286376&action=review > > > LayoutTests/imported/w3c/ChangeLog:12 > > + 651. The only checks we're failing is due to "Big5-HKSCS" not being an alias > > + to "Big5". > > What prevents us from fixing this at this time? > > > Source/WebCore/platform/text/TextCodecICU.cpp:79 > > +DECLARE_ALIASES(UTF_8, "unicode-1-1-utf-8", "utf8", "unicode11utf8", "unicode20utf8", "x-unicode20utf8"); > > We don’t use ICU for UTF-8, so it’s peculiar to have this inside > TextCodecICU. > > > Source/WebCore/platform/text/TextCodecICU.cpp:100 > > +DECLARE_ALIASES(windows_1252, "ansi_x3.4-1968", "ascii", "cp1252", "cp819", "csisolatin1", "ibm819", "iso-8859-1", "iso-ir-100", "iso8859-1", "iso88591", "iso_8859-1", "iso_8859-1:1987", "l1", "latin1", "us-ascii", "x-cp1252"); > > We don’t use ICU for this encoding, so it’s peculiar to have this inside > TextCodecICU. > > > Source/WebCore/platform/text/TextCodecICU.cpp:134 > > + size_t aliasCount; > > Seems overkill to use size_t for this. Could be uint8_t, for example. If > there’s no actual cost to using 64-bits for these small counts, maybe this > is fine. I dropped the encodings for which we don't use ICU and used unsigned for aliasCount.
Comment on attachment 286380 [details] Patch for landing Clearing flags on attachment: 286380 Committed r204605: <http://trac.webkit.org/changeset/204605>
All reviewed patches have been landed. Closing bug.
Comment on attachment 286380 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=286380&action=review > Source/WebCore/platform/text/TextCodecICU.cpp:128 > + const char* const name; Just noticed that the second const here is redundant and not needed since this is a member of a struct which itself is only used const. The other two members are not marked const, nor need they be.
(In reply to comment #71) > Comment on attachment 286380 [details] > Patch for landing > > View in context: > https://bugs.webkit.org/attachment.cgi?id=286380&action=review > > > Source/WebCore/platform/text/TextCodecICU.cpp:128 > > + const char* const name; > > Just noticed that the second const here is redundant and not needed since > this is a member of a struct which itself is only used const. The other two > members are not marked const, nor need they be. Fixed in <http://trac.webkit.org/changeset/204613>.