RESOLVED FIXED Bug 160931
Align our encoding labels with the encoding specification
https://bugs.webkit.org/show_bug.cgi?id=160931
Summary Align our encoding labels with the encoding specification
Chris Dumez
Reported 2016-08-16 20:11:42 PDT
Align our encoding labels with the encoding specification: - https://encoding.spec.whatwg.org/#names-and-labels This also aligns with Firefox and Chrome.
Attachments
Patch (64.29 KB, patch)
2016-08-16 20:19 PDT, Chris Dumez
no flags
Archive of layout-test-results from ews113 for mac-yosemite (1.17 MB, application/zip)
2016-08-16 20:43 PDT, Build Bot
no flags
Patch (64.29 KB, patch)
2016-08-16 20:56 PDT, Chris Dumez
no flags
Archive of layout-test-results from ews117 for mac-yosemite (428.08 KB, application/zip)
2016-08-16 21:34 PDT, Build Bot
no flags
Patch (65.01 KB, patch)
2016-08-16 21:36 PDT, Chris Dumez
no flags
Patch (68.78 KB, patch)
2016-08-16 22:11 PDT, Chris Dumez
no flags
Archive of layout-test-results from ews101 for mac-yosemite (1.46 MB, application/zip)
2016-08-16 22:39 PDT, Build Bot
no flags
Archive of layout-test-results from ews116 for mac-yosemite (890.46 KB, application/zip)
2016-08-16 22:57 PDT, Build Bot
no flags
Archive of layout-test-results from ews106 for mac-yosemite-wk2 (1.33 MB, application/zip)
2016-08-16 23:01 PDT, Build Bot
no flags
Archive of layout-test-results from ews124 for ios-simulator-elcapitan-wk2 (1.75 MB, application/zip)
2016-08-17 00:01 PDT, Build Bot
no flags
Patch (69.02 KB, patch)
2016-08-17 12:49 PDT, Chris Dumez
no flags
Archive of layout-test-results from ews103 for mac-yosemite (1.40 MB, application/zip)
2016-08-17 13:18 PDT, Build Bot
no flags
Archive of layout-test-results from ews104 for mac-yosemite-wk2 (1.36 MB, application/zip)
2016-08-17 13:40 PDT, Build Bot
no flags
Archive of layout-test-results from ews112 for mac-yosemite (1.96 MB, application/zip)
2016-08-17 13:49 PDT, Build Bot
no flags
Patch (70.25 KB, patch)
2016-08-17 14:06 PDT, Chris Dumez
no flags
Test page to check ICU encodings support (11.67 KB, text/html)
2016-08-17 16:04 PDT, Chris Dumez
no flags
Safari 10 ICU encodings support (28.98 KB, text/plain)
2016-08-17 16:04 PDT, Chris Dumez
no flags
Firefox 48 ICU encodings support (32.31 KB, text/plain)
2016-08-17 16:05 PDT, Chris Dumez
no flags
WebKit w/ patch ICU encodings support (32.11 KB, text/plain)
2016-08-17 16:05 PDT, Chris Dumez
no flags
ICU changes due to patch (side by side diff) (67.59 KB, text/plain)
2016-08-17 16:15 PDT, Chris Dumez
no flags
ICU changes due to patch (side by side diff) (53.33 KB, text/plain)
2016-08-17 16:22 PDT, Chris Dumez
no flags
Archive of layout-test-results from ews122 for ios-simulator-elcapitan-wk2 (693.46 KB, application/zip)
2016-08-17 16:25 PDT, Build Bot
no flags
ICU labels support comparison (WK, WK w/ patch, Firefox) (330.29 KB, application/x-iwork-keynote-sffnumbers)
2016-08-17 17:02 PDT, Chris Dumez
no flags
Test page to check ICU encodings support (11.77 KB, text/html)
2016-08-17 17:03 PDT, Chris Dumez
no flags
ICU labels support comparison (WK, WK w/ patch, Firefox) (322.23 KB, text/plain)
2016-08-17 19:02 PDT, Chris Dumez
no flags
Test page to check ICU encodings support (11.53 KB, text/html)
2016-08-17 19:09 PDT, Chris Dumez
no flags
ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome) (349.21 KB, patch)
2016-08-17 19:45 PDT, Chris Dumez
no flags
Patch (70.37 KB, patch)
2016-08-17 19:58 PDT, Chris Dumez
no flags
ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome) (349.24 KB, application/x-iwork-keynote-sffnumbers)
2016-08-17 19:59 PDT, Chris Dumez
no flags
Patch (69.20 KB, patch)
2016-08-18 09:27 PDT, Chris Dumez
no flags
ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome) (349.15 KB, application/x-iwork-keynote-sffnumbers)
2016-08-18 09:28 PDT, Chris Dumez
no flags
Patch (68.81 KB, patch)
2016-08-18 10:39 PDT, Chris Dumez
darin: review+
ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome) (349.35 KB, application/x-iwork-keynote-sffnumbers)
2016-08-18 10:41 PDT, Chris Dumez
no flags
Patch for landing (68.04 KB, patch)
2016-08-18 11:39 PDT, Chris Dumez
no flags
ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome) (349.46 KB, application/x-iwork-keynote-sffnumbers)
2016-08-18 11:39 PDT, Chris Dumez
no flags
Chris Dumez
Comment 1 2016-08-16 20:19:42 PDT
Build Bot
Comment 2 2016-08-16 20:43:30 PDT
Comment on attachment 286260 [details] Patch Attachment 286260 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1884877 Number of test failures exceeded the failure limit.
Build Bot
Comment 3 2016-08-16 20:43:34 PDT
Created attachment 286264 [details] Archive of layout-test-results from ews113 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews113 Port: mac-yosemite Platform: Mac OS X 10.10.5
Chris Dumez
Comment 4 2016-08-16 20:56:15 PDT
Alexey Proskuryakov
Comment 5 2016-08-16 21:08:50 PDT
Comment on attachment 286265 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=286265&action=review > Source/WebCore/ChangeLog:9 > + Align our encoding labels with the encoding specification: > + - https://encoding.spec.whatwg.org/#names-and-labels The best approach here would be to have our own encoding name table, and never iterate ICU codecs.
Chris Dumez
Comment 6 2016-08-16 21:20:52 PDT
(In reply to comment #5) > Comment on attachment 286265 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=286265&action=review > > > Source/WebCore/ChangeLog:9 > > + Align our encoding labels with the encoding specification: > > + - https://encoding.spec.whatwg.org/#names-and-labels > > The best approach here would be to have our own encoding name table, and > never iterate ICU codecs. Ok, I will try and refactor tomorrow then.
Build Bot
Comment 7 2016-08-16 21:34:53 PDT
Comment on attachment 286265 [details] Patch Attachment 286265 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1885077 Number of test failures exceeded the failure limit.
Build Bot
Comment 8 2016-08-16 21:34:57 PDT
Created attachment 286268 [details] Archive of layout-test-results from ews117 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews117 Port: mac-yosemite Platform: Mac OS X 10.10.5
Chris Dumez
Comment 9 2016-08-16 21:36:49 PDT
Chris Dumez
Comment 10 2016-08-16 22:11:13 PDT
Build Bot
Comment 11 2016-08-16 22:39:34 PDT
Comment on attachment 286274 [details] Patch Attachment 286274 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/1885316 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Build Bot
Comment 12 2016-08-16 22:39:39 PDT
Created attachment 286275 [details] Archive of layout-test-results from ews101 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-yosemite Platform: Mac OS X 10.10.5
Build Bot
Comment 13 2016-08-16 22:56:58 PDT
Comment on attachment 286274 [details] Patch Attachment 286274 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1885366 Number of test failures exceeded the failure limit.
Build Bot
Comment 14 2016-08-16 22:57:03 PDT
Created attachment 286277 [details] Archive of layout-test-results from ews116 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews116 Port: mac-yosemite Platform: Mac OS X 10.10.5
Build Bot
Comment 15 2016-08-16 23:01:25 PDT
Comment on attachment 286274 [details] Patch Attachment 286274 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/1885394 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html fast/encoding/char-encoding-mac.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-big-endian-nobom.xml http/tests/misc/submit-get-in-utf32be.html http/tests/misc/url-in-utf32be.html
Build Bot
Comment 16 2016-08-16 23:01:31 PDT
Created attachment 286279 [details] Archive of layout-test-results from ews106 for mac-yosemite-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Build Bot
Comment 17 2016-08-17 00:01:13 PDT
Comment on attachment 286274 [details] Patch Attachment 286274 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/1885471 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Build Bot
Comment 18 2016-08-17 00:01:19 PDT
Created attachment 286285 [details] Archive of layout-test-results from ews124 for ios-simulator-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews124 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5
Chris Dumez
Comment 19 2016-08-17 12:49:51 PDT
Build Bot
Comment 20 2016-08-17 13:18:02 PDT
Comment on attachment 286317 [details] Patch Attachment 286317 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/1888764 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Build Bot
Comment 21 2016-08-17 13:18:07 PDT
Created attachment 286320 [details] Archive of layout-test-results from ews103 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews103 Port: mac-yosemite Platform: Mac OS X 10.10.5
Build Bot
Comment 22 2016-08-17 13:40:55 PDT
Comment on attachment 286317 [details] Patch Attachment 286317 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/1888848 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Build Bot
Comment 23 2016-08-17 13:40:59 PDT
Created attachment 286322 [details] Archive of layout-test-results from ews104 for mac-yosemite-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews104 Port: mac-yosemite-wk2 Platform: Mac OS X 10.10.5
Build Bot
Comment 24 2016-08-17 13:49:46 PDT
Comment on attachment 286317 [details] Patch Attachment 286317 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/1888859 New failing tests: fast/encoding/utf-32-big-endian-bom.html http/tests/misc/submit-get-in-utf32le.html http/tests/misc/url-in-utf32le.html http/tests/misc/submit-post-in-utf32be.html fast/encoding/char-decoding.html fast/encoding/char-decoding-mac.html http/tests/misc/submit-get-in-utf32be.html fast/encoding/utf-32-big-endian-nobom.xml fast/encoding/char-encoding.html fast/encoding/GBK/EUC-CN.html http/tests/misc/submit-post-in-utf32le.html fast/encoding/utf-32-little-endian-bom.html fast/encoding/utf-32-little-endian-nobom.xml fast/encoding/char-encoding-mac.html http/tests/misc/url-in-utf32be.html
Build Bot
Comment 25 2016-08-17 13:49:50 PDT
Created attachment 286324 [details] Archive of layout-test-results from ews112 for mac-yosemite The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-yosemite Platform: Mac OS X 10.10.5
Chris Dumez
Comment 26 2016-08-17 14:06:09 PDT
Chris Dumez
Comment 27 2016-08-17 16:04:17 PDT
Created attachment 286335 [details] Test page to check ICU encodings support
Chris Dumez
Comment 28 2016-08-17 16:04:50 PDT
Created attachment 286336 [details] Safari 10 ICU encodings support
Chris Dumez
Comment 29 2016-08-17 16:05:13 PDT
Created attachment 286337 [details] Firefox 48 ICU encodings support
Chris Dumez
Comment 30 2016-08-17 16:05:36 PDT
Created attachment 286338 [details] WebKit w/ patch ICU encodings support
Chris Dumez
Comment 31 2016-08-17 16:15:50 PDT
Created attachment 286340 [details] ICU changes due to patch (side by side diff)
Chris Dumez
Comment 32 2016-08-17 16:22:46 PDT
Created attachment 286341 [details] ICU changes due to patch (side by side diff)
Build Bot
Comment 33 2016-08-17 16:25:08 PDT
Comment on attachment 286327 [details] Patch Attachment 286327 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/1889304 New failing tests: fast/scrolling/ios/scrollTo-at-page-load.html
Build Bot
Comment 34 2016-08-17 16:25:12 PDT
Created attachment 286342 [details] Archive of layout-test-results from ews122 for ios-simulator-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews122 Port: ios-simulator-elcapitan-wk2 Platform: Mac OS X 10.11.5
Chris Dumez
Comment 35 2016-08-17 17:02:35 PDT
Created attachment 286345 [details] ICU labels support comparison (WK, WK w/ patch, Firefox)
Chris Dumez
Comment 36 2016-08-17 17:03:04 PDT
Created attachment 286346 [details] Test page to check ICU encodings support
Alexey Proskuryakov
Comment 37 2016-08-17 17:17:17 PDT
Are all the "SUPPORTED (got: windows-1252)" results actually "unsupported"?
Alexey Proskuryakov
Comment 38 2016-08-17 17:18:58 PDT
I think that non-ASCII compatible encodings like ISO_2022 need to be tested separately, they won't work with a data: URL.
Alexey Proskuryakov
Comment 39 2016-08-17 17:20:26 PDT
I'm actually not sure what "ISO_2022,locale=ja,version=0" is and whether it is ASCII compatible.
Chris Dumez
Comment 40 2016-08-17 18:24:25 PDT
(In reply to comment #37) > Are all the "SUPPORTED (got: windows-1252)" results actually "unsupported"? I guess it depends what you mean by "unsupported". If we failed to recognize the charset, we would fallback to "UTF-8". If we use "windows-1252" then I believe it means we recognized the label and decided to use windows-1252.
Chris Dumez
Comment 41 2016-08-17 19:02:51 PDT
Created attachment 286355 [details] ICU labels support comparison (WK, WK w/ patch, Firefox) Improved the excel sheet by: - coloring browsers that agree with each other - Dropping SUPPORTED / UNSUPPORTED as it was confusing
Chris Dumez
Comment 42 2016-08-17 19:08:02 PDT
As you can see from the spreadsheet, when it comes to ICU labels: - We are mostly aligned with Firefox - In cases we are not aligned with Firefox, we maintain our previous behavior (the one in Safari 10). - A lot of ICU labels give different results after my patch. However, in ALL cases, we are now consistent with Firefox. All in all, this looks pretty good to me but I am no expert: - We pass all W3C tests - The new behavior is mostly aligned with Firefox and is aligned with Safari 10 otherwise. If you see any encoding that is no longer supported with my patch but you think is important, I am happy to re-introduce them.
Chris Dumez
Comment 43 2016-08-17 19:09:36 PDT
Created attachment 286357 [details] Test page to check ICU encodings support
Chris Dumez
Comment 44 2016-08-17 19:33:42 PDT
I am updating the spreadsheet with Chrome results too.
Chris Dumez
Comment 45 2016-08-17 19:45:58 PDT
Created attachment 286358 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome) Spreadsheet now includes Chrome results too. Chrome is super close to Firefox. There are however a few cases (marked in red) where Chrome agrees with our old behavior. Therefore, I'll update my patch to maintain our old behavior in this case.
Chris Dumez
Comment 46 2016-08-17 19:58:35 PDT
Chris Dumez
Comment 47 2016-08-17 19:59:20 PDT
Created attachment 286361 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome)
Chris Dumez
Comment 48 2016-08-17 20:03:50 PDT
Comment on attachment 286360 [details] Patch I think the results are even more convincing now (with the Chrome results added to the spreadsheet and the couple of fixes to align better with Chrome).
Alexey Proskuryakov
Comment 49 2016-08-17 22:19:38 PDT
Looks pretty convincing to me too. I feel like windows-1252 is an artifact of testing approach, so maybe testing with data: subframes wasn't sufficiently accurate (sorry about that!). Unknown encoding names are ignored, and I expect that to be the case for all browsers. Should probably use HTTP to generate Content-Type with a charset. That may also help with some other surprising results, such as ISO-2022, HZ and other complex Asian encodings. I'm confused by what's going on with aliases that become "replacement" - it seems wrong, but I do not have a theory about what's going on. I'm not sure if we have test coverage for visual vs. logical RTL encodings - this table alone doesn't tell the full story for encodings in ISO-8859-6 and ISO-8859-8 groups.
Alexey Proskuryakov
Comment 50 2016-08-17 22:24:36 PDT
Comment on attachment 286360 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=286360&action=review > Source/WebCore/platform/text/TextCodecICU.cpp:241 > + if (!strcmp(encodingName.name, "x-mac-turkish")) { Are these Mac encodings also supported by TextCodecMac, using TEC? > Source/WebCore/platform/text/TextCodecLatin1.cpp:-115 > - // ASCII and Latin-1 both decode as Windows Latin-1 although they retain unique identities. Is this still accurate?
Chris Dumez
Comment 51 2016-08-18 07:39:39 PDT
(In reply to comment #49) > Looks pretty convincing to me too. > > I feel like windows-1252 is an artifact of testing approach, so maybe > testing with data: subframes wasn't sufficiently accurate (sorry about > that!). Unknown encoding names are ignored, and I expect that to be the case > for all browsers. > > Should probably use HTTP to generate Content-Type with a charset. That may > also help with some other surprising results, such as ISO-2022, HZ and other > complex Asian encodings. > > I'm confused by what's going on with aliases that become "replacement" - it > seems wrong, but I do not have a theory about what's going on. > > I'm not sure if we have test coverage for visual vs. logical RTL encodings - > this table alone doesn't tell the full story for encodings in ISO-8859-6 and > ISO-8859-8 groups. I actually used http for the latest results because I could not get the data uri I frame to work in Chrome.
Chris Dumez
Comment 52 2016-08-18 07:41:19 PDT
(In reply to comment #50) > Comment on attachment 286360 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=286360&action=review > > > Source/WebCore/platform/text/TextCodecICU.cpp:241 > > + if (!strcmp(encodingName.name, "x-mac-turkish")) { > > Are these Mac encodings also supported by TextCodecMac, using TEC? > I will check but I remember I had to add them to make the layout tests pass. > > Source/WebCore/platform/text/TextCodecLatin1.cpp:-115 > > - // ASCII and Latin-1 both decode as Windows Latin-1 although they retain unique identities. > > Is this still accurate? I don't think so, I will drop the comment.
Chris Dumez
Comment 53 2016-08-18 09:27:58 PDT
Created attachment 286371 [details] Patch Added back support for "x-macroman" -> "macintosh" alias.
Chris Dumez
Comment 54 2016-08-18 09:28:23 PDT
Created attachment 286372 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome)
Chris Dumez
Comment 55 2016-08-18 09:43:21 PDT
Relevant mozilla bug for Big5-HKSCS -> Big5 merge: https://bugzilla.mozilla.org/show_bug.cgi?id=912470
Chris Dumez
Comment 56 2016-08-18 09:48:59 PDT
(In reply to comment #55) > Relevant mozilla bug for Big5-HKSCS -> Big5 merge: > https://bugzilla.mozilla.org/show_bug.cgi?id=912470 http://app.midland.com.hk/residential_ebook/%A4%F1%B5%D8%A7Q%A4s-beverly-hill is an example of page using Big5-HKSCS charset.
Chris Dumez
Comment 57 2016-08-18 09:51:58 PDT
(In reply to comment #56) > (In reply to comment #55) > > Relevant mozilla bug for Big5-HKSCS -> Big5 merge: > > https://bugzilla.mozilla.org/show_bug.cgi?id=912470 > > http://app.midland.com.hk/residential_ebook/%A4%F1%B5%D8%A7Q%A4s-beverly- > hill is an example of page using Big5-HKSCS charset. I personally cannot spot any difference with our without my change on this Big5-HKSCS page.
Chris Dumez
Comment 58 2016-08-18 10:13:32 PDT
(In reply to comment #57) > (In reply to comment #56) > > (In reply to comment #55) > > > Relevant mozilla bug for Big5-HKSCS -> Big5 merge: > > > https://bugzilla.mozilla.org/show_bug.cgi?id=912470 > > > > http://app.midland.com.hk/residential_ebook/%A4%F1%B5%D8%A7Q%A4s-beverly- > > hill is an example of page using Big5-HKSCS charset. > > I personally cannot spot any difference with our without my change on this > Big5-HKSCS page. I do see some differences on http://www.nwstbus.com.hk/fareConcession/concession_detail.aspx?id=49&intLangID=2 though. The character coming after "厚德" seems to be missing with my change.
Chris Dumez
Comment 59 2016-08-18 10:34:08 PDT
(In reply to comment #58) > (In reply to comment #57) > > (In reply to comment #56) > > > (In reply to comment #55) > > > > Relevant mozilla bug for Big5-HKSCS -> Big5 merge: > > > > https://bugzilla.mozilla.org/show_bug.cgi?id=912470 > > > > > > http://app.midland.com.hk/residential_ebook/%A4%F1%B5%D8%A7Q%A4s-beverly- > > > hill is an example of page using Big5-HKSCS charset. > > > > I personally cannot spot any difference with our without my change on this > > Big5-HKSCS page. > > I do see some differences on > http://www.nwstbus.com.hk/fareConcession/concession_detail. > aspx?id=49&intLangID=2 though. The character coming after "厚德" seems to be > missing with my change. I'll stop marking Big5-HKSCS be an alias for Big5 as it seems to break rendering for some sites. I'll go back to the old behavior of using Big5-HKSCS codec.
Chris Dumez
Comment 60 2016-08-18 10:39:32 PDT
Created attachment 286376 [details] Patch Big5-HKSCS is no longer an alias to Big5 to maintain our previous behavior and avoid breakage.
Chris Dumez
Comment 61 2016-08-18 10:41:51 PDT
Created attachment 286377 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome)
Darin Adler
Comment 62 2016-08-18 10:51:24 PDT
Comment on attachment 286376 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=286376&action=review > LayoutTests/imported/w3c/ChangeLog:12 > + 651. The only checks we're failing is due to "Big5-HKSCS" not being an alias > + to "Big5". What prevents us from fixing this at this time? > Source/WebCore/platform/text/TextCodecICU.cpp:79 > +DECLARE_ALIASES(UTF_8, "unicode-1-1-utf-8", "utf8", "unicode11utf8", "unicode20utf8", "x-unicode20utf8"); We don’t use ICU for UTF-8, so it’s peculiar to have this inside TextCodecICU. > Source/WebCore/platform/text/TextCodecICU.cpp:100 > +DECLARE_ALIASES(windows_1252, "ansi_x3.4-1968", "ascii", "cp1252", "cp819", "csisolatin1", "ibm819", "iso-8859-1", "iso-ir-100", "iso8859-1", "iso88591", "iso_8859-1", "iso_8859-1:1987", "l1", "latin1", "us-ascii", "x-cp1252"); We don’t use ICU for this encoding, so it’s peculiar to have this inside TextCodecICU. > Source/WebCore/platform/text/TextCodecICU.cpp:134 > + size_t aliasCount; Seems overkill to use size_t for this. Could be uint8_t, for example. If there’s no actual cost to using 64-bits for these small counts, maybe this is fine.
Darin Adler
Comment 63 2016-08-18 10:52:43 PDT
Comment on attachment 286376 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=286376&action=review >> LayoutTests/imported/w3c/ChangeLog:12 >> + to "Big5". > > What prevents us from fixing this at this time? Oh, I see your earlier comments about that now.
Alexey Proskuryakov
Comment 64 2016-08-18 11:02:24 PDT
I don't feel very good about the Big-5 situation. To just keep the current behavior, we should probably also keep windows-950 group aliases. But longer term, I just can't tell which of the variants is a superset of which, which is supported in Apple fonts, and whether the super naive algorithm in the encoding spec is actually the way to go.
Chris Dumez
Comment 65 2016-08-18 11:39:18 PDT
Created attachment 286380 [details] Patch for landing
Chris Dumez
Comment 66 2016-08-18 11:39:42 PDT
Created attachment 286381 [details] ICU labels support comparison (WK, WK w/ patch, Firefox, Chrome)
Chris Dumez
Comment 67 2016-08-18 11:40:26 PDT
(In reply to comment #64) > I don't feel very good about the Big-5 situation. To just keep the current > behavior, we should probably also keep windows-950 group aliases. But longer > term, I just can't tell which of the variants is a superset of which, which > is supported in Apple fonts, and whether the super naive algorithm in the > encoding spec is actually the way to go. Brought back windows-950 in the last iteration and updated the spreadsheet accordingly.
Chris Dumez
Comment 68 2016-08-18 11:41:43 PDT
(In reply to comment #62) > Comment on attachment 286376 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=286376&action=review > > > LayoutTests/imported/w3c/ChangeLog:12 > > + 651. The only checks we're failing is due to "Big5-HKSCS" not being an alias > > + to "Big5". > > What prevents us from fixing this at this time? > > > Source/WebCore/platform/text/TextCodecICU.cpp:79 > > +DECLARE_ALIASES(UTF_8, "unicode-1-1-utf-8", "utf8", "unicode11utf8", "unicode20utf8", "x-unicode20utf8"); > > We don’t use ICU for UTF-8, so it’s peculiar to have this inside > TextCodecICU. > > > Source/WebCore/platform/text/TextCodecICU.cpp:100 > > +DECLARE_ALIASES(windows_1252, "ansi_x3.4-1968", "ascii", "cp1252", "cp819", "csisolatin1", "ibm819", "iso-8859-1", "iso-ir-100", "iso8859-1", "iso88591", "iso_8859-1", "iso_8859-1:1987", "l1", "latin1", "us-ascii", "x-cp1252"); > > We don’t use ICU for this encoding, so it’s peculiar to have this inside > TextCodecICU. > > > Source/WebCore/platform/text/TextCodecICU.cpp:134 > > + size_t aliasCount; > > Seems overkill to use size_t for this. Could be uint8_t, for example. If > there’s no actual cost to using 64-bits for these small counts, maybe this > is fine. I dropped the encodings for which we don't use ICU and used unsigned for aliasCount.
WebKit Commit Bot
Comment 69 2016-08-18 12:55:13 PDT
Comment on attachment 286380 [details] Patch for landing Clearing flags on attachment: 286380 Committed r204605: <http://trac.webkit.org/changeset/204605>
WebKit Commit Bot
Comment 70 2016-08-18 12:55:22 PDT
All reviewed patches have been landed. Closing bug.
Darin Adler
Comment 71 2016-08-18 14:13:52 PDT
Comment on attachment 286380 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=286380&action=review > Source/WebCore/platform/text/TextCodecICU.cpp:128 > + const char* const name; Just noticed that the second const here is redundant and not needed since this is a member of a struct which itself is only used const. The other two members are not marked const, nor need they be.
Chris Dumez
Comment 72 2016-08-18 16:05:41 PDT
(In reply to comment #71) > Comment on attachment 286380 [details] > Patch for landing > > View in context: > https://bugs.webkit.org/attachment.cgi?id=286380&action=review > > > Source/WebCore/platform/text/TextCodecICU.cpp:128 > > + const char* const name; > > Just noticed that the second const here is redundant and not needed since > this is a member of a struct which itself is only used const. The other two > members are not marked const, nor need they be. Fixed in <http://trac.webkit.org/changeset/204613>.
Note You need to log in before you can comment on or make changes to this bug.