Bug 185836 - [INTL] Language tags are not canonicalized
Summary: [INTL] Language tags are not canonicalized
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Andy VanWagoner
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-05-21 13:44 PDT by Andy VanWagoner
Modified: 2018-07-24 10:10 PDT (History)
10 users (show)

See Also:


Attachments
Naive (52.02 KB, patch)
2018-05-21 14:01 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews101 for mac-sierra (2.27 MB, application/zip)
2018-05-21 15:07 PDT, EWS Watchlist
no flags Details
Archive of layout-test-results from ews106 for mac-sierra-wk2 (3.11 MB, application/zip)
2018-05-21 15:11 PDT, EWS Watchlist
no flags Details
Archive of layout-test-results from ews125 for ios-simulator-wk2 (2.24 MB, application/zip)
2018-05-21 15:43 PDT, EWS Watchlist
no flags Details
Archive of layout-test-results from ews115 for mac-sierra (3.00 MB, application/zip)
2018-05-21 15:46 PDT, EWS Watchlist
no flags Details
Patch (815.54 KB, patch)
2018-05-23 10:47 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Script extracting info from registry (813.66 KB, patch)
2018-05-23 12:15 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Patch (815.15 KB, patch)
2018-05-23 14:35 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Rebased (814.30 KB, patch)
2018-06-27 12:38 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews205 for win-future (12.88 MB, application/zip)
2018-06-28 01:38 PDT, EWS Watchlist
no flags Details
Patch (814.17 KB, patch)
2018-07-04 07:20 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews206 for win-future (12.79 MB, application/zip)
2018-07-04 09:20 PDT, EWS Watchlist
no flags Details
Patch (819.51 KB, patch)
2018-07-23 08:05 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews200 for win-future (13.03 MB, application/zip)
2018-07-23 12:49 PDT, EWS Watchlist
no flags Details
Patch (821.20 KB, patch)
2018-07-23 12:55 PDT, Andy VanWagoner
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andy VanWagoner 2018-05-21 13:44:12 PDT
[INTL] Language tags are not canonicalized
Comment 1 Andy VanWagoner 2018-05-21 14:01:29 PDT
Created attachment 340884 [details]
Naive
Comment 2 Andy VanWagoner 2018-05-21 14:17:48 PDT
Comment on attachment 340884 [details]
Naive

There are multiple static constant maps. These are being naively created and destroyed in every pass. What is the best way to have these created once instead of each time? In debug mode the test262/intl402 tests take roughly 4x as long as they do without this patch.

This does not cover every possible case for canonicalization, and the mappings could also change over time, and probably should be generated from the iana subtag registry[1]. What's the preferred way to script that sort of thing?

This removes support for a few algorithmic numbering systems (like "roman" and "jpanfin") that were causing problems. Percent style was simply ignored, and currency would throw an error. Even with decimal style, roman numerals would get messed up when formatting large numbers. `(123456).toLocaleString('en-u-nu-roman') -> "ↈↂↂMMMCDLVI"`

[1] https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
Comment 3 EWS Watchlist 2018-05-21 15:07:31 PDT
Comment on attachment 340884 [details]
Naive

Attachment 340884 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/7757320

New failing tests:
js/intl-datetimeformat.html
Comment 4 EWS Watchlist 2018-05-21 15:07:32 PDT
Created attachment 340902 [details]
Archive of layout-test-results from ews101 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 5 EWS Watchlist 2018-05-21 15:11:09 PDT
Comment on attachment 340884 [details]
Naive

Attachment 340884 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/7757299

New failing tests:
js/intl-datetimeformat.html
Comment 6 EWS Watchlist 2018-05-21 15:11:10 PDT
Created attachment 340904 [details]
Archive of layout-test-results from ews106 for mac-sierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-sierra-wk2  Platform: Mac OS X 10.12.6
Comment 7 EWS Watchlist 2018-05-21 15:13:58 PDT
Comment on attachment 340884 [details]
Naive

Attachment 340884 [details] did not pass jsc-ews (mac):
Output: http://webkit-queues.webkit.org/results/7757121

New failing tests:
jsc-layout-tests.yaml/js/script-tests/intl-datetimeformat.js.layout
jsc-layout-tests.yaml/js/script-tests/intl-datetimeformat.js.layout-no-ftl
jsc-layout-tests.yaml/js/script-tests/intl-datetimeformat.js.layout-ftl-eager-no-cjit
jsc-layout-tests.yaml/js/script-tests/intl-datetimeformat.js.layout-ftl-no-cjit
jsc-layout-tests.yaml/js/script-tests/intl-datetimeformat.js.layout-no-cjit
jsc-layout-tests.yaml/js/script-tests/intl-datetimeformat.js.layout-no-llint
jsc-layout-tests.yaml/js/script-tests/intl-datetimeformat.js.layout-dfg-eager-no-cjit
Comment 8 EWS Watchlist 2018-05-21 15:43:56 PDT
Comment on attachment 340884 [details]
Naive

Attachment 340884 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/7757468

New failing tests:
js/intl-datetimeformat.html
Comment 9 EWS Watchlist 2018-05-21 15:43:57 PDT
Created attachment 340911 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.4
Comment 10 EWS Watchlist 2018-05-21 15:46:30 PDT
Comment on attachment 340884 [details]
Naive

Attachment 340884 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/7757455

New failing tests:
js/intl-datetimeformat.html
Comment 11 EWS Watchlist 2018-05-21 15:46:31 PDT
Created attachment 340912 [details]
Archive of layout-test-results from ews115 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 12 Andy VanWagoner 2018-05-23 10:47:26 PDT
Created attachment 341106 [details]
Patch
Comment 13 Andy VanWagoner 2018-05-23 12:15:59 PDT
Created attachment 341113 [details]
Script extracting info from registry
Comment 14 Andy VanWagoner 2018-05-23 14:35:25 PDT
Created attachment 341131 [details]
Patch
Comment 15 Andy VanWagoner 2018-05-24 08:07:45 PDT
Comment on attachment 341131 [details]
Patch

There are probably a few edge cases for canonicalization missing, but generates code from the language subtag registry, covers cases tested by test262/intl402, and generally does a way better job of canonicalizing than before.

If we find a more efficient way to do the lookups, we can modify the script to generate it, instead of manually updating several c++ functions.
Comment 16 Andy VanWagoner 2018-06-11 14:05:01 PDT
Is there anything else this patch needs before it can be considered?
Comment 17 Andy VanWagoner 2018-06-27 12:38:29 PDT
Created attachment 343739 [details]
Rebased
Comment 18 EWS Watchlist 2018-06-28 01:38:33 PDT
Comment on attachment 343739 [details]
Rebased

Attachment 343739 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/8367720

New failing tests:
http/tests/security/local-video-source-from-remote.html
Comment 19 EWS Watchlist 2018-06-28 01:38:45 PDT
Created attachment 343802 [details]
Archive of layout-test-results from ews205 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews205  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 20 Andy VanWagoner 2018-07-04 07:20:38 PDT
Created attachment 344285 [details]
Patch
Comment 21 EWS Watchlist 2018-07-04 09:20:13 PDT
Comment on attachment 344285 [details]
Patch

Attachment 344285 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/8436832

New failing tests:
http/tests/security/canvas-remote-read-remote-video-redirect.html
Comment 22 EWS Watchlist 2018-07-04 09:20:25 PDT
Created attachment 344292 [details]
Archive of layout-test-results from ews206 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews206  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 23 Andy VanWagoner 2018-07-23 08:05:06 PDT
Created attachment 345573 [details]
Patch
Comment 24 Keith Miller 2018-07-23 11:02:28 PDT
Comment on attachment 345573 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=345573&action=review

r=me with some comments.

> Source/JavaScriptCore/Scripts/generateIntlCanonicalizeLanguage.py:113
> +        # file.write("#include <wtf/HashMap.h>\n")
> +        # file.write("#include <wtf/NeverDestroyed.h>\n")

Please remove.

> Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:679
> +    ucal_setGregorianChange(cal, -8.64E15, &status);

What's -8.64E15? Maybe it'd be better as a named const variable?

> Source/JavaScriptCore/runtime/IntlObject.cpp:327
> +                if (!times) {
> +                    if (intlPreferredExtlangTag(extlangLower) == langtag) {

Nit: Can you make this if (!times && intlPreferredExtlangTag(extlangLower) == langtag)

> Source/JavaScriptCore/ucd/language-subtag-registry.txt:1
> +File-Date: 2018-04-23

Does this file have a Copyright? Do we need to do anything there.
Comment 25 Keith Miller 2018-07-23 11:03:21 PDT
Thanks again for taking the time to keep the patch up to date! Sorry about the delay in the review... :\
Comment 26 Andy VanWagoner 2018-07-23 11:23:20 PDT
(In reply to Keith Miller from comment #24)
> Comment on attachment 345573 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=345573&action=review
> 
> r=me with some comments.
> 
> > Source/JavaScriptCore/Scripts/generateIntlCanonicalizeLanguage.py:113
> > +        # file.write("#include <wtf/HashMap.h>\n")
> > +        # file.write("#include <wtf/NeverDestroyed.h>\n")
> 
> Please remove.

Will Do.

> 
> > Source/JavaScriptCore/runtime/IntlDateTimeFormat.cpp:679
> > +    ucal_setGregorianChange(cal, -8.64E15, &status);
> 
> What's -8.64E15? Maybe it'd be better as a named const variable?

It's the min ECMAScript time. A const for it sounds good.

> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:327
> > +                if (!times) {
> > +                    if (intlPreferredExtlangTag(extlangLower) == langtag) {
> 
> Nit: Can you make this if (!times && intlPreferredExtlangTag(extlangLower)
> == langtag)

Sure thing.

> 
> > Source/JavaScriptCore/ucd/language-subtag-registry.txt:1
> > +File-Date: 2018-04-23
> 
> Does this file have a Copyright? Do we need to do anything there.

This file is downloaded from https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry I left it verbatim, but I can modify it if we need to. I figured downloading it in the script would be bad, relying on network connectivity for the build.
Comment 27 Keith Miller 2018-07-23 11:57:57 PDT
Comment on attachment 345573 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=345573&action=review

>>> Source/JavaScriptCore/ucd/language-subtag-registry.txt:1
>>> +File-Date: 2018-04-23
>> 
>> Does this file have a Copyright? Do we need to do anything there.
> 
> This file is downloaded from https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry I left it verbatim, but I can modify it if we need to. I figured downloading it in the script would be bad, relying on network connectivity for the build.

Yeah, we definitely shouldn't download it each time. Based on http://trustee.ietf.org/license-info/IETF-TLP-5.pdf we need to include the Simplified BSD License at the top of the file. Can you do that before landing?
Comment 28 EWS Watchlist 2018-07-23 12:49:36 PDT
Comment on attachment 345573 [details]
Patch

Attachment 345573 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/8629263

New failing tests:
http/tests/security/canvas-remote-read-remote-video-localhost.html
Comment 29 EWS Watchlist 2018-07-23 12:49:51 PDT
Created attachment 345599 [details]
Archive of layout-test-results from ews200 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews200  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 30 Andy VanWagoner 2018-07-23 12:55:12 PDT
Created attachment 345600 [details]
Patch
Comment 31 WebKit Commit Bot 2018-07-23 18:05:47 PDT
Comment on attachment 345600 [details]
Patch

Clearing flags on attachment: 345600

Committed r234127: <https://trac.webkit.org/changeset/234127>
Comment 32 WebKit Commit Bot 2018-07-23 18:05:49 PDT
All reviewed patches have been landed.  Closing bug.
Comment 33 Radar WebKit Bug Importer 2018-07-23 18:09:34 PDT
<rdar://problem/42524024>
Comment 34 Dawei Fenton (:realdawei) 2018-07-24 09:06:22 PDT
(In reply to WebKit Commit Bot from comment #31)
> Comment on attachment 345600 [details]
> Patch
> 
> Clearing flags on attachment: 345600
> 
> Committed r234127: <https://trac.webkit.org/changeset/234127>

seeing 262 errors on the bots after this revision

sample output:
https://build.webkit.org/builders/Apple%20Sierra%20Release%20Test262%20%28Tests%29/builds/3611/steps/test262-test/logs/stdio

FAIL test/intl402/Locale/constructor-non-iana-canon.js (default)
Full Output:
Exception: TypeError: undefined is not a constructor (evaluating 'new Intl.Locale(tag)')
global code@/Volumes/Data/slave/sierra-release-tests-test262/build/JSTests/test262/test/intl402/Locale/constructor-non-iana-canon.js:86:32

FAIL test/intl402/Locale/constructor-non-iana-canon.js (strict mode)
Full Output:
Exception: TypeError: undefined is not a constructor (evaluating 'new Intl.Locale(tag)')
global code@/Volumes/Data/slave/sierra-release-tests-test262/build/JSTests/test262/test/intl402/Locale/constructor-non-iana-canon.js:87:32

FAIL test/intl402/Locale/likely-subtags-grandfathered.js (default)
Full Output:
Exception: TypeError: undefined is not a constructor (evaluating 'new Intl.Locale(tag)')
global code@/Volumes/Data/slave/sierra-release-tests-test262/build/JSTests/test262/test/intl402/Locale/likely-subtags-grandfathered.js:98:32

FAIL test/intl402/Locale/likely-subtags-grandfathered.js (strict mode)
Full Output:
Exception: TypeError: undefined is not a constructor (evaluating 'new Intl.Locale(tag)')
global code@/Volumes/Data/slave/sierra-release-tests-test262/build/JSTests/test262/test/intl402/Locale/likely-subtags-grandfathered.js:99:32
Comment 35 Andy VanWagoner 2018-07-24 09:39:09 PDT
(In reply to David Fenton (:realdawei) from comment #34)
> (In reply to WebKit Commit Bot from comment #31)
> > Comment on attachment 345600 [details]
> > Patch
> > 
> > Clearing flags on attachment: 345600
> > 
> > Committed r234127: <https://trac.webkit.org/changeset/234127>
> 
> seeing 262 errors on the bots after this revision
> 
> sample output:
> https://build.webkit.org/builders/
> Apple%20Sierra%20Release%20Test262%20%28Tests%29/builds/3611/steps/test262-
> test/logs/stdio
> 
> FAIL test/intl402/Locale/constructor-non-iana-canon.js (default)
> Full Output:
> Exception: TypeError: undefined is not a constructor (evaluating 'new
> Intl.Locale(tag)')
> global
> code@/Volumes/Data/slave/sierra-release-tests-test262/build/JSTests/test262/
> test/intl402/Locale/constructor-non-iana-canon.js:86:32
> 
> FAIL test/intl402/Locale/constructor-non-iana-canon.js (strict mode)
> Full Output:
> Exception: TypeError: undefined is not a constructor (evaluating 'new
> Intl.Locale(tag)')
> global
> code@/Volumes/Data/slave/sierra-release-tests-test262/build/JSTests/test262/
> test/intl402/Locale/constructor-non-iana-canon.js:87:32
> 
> FAIL test/intl402/Locale/likely-subtags-grandfathered.js (default)
> Full Output:
> Exception: TypeError: undefined is not a constructor (evaluating 'new
> Intl.Locale(tag)')
> global
> code@/Volumes/Data/slave/sierra-release-tests-test262/build/JSTests/test262/
> test/intl402/Locale/likely-subtags-grandfathered.js:98:32
> 
> FAIL test/intl402/Locale/likely-subtags-grandfathered.js (strict mode)
> Full Output:
> Exception: TypeError: undefined is not a constructor (evaluating 'new
> Intl.Locale(tag)')
> global
> code@/Volumes/Data/slave/sierra-release-tests-test262/build/JSTests/test262/
> test/intl402/Locale/likely-subtags-grandfathered.js:99:32

These appear to be newer tests related to `Intl.Locale` which we haven't implemented yet. They weren't accounted for in the expectations.yml file yet, and I didn't add them.

How does expectations.yml & the test262 tests usually get updated? If needed I can push up a patch that adds those as expected failures.
Comment 36 Andy VanWagoner 2018-07-24 10:09:22 PDT
The patch for https://bugs.webkit.org/show_bug.cgi?id=187960 includes fixing the expectations for Intl.Locale tests that were changed by this.
Comment 37 Dawei Fenton (:realdawei) 2018-07-24 10:10:48 PDT
(In reply to Andy VanWagoner from comment #36)
> The patch for https://bugs.webkit.org/show_bug.cgi?id=187960 includes fixing
> the expectations for Intl.Locale tests that were changed by this.

great. thanks!