WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
210853
[Intl] Locale validation/canonicalization should defer to ICU
https://bugs.webkit.org/show_bug.cgi?id=210853
Summary
[Intl] Locale validation/canonicalization should defer to ICU
Ross Kirsling
Reported
2020-04-22 02:18:39 PDT
[Intl] Canonicalize locales according to UTS 35
Attachments
Patch
(916.00 KB, patch)
2020-04-22 02:38 PDT
,
Ross Kirsling
no flags
Details
Formatted Diff
Diff
Patch
(915.52 KB, patch)
2020-04-22 11:43 PDT
,
Ross Kirsling
no flags
Details
Formatted Diff
Diff
Patch
(785.18 KB, patch)
2020-04-23 23:53 PDT
,
Ross Kirsling
no flags
Details
Formatted Diff
Diff
Patch
(785.32 KB, patch)
2020-04-24 14:04 PDT
,
Ross Kirsling
no flags
Details
Formatted Diff
Diff
Patch for landing
(787.78 KB, patch)
2020-04-24 17:16 PDT
,
Ross Kirsling
no flags
Details
Formatted Diff
Diff
Patch for landing
(787.74 KB, patch)
2020-04-24 21:29 PDT
,
Ross Kirsling
no flags
Details
Formatted Diff
Diff
Patch for landing
(789.78 KB, patch)
2020-04-25 01:04 PDT
,
Ross Kirsling
no flags
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Ross Kirsling
Comment 1
2020-04-22 02:38:27 PDT
Created
attachment 397180
[details]
Patch
Ross Kirsling
Comment 2
2020-04-22 02:57:35 PDT
Additional notes: 1. For ease of review, here's what the two functions in the generated header end up looking like: static String intlPreferredLanguageTag(const String& tag) { // 354 possible replacements static const auto map = makeNeverDestroyed<HashMap<String, String>>({ { "aam"_s, "aas"_s }, ... }); return map.get().get(tag); } 2. A ~5000-line file is much nicer than a ~50000-line file, but you might ask why ICU can't do this for us, since it has the data. The answer is that uloc_[for|to]LanguageTag unfortunately don't produce results that conform to ECMA-402 (and are also supposedly quite slow?). Still, there are certain transformations that we still need to apply.
Ross Kirsling
Comment 3
2020-04-22 11:43:49 PDT
Created
attachment 397227
[details]
Patch
Ross Kirsling
Comment 4
2020-04-23 13:24:53 PDT
(In reply to Ross Kirsling from
comment #2
)
> 2. A ~5000-line file is much nicer than a ~50000-line file, but you might > ask why ICU can't do this for us, since it has the data. The answer is that > uloc_[for|to]LanguageTag unfortunately don't produce results that conform to > ECMA-402 (and are also supposedly quite slow?). Still, there are certain > transformations that we still need to apply.
Hmm, it turns out there's an XML version of this that will be much easier (and future-proof) to parse. I'll try using that instead.
Ross Kirsling
Comment 5
2020-04-23 15:44:22 PDT
Apparently just as we were going to catch up with the existing tests for locale canonicalization, many more landed that raise the bar significantly higher (
https://github.com/tc39/test262/pull/2554
). This seems utterly unreasonable to ask each engine to implement. We should probably just rely on ICU and accept whatever failures occur in a given version.
Ross Kirsling
Comment 6
2020-04-23 23:53:42 PDT
Created
attachment 397431
[details]
Patch
Ross Kirsling
Comment 7
2020-04-23 23:56:25 PDT
By Yusuke's suggestion, let's just roll with what ICU gives us. What's cool is that there's almost zero loss to doing so in ICU 64 -- the only thing missing is the "-true" elimination of
r260151
. (And we could even add it back temporarily if we felt it were worth the perf hit.)
Darin Adler
Comment 8
2020-04-24 11:07:46 PDT
Comment on
attachment 397431
[details]
Patch I’ll review once the JSC stuff compiles.
Ross Kirsling
Comment 9
2020-04-24 14:04:18 PDT
Created
attachment 397505
[details]
Patch
Darin Adler
Comment 10
2020-04-24 16:23:13 PDT
Comment on
attachment 397505
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=397505&action=review
> Source/JavaScriptCore/runtime/IntlObject.cpp:356 > + return String(result.data(), resultLength);
Are these language tags ASCII? Latin-1? UTF-8? This code is fine if they are ASCII or Latin-1, not if they are UTF-8.
Ross Kirsling
Comment 11
2020-04-24 16:53:26 PDT
(In reply to Darin Adler from
comment #10
)
> Comment on
attachment 397505
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=397505&action=review
> > > Source/JavaScriptCore/runtime/IntlObject.cpp:356 > > + return String(result.data(), resultLength); > > Are these language tags ASCII? Latin-1? UTF-8? > > This code is fine if they are ASCII or Latin-1, not if they are UTF-8.
Yeah, the output is all ASCII (just alphanumeric and separator characters).
Ross Kirsling
Comment 12
2020-04-24 17:16:23 PDT
Created
attachment 397528
[details]
Patch for landing
Ross Kirsling
Comment 13
2020-04-24 17:21:47 PDT
(I'll make sure I see JSC-ARMv7 pass tests before landing.)
Darin Adler
Comment 14
2020-04-24 17:28:19 PDT
I like this new strategy. But: I think it’s OK to add extra code to perform better with older versions of ICU or with things ICU didn’t get to yet, depending on the importance.
Ross Kirsling
Comment 15
2020-04-24 21:29:37 PDT
Created
attachment 397538
[details]
Patch for landing
Ross Kirsling
Comment 16
2020-04-25 01:04:46 PDT
Created
attachment 397549
[details]
Patch for landing
EWS
Comment 17
2020-04-25 02:14:16 PDT
Committed
r260697
: <
https://trac.webkit.org/changeset/260697
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 397549
[details]
.
Radar WebKit Bug Importer
Comment 18
2020-04-25 02:15:20 PDT
<
rdar://problem/62365727
>
Ross Kirsling
Comment 19
2020-04-25 14:01:06 PDT
For some reason this hits an unchecked exception assert only via run-javascript-core and not via run-jsc. Will fix...
Ross Kirsling
Comment 20
2020-04-25 16:48:06 PDT
(In reply to Ross Kirsling from
comment #19
)
> For some reason this hits an unchecked exception assert only via > run-javascript-core and not via run-jsc. Will fix...
Committed
r260710
: <
https://trac.webkit.org/changeset/260710
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug