[Intl] Canonicalize locales according to UTS 35
Created attachment 397180 [details] Patch
Additional notes: 1. For ease of review, here's what the two functions in the generated header end up looking like: static String intlPreferredLanguageTag(const String& tag) { // 354 possible replacements static const auto map = makeNeverDestroyed<HashMap<String, String>>({ { "aam"_s, "aas"_s }, ... }); return map.get().get(tag); } 2. A ~5000-line file is much nicer than a ~50000-line file, but you might ask why ICU can't do this for us, since it has the data. The answer is that uloc_[for|to]LanguageTag unfortunately don't produce results that conform to ECMA-402 (and are also supposedly quite slow?). Still, there are certain transformations that we still need to apply.
Created attachment 397227 [details] Patch
(In reply to Ross Kirsling from comment #2) > 2. A ~5000-line file is much nicer than a ~50000-line file, but you might > ask why ICU can't do this for us, since it has the data. The answer is that > uloc_[for|to]LanguageTag unfortunately don't produce results that conform to > ECMA-402 (and are also supposedly quite slow?). Still, there are certain > transformations that we still need to apply. Hmm, it turns out there's an XML version of this that will be much easier (and future-proof) to parse. I'll try using that instead.
Apparently just as we were going to catch up with the existing tests for locale canonicalization, many more landed that raise the bar significantly higher (https://github.com/tc39/test262/pull/2554). This seems utterly unreasonable to ask each engine to implement. We should probably just rely on ICU and accept whatever failures occur in a given version.
Created attachment 397431 [details] Patch
By Yusuke's suggestion, let's just roll with what ICU gives us. What's cool is that there's almost zero loss to doing so in ICU 64 -- the only thing missing is the "-true" elimination of r260151. (And we could even add it back temporarily if we felt it were worth the perf hit.)
Comment on attachment 397431 [details] Patch I’ll review once the JSC stuff compiles.
Created attachment 397505 [details] Patch
Comment on attachment 397505 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=397505&action=review > Source/JavaScriptCore/runtime/IntlObject.cpp:356 > + return String(result.data(), resultLength); Are these language tags ASCII? Latin-1? UTF-8? This code is fine if they are ASCII or Latin-1, not if they are UTF-8.
(In reply to Darin Adler from comment #10) > Comment on attachment 397505 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=397505&action=review > > > Source/JavaScriptCore/runtime/IntlObject.cpp:356 > > + return String(result.data(), resultLength); > > Are these language tags ASCII? Latin-1? UTF-8? > > This code is fine if they are ASCII or Latin-1, not if they are UTF-8. Yeah, the output is all ASCII (just alphanumeric and separator characters).
Created attachment 397528 [details] Patch for landing
(I'll make sure I see JSC-ARMv7 pass tests before landing.)
I like this new strategy. But: I think it’s OK to add extra code to perform better with older versions of ICU or with things ICU didn’t get to yet, depending on the importance.
Created attachment 397538 [details] Patch for landing
Created attachment 397549 [details] Patch for landing
Committed r260697: <https://trac.webkit.org/changeset/260697> All reviewed patches have been landed. Closing bug and clearing flags on attachment 397549 [details].
<rdar://problem/62365727>
For some reason this hits an unchecked exception assert only via run-javascript-core and not via run-jsc. Will fix...
(In reply to Ross Kirsling from comment #19) > For some reason this hits an unchecked exception assert only via > run-javascript-core and not via run-jsc. Will fix... Committed r260710: <https://trac.webkit.org/changeset/260710>