Bug 213007 - REGRESSION(r260697): [Intl] "missing script" locales like zh-TW are no longer mapped
Summary: REGRESSION(r260697): [Intl] "missing script" locales like zh-TW are no longer...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ross Kirsling
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-06-09 20:36 PDT by Ross Kirsling
Modified: 2020-06-12 13:56 PDT (History)
8 users (show)

See Also:


Attachments
Patch (5.36 KB, patch)
2020-06-09 20:57 PDT, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch (5.70 KB, patch)
2020-06-10 16:59 PDT, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch (5.74 KB, patch)
2020-06-10 17:39 PDT, Ross Kirsling
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ross Kirsling 2020-06-09 20:36:04 PDT
REGRESSION(r260697): [Intl] ""missing script" locales like zh-TW are no longer mapped
Comment 1 Ross Kirsling 2020-06-09 20:57:23 PDT
Created attachment 401510 [details]
Patch
Comment 2 Darin Adler 2020-06-10 11:17:29 PDT
Comment on attachment 401510 [details]
Patch

Not thrilled that the test coverage exactly matches our hard-coded exception list. Is there a way to figure out what is expected and desired and make a broader test, even if some of the less important cases fail at this time?

Generally not thrilled with tests that only test the positive cases.
Comment 3 Darin Adler 2020-06-10 11:17:47 PDT
Is there a reason this isn’t covered in Web Platform Tests?
Comment 4 Ross Kirsling 2020-06-10 12:49:47 PDT
(In reply to Darin Adler from comment #2)
> Comment on attachment 401510 [details]
> Patch
> 
> Not thrilled that the test coverage exactly matches our hard-coded exception
> list. Is there a way to figure out what is expected and desired and make a
> broader test, even if some of the less important cases fail at this time?
> 
> Generally not thrilled with tests that only test the positive cases.

Like the ChangeLog indicates, there's an ongoing discussion about what the right set of locales here should be (since SpiderMonkey has the same hardcoded list):
https://github.com/tc39/ecma402/issues/159

I got worried about sitting on this regression for too long (because not a few users will be upset if a mainline release goes out this way) so this is just meant as a partial revert, but I also figured that not adding a test would be a bad look.

That said, I think my comment that "we can't just add xx-ZZ for every available xx-Yyyy-ZZ locale" may be wrong. Just because ICU4C doesn't have an explicit xx-ZZ available doesn't mean that it won't fall back properly. I should try this general approach after all.

(In reply to Darin Adler from comment #3)
> Is there a reason this isn’t covered in Web Platform Tests?

It should be covered in test262 pending the resolution of the above discussion. (There are some tests indirectly assuming that zh-TW is a Hant locale, but...)
Comment 5 Ross Kirsling 2020-06-10 16:59:08 PDT
Created attachment 401605 [details]
Patch
Comment 6 Ross Kirsling 2020-06-10 17:04:54 PDT
Uploaded a new version which keeps the regression test but doesn't use a hardcoded locale list.

I think this is better but I hope I'm doing the createStaticStringImpl part correctly; we're still filling a NeverDestroyed<HashSet<String>> but now some of these are of our own creation (and some duplicate adds will occur).
Comment 7 Darin Adler 2020-06-10 17:15:30 PDT
Comment on attachment 401605 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=401605&action=review

> Source/JavaScriptCore/runtime/IntlObject.cpp:197
> +    Vector<StringView> subtags;

Make this Vector<StringView, 3> to avoid one memory allocation.

> Source/JavaScriptCore/runtime/IntlObject.cpp:198
> +    for (auto subtag : locale.splitAllowingEmptyEntries('-')) 

Do you really want the "allowing empty entries" version? Seems wrong to me.

> Source/JavaScriptCore/runtime/IntlObject.cpp:199
> +        subtags.append(subtag);

To cut down memory allocation a bit more, write:

    if (subtags.size() >= 3)
        return;

Before the call to append.

> Source/JavaScriptCore/runtime/IntlObject.cpp:211
> +    Vector<char, 12> buffer;
> +    ASSERT(subtags[0].is8Bit() && subtags[0].isAllASCII());
> +    buffer.append(reinterpret_cast<const char*>(subtags[0].characters8()), subtags[0].length());
> +    buffer.append('-');
> +    ASSERT(subtags[2].is8Bit() && subtags[2].isAllASCII());
> +    buffer.append(reinterpret_cast<const char*>(subtags[2].characters8()), subtags[2].length());
> +
> +    availableLocales.add(StringImpl::createStaticStringImpl(buffer.data(), buffer.size()));

If it wasn’t for this StaticString thing, we could get rid of all those asserts and just write:

    availableLocales.add(makeString(subtags[0], '-', subtags[2]);

But, as last time, I don’t really understand why StaticString is important here.
Comment 8 Ross Kirsling 2020-06-10 17:39:48 PDT
Created attachment 401609 [details]
Patch
Comment 9 EWS 2020-06-10 19:20:55 PDT
Committed r262890: <https://trac.webkit.org/changeset/262890>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 401609 [details].
Comment 10 Radar WebKit Bug Importer 2020-06-10 19:21:16 PDT
<rdar://problem/64235612>
Comment 11 Darin Adler 2020-06-11 16:06:08 PDT
Comment on attachment 401609 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=401609&action=review

> Source/JavaScriptCore/runtime/IntlObject.cpp:204
> +    if (subtags.size() < 3 || subtags[1].length() != 4 || subtags[2].length() > 3)

Realized that this should be "!= 3" rather than "< 3" for clarity.
Comment 12 Ross Kirsling 2020-06-12 12:37:32 PDT
(In reply to Darin Adler from comment #11)
> Comment on attachment 401609 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=401609&action=review
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:204
> > +    if (subtags.size() < 3 || subtags[1].length() != 4 || subtags[2].length() > 3)
> 
> Realized that this should be "!= 3" rather than "< 3" for clarity.

Hmm, I changed this from `!=` to `<` when I added the loop-internal early out you suggested, aiming to underscore the fact that we'd be gone already if we tried to grow beyond 3. I can switch it if it's inadvertently confusing though?
Comment 13 Darin Adler 2020-06-12 13:42:00 PDT
I think the narrower condition is simply easier to understand and more logical.

From a "policy" point of view, we only want to run the code below if the size is exactly 3, and thus that’s what the code should say.
Comment 14 Ross Kirsling 2020-06-12 13:56:47 PDT
Works for me.

Committed r262974: <https://trac.webkit.org/changeset/262974>