Bug 209772

Summary: [ECMA-402] Implement Intl.Locale
Product: WebKit Reporter: Shane Carr <sffc>
Component: JavaScriptCoreAssignee: Ross Kirsling <ross.kirsling>
Status: RESOLVED FIXED    
Severity: Normal CC: andy, annulen, ashvayka, darin, ews-watchlist, gyuyoung.kim, keith_miller, mark.lam, mmaxfield, msaboff, ross.kirsling, ryuan.choi, saam, sergio, sffc, ticaiolima, tzagallo, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=214223
Attachments:
Description Flags
Patch
none
Patch
none
Patch for landing none

Description Shane Carr 2020-03-30 14:36:22 PDT
Intl.Locale has reached Stage 4 in TC39 as of February 2020, and the other major browser engines (V8 and SpiderMonkey) have been shipping it for some time.

As usage of Intl.Locale increases throughout the ecosystem, WebKit users will be left behind with legacy polyfills, leading to poorer performance relative to other browsers.  Intl.Locale is a low-level type whose usage we expect to grow rapidly as more and more libraries begin to leverage it.

ICU4C exposes C and C++ APIs that can be used to implement Intl.Locale.  Implementing Intl.Locale is largely a matter of adding the glue between JavaScript and ICU4C, as WebKit already does for other Intl types.
Comment 1 Shane Carr 2020-03-30 14:40:08 PDT
Original proposal repo: https://github.com/tc39/proposal-intl-locale
Comment 2 Radar WebKit Bug Importer 2020-03-30 16:39:02 PDT
<rdar://problem/61079762>
Comment 3 Ross Kirsling 2020-05-02 18:19:22 PDT
Created attachment 398301 [details]
Patch
Comment 4 Ross Kirsling 2020-05-02 19:06:45 PDT Comment hidden (obsolete)
Comment 5 Ross Kirsling 2020-05-02 19:32:15 PDT Comment hidden (obsolete)
Comment 6 Ross Kirsling 2020-05-02 20:09:39 PDT
Created attachment 398303 [details]
Patch
Comment 7 Alexey Shvayka 2020-05-05 10:35:14 PDT
Comment on attachment 398303 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review

> Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:56
> +const ClassInfo IntlLocalePrototype::s_info = { "Object", &Base::s_info, &localePrototypeTable, nullptr, CREATE_METHOD_TABLE(IntlLocalePrototype) };

We might also set `className` to "Intl.Locale" and use JSC_TO_STRING_TAG_WITHOUT_TRANSITION in IntlLocalePrototype::finishCreation().
Comment 8 Ross Kirsling 2020-05-05 11:08:28 PDT
(In reply to Alexey Shvayka from comment #7)
> Comment on attachment 398303 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=398303&action=review
> 
> > Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:56
> > +const ClassInfo IntlLocalePrototype::s_info = { "Object", &Base::s_info, &localePrototypeTable, nullptr, CREATE_METHOD_TABLE(IntlLocalePrototype) };
> 
> We might also set `className` to "Intl.Locale" and use
> JSC_TO_STRING_TAG_WITHOUT_TRANSITION in
> IntlLocalePrototype::finishCreation().

Oh, I didn't know about this approach. Perhaps I should do so for all of the Intl prototypes?
Comment 9 Alexey Shvayka 2020-05-05 11:15:43 PDT
(In reply to Ross Kirsling from comment #8)
> Oh, I didn't know about this approach. Perhaps I should do so for all of the
> Intl prototypes?

It's very new (introduced in r260992); other Intl prototypes are already covered.
Also, I hope that https://github.com/tc39/ecma402/pull/430 will align @@toStringTag values of all Intl prototypes.
Comment 10 Darin Adler 2020-05-05 11:17:02 PDT
Comment on attachment 398303 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review

Looks generally great. I have some class design thoughts.

> Source/JavaScriptCore/runtime/IntlLocale.cpp:85
> +    UErrorCode status = U_ZERO_ERROR;
> +    Vector<char, 32> buffer(32);
> +    int32_t parsedLength;
> +    auto bufferLength = uloc_forLanguageTag(input.data(), buffer.data(), buffer.size(), &parsedLength, &status);
> +    if (needsToGrowToProduceBuffer(status)) {
> +        buffer.resize(bufferLength);
> +        status = U_ZERO_ERROR;
> +        uloc_forLanguageTag(input.data(), buffer.data(), bufferLength, &parsedLength, &status);
> +    }
> +    if (U_FAILURE(status) || parsedLength != static_cast<int32_t>(input.length()))
> +        return false;

I am not OK with us continuing to write these out individually. I am going to come up with a template/pattern for these so we have a better chance of doing it right.

> Source/JavaScriptCore/runtime/IntlLocale.cpp:475
> +        m_numeric = triState(keyword("colnumeric"_s).startsWith("true"_s));

I looked at the specification and did not know how to find the string "colnumeric" or the rule about "true" as a prefix. Surprised that a "starts with" check is correct here.

Checking that something starts with an ASCII literal (case sensitive) is not well optimized, perhaps because it’s not very common. Surprised that it is the right thing here.

From a micro-optimization point of view, annoying that this creates/destroys a WTF::String containing the characters "true"; a malloc/free pair. Neither == nor startsWithLettersIgnoringASCIICase does that. We just don’t have a startsWith overload that takes a literal. If we don’t want to add new overloads for WTF::String::startsWith, to avoid allocating memory we could use StringView:

    StringView { keyword("colnumeric"_s) }.startsWith("true")

That would be more efficient. Also, ironically, the free function startsWith from StringCommon.h is better optimized, although it doesn’t work with literals:

    startsWith(keyword("colnumeric"_s), StringView { "true "})

Both of the above would be more efficient than String::startsWith. Further, the StringCommon.h functions could be fixed to work with ASCII literals by adding member functions named length(), is8Bit(), characters8(), and characters16() to ASCIILiteral or by some other kinds of refactoring.

> Source/JavaScriptCore/runtime/IntlLocale.h:76
> +    void setKeyword(ASCIILiteral, const String&);

I think this should be called setKeywordValue. I think this should take a StringView. I know that currently String has a function to produce UTF-8 characters and StringView does not, but we should fix that.

> Source/JavaScriptCore/runtime/IntlLocale.h:77
> +    String keyword(ASCIILiteral) const;

I think this should be called keywordValue.

> Source/JavaScriptCore/runtime/IntlLocale.h:80
> +    CString m_localeID;

CString seems fine/great for the locale ID after it’s computed. While it’s being computed I don’t think CString is optimal. Vector<char> would probably be better, probably even with inline capacity.

I think it’s a mistake that IntlLocale object is both the "locale ID builder" and the thing to use after it’s built. I suggest a separate object for use as the locale ID builder, that gets created and destroyed within the initializeLocaleFunction. Many of the private member functions would be part of that, not part of IntlLocale.
Comment 11 Saam Barati 2020-05-05 11:39:16 PDT
Comment on attachment 398303 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review

> Source/JavaScriptCore/runtime/IntlLocale.cpp:79
> +    if (needsToGrowToProduceBuffer(status)) {

This is wrong. It needs to be:
needsToGrowToProduceCString
Comment 12 Saam Barati 2020-05-05 11:41:56 PDT
Comment on attachment 398303 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review

>> Source/JavaScriptCore/runtime/IntlLocale.cpp:79
>> +    if (needsToGrowToProduceBuffer(status)) {
> 
> This is wrong. It needs to be:
> needsToGrowToProduceCString

Sorry I think I'm wrong here based on your usage. Reinstating Darin's r+

> Source/JavaScriptCore/runtime/IntlLocale.cpp:95
> +    auto bufferLength = uloc_canonicalize(m_localeID.data(), buffer.data(), buffer.size(), &status);

Looks like the same issue.
Comment 13 Saam Barati 2020-05-05 11:45:36 PDT
Comment on attachment 398303 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review

>> Source/JavaScriptCore/runtime/IntlLocale.cpp:95
>> +    auto bufferLength = uloc_canonicalize(m_localeID.data(), buffer.data(), buffer.size(), &status);
> 
> Looks like the same issue.

Also wrong here. I did a quick vetting of all uses of "needsToGrowXYZ" and they seem correct.
Comment 14 Ross Kirsling 2020-05-05 11:47:36 PDT
(In reply to Saam Barati from comment #13)
> Comment on attachment 398303 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=398303&action=review
> 
> >> Source/JavaScriptCore/runtime/IntlLocale.cpp:95
> >> +    auto bufferLength = uloc_canonicalize(m_localeID.data(), buffer.data(), buffer.size(), &status);
> > 
> > Looks like the same issue.
> 
> Also wrong here. I did a quick vetting of all uses of "needsToGrowXYZ" and
> they seem correct.

Thanks. I guess this may imply that the naming is a little confusing? Since the relevance of the warning is when we need the buffer itself to be null-terminated so that we can use its data *without* making any sort of string class.
Comment 15 Ross Kirsling 2020-05-05 11:58:12 PDT
(In reply to Darin Adler from comment #10)
> > Source/JavaScriptCore/runtime/IntlLocale.cpp:475
> > +        m_numeric = triState(keyword("colnumeric"_s).startsWith("true"_s));
> 
> I looked at the specification and did not know how to find the string
> "colnumeric" or the rule about "true" as a prefix. Surprised that a "starts
> with" check is correct here.
> 
> Checking that something starts with an ASCII literal (case sensitive) is not
> well optimized, perhaps because it’s not very common. Surprised that it is
> the right thing here.

This situation here is kind of goofy because we're operating on ICU's legacy locale identifiers. In particular, they represent boolean true as "yes", but when we modernize these legacy "yes" and "no" values they become "truefalse" and "falsefalse" (which would seem like a very weird bug, but then a missing value turns into just "false" so maybe it's somehow intended??). I didn't feel great about using startsWith as a solution to begin with, so I think this means we should just avoid the modernization step for boolean values and check `== "yes"` instead.
Comment 16 Saam Barati 2020-05-05 12:01:37 PDT
(In reply to Ross Kirsling from comment #14)
> (In reply to Saam Barati from comment #13)
> > Comment on attachment 398303 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=398303&action=review
> > 
> > >> Source/JavaScriptCore/runtime/IntlLocale.cpp:95
> > >> +    auto bufferLength = uloc_canonicalize(m_localeID.data(), buffer.data(), buffer.size(), &status);
> > > 
> > > Looks like the same issue.
> > 
> > Also wrong here. I did a quick vetting of all uses of "needsToGrowXYZ" and
> > they seem correct.
> 
> Thanks. I guess this may imply that the naming is a little confusing? Since
> the relevance of the warning is when we need the buffer itself to be
> null-terminated so that we can use its data *without* making any sort of
> string class.

The confusion on my end is just not trusting my immediate reaction that a char* buffer needs to be null terminated. It doesn’t depending on how the result is used.

I think Darin’s planned refactoring will make this a lot nicer by having this code all in one place
Comment 17 Ross Kirsling 2020-05-05 12:30:18 PDT
(In reply to Ross Kirsling from comment #8)
> (In reply to Alexey Shvayka from comment #7)
> > Comment on attachment 398303 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=398303&action=review
> > 
> > > Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:56
> > > +const ClassInfo IntlLocalePrototype::s_info = { "Object", &Base::s_info, &localePrototypeTable, nullptr, CREATE_METHOD_TABLE(IntlLocalePrototype) };
> > 
> > We might also set `className` to "Intl.Locale" and use
> > JSC_TO_STRING_TAG_WITHOUT_TRANSITION in
> > IntlLocalePrototype::finishCreation().
> 
> Oh, I didn't know about this approach. Perhaps I should do so for all of the
> Intl prototypes?

Oh oops, I now see this was done in bug 211020. Will follow suit here.
Comment 18 Ross Kirsling 2020-05-05 21:14:44 PDT
(In reply to Darin Adler from comment #10)
> > Source/JavaScriptCore/runtime/IntlLocale.h:76
> > +    void setKeyword(ASCIILiteral, const String&);
> 
> I think this should be called setKeywordValue. I think this should take a
> StringView. I know that currently String has a function to produce UTF-8
> characters and StringView does not, but we should fix that.

We can do this, but since characters8() lacks a null terminator, I think we still have to make a CString before passing to uloc_setKeywordValue.
Comment 19 Darin Adler 2020-05-05 21:20:16 PDT
(In reply to Ross Kirsling from comment #18)
> We can do this, but since characters8() lacks a null terminator, I think we
> still have to make a CString before passing to uloc_setKeywordValue.

Of course: characters8() also won’t produce UTF-8. I’m just saying that we can create a function like String::utf8 that works on a StringView.
Comment 20 Ross Kirsling 2020-05-05 21:25:49 PDT
Created attachment 398585 [details]
Patch for landing
Comment 21 EWS 2020-05-05 23:01:11 PDT
Committed r261215: <https://trac.webkit.org/changeset/261215>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 398585 [details].