Summary: | [ECMA-402] Implement Intl.Locale | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Shane Carr <sffc> | ||||||||
Component: | JavaScriptCore | Assignee: | Ross Kirsling <ross.kirsling> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Normal | CC: | andy, annulen, ashvayka, darin, ews-watchlist, gyuyoung.kim, keith_miller, mark.lam, mmaxfield, msaboff, ross.kirsling, ryuan.choi, saam, sergio, sffc, ticaiolima, tzagallo, webkit-bug-importer, ysuzuki | ||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||
Version: | WebKit Nightly Build | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
See Also: | https://bugs.webkit.org/show_bug.cgi?id=214223 | ||||||||||
Attachments: |
|
Description
Shane Carr
2020-03-30 14:36:22 PDT
Original proposal repo: https://github.com/tc39/proposal-intl-locale Created attachment 398301 [details]
Patch
Hmm, since this is basically a static object with a slew of lazy getters, I suppose it would be good to have them all do call_once. (In reply to Ross Kirsling from comment #4) > Hmm, since this is basically a static object with a slew of lazy getters, I > suppose it would be good to have them all do call_once. Err no, that's not for caching instance data, what am I talking about... Created attachment 398303 [details]
Patch
Comment on attachment 398303 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review > Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:56 > +const ClassInfo IntlLocalePrototype::s_info = { "Object", &Base::s_info, &localePrototypeTable, nullptr, CREATE_METHOD_TABLE(IntlLocalePrototype) }; We might also set `className` to "Intl.Locale" and use JSC_TO_STRING_TAG_WITHOUT_TRANSITION in IntlLocalePrototype::finishCreation(). (In reply to Alexey Shvayka from comment #7) > Comment on attachment 398303 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=398303&action=review > > > Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:56 > > +const ClassInfo IntlLocalePrototype::s_info = { "Object", &Base::s_info, &localePrototypeTable, nullptr, CREATE_METHOD_TABLE(IntlLocalePrototype) }; > > We might also set `className` to "Intl.Locale" and use > JSC_TO_STRING_TAG_WITHOUT_TRANSITION in > IntlLocalePrototype::finishCreation(). Oh, I didn't know about this approach. Perhaps I should do so for all of the Intl prototypes? (In reply to Ross Kirsling from comment #8) > Oh, I didn't know about this approach. Perhaps I should do so for all of the > Intl prototypes? It's very new (introduced in r260992); other Intl prototypes are already covered. Also, I hope that https://github.com/tc39/ecma402/pull/430 will align @@toStringTag values of all Intl prototypes. Comment on attachment 398303 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review Looks generally great. I have some class design thoughts. > Source/JavaScriptCore/runtime/IntlLocale.cpp:85 > + UErrorCode status = U_ZERO_ERROR; > + Vector<char, 32> buffer(32); > + int32_t parsedLength; > + auto bufferLength = uloc_forLanguageTag(input.data(), buffer.data(), buffer.size(), &parsedLength, &status); > + if (needsToGrowToProduceBuffer(status)) { > + buffer.resize(bufferLength); > + status = U_ZERO_ERROR; > + uloc_forLanguageTag(input.data(), buffer.data(), bufferLength, &parsedLength, &status); > + } > + if (U_FAILURE(status) || parsedLength != static_cast<int32_t>(input.length())) > + return false; I am not OK with us continuing to write these out individually. I am going to come up with a template/pattern for these so we have a better chance of doing it right. > Source/JavaScriptCore/runtime/IntlLocale.cpp:475 > + m_numeric = triState(keyword("colnumeric"_s).startsWith("true"_s)); I looked at the specification and did not know how to find the string "colnumeric" or the rule about "true" as a prefix. Surprised that a "starts with" check is correct here. Checking that something starts with an ASCII literal (case sensitive) is not well optimized, perhaps because it’s not very common. Surprised that it is the right thing here. From a micro-optimization point of view, annoying that this creates/destroys a WTF::String containing the characters "true"; a malloc/free pair. Neither == nor startsWithLettersIgnoringASCIICase does that. We just don’t have a startsWith overload that takes a literal. If we don’t want to add new overloads for WTF::String::startsWith, to avoid allocating memory we could use StringView: StringView { keyword("colnumeric"_s) }.startsWith("true") That would be more efficient. Also, ironically, the free function startsWith from StringCommon.h is better optimized, although it doesn’t work with literals: startsWith(keyword("colnumeric"_s), StringView { "true "}) Both of the above would be more efficient than String::startsWith. Further, the StringCommon.h functions could be fixed to work with ASCII literals by adding member functions named length(), is8Bit(), characters8(), and characters16() to ASCIILiteral or by some other kinds of refactoring. > Source/JavaScriptCore/runtime/IntlLocale.h:76 > + void setKeyword(ASCIILiteral, const String&); I think this should be called setKeywordValue. I think this should take a StringView. I know that currently String has a function to produce UTF-8 characters and StringView does not, but we should fix that. > Source/JavaScriptCore/runtime/IntlLocale.h:77 > + String keyword(ASCIILiteral) const; I think this should be called keywordValue. > Source/JavaScriptCore/runtime/IntlLocale.h:80 > + CString m_localeID; CString seems fine/great for the locale ID after it’s computed. While it’s being computed I don’t think CString is optimal. Vector<char> would probably be better, probably even with inline capacity. I think it’s a mistake that IntlLocale object is both the "locale ID builder" and the thing to use after it’s built. I suggest a separate object for use as the locale ID builder, that gets created and destroyed within the initializeLocaleFunction. Many of the private member functions would be part of that, not part of IntlLocale. Comment on attachment 398303 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review > Source/JavaScriptCore/runtime/IntlLocale.cpp:79 > + if (needsToGrowToProduceBuffer(status)) { This is wrong. It needs to be: needsToGrowToProduceCString Comment on attachment 398303 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review >> Source/JavaScriptCore/runtime/IntlLocale.cpp:79 >> + if (needsToGrowToProduceBuffer(status)) { > > This is wrong. It needs to be: > needsToGrowToProduceCString Sorry I think I'm wrong here based on your usage. Reinstating Darin's r+ > Source/JavaScriptCore/runtime/IntlLocale.cpp:95 > + auto bufferLength = uloc_canonicalize(m_localeID.data(), buffer.data(), buffer.size(), &status); Looks like the same issue. Comment on attachment 398303 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=398303&action=review >> Source/JavaScriptCore/runtime/IntlLocale.cpp:95 >> + auto bufferLength = uloc_canonicalize(m_localeID.data(), buffer.data(), buffer.size(), &status); > > Looks like the same issue. Also wrong here. I did a quick vetting of all uses of "needsToGrowXYZ" and they seem correct. (In reply to Saam Barati from comment #13) > Comment on attachment 398303 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=398303&action=review > > >> Source/JavaScriptCore/runtime/IntlLocale.cpp:95 > >> + auto bufferLength = uloc_canonicalize(m_localeID.data(), buffer.data(), buffer.size(), &status); > > > > Looks like the same issue. > > Also wrong here. I did a quick vetting of all uses of "needsToGrowXYZ" and > they seem correct. Thanks. I guess this may imply that the naming is a little confusing? Since the relevance of the warning is when we need the buffer itself to be null-terminated so that we can use its data *without* making any sort of string class. (In reply to Darin Adler from comment #10) > > Source/JavaScriptCore/runtime/IntlLocale.cpp:475 > > + m_numeric = triState(keyword("colnumeric"_s).startsWith("true"_s)); > > I looked at the specification and did not know how to find the string > "colnumeric" or the rule about "true" as a prefix. Surprised that a "starts > with" check is correct here. > > Checking that something starts with an ASCII literal (case sensitive) is not > well optimized, perhaps because it’s not very common. Surprised that it is > the right thing here. This situation here is kind of goofy because we're operating on ICU's legacy locale identifiers. In particular, they represent boolean true as "yes", but when we modernize these legacy "yes" and "no" values they become "truefalse" and "falsefalse" (which would seem like a very weird bug, but then a missing value turns into just "false" so maybe it's somehow intended??). I didn't feel great about using startsWith as a solution to begin with, so I think this means we should just avoid the modernization step for boolean values and check `== "yes"` instead. (In reply to Ross Kirsling from comment #14) > (In reply to Saam Barati from comment #13) > > Comment on attachment 398303 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=398303&action=review > > > > >> Source/JavaScriptCore/runtime/IntlLocale.cpp:95 > > >> + auto bufferLength = uloc_canonicalize(m_localeID.data(), buffer.data(), buffer.size(), &status); > > > > > > Looks like the same issue. > > > > Also wrong here. I did a quick vetting of all uses of "needsToGrowXYZ" and > > they seem correct. > > Thanks. I guess this may imply that the naming is a little confusing? Since > the relevance of the warning is when we need the buffer itself to be > null-terminated so that we can use its data *without* making any sort of > string class. The confusion on my end is just not trusting my immediate reaction that a char* buffer needs to be null terminated. It doesn’t depending on how the result is used. I think Darin’s planned refactoring will make this a lot nicer by having this code all in one place (In reply to Ross Kirsling from comment #8) > (In reply to Alexey Shvayka from comment #7) > > Comment on attachment 398303 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=398303&action=review > > > > > Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:56 > > > +const ClassInfo IntlLocalePrototype::s_info = { "Object", &Base::s_info, &localePrototypeTable, nullptr, CREATE_METHOD_TABLE(IntlLocalePrototype) }; > > > > We might also set `className` to "Intl.Locale" and use > > JSC_TO_STRING_TAG_WITHOUT_TRANSITION in > > IntlLocalePrototype::finishCreation(). > > Oh, I didn't know about this approach. Perhaps I should do so for all of the > Intl prototypes? Oh oops, I now see this was done in bug 211020. Will follow suit here. (In reply to Darin Adler from comment #10) > > Source/JavaScriptCore/runtime/IntlLocale.h:76 > > + void setKeyword(ASCIILiteral, const String&); > > I think this should be called setKeywordValue. I think this should take a > StringView. I know that currently String has a function to produce UTF-8 > characters and StringView does not, but we should fix that. We can do this, but since characters8() lacks a null terminator, I think we still have to make a CString before passing to uloc_setKeywordValue. (In reply to Ross Kirsling from comment #18) > We can do this, but since characters8() lacks a null terminator, I think we > still have to make a CString before passing to uloc_setKeywordValue. Of course: characters8() also won’t produce UTF-8. I’m just saying that we can create a function like String::utf8 that works on a StringView. Created attachment 398585 [details]
Patch for landing
Committed r261215: <https://trac.webkit.org/changeset/261215> All reviewed patches have been landed. Closing bug and clearing flags on attachment 398585 [details]. |