[JSC] Canonicalize "true" unicode extension type value to ""
Created attachment 408119 [details] Patch
Created attachment 408120 [details] Patch
Comment on attachment 408120 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408120&action=review > Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:188 > + if (caseFirst.isEmpty()) > + return JSValue::encode(jsUndefined()); > + // If caseFirst is "true", then we should return "" since language tag canonicalization makes it "". > + if (caseFirst == "true"_s) > + return JSValue::encode(jsEmptyString(vm)); > + return JSValue::encode(jsString(vm, caseFirst)); This shouldn't be correct -- kf isn't a boolean extension; it should only ever be "upper"/"lower"/"false"/undefined (though Intl.Locale will evidently let it be an arbitrary string). This does highlight an interesting separate issue in Intl.Locale though -- if the locale string has a *non-boolean* Unicode extension with the value omitted, we're returning "yes" instead of "true". For example: new Intl.Locale('en-us-u-co').collation > yes > Source/JavaScriptCore/runtime/IntlObject.cpp:306 > + // In the following unicode extension keys, "yes" is alias of "true". So we should not include value. > + // See CLDR key name="kb" entry for example. > + switch (computeTwoCharacters16Code(key)) { > + case computeTwoCharacters16Code("kb"_s): > + case computeTwoCharacters16Code("kc"_s): > + case computeTwoCharacters16Code("kh"_s): > + case computeTwoCharacters16Code("kk"_s): > + case computeTwoCharacters16Code("kn"_s): I don't think we should be maintaining an allowlist here? Most of these are unknown to ECMA-402 and V8 evidently accepts "yes" for arbitrary extensions. Also, unless I'm missing something, "yes" should never appear in the output of uloc_toLanguageTag, right?
Comment on attachment 408120 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408120&action=review >> Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:188 >> + return JSValue::encode(jsString(vm, caseFirst)); > > This shouldn't be correct -- kf isn't a boolean extension; it should only ever be "upper"/"lower"/"false"/undefined (though Intl.Locale will evidently let it be an arbitrary string). > > This does highlight an interesting separate issue in Intl.Locale though -- if the locale string has a *non-boolean* Unicode extension with the value omitted, we're returning "yes" instead of "true". For example: > > new Intl.Locale('en-us-u-co').collation > > yes However, if "true" is explicitly specified (`en-u-kf-true`), then, I think we should return "" since > Any type or tfield value "true" is removed. rule can be applied regardless of whether this is a boolean extension. So, when "true" is specified, "" is better than returning "true". >> Source/JavaScriptCore/runtime/IntlObject.cpp:306 >> + case computeTwoCharacters16Code("kn"_s): > > I don't think we should be maintaining an allowlist here? Most of these are unknown to ECMA-402 and V8 evidently accepts "yes" for arbitrary extensions. > Also, unless I'm missing something, "yes" should never appear in the output of uloc_toLanguageTag, right? Right, "yes" will not appear from ICU except for the user explicitly set "yes". And currently it is tested in test262/test/intl402/Intl/getCanonicalLocales/unicode-ext-canonicalize-yes-to-true.js Currently, ICU does not handle this correctly, so we need to do that here. I think this list is small enough, so I'm fine to keep it here. And we can remove it once ICU supports this correctly.
Created attachment 408425 [details] Patch
(In reply to Yusuke Suzuki from comment #4) > Comment on attachment 408120 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=408120&action=review > > >> Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:188 > >> + return JSValue::encode(jsString(vm, caseFirst)); > > > > This shouldn't be correct -- kf isn't a boolean extension; it should only ever be "upper"/"lower"/"false"/undefined (though Intl.Locale will evidently let it be an arbitrary string). > > > > This does highlight an interesting separate issue in Intl.Locale though -- if the locale string has a *non-boolean* Unicode extension with the value omitted, we're returning "yes" instead of "true". For example: > > > > new Intl.Locale('en-us-u-co').collation > > > yes > > However, if "true" is explicitly specified (`en-u-kf-true`), then, I think > we should return "" since > > > Any type or tfield value "true" is removed. > > rule can be applied regardless of whether this is a boolean extension. So, > when "true" is specified, "" is better than returning "true". Hmm, but this is purely a statement about locale string canonicalization and not about how Intl.Locale's getters should behave. You may still be right (since canonicalization happens in step 12 of https://tc39.es/ecma402/#sec-Intl.Locale), but there's absolutely nothing unique about caseFirst, so we'd need to do it for the other non-boolean getters too. I guess this and the "yes gets returned" issue I mentioned should probably be handled together within IntlLocale.cpp. > >> Source/JavaScriptCore/runtime/IntlObject.cpp:306 > >> + case computeTwoCharacters16Code("kn"_s): > > > > I don't think we should be maintaining an allowlist here? Most of these are unknown to ECMA-402 and V8 evidently accepts "yes" for arbitrary extensions. > > Also, unless I'm missing something, "yes" should never appear in the output of uloc_toLanguageTag, right? > > Right, "yes" will not appear from ICU except for the user explicitly set > "yes". And currently it is tested in > test262/test/intl402/Intl/getCanonicalLocales/unicode-ext-canonicalize-yes- > to-true.js > Currently, ICU does not handle this correctly, so we need to do that here. I > think this list is small enough, so I'm fine to keep it here. And we can > remove it once ICU supports this correctly. That test makes sense, but my reason for confusion is because this is the existing behavior without your new canonicalization: new Intl.Locale('en-us-u-kb-yes') > en-US-u-kb-true new Intl.Locale('en-us-u-co-yes') > en-US-u-co-yes ...so it seems like we only see "yes" returned from uloc_toLanguageTag if the user provided "yes" *and* the extension is non-boolean, which would mean that we'd never need to check for "yes" in particular.
Comment on attachment 408120 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408120&action=review >>>> Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:188 >>>> + return JSValue::encode(jsString(vm, caseFirst)); >>> >>> This shouldn't be correct -- kf isn't a boolean extension; it should only ever be "upper"/"lower"/"false"/undefined (though Intl.Locale will evidently let it be an arbitrary string). >>> >>> This does highlight an interesting separate issue in Intl.Locale though -- if the locale string has a *non-boolean* Unicode extension with the value omitted, we're returning "yes" instead of "true". For example: >>> >>> new Intl.Locale('en-us-u-co').collation >>> > yes >> >> However, if "true" is explicitly specified (`en-u-kf-true`), then, I think we should return "" since > > Hmm, but this is purely a statement about locale string canonicalization and not about how Intl.Locale's getters should behave. You may still be right (since canonicalization happens in step 12 of https://tc39.es/ecma402/#sec-Intl.Locale), but there's absolutely nothing unique about caseFirst, so we'd need to do it for the other non-boolean getters too. I guess this and the "yes gets returned" issue I mentioned should probably be handled together within IntlLocale.cpp. OK, so we should move this logic to IntlLocale.cpp's getters. Changed. >>>> Source/JavaScriptCore/runtime/IntlObject.cpp:306 >>>> + case computeTwoCharacters16Code("kn"_s): >>> >>> I don't think we should be maintaining an allowlist here? Most of these are unknown to ECMA-402 and V8 evidently accepts "yes" for arbitrary extensions. >>> Also, unless I'm missing something, "yes" should never appear in the output of uloc_toLanguageTag, right? >> >> Right, "yes" will not appear from ICU except for the user explicitly set "yes". And currently it is tested in test262/test/intl402/Intl/getCanonicalLocales/unicode-ext-canonicalize-yes-to-true.js >> Currently, ICU does not handle this correctly, so we need to do that here. I think this list is small enough, so I'm fine to keep it here. And we can remove it once ICU supports this correctly. > > That test makes sense, but my reason for confusion is because this is the existing behavior without your new canonicalization: > > new Intl.Locale('en-us-u-kb-yes') > > en-US-u-kb-true > new Intl.Locale('en-us-u-co-yes') > > en-US-u-co-yes > > ...so it seems like we only see "yes" returned from uloc_toLanguageTag if the user provided "yes" *and* the extension is non-boolean, which would mean that we'd never need to check for "yes" in particular. Ah! Right, so this is not necessary. Removed.
Created attachment 408582 [details] Patch
Created attachment 408583 [details] Patch
Comment on attachment 408582 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408582&action=review r=me > Source/JavaScriptCore/runtime/IntlObject.cpp:193 > +static Vector<StringView> unicodeExtensionSubTags(StringView extension) nit: Since we're moving this anyway, could we rename it unicodeExtensionSubtags with a lowercase t? ...or rather, maybe unicodeExtensionComponents, since that's what it's been renamed to: https://tc39.es/ecma402/#sec-unicode-extension-components > Source/JavaScriptCore/runtime/IntlObject.cpp:293 > + if (subtag.length() != 2) { > + ++index; > + continue; > + } Is this just for the leading attribute case?
Comment on attachment 408582 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=408582&action=review >> Source/JavaScriptCore/runtime/IntlObject.cpp:193 >> +static Vector<StringView> unicodeExtensionSubTags(StringView extension) > > nit: Since we're moving this anyway, could we rename it unicodeExtensionSubtags with a lowercase t? > > ...or rather, maybe unicodeExtensionComponents, since that's what it's been renamed to: > https://tc39.es/ecma402/#sec-unicode-extension-components unicodeExtensionComponents sounds good. Changed. >> Source/JavaScriptCore/runtime/IntlObject.cpp:293 >> + } > > Is this just for the leading attribute case? Yes.
Committed r266973: <https://trac.webkit.org/changeset/266973>
<rdar://problem/68751852>