WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
216224
[JSC] Canonicalize "true" unicode extension type value to ""
https://bugs.webkit.org/show_bug.cgi?id=216224
Summary
[JSC] Canonicalize "true" unicode extension type value to ""
Yusuke Suzuki
Reported
2020-09-06 01:58:40 PDT
[JSC] Canonicalize "true" unicode extension type value to ""
Attachments
Patch
(20.82 KB, patch)
2020-09-06 02:00 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(20.93 KB, patch)
2020-09-06 02:04 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(33.01 KB, patch)
2020-09-10 02:22 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(38.41 KB, patch)
2020-09-11 20:18 PDT
,
Yusuke Suzuki
ross.kirsling
: review+
Details
Formatted Diff
Diff
Patch
(38.51 KB, patch)
2020-09-11 20:24 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Yusuke Suzuki
Comment 1
2020-09-06 02:00:43 PDT
Created
attachment 408119
[details]
Patch
Yusuke Suzuki
Comment 2
2020-09-06 02:04:31 PDT
Created
attachment 408120
[details]
Patch
Ross Kirsling
Comment 3
2020-09-09 16:04:25 PDT
Comment on
attachment 408120
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=408120&action=review
> Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:188 > + if (caseFirst.isEmpty()) > + return JSValue::encode(jsUndefined()); > + // If caseFirst is "true", then we should return "" since language tag canonicalization makes it "". > + if (caseFirst == "true"_s) > + return JSValue::encode(jsEmptyString(vm)); > + return JSValue::encode(jsString(vm, caseFirst));
This shouldn't be correct -- kf isn't a boolean extension; it should only ever be "upper"/"lower"/"false"/undefined (though Intl.Locale will evidently let it be an arbitrary string). This does highlight an interesting separate issue in Intl.Locale though -- if the locale string has a *non-boolean* Unicode extension with the value omitted, we're returning "yes" instead of "true". For example: new Intl.Locale('en-us-u-co').collation > yes
> Source/JavaScriptCore/runtime/IntlObject.cpp:306 > + // In the following unicode extension keys, "yes" is alias of "true". So we should not include value. > + // See CLDR key name="kb" entry for example. > + switch (computeTwoCharacters16Code(key)) { > + case computeTwoCharacters16Code("kb"_s): > + case computeTwoCharacters16Code("kc"_s): > + case computeTwoCharacters16Code("kh"_s): > + case computeTwoCharacters16Code("kk"_s): > + case computeTwoCharacters16Code("kn"_s):
I don't think we should be maintaining an allowlist here? Most of these are unknown to ECMA-402 and V8 evidently accepts "yes" for arbitrary extensions. Also, unless I'm missing something, "yes" should never appear in the output of uloc_toLanguageTag, right?
Yusuke Suzuki
Comment 4
2020-09-10 02:15:47 PDT
Comment on
attachment 408120
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=408120&action=review
>> Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:188 >> + return JSValue::encode(jsString(vm, caseFirst)); > > This shouldn't be correct -- kf isn't a boolean extension; it should only ever be "upper"/"lower"/"false"/undefined (though Intl.Locale will evidently let it be an arbitrary string). > > This does highlight an interesting separate issue in Intl.Locale though -- if the locale string has a *non-boolean* Unicode extension with the value omitted, we're returning "yes" instead of "true". For example: > > new Intl.Locale('en-us-u-co').collation > > yes
However, if "true" is explicitly specified (`en-u-kf-true`), then, I think we should return "" since
> Any type or tfield value "true" is removed.
rule can be applied regardless of whether this is a boolean extension. So, when "true" is specified, "" is better than returning "true".
>> Source/JavaScriptCore/runtime/IntlObject.cpp:306 >> + case computeTwoCharacters16Code("kn"_s): > > I don't think we should be maintaining an allowlist here? Most of these are unknown to ECMA-402 and V8 evidently accepts "yes" for arbitrary extensions. > Also, unless I'm missing something, "yes" should never appear in the output of uloc_toLanguageTag, right?
Right, "yes" will not appear from ICU except for the user explicitly set "yes". And currently it is tested in test262/test/intl402/Intl/getCanonicalLocales/unicode-ext-canonicalize-yes-to-true.js Currently, ICU does not handle this correctly, so we need to do that here. I think this list is small enough, so I'm fine to keep it here. And we can remove it once ICU supports this correctly.
Yusuke Suzuki
Comment 5
2020-09-10 02:22:12 PDT
Created
attachment 408425
[details]
Patch
Ross Kirsling
Comment 6
2020-09-10 12:47:26 PDT
(In reply to Yusuke Suzuki from
comment #4
)
> Comment on
attachment 408120
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=408120&action=review
> > >> Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:188 > >> + return JSValue::encode(jsString(vm, caseFirst)); > > > > This shouldn't be correct -- kf isn't a boolean extension; it should only ever be "upper"/"lower"/"false"/undefined (though Intl.Locale will evidently let it be an arbitrary string). > > > > This does highlight an interesting separate issue in Intl.Locale though -- if the locale string has a *non-boolean* Unicode extension with the value omitted, we're returning "yes" instead of "true". For example: > > > > new Intl.Locale('en-us-u-co').collation > > > yes > > However, if "true" is explicitly specified (`en-u-kf-true`), then, I think > we should return "" since > > > Any type or tfield value "true" is removed. > > rule can be applied regardless of whether this is a boolean extension. So, > when "true" is specified, "" is better than returning "true".
Hmm, but this is purely a statement about locale string canonicalization and not about how Intl.Locale's getters should behave. You may still be right (since canonicalization happens in step 12 of
https://tc39.es/ecma402/#sec-Intl.Locale
), but there's absolutely nothing unique about caseFirst, so we'd need to do it for the other non-boolean getters too. I guess this and the "yes gets returned" issue I mentioned should probably be handled together within IntlLocale.cpp.
> >> Source/JavaScriptCore/runtime/IntlObject.cpp:306 > >> + case computeTwoCharacters16Code("kn"_s): > > > > I don't think we should be maintaining an allowlist here? Most of these are unknown to ECMA-402 and V8 evidently accepts "yes" for arbitrary extensions. > > Also, unless I'm missing something, "yes" should never appear in the output of uloc_toLanguageTag, right? > > Right, "yes" will not appear from ICU except for the user explicitly set > "yes". And currently it is tested in > test262/test/intl402/Intl/getCanonicalLocales/unicode-ext-canonicalize-yes- > to-true.js > Currently, ICU does not handle this correctly, so we need to do that here. I > think this list is small enough, so I'm fine to keep it here. And we can > remove it once ICU supports this correctly.
That test makes sense, but my reason for confusion is because this is the existing behavior without your new canonicalization: new Intl.Locale('en-us-u-kb-yes') > en-US-u-kb-true new Intl.Locale('en-us-u-co-yes') > en-US-u-co-yes ...so it seems like we only see "yes" returned from uloc_toLanguageTag if the user provided "yes" *and* the extension is non-boolean, which would mean that we'd never need to check for "yes" in particular.
Yusuke Suzuki
Comment 7
2020-09-11 19:49:59 PDT
Comment on
attachment 408120
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=408120&action=review
>>>> Source/JavaScriptCore/runtime/IntlLocalePrototype.cpp:188 >>>> + return JSValue::encode(jsString(vm, caseFirst)); >>> >>> This shouldn't be correct -- kf isn't a boolean extension; it should only ever be "upper"/"lower"/"false"/undefined (though Intl.Locale will evidently let it be an arbitrary string). >>> >>> This does highlight an interesting separate issue in Intl.Locale though -- if the locale string has a *non-boolean* Unicode extension with the value omitted, we're returning "yes" instead of "true". For example: >>> >>> new Intl.Locale('en-us-u-co').collation >>> > yes >> >> However, if "true" is explicitly specified (`en-u-kf-true`), then, I think we should return "" since > > Hmm, but this is purely a statement about locale string canonicalization and not about how Intl.Locale's getters should behave. You may still be right (since canonicalization happens in step 12 of
https://tc39.es/ecma402/#sec-Intl.Locale
), but there's absolutely nothing unique about caseFirst, so we'd need to do it for the other non-boolean getters too. I guess this and the "yes gets returned" issue I mentioned should probably be handled together within IntlLocale.cpp.
OK, so we should move this logic to IntlLocale.cpp's getters. Changed.
>>>> Source/JavaScriptCore/runtime/IntlObject.cpp:306 >>>> + case computeTwoCharacters16Code("kn"_s): >>> >>> I don't think we should be maintaining an allowlist here? Most of these are unknown to ECMA-402 and V8 evidently accepts "yes" for arbitrary extensions. >>> Also, unless I'm missing something, "yes" should never appear in the output of uloc_toLanguageTag, right? >> >> Right, "yes" will not appear from ICU except for the user explicitly set "yes". And currently it is tested in test262/test/intl402/Intl/getCanonicalLocales/unicode-ext-canonicalize-yes-to-true.js >> Currently, ICU does not handle this correctly, so we need to do that here. I think this list is small enough, so I'm fine to keep it here. And we can remove it once ICU supports this correctly. > > That test makes sense, but my reason for confusion is because this is the existing behavior without your new canonicalization: > > new Intl.Locale('en-us-u-kb-yes') > > en-US-u-kb-true > new Intl.Locale('en-us-u-co-yes') > > en-US-u-co-yes > > ...so it seems like we only see "yes" returned from uloc_toLanguageTag if the user provided "yes" *and* the extension is non-boolean, which would mean that we'd never need to check for "yes" in particular.
Ah! Right, so this is not necessary. Removed.
Yusuke Suzuki
Comment 8
2020-09-11 20:18:53 PDT
Created
attachment 408582
[details]
Patch
Yusuke Suzuki
Comment 9
2020-09-11 20:24:17 PDT
Created
attachment 408583
[details]
Patch
Ross Kirsling
Comment 10
2020-09-11 21:03:26 PDT
Comment on
attachment 408582
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=408582&action=review
r=me
> Source/JavaScriptCore/runtime/IntlObject.cpp:193 > +static Vector<StringView> unicodeExtensionSubTags(StringView extension)
nit: Since we're moving this anyway, could we rename it unicodeExtensionSubtags with a lowercase t? ...or rather, maybe unicodeExtensionComponents, since that's what it's been renamed to:
https://tc39.es/ecma402/#sec-unicode-extension-components
> Source/JavaScriptCore/runtime/IntlObject.cpp:293 > + if (subtag.length() != 2) { > + ++index; > + continue; > + }
Is this just for the leading attribute case?
Yusuke Suzuki
Comment 11
2020-09-11 21:15:13 PDT
Comment on
attachment 408582
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=408582&action=review
>> Source/JavaScriptCore/runtime/IntlObject.cpp:193 >> +static Vector<StringView> unicodeExtensionSubTags(StringView extension) > > nit: Since we're moving this anyway, could we rename it unicodeExtensionSubtags with a lowercase t? > > ...or rather, maybe unicodeExtensionComponents, since that's what it's been renamed to: >
https://tc39.es/ecma402/#sec-unicode-extension-components
unicodeExtensionComponents sounds good. Changed.
>> Source/JavaScriptCore/runtime/IntlObject.cpp:293 >> + } > > Is this just for the leading attribute case?
Yes.
Yusuke Suzuki
Comment 12
2020-09-11 22:57:05 PDT
Committed
r266973
: <
https://trac.webkit.org/changeset/266973
>
Radar WebKit Bug Importer
Comment 13
2020-09-11 22:58:18 PDT
<
rdar://problem/68751852
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug