WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
215058
[JSC] Implement Intl Language Tag Parser
https://bugs.webkit.org/show_bug.cgi?id=215058
Summary
[JSC] Implement Intl Language Tag Parser
Yusuke Suzuki
Reported
2020-08-01 02:15:53 PDT
Add precise language tag parser implementing BCP47.
Attachments
Patch
(20.63 KB, patch)
2020-08-01 04:08 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(47.25 KB, patch)
2020-08-01 05:49 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Patch
(59.95 KB, patch)
2020-08-02 18:33 PDT
,
Yusuke Suzuki
ross.kirsling
: review+
Details
Formatted Diff
Diff
Patch
(71.04 KB, patch)
2020-08-22 15:49 PDT
,
Yusuke Suzuki
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Yusuke Suzuki
Comment 1
2020-08-01 04:08:53 PDT
Created
attachment 405783
[details]
Patch WIP
Yusuke Suzuki
Comment 2
2020-08-01 05:49:24 PDT
Created
attachment 405784
[details]
Patch WIP
Yusuke Suzuki
Comment 3
2020-08-02 18:33:53 PDT
Created
attachment 405815
[details]
Patch
Ross Kirsling
Comment 4
2020-08-04 16:22:33 PDT
Comment on
attachment 405815
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=405815&action=review
Looks good overall. It sucks that ICU canonicalization isn't sufficient, but I think you're right that this is unavoidable if we want cross-engine consistency, so thanks for doing what I was too stubborn to do. ;) I think Darin will likely have comments regarding the string manipulation parts.
> Source/JavaScriptCore/runtime/IntlObject.cpp:851 > + struct Code { > + LChar characters[8] { }; > + };
Is it necessary that this be wrapped in a struct?
> Source/JavaScriptCore/runtime/IntlObject.cpp:875 > +static bool isUnicodeExtensionAttribute(StringView string)
We could probably merge this with isUnicodeExtensionTypeComponent...
> Source/JavaScriptCore/runtime/IntlObject.cpp:888 > +static bool isUnicodeExtensionTypeComponent(StringView string)
...but note that isUnicodeLocaleIdentifierType also exists, so this already feels a bit duplicative?
> Source/JavaScriptCore/runtime/IntlObject.cpp:917 > +static bool isUnicodeTValue(StringView string)
This appears to be just a TValue "component"? (And also is matching the two functions above...perhaps we should have an alphanum{i,j} helper function?)
> Source/JavaScriptCore/runtime/IntlObject.cpp:1078 > + bool isTLangOrTField = false;
Seems like this should be renamed to something like `found`...
> Source/JavaScriptCore/runtime/IntlObject.cpp:1109 > + if (!isTLangOrTField) > return false; > - ++cursor; > + return true;
...and then you can just return it directly.
> Source/JavaScriptCore/runtime/IntlObject.cpp:1125 > + if (!isUnicodeOtherExtensionValue(m_current)) > + return true; > + if (!next()) > + return true;
Seems like in a case like this you could conjoin the conditions, but I understand if you prefer to keep parallelism with the rest of the logic.
> Source/JavaScriptCore/runtime/IntlObject.cpp:1231 > + if (!parser.parseUnicodeLocaleId()) > + return false; > + if (!parser.isEOS()) > + return false; > + return true;
(There's a part of me that really wants to write this as a single return statement, but I do see what you're aiming to convey with this code style. ;D)
Darin Adler
Comment 5
2020-08-04 16:57:45 PDT
Comment on
attachment 405815
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=405815&action=review
I wonder about the idioms here compared with the StringParsingBuffer idioms that Sam has been using. Have you looked at that? Parsers need extensive test suites, with cases that tickle every single branch. Do we have enough?
> Source/JavaScriptCore/runtime/IntlObject.cpp:871 > + ASSERT(isASCIIAlphanumeric(singleton)); > + singleton = toASCIILower(singleton); > + // 0 - 9 => numeric > + // 10 - 35 => alpha > + if (isASCIIDigit(singleton)) > + return singleton - '0'; > + return (singleton - 'a') + 10;
This could be a bit more like toASCIIHexDigit from ASCIICType.h.
> Source/JavaScriptCore/runtime/IntlObject.cpp:880 > + if (length >= 3 && length <= 8) > + return string.isAllSpecialCharacters<isASCIIAlphanumeric>(); > + return false;
Should do it all with &&: return length >= 3 && length<= 8 && string.isAllSpecialCharacters<isASCIIAlphanumeric>();
Darin Adler
Comment 6
2020-08-04 16:58:10 PDT
Generally looks fine. I have no objections.
Darin Adler
Comment 7
2020-08-04 16:58:32 PDT
Comment on
attachment 405815
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=405815&action=review
>> Source/JavaScriptCore/runtime/IntlObject.cpp:1231 >> + return true; > > (There's a part of me that really wants to write this as a single return statement, but I do see what you're aiming to convey with this code style. ;D)
I feel the same way.
Radar WebKit Bug Importer
Comment 8
2020-08-08 02:16:20 PDT
<
rdar://problem/66720470
>
Yusuke Suzuki
Comment 9
2020-08-22 14:46:51 PDT
Comment on
attachment 405815
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=405815&action=review
>> Source/JavaScriptCore/runtime/IntlObject.cpp:851 >> + }; > > Is it necessary that this be wrapped in a struct?
This makes `bitwise_cast<VariantCode>()` easy by suppressing array to pointer decay.
>> Source/JavaScriptCore/runtime/IntlObject.cpp:871 >> + return (singleton - 'a') + 10; > > This could be a bit more like toASCIIHexDigit from ASCIICType.h.
Yeah, the difference is it is using G-Z too (so, it is not hex).
>> Source/JavaScriptCore/runtime/IntlObject.cpp:875 >> +static bool isUnicodeExtensionAttribute(StringView string) > > We could probably merge this with isUnicodeExtensionTypeComponent...
I think we should have different ones because they are different part while the patterns are the same by chance.
>> Source/JavaScriptCore/runtime/IntlObject.cpp:880 >> + return false; > > Should do it all with &&: > > return length >= 3 && length<= 8 && string.isAllSpecialCharacters<isASCIIAlphanumeric>();
Changed
>> Source/JavaScriptCore/runtime/IntlObject.cpp:888 >> +static bool isUnicodeExtensionTypeComponent(StringView string) > > ...but note that isUnicodeLocaleIdentifierType also exists, so this already feels a bit duplicative?
Ditto.
>> Source/JavaScriptCore/runtime/IntlObject.cpp:917 >> +static bool isUnicodeTValue(StringView string) > > This appears to be just a TValue "component"? > > (And also is matching the two functions above...perhaps we should have an alphanum{i,j} helper function?)
Right, changed.
>> Source/JavaScriptCore/runtime/IntlObject.cpp:1109 >> + return true; > > ...and then you can just return it directly.
Nice, fixed.
>> Source/JavaScriptCore/runtime/IntlObject.cpp:1125 >> + return true; > > Seems like in a case like this you could conjoin the conditions, but I understand if you prefer to keep parallelism with the rest of the logic.
Yeah, these codes are aligned to the style used in parse function in this parser.
>>> Source/JavaScriptCore/runtime/IntlObject.cpp:1231 >>> + return true; >> >> (There's a part of me that really wants to write this as a single return statement, but I do see what you're aiming to convey with this code style. ;D) > > I feel the same way.
Because the other functions are using `break` / `continue` etc., I'm aligning these parse & isEOS() patterns with the other places not to introduce a bug. I think this makes code consistent and avoid introducing a bug.
Yusuke Suzuki
Comment 10
2020-08-22 15:49:47 PDT
Created
attachment 407058
[details]
Patch Updating
Yusuke Suzuki
Comment 11
2020-08-22 17:25:11 PDT
> I wonder about the idioms here compared with the StringParsingBuffer idioms that Sam has been using. Have you looked at that?
Currently, this parser relies on split feature, so we cannot use StringParsingBuffer. We can revise whether we can use it later.
> Parsers need extensive test suites, with cases that tickle every single branch. Do we have enough?
I added more tests and ensured code coverage of the parser is 100% at least.
Yusuke Suzuki
Comment 12
2020-08-22 17:26:47 PDT
Committed
r266039
: <
https://trac.webkit.org/changeset/266039
>
WebKit Commit Bot
Comment 13
2020-08-22 23:01:33 PDT
Re-opened since this is blocked by
bug 215756
Yusuke Suzuki
Comment 14
2020-08-22 23:06:58 PDT
Committed
r266043
: <
https://trac.webkit.org/changeset/266043
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug