Bug 215058

Summary: [JSC] Implement Intl Language Tag Parser
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: JavaScriptCoreAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, darin, ews-watchlist, keith_miller, mark.lam, msaboff, ross.kirsling, saam, tzagallo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 215756    
Bug Blocks: 213425    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
ross.kirsling: review+
Patch none

Yusuke Suzuki
Reported 2020-08-01 02:15:53 PDT
Add precise language tag parser implementing BCP47.
Attachments
Patch (20.63 KB, patch)
2020-08-01 04:08 PDT, Yusuke Suzuki
no flags
Patch (47.25 KB, patch)
2020-08-01 05:49 PDT, Yusuke Suzuki
no flags
Patch (59.95 KB, patch)
2020-08-02 18:33 PDT, Yusuke Suzuki
ross.kirsling: review+
Patch (71.04 KB, patch)
2020-08-22 15:49 PDT, Yusuke Suzuki
no flags
Yusuke Suzuki
Comment 1 2020-08-01 04:08:53 PDT
Created attachment 405783 [details] Patch WIP
Yusuke Suzuki
Comment 2 2020-08-01 05:49:24 PDT
Created attachment 405784 [details] Patch WIP
Yusuke Suzuki
Comment 3 2020-08-02 18:33:53 PDT
Ross Kirsling
Comment 4 2020-08-04 16:22:33 PDT
Comment on attachment 405815 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=405815&action=review Looks good overall. It sucks that ICU canonicalization isn't sufficient, but I think you're right that this is unavoidable if we want cross-engine consistency, so thanks for doing what I was too stubborn to do. ;) I think Darin will likely have comments regarding the string manipulation parts. > Source/JavaScriptCore/runtime/IntlObject.cpp:851 > + struct Code { > + LChar characters[8] { }; > + }; Is it necessary that this be wrapped in a struct? > Source/JavaScriptCore/runtime/IntlObject.cpp:875 > +static bool isUnicodeExtensionAttribute(StringView string) We could probably merge this with isUnicodeExtensionTypeComponent... > Source/JavaScriptCore/runtime/IntlObject.cpp:888 > +static bool isUnicodeExtensionTypeComponent(StringView string) ...but note that isUnicodeLocaleIdentifierType also exists, so this already feels a bit duplicative? > Source/JavaScriptCore/runtime/IntlObject.cpp:917 > +static bool isUnicodeTValue(StringView string) This appears to be just a TValue "component"? (And also is matching the two functions above...perhaps we should have an alphanum{i,j} helper function?) > Source/JavaScriptCore/runtime/IntlObject.cpp:1078 > + bool isTLangOrTField = false; Seems like this should be renamed to something like `found`... > Source/JavaScriptCore/runtime/IntlObject.cpp:1109 > + if (!isTLangOrTField) > return false; > - ++cursor; > + return true; ...and then you can just return it directly. > Source/JavaScriptCore/runtime/IntlObject.cpp:1125 > + if (!isUnicodeOtherExtensionValue(m_current)) > + return true; > + if (!next()) > + return true; Seems like in a case like this you could conjoin the conditions, but I understand if you prefer to keep parallelism with the rest of the logic. > Source/JavaScriptCore/runtime/IntlObject.cpp:1231 > + if (!parser.parseUnicodeLocaleId()) > + return false; > + if (!parser.isEOS()) > + return false; > + return true; (There's a part of me that really wants to write this as a single return statement, but I do see what you're aiming to convey with this code style. ;D)
Darin Adler
Comment 5 2020-08-04 16:57:45 PDT
Comment on attachment 405815 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=405815&action=review I wonder about the idioms here compared with the StringParsingBuffer idioms that Sam has been using. Have you looked at that? Parsers need extensive test suites, with cases that tickle every single branch. Do we have enough? > Source/JavaScriptCore/runtime/IntlObject.cpp:871 > + ASSERT(isASCIIAlphanumeric(singleton)); > + singleton = toASCIILower(singleton); > + // 0 - 9 => numeric > + // 10 - 35 => alpha > + if (isASCIIDigit(singleton)) > + return singleton - '0'; > + return (singleton - 'a') + 10; This could be a bit more like toASCIIHexDigit from ASCIICType.h. > Source/JavaScriptCore/runtime/IntlObject.cpp:880 > + if (length >= 3 && length <= 8) > + return string.isAllSpecialCharacters<isASCIIAlphanumeric>(); > + return false; Should do it all with &&: return length >= 3 && length<= 8 && string.isAllSpecialCharacters<isASCIIAlphanumeric>();
Darin Adler
Comment 6 2020-08-04 16:58:10 PDT
Generally looks fine. I have no objections.
Darin Adler
Comment 7 2020-08-04 16:58:32 PDT
Comment on attachment 405815 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=405815&action=review >> Source/JavaScriptCore/runtime/IntlObject.cpp:1231 >> + return true; > > (There's a part of me that really wants to write this as a single return statement, but I do see what you're aiming to convey with this code style. ;D) I feel the same way.
Radar WebKit Bug Importer
Comment 8 2020-08-08 02:16:20 PDT
Yusuke Suzuki
Comment 9 2020-08-22 14:46:51 PDT
Comment on attachment 405815 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=405815&action=review >> Source/JavaScriptCore/runtime/IntlObject.cpp:851 >> + }; > > Is it necessary that this be wrapped in a struct? This makes `bitwise_cast<VariantCode>()` easy by suppressing array to pointer decay. >> Source/JavaScriptCore/runtime/IntlObject.cpp:871 >> + return (singleton - 'a') + 10; > > This could be a bit more like toASCIIHexDigit from ASCIICType.h. Yeah, the difference is it is using G-Z too (so, it is not hex). >> Source/JavaScriptCore/runtime/IntlObject.cpp:875 >> +static bool isUnicodeExtensionAttribute(StringView string) > > We could probably merge this with isUnicodeExtensionTypeComponent... I think we should have different ones because they are different part while the patterns are the same by chance. >> Source/JavaScriptCore/runtime/IntlObject.cpp:880 >> + return false; > > Should do it all with &&: > > return length >= 3 && length<= 8 && string.isAllSpecialCharacters<isASCIIAlphanumeric>(); Changed >> Source/JavaScriptCore/runtime/IntlObject.cpp:888 >> +static bool isUnicodeExtensionTypeComponent(StringView string) > > ...but note that isUnicodeLocaleIdentifierType also exists, so this already feels a bit duplicative? Ditto. >> Source/JavaScriptCore/runtime/IntlObject.cpp:917 >> +static bool isUnicodeTValue(StringView string) > > This appears to be just a TValue "component"? > > (And also is matching the two functions above...perhaps we should have an alphanum{i,j} helper function?) Right, changed. >> Source/JavaScriptCore/runtime/IntlObject.cpp:1109 >> + return true; > > ...and then you can just return it directly. Nice, fixed. >> Source/JavaScriptCore/runtime/IntlObject.cpp:1125 >> + return true; > > Seems like in a case like this you could conjoin the conditions, but I understand if you prefer to keep parallelism with the rest of the logic. Yeah, these codes are aligned to the style used in parse function in this parser. >>> Source/JavaScriptCore/runtime/IntlObject.cpp:1231 >>> + return true; >> >> (There's a part of me that really wants to write this as a single return statement, but I do see what you're aiming to convey with this code style. ;D) > > I feel the same way. Because the other functions are using `break` / `continue` etc., I'm aligning these parse & isEOS() patterns with the other places not to introduce a bug. I think this makes code consistent and avoid introducing a bug.
Yusuke Suzuki
Comment 10 2020-08-22 15:49:47 PDT
Created attachment 407058 [details] Patch Updating
Yusuke Suzuki
Comment 11 2020-08-22 17:25:11 PDT
> I wonder about the idioms here compared with the StringParsingBuffer idioms that Sam has been using. Have you looked at that? Currently, this parser relies on split feature, so we cannot use StringParsingBuffer. We can revise whether we can use it later. > Parsers need extensive test suites, with cases that tickle every single branch. Do we have enough? I added more tests and ensured code coverage of the parser is 100% at least.
Yusuke Suzuki
Comment 12 2020-08-22 17:26:47 PDT
WebKit Commit Bot
Comment 13 2020-08-22 23:01:33 PDT
Re-opened since this is blocked by bug 215756
Yusuke Suzuki
Comment 14 2020-08-22 23:06:58 PDT
Note You need to log in before you can comment on or make changes to this bug.