Summary: | JSC: Intl API should ignore encoding when parsing BCP 47 language tag from ISO 15897 locale string (passed via LANG) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | mod-wkbz | ||||||||||
Component: | JavaScriptCore | Assignee: | Andy VanWagoner <andy> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | andy, bburg, cgarcia, commit-queue, dirk.fizzlebeef, ews-watchlist, inspector-bugzilla-changes, joepeck, keith_miller, laurb3st, mark.lam, mcatanzaro, Meanjeanaco, msaboff, saam, sergey, webkit-bug-importer, ysuzuki | ||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||
Version: | Other | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Attachments: |
|
Description
mod-wkbz
2017-02-08 08:16:07 PST
1. WebKitGTK should not be showing the uncaught exception screen for production / distribution builds. 2. I don't know whether DateTimeFormat is spec'd to support locale.encoding. 3. We can always add a try-catch around the Inspector code that constructs this object, though this seems like a weird workaround for an error case that should be handled by JSC. This should probably happen by improving IntlObject::convertICULocaleToBCP47LanguageTag(). es_ES.utf8 is also my locale and the inspector works for me. https://bugs.webkit.org/show_bug.cgi?id=182467 is likely a duplicate of this. https://bugs.webkit.org/show_bug.cgi?id=171207 is another likely dupe. (In reply to Brian Burg from comment #1) > 1. WebKitGTK should not be showing the uncaught exception screen for > production / distribution builds. We use ENABLE(DEVELOPER_MODE) guards to disable functionality in production builds. Does one need to be added somewhere in the web inspector code? *** Bug 182467 has been marked as a duplicate of this bug. *** *** Bug 171207 has been marked as a duplicate of this bug. *** Two interesting comments from bug #171207: (In reply to Sergey Mende from comment #3) > The problem lies in the discrepancy between LANG environment variable format > and https://www.rfc-editor.org/rfc/bcp/bcp47.txt spec. > > The Web Inspector does not cause an internal error if LANG is set to a > simple two characters language code, i.e.: > > env LANG="en" midori > > works just fine. > > As soon as LANG is set to a full spec, e.g. "en_US.UTF-8", the Web Inspector > fails to open with the following message in its pane: > > TypeError: failed to initialize NumberFormat due to invalid locale (at > Utilities.js:801:57) > NumberFormat @ [native code] > toLocaleString @ [native code] > d @ Utilities.js:801:57 > value @ Utilities.js:881:64 > value @ Utilities.js:899:29 > _loadNewRecording @ TimelineManager.js:655:108 > reset @ TimelineManager.js:114:31 > TimelineManager @ TimelineManager.js:62:19 > loaded @ Main.js:130:50 > global code @ Main.html:808:18 > > > I am using stock FC26's package (webkitgtk4-2.18.3-1.fc26.x86_64) And: (In reply to Michael Catanzaro from comment #6) > I did some grepping and found > Source/JavaScriptCore/runtime/IntlNumberFormat.cpp. The failure is occurring > in IntlNumberFormat::initializeNumberFormat. I would look closely at > resolveLocale in IntlObject.cpp. If you're able to rebuild WebKit, adding > printfs is by far the easiest way to debug a problem like this. Good luck! What's strange to me is that I've never seen the reported bug, ever, using the en_US.UTF-8 locale. This error only appears to be possible when no available locales are passed into an Intl constructor like Intl.DateTimeFormat. The default language currently is assumed to be valid, but may end up being empty. There are multiple paths in how the default language is figured out, we just need JSC to do a better job ensuring the best valid non-empty default locale is used. Created attachment 345787 [details]
Patch
Comment on attachment 345787 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=345787&action=review > Source/JavaScriptCore/runtime/IntlObject.cpp:138 > + auto length = uloc_toLanguageTag(localeID, buffer.data(), buffer.size(), false, &status); > + if (status == U_BUFFER_OVERFLOW_ERROR) { Gotta call out the ICU developers for this terrible API design, making you call the function twice rather than just returning the proper amount of allocated memory.... > Source/JavaScriptCore/runtime/IntlObject.cpp:144 > + return String(buffer.data(), length); I believe the langtag buffer is filled with ASCII, not UTF-16, guessing from the API reference [1] and [2], which says: "In some APIs, ICU uses char * strings. This is either for file system paths or for strings that contain invariant characters only (such as locale identifiers). These strings are in the platform-specific encoding of either ASCII or EBCDIC." It's only appropriate to pass the data directly to the String constructor if it's already UTF-16. So assuming we don't care about EBCDIC, and considering that all ASCII is valid UTF-*, we should use String::fromUTF8() to create the return value, rather than directly using the String constructor. [1] https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#a1d50c91925ca3853fce6f28cf7390c3c [2] http://userguide.icu-project.org/strings > Source/JavaScriptCore/runtime/IntlObject.cpp:621 > + return "en"_s; Huh, I didn't know we had this user-defined literal. How about that. > LayoutTests/js/intl-default-locale.html:15 > +shouldBe("new Intl.DateTimeFormat().resolvedOptions().locale", "'tlh'"); > +shouldBe("new Intl.NumberFormat().resolvedOptions().locale", "'tlh'"); > +shouldBe("new Intl.Collator().resolvedOptions().locale", "'tlh'"); Uh, perhaps I don't understand something here, but why is 'tlh' a good default locale? Comment on attachment 345787 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=345787&action=review >> Source/JavaScriptCore/runtime/IntlObject.cpp:144 >> + return String(buffer.data(), length); > > I believe the langtag buffer is filled with ASCII, not UTF-16, guessing from the API reference [1] and [2], which says: "In some APIs, ICU uses char * strings. This is either for file system paths or for strings that contain invariant characters only (such as locale identifiers). These strings are in the platform-specific encoding of either ASCII or EBCDIC." It's only appropriate to pass the data directly to the String constructor if it's already UTF-16. So assuming we don't care about EBCDIC, and considering that all ASCII is valid UTF-*, we should use String::fromUTF8() to create the return value, rather than directly using the String constructor. > > [1] https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#a1d50c91925ca3853fce6f28cf7390c3c > [2] http://userguide.icu-project.org/strings Yeah, these should definitely be all ASCII and return a `const char*`. It looks like the char* constructor and fromUTF8 do exactly the same thing, but I can switch if that's the preferred way to handle this. >> Source/JavaScriptCore/runtime/IntlObject.cpp:621 >> + return "en"_s; > > Huh, I didn't know we had this user-defined literal. How about that. Yeah, that the new ASCIILiteral("en"). >> LayoutTests/js/intl-default-locale.html:15 >> +shouldBe("new Intl.Collator().resolvedOptions().locale", "'tlh'"); > > Uh, perhaps I don't understand something here, but why is 'tlh' a good default locale? Maybe I should have picked an easier to grok example. :) 'tlh' is the canonical form of i-klingon. It's also a language I'm sure is not supported/available on any of our platforms. This test ensures it used the default locale, canonicalized it, and it couldn't have pulled it from anywhere else. Created attachment 345813 [details]
Patch
Comment on attachment 345813 [details] Patch Attachment 345813 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/8661021 New failing tests: http/tests/security/contentSecurityPolicy/userAgentShadowDOM/allow-video.html Created attachment 345840 [details]
Archive of layout-test-results from ews205 for win-future
The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews205 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Win port still has Intl disabled, so I would surprised if any of this code was exercised. Also the failures don't immediately appear related. Is win-future a different bot than win-ews? (In reply to Andy VanWagoner from comment #18) > Win port still has Intl disabled, so I would surprised if any of this code > was exercised. Also the failures don't immediately appear related. Is > win-future a different bot than win-ews? They're different bots, but don't worry about them, there are so many flaky tests that that the bots are best ignored right now unless the failures look relevant. Comment on attachment 345813 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=345813&action=review > Source/JavaScriptCore/runtime/IntlObject.cpp:144 > + return String::fromUTF8(buffer.data(), length); You're right: it's safe to pass directly to the String constructor if you know the data is all ASCII, so my comment was totally wrong and the way you had it before was better. > Source/JavaScriptCore/runtime/IntlObject.cpp:609 > + for (const auto &language : languages) { The & gets attached to the type, not the variable name: const auto& language Created attachment 345847 [details]
Patch
Comment on attachment 345847 [details] Patch Clearing flags on attachment: 345847 Committed r234260: <https://trac.webkit.org/changeset/234260> All reviewed patches have been landed. Closing bug. |