RESOLVED FIXED 167991
JSC: Intl API should ignore encoding when parsing BCP 47 language tag from ISO 15897 locale string (passed via LANG)
https://bugs.webkit.org/show_bug.cgi?id=167991
Summary JSC: Intl API should ignore encoding when parsing BCP 47 language tag from IS...
mod-wkbz
Reported 2017-02-08 08:16:07 PST
I'm using WebKit2GTK+ 2.14.3 on Arch Linux, but this bug has been around for some time. The only locale-related {LC_*, LOCALE, LANG} environment variable set on my system is "LANG=es_ES.utf8". In this configuration, opening the WebKit2 inspector results in a "Web Inspector encountered an internal error." page instead of a working inspector. If I set LANG to "es_ES" or "en_US" or "C", the inspector works. If I set LANG to the (invalid) "es_ES.", the C library complains about an invalid locale as it should: (process:15144): Gtk-WARNING **: Locale not supported by C library. Using the fallback 'C' locale. The browser then runs in English, but the inspector still crashes in the same way. Given that no other software complains about "LANG=es_ES.utf8", the inspector should be fixed to also support such "locale.encoding" values. The JS exception stack is attached below; it looks like IntlDateTimeFormat::initializeDateTimeFormat is really the problem here but I couldn't figure out how specifically. ------- Inspected URL: (unknown) Loading completed: false Frontend User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/603.1 (KHTML, like Gecko) Version/10.0 Safari/603.1 Uncaught Exceptions: - TypeError: failed to initialize DateTimeFormat due to invalid locale (at LogTreeElement.js:32:107) DateTimeFormat @ [native code] toLocaleTimeString @ [native code] LogTreeElement @ LogTreeElement.js:32:107 contentLoaded @ Main.js:247:63 ? @ [native code] - TypeError: undefined is not an object (evaluating 'this.tabBrowser.updateLayout') (at Main.js:1512:20) _tabBrowserSizeDidChange @ Main.js:1512:20 _windowResized @ Main.js:1401:34 ? @ [native code]
Attachments
Patch (13.46 KB, patch)
2018-07-25 14:55 PDT, Andy VanWagoner
no flags
Patch (13.67 KB, patch)
2018-07-25 19:40 PDT, Andy VanWagoner
no flags
Archive of layout-test-results from ews205 for win-future (12.81 MB, application/zip)
2018-07-26 05:36 PDT, EWS Watchlist
no flags
Patch (13.66 KB, patch)
2018-07-26 08:32 PDT, Andy VanWagoner
no flags
Blaze Burg
Comment 1 2017-02-08 10:13:10 PST
1. WebKitGTK should not be showing the uncaught exception screen for production / distribution builds. 2. I don't know whether DateTimeFormat is spec'd to support locale.encoding. 3. We can always add a try-catch around the Inspector code that constructs this object, though this seems like a weird workaround for an error case that should be handled by JSC.
Radar WebKit Bug Importer
Comment 2 2017-02-08 10:13:28 PST
Blaze Burg
Comment 3 2017-02-08 10:27:39 PST
This should probably happen by improving IntlObject::convertICULocaleToBCP47LanguageTag().
Carlos Garcia Campos
Comment 4 2017-02-08 22:22:43 PST
es_ES.utf8 is also my locale and the inspector works for me.
Andy VanWagoner
Comment 5 2018-07-25 13:04:16 PDT
https://bugs.webkit.org/show_bug.cgi?id=182467 is likely a duplicate of this.
Andy VanWagoner
Comment 6 2018-07-25 13:06:23 PDT
Michael Catanzaro
Comment 7 2018-07-25 13:46:34 PDT
(In reply to Brian Burg from comment #1) > 1. WebKitGTK should not be showing the uncaught exception screen for > production / distribution builds. We use ENABLE(DEVELOPER_MODE) guards to disable functionality in production builds. Does one need to be added somewhere in the web inspector code?
Michael Catanzaro
Comment 8 2018-07-25 13:46:49 PDT
*** Bug 182467 has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 9 2018-07-25 13:47:01 PDT
*** Bug 171207 has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 10 2018-07-25 13:51:40 PDT
Two interesting comments from bug #171207: (In reply to Sergey Mende from comment #3) > The problem lies in the discrepancy between LANG environment variable format > and https://www.rfc-editor.org/rfc/bcp/bcp47.txt spec. > > The Web Inspector does not cause an internal error if LANG is set to a > simple two characters language code, i.e.: > > env LANG="en" midori > > works just fine. > > As soon as LANG is set to a full spec, e.g. "en_US.UTF-8", the Web Inspector > fails to open with the following message in its pane: > > TypeError:​ failed to initialize NumberFormat due to invalid locale (at > Utilities.js:​801:​57)​ > NumberFormat @ [native code]​ > toLocaleString @ [native code]​ > d @ Utilities.js:​801:​57 > value @ Utilities.js:​881:​64 > value @ Utilities.js:​899:​29 > _loadNewRecording @ TimelineManager.js:​655:​108 > reset @ TimelineManager.js:​114:​31 > TimelineManager @ TimelineManager.js:​62:​19 > loaded @ Main.js:​130:​50 > global code @ Main.html:​808:​18 > > > I am using stock FC26's package (webkitgtk4-2.18.3-1.fc26.x86_64) And: (In reply to Michael Catanzaro from comment #6) > I did some grepping and found > Source/JavaScriptCore/runtime/IntlNumberFormat.cpp. The failure is occurring > in IntlNumberFormat::initializeNumberFormat. I would look closely at > resolveLocale in IntlObject.cpp. If you're able to rebuild WebKit, adding > printfs is by far the easiest way to debug a problem like this. Good luck! What's strange to me is that I've never seen the reported bug, ever, using the en_US.UTF-8 locale.
Andy VanWagoner
Comment 11 2018-07-25 14:04:28 PDT
This error only appears to be possible when no available locales are passed into an Intl constructor like Intl.DateTimeFormat. The default language currently is assumed to be valid, but may end up being empty. There are multiple paths in how the default language is figured out, we just need JSC to do a better job ensuring the best valid non-empty default locale is used.
Andy VanWagoner
Comment 12 2018-07-25 14:55:13 PDT
Michael Catanzaro
Comment 13 2018-07-25 16:35:51 PDT
Comment on attachment 345787 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=345787&action=review > Source/JavaScriptCore/runtime/IntlObject.cpp:138 > + auto length = uloc_toLanguageTag(localeID, buffer.data(), buffer.size(), false, &status); > + if (status == U_BUFFER_OVERFLOW_ERROR) { Gotta call out the ICU developers for this terrible API design, making you call the function twice rather than just returning the proper amount of allocated memory.... > Source/JavaScriptCore/runtime/IntlObject.cpp:144 > + return String(buffer.data(), length); I believe the langtag buffer is filled with ASCII, not UTF-16, guessing from the API reference [1] and [2], which says: "In some APIs, ICU uses char * strings. This is either for file system paths or for strings that contain invariant characters only (such as locale identifiers). These strings are in the platform-specific encoding of either ASCII or EBCDIC." It's only appropriate to pass the data directly to the String constructor if it's already UTF-16. So assuming we don't care about EBCDIC, and considering that all ASCII is valid UTF-*, we should use String::fromUTF8() to create the return value, rather than directly using the String constructor. [1] https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#a1d50c91925ca3853fce6f28cf7390c3c [2] http://userguide.icu-project.org/strings > Source/JavaScriptCore/runtime/IntlObject.cpp:621 > + return "en"_s; Huh, I didn't know we had this user-defined literal. How about that. > LayoutTests/js/intl-default-locale.html:15 > +shouldBe("new Intl.DateTimeFormat().resolvedOptions().locale", "'tlh'"); > +shouldBe("new Intl.NumberFormat().resolvedOptions().locale", "'tlh'"); > +shouldBe("new Intl.Collator().resolvedOptions().locale", "'tlh'"); Uh, perhaps I don't understand something here, but why is 'tlh' a good default locale?
Andy VanWagoner
Comment 14 2018-07-25 16:58:32 PDT
Comment on attachment 345787 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=345787&action=review >> Source/JavaScriptCore/runtime/IntlObject.cpp:144 >> + return String(buffer.data(), length); > > I believe the langtag buffer is filled with ASCII, not UTF-16, guessing from the API reference [1] and [2], which says: "In some APIs, ICU uses char * strings. This is either for file system paths or for strings that contain invariant characters only (such as locale identifiers). These strings are in the platform-specific encoding of either ASCII or EBCDIC." It's only appropriate to pass the data directly to the String constructor if it's already UTF-16. So assuming we don't care about EBCDIC, and considering that all ASCII is valid UTF-*, we should use String::fromUTF8() to create the return value, rather than directly using the String constructor. > > [1] https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#a1d50c91925ca3853fce6f28cf7390c3c > [2] http://userguide.icu-project.org/strings Yeah, these should definitely be all ASCII and return a `const char*`. It looks like the char* constructor and fromUTF8 do exactly the same thing, but I can switch if that's the preferred way to handle this. >> Source/JavaScriptCore/runtime/IntlObject.cpp:621 >> + return "en"_s; > > Huh, I didn't know we had this user-defined literal. How about that. Yeah, that the new ASCIILiteral("en"). >> LayoutTests/js/intl-default-locale.html:15 >> +shouldBe("new Intl.Collator().resolvedOptions().locale", "'tlh'"); > > Uh, perhaps I don't understand something here, but why is 'tlh' a good default locale? Maybe I should have picked an easier to grok example. :) 'tlh' is the canonical form of i-klingon. It's also a language I'm sure is not supported/available on any of our platforms. This test ensures it used the default locale, canonicalized it, and it couldn't have pulled it from anywhere else.
Andy VanWagoner
Comment 15 2018-07-25 19:40:49 PDT
EWS Watchlist
Comment 16 2018-07-26 05:35:46 PDT
Comment on attachment 345813 [details] Patch Attachment 345813 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/8661021 New failing tests: http/tests/security/contentSecurityPolicy/userAgentShadowDOM/allow-video.html
EWS Watchlist
Comment 17 2018-07-26 05:36:00 PDT
Created attachment 345840 [details] Archive of layout-test-results from ews205 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews205 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Andy VanWagoner
Comment 18 2018-07-26 08:14:18 PDT
Win port still has Intl disabled, so I would surprised if any of this code was exercised. Also the failures don't immediately appear related. Is win-future a different bot than win-ews?
Michael Catanzaro
Comment 19 2018-07-26 08:21:18 PDT
(In reply to Andy VanWagoner from comment #18) > Win port still has Intl disabled, so I would surprised if any of this code > was exercised. Also the failures don't immediately appear related. Is > win-future a different bot than win-ews? They're different bots, but don't worry about them, there are so many flaky tests that that the bots are best ignored right now unless the failures look relevant.
Michael Catanzaro
Comment 20 2018-07-26 08:29:37 PDT
Comment on attachment 345813 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=345813&action=review > Source/JavaScriptCore/runtime/IntlObject.cpp:144 > + return String::fromUTF8(buffer.data(), length); You're right: it's safe to pass directly to the String constructor if you know the data is all ASCII, so my comment was totally wrong and the way you had it before was better. > Source/JavaScriptCore/runtime/IntlObject.cpp:609 > + for (const auto &language : languages) { The & gets attached to the type, not the variable name: const auto& language
Andy VanWagoner
Comment 21 2018-07-26 08:32:47 PDT
WebKit Commit Bot
Comment 22 2018-07-26 09:01:12 PDT
Comment on attachment 345847 [details] Patch Clearing flags on attachment: 345847 Committed r234260: <https://trac.webkit.org/changeset/234260>
WebKit Commit Bot
Comment 23 2018-07-26 09:01:14 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.