Bug 167991

Summary: JSC: Intl API should ignore encoding when parsing BCP 47 language tag from ISO 15897 locale string (passed via LANG)
Product: WebKit Reporter: mod-wkbz
Component: JavaScriptCoreAssignee: Andy VanWagoner <andy>
Status: RESOLVED FIXED    
Severity: Normal CC: andy, bburg, cgarcia, commit-queue, dirk.fizzlebeef, ews-watchlist, inspector-bugzilla-changes, joepeck, keith_miller, laurb3st, mark.lam, mcatanzaro, Meanjeanaco, msaboff, saam, sergey, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: Other   
Hardware: All   
OS: Linux   
Attachments:
Description Flags
Patch
none
Patch
none
Archive of layout-test-results from ews205 for win-future
none
Patch none

Description mod-wkbz 2017-02-08 08:16:07 PST
I'm using WebKit2GTK+ 2.14.3 on Arch Linux, but this bug has been around for some time.

The only locale-related {LC_*, LOCALE, LANG} environment variable set on my system is "LANG=es_ES.utf8". In this configuration, opening the WebKit2 inspector results in a "Web Inspector encountered an internal error." page instead of a working inspector.

If I set LANG to "es_ES" or "en_US" or "C", the inspector works. If I set LANG to the (invalid) "es_ES.", the C library complains about an invalid locale as it should:

(process:15144): Gtk-WARNING **: Locale not supported by C library.
	Using the fallback 'C' locale.

The browser then runs in English, but the inspector still crashes in the same way.

Given that no other software complains about "LANG=es_ES.utf8", the inspector should be fixed to also support such "locale.encoding" values.

The JS exception stack is attached below; it looks like IntlDateTimeFormat::initializeDateTimeFormat is really the problem here but I couldn't figure out how specifically.

-------
Inspected URL:        (unknown)
Loading completed:    false
Frontend User Agent:  Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/603.1 (KHTML, like Gecko) Version/10.0 Safari/603.1

Uncaught Exceptions:
 - TypeError: failed to initialize DateTimeFormat due to invalid locale (at LogTreeElement.js:32:107)
    DateTimeFormat @ [native code]
    toLocaleTimeString @ [native code]
    LogTreeElement @ LogTreeElement.js:32:107
    contentLoaded @ Main.js:247:63
    ? @ [native code]
 - TypeError: undefined is not an object (evaluating 'this.tabBrowser.updateLayout') (at Main.js:1512:20)
    _tabBrowserSizeDidChange @ Main.js:1512:20
    _windowResized @ Main.js:1401:34
    ? @ [native code]
Comment 1 BJ Burg 2017-02-08 10:13:10 PST
1. WebKitGTK should not be showing the uncaught exception screen for production / distribution builds.

2. I don't know whether DateTimeFormat is spec'd to support locale.encoding.

3. We can always add a try-catch around the Inspector code that constructs this object, though this seems like a weird workaround for an error case that should be handled by JSC.
Comment 2 Radar WebKit Bug Importer 2017-02-08 10:13:28 PST
<rdar://problem/30422870>
Comment 3 BJ Burg 2017-02-08 10:27:39 PST
This should probably happen by improving IntlObject::convertICULocaleToBCP47LanguageTag().
Comment 4 Carlos Garcia Campos 2017-02-08 22:22:43 PST
es_ES.utf8 is also my locale and the inspector works for me.
Comment 5 Andy VanWagoner 2018-07-25 13:04:16 PDT
https://bugs.webkit.org/show_bug.cgi?id=182467 is likely a duplicate of this.
Comment 6 Andy VanWagoner 2018-07-25 13:06:23 PDT
https://bugs.webkit.org/show_bug.cgi?id=171207 is another likely dupe.
Comment 7 Michael Catanzaro 2018-07-25 13:46:34 PDT
(In reply to Brian Burg from comment #1)
> 1. WebKitGTK should not be showing the uncaught exception screen for
> production / distribution builds.

We use ENABLE(DEVELOPER_MODE) guards to disable functionality in production builds. Does one need to be added somewhere in the web inspector code?
Comment 8 Michael Catanzaro 2018-07-25 13:46:49 PDT
*** Bug 182467 has been marked as a duplicate of this bug. ***
Comment 9 Michael Catanzaro 2018-07-25 13:47:01 PDT
*** Bug 171207 has been marked as a duplicate of this bug. ***
Comment 10 Michael Catanzaro 2018-07-25 13:51:40 PDT
Two interesting comments from bug #171207:

(In reply to Sergey Mende from comment #3)
> The problem lies in the discrepancy between LANG environment variable format
> and  https://www.rfc-editor.org/rfc/bcp/bcp47.txt spec.
> 
> The Web Inspector does not cause an internal error if LANG is set to a
> simple two characters language code, i.e.:
> 
>     env LANG="en" midori
> 
> works just fine. 
> 
> As soon as LANG is set to a full spec, e.g. "en_US.UTF-8", the Web Inspector
> fails to open with the following message in its pane:
> 
> TypeError:​ failed to initialize NumberFormat due to invalid locale (at
> Utilities.js:​801:​57)​
>     NumberFormat @ [native code]​
>     toLocaleString @ [native code]​
>     d @ Utilities.js:​801:​57
>     value @ Utilities.js:​881:​64
>     value @ Utilities.js:​899:​29
>     _loadNewRecording @ TimelineManager.js:​655:​108
>     reset @ TimelineManager.js:​114:​31
>     TimelineManager @ TimelineManager.js:​62:​19
>     loaded @ Main.js:​130:​50
>     global code @ Main.html:​808:​18
> 
> 
> I am using stock FC26's package (webkitgtk4-2.18.3-1.fc26.x86_64)

And:

(In reply to Michael Catanzaro from comment #6)
> I did some grepping and found
> Source/JavaScriptCore/runtime/IntlNumberFormat.cpp. The failure is occurring
> in IntlNumberFormat::initializeNumberFormat. I would look closely at
> resolveLocale in IntlObject.cpp. If you're able to rebuild WebKit, adding
> printfs is by far the easiest way to debug a problem like this. Good luck!

What's strange to me is that I've never seen the reported bug, ever, using the en_US.UTF-8 locale.
Comment 11 Andy VanWagoner 2018-07-25 14:04:28 PDT
This error only appears to be possible when no available locales are passed into an Intl constructor like Intl.DateTimeFormat. The default language currently is assumed to be valid, but may end up being empty.

There are multiple paths in how the default language is figured out, we just need JSC to do a better job ensuring the best valid non-empty default locale is used.
Comment 12 Andy VanWagoner 2018-07-25 14:55:13 PDT
Created attachment 345787 [details]
Patch
Comment 13 Michael Catanzaro 2018-07-25 16:35:51 PDT
Comment on attachment 345787 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=345787&action=review

> Source/JavaScriptCore/runtime/IntlObject.cpp:138
> +    auto length = uloc_toLanguageTag(localeID, buffer.data(), buffer.size(), false, &status);
> +    if (status == U_BUFFER_OVERFLOW_ERROR) {

Gotta call out the ICU developers for this terrible API design, making you call the function twice rather than just returning the proper amount of allocated memory....

> Source/JavaScriptCore/runtime/IntlObject.cpp:144
> +        return String(buffer.data(), length);

I believe the langtag buffer is filled with ASCII, not UTF-16, guessing from the API reference [1] and [2], which says: "In some APIs, ICU uses char * strings. This is either for file system paths or for strings that contain invariant characters only (such as locale identifiers). These strings are in the platform-specific encoding of either ASCII or EBCDIC." It's only appropriate to pass the data directly to the String constructor if it's already UTF-16. So assuming we don't care about EBCDIC, and considering that all ASCII is valid UTF-*, we should use String::fromUTF8() to create the return value, rather than directly using the String constructor.

[1] https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#a1d50c91925ca3853fce6f28cf7390c3c
[2] http://userguide.icu-project.org/strings

> Source/JavaScriptCore/runtime/IntlObject.cpp:621
> +    return "en"_s;

Huh, I didn't know we had this user-defined literal. How about that.

> LayoutTests/js/intl-default-locale.html:15
> +shouldBe("new Intl.DateTimeFormat().resolvedOptions().locale", "'tlh'");
> +shouldBe("new Intl.NumberFormat().resolvedOptions().locale", "'tlh'");
> +shouldBe("new Intl.Collator().resolvedOptions().locale", "'tlh'");

Uh, perhaps I don't understand something here, but why is 'tlh' a good default locale?
Comment 14 Andy VanWagoner 2018-07-25 16:58:32 PDT
Comment on attachment 345787 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=345787&action=review

>> Source/JavaScriptCore/runtime/IntlObject.cpp:144
>> +        return String(buffer.data(), length);
> 
> I believe the langtag buffer is filled with ASCII, not UTF-16, guessing from the API reference [1] and [2], which says: "In some APIs, ICU uses char * strings. This is either for file system paths or for strings that contain invariant characters only (such as locale identifiers). These strings are in the platform-specific encoding of either ASCII or EBCDIC." It's only appropriate to pass the data directly to the String constructor if it's already UTF-16. So assuming we don't care about EBCDIC, and considering that all ASCII is valid UTF-*, we should use String::fromUTF8() to create the return value, rather than directly using the String constructor.
> 
> [1] https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#a1d50c91925ca3853fce6f28cf7390c3c
> [2] http://userguide.icu-project.org/strings

Yeah, these should definitely be all ASCII and return a `const char*`. It looks like the char* constructor and fromUTF8 do exactly the same thing, but I can switch if that's the preferred way to handle this.

>> Source/JavaScriptCore/runtime/IntlObject.cpp:621
>> +    return "en"_s;
> 
> Huh, I didn't know we had this user-defined literal. How about that.

Yeah, that the new ASCIILiteral("en").

>> LayoutTests/js/intl-default-locale.html:15
>> +shouldBe("new Intl.Collator().resolvedOptions().locale", "'tlh'");
> 
> Uh, perhaps I don't understand something here, but why is 'tlh' a good default locale?

Maybe I should have picked an easier to grok example. :) 'tlh' is the canonical form of i-klingon. It's also a language I'm sure is not supported/available on any of our platforms.
This test ensures it used the default locale, canonicalized it, and it couldn't have pulled it from anywhere else.
Comment 15 Andy VanWagoner 2018-07-25 19:40:49 PDT
Created attachment 345813 [details]
Patch
Comment 16 EWS Watchlist 2018-07-26 05:35:46 PDT
Comment on attachment 345813 [details]
Patch

Attachment 345813 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/8661021

New failing tests:
http/tests/security/contentSecurityPolicy/userAgentShadowDOM/allow-video.html
Comment 17 EWS Watchlist 2018-07-26 05:36:00 PDT
Created attachment 345840 [details]
Archive of layout-test-results from ews205 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews205  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 18 Andy VanWagoner 2018-07-26 08:14:18 PDT
Win port still has Intl disabled, so I would surprised if any of this code was exercised. Also the failures don't immediately appear related. Is win-future a different bot than win-ews?
Comment 19 Michael Catanzaro 2018-07-26 08:21:18 PDT
(In reply to Andy VanWagoner from comment #18)
> Win port still has Intl disabled, so I would surprised if any of this code
> was exercised. Also the failures don't immediately appear related. Is
> win-future a different bot than win-ews?

They're different bots, but don't worry about them, there are so many flaky tests that that the bots are best ignored right now unless the failures look relevant.
Comment 20 Michael Catanzaro 2018-07-26 08:29:37 PDT
Comment on attachment 345813 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=345813&action=review

> Source/JavaScriptCore/runtime/IntlObject.cpp:144
> +        return String::fromUTF8(buffer.data(), length);

You're right: it's safe to pass directly to the String constructor if you know the data is all ASCII, so my comment was totally wrong and the way you had it before was better.

> Source/JavaScriptCore/runtime/IntlObject.cpp:609
> +    for (const auto &language : languages) {

The & gets attached to the type, not the variable name:

const auto& language
Comment 21 Andy VanWagoner 2018-07-26 08:32:47 PDT
Created attachment 345847 [details]
Patch
Comment 22 WebKit Commit Bot 2018-07-26 09:01:12 PDT
Comment on attachment 345847 [details]
Patch

Clearing flags on attachment: 345847

Committed r234260: <https://trac.webkit.org/changeset/234260>
Comment 23 WebKit Commit Bot 2018-07-26 09:01:14 PDT
All reviewed patches have been landed.  Closing bug.