Bug 152448 - Intl.Collator uses POSIX locale (detected by js/intl-collator.html on iOS Simulator)
Summary: Intl.Collator uses POSIX locale (detected by js/intl-collator.html on iOS Sim...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Andy VanWagoner
URL:
Keywords:
Depends on:
Blocks: 90906 152130
  Show dependency treegraph
 
Reported: 2015-12-18 15:50 PST by Ryan Haddad
Modified: 2016-11-17 12:55 PST (History)
11 users (show)

See Also:


Attachments
Patch (2.05 KB, patch)
2015-12-18 18:12 PST, Sukolsak Sakshuwong
no flags Details | Formatted Diff | Diff
Patch (3.29 KB, patch)
2015-12-18 18:18 PST, Sukolsak Sakshuwong
ap: review-
Details | Formatted Diff | Diff
Add defaultLanguage to globalObjectMethodTable (26.44 KB, patch)
2016-02-17 22:04 PST, Andy VanWagoner
no flags Details | Formatted Diff | Diff
Patch (25.74 KB, patch)
2016-02-27 10:28 PST, Andy VanWagoner
no flags Details | Formatted Diff | Diff
/Applications/Safari\ Technology\ Preview.app (25.74 KB, text/plain)
2016-11-17 12:55 PST, rurumi663@gmail.com
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Haddad 2015-12-18 15:50:21 PST
[iOS Simulator] js/intl-collator.html failing

<https://build.webkit.org/results/Apple%20iOS%209%20Simulator%20Release%20WK2%20(Tests)/r194295%20(1860)/results.html>

--- /Volumes/Data/slave/ios-simulator-9-release-tests-wk2/build/layout-test-results/js/intl-collator-expected.txt
+++ /Volumes/Data/slave/ios-simulator-9-release-tests-wk2/build/layout-test-results/js/intl-collator-actual.txt
@@ -119,10 +119,10 @@
 PASS badCalls is 0
 PASS Intl.Collator.prototype.compare('a', { toString() { throw Error('8') } }) threw exception Error: 8.
 PASS Intl.Collator.prototype.compare.call(null, 'a', 'b') is -1
-PASS Intl.Collator.prototype.compare.call(Intl.Collator('en', { sensitivity:'base' }), 'A', 'a') is 1
+FAIL Intl.Collator.prototype.compare.call(Intl.Collator('en', { sensitivity:'base' }), 'A', 'a') should be 1. Was -1.
 PASS Intl.Collator.prototype.compare.call(5, 'a', 'b') is -1
 PASS new Intl.Collator().compare.call(null, 'a', 'b') is -1
-PASS new Intl.Collator().compare.call(Intl.Collator('en', { sensitivity:'base' }), 'A', 'a') is 1
+FAIL new Intl.Collator().compare.call(Intl.Collator('en', { sensitivity:'base' }), 'A', 'a') should be 1. Was -1.
 PASS new Intl.Collator().compare.call(5, 'a', 'b') is -1
 PASS Intl.Collator.prototype.compare() is 0
 PASS Intl.Collator.prototype.compare('undefinec') is -1
Comment 1 Sukolsak Sakshuwong 2015-12-18 18:12:58 PST
Created attachment 267667 [details]
Patch
Comment 2 Sukolsak Sakshuwong 2015-12-18 18:18:51 PST
Created attachment 267671 [details]
Patch
Comment 3 Alexey Proskuryakov 2015-12-18 22:13:50 PST
Comment on attachment 267671 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=267671&action=review

> LayoutTests/ChangeLog:8
> +        On iOS simulator with en-US as the default locale, uloc_getDefault()

Is this a WebKitTestRunner bug? There is no way the actual system default on iOS would be Posix - and WKTR does some work for setting the locale, and it also has an Info.plist that could specify something relevant.

> LayoutTests/ChangeLog:11
> +        Make the test not rely on the default locale of the system.

Yes, WKTR is supposed to set the locale.
Comment 4 Sukolsak Sakshuwong 2015-12-18 22:46:20 PST
(In reply to comment #3)
> Comment on attachment 267671 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=267671&action=review
> 
> > LayoutTests/ChangeLog:8
> > +        On iOS simulator with en-US as the default locale, uloc_getDefault()
> 
> Is this a WebKitTestRunner bug? There is no way the actual system default on
> iOS would be Posix - and WKTR does some work for setting the locale, and it
> also has an Info.plist that could specify something relevant.

I don't know. I didn't use WebKitTestRunner. I created an iOS project in Xcode, linked it with ICU (libicucore.tbd), and printed out uloc_getDefault() in the simulator. It printed "en_US_POSIX". But if I print out uloc_getDefault() in WebKit on OS X, I get "en_US".
Comment 5 Alexey Proskuryakov 2015-12-19 00:06:50 PST
Comment on attachment 267671 [details]
Patch

I meant to r- the patch. Having these different between OS X and iOS would have a long-term maintenance cost, so we should avoid that unless there is a really good reason why the platforms need to be different in this respect.
Comment 6 Alexey Proskuryakov 2015-12-19 00:19:40 PST
Looking through ICU source code, it's not really clear where the difference comes from. en_US_POSIX may be the last resort fallback in uprv_getPOSIXID, so perhaps setlocale(LC_ALL, "") works differently?

Seems like this needs some experimenting.
Comment 7 Sukolsak Sakshuwong 2015-12-19 01:57:30 PST
Looks like it has nothing to do with the difference between iOS and OS X. I tried creating a command line tool project in Xcode and printing out uloc_getDefault(). When I ran it via Xcode, it printed "en_US_POSIX". But when I ran it via command line, it printed "en_US".

It seems to have something to do with the environment variable LANG. Here's a script I created to test it:

$ cat test.c
#include <stdio.h>
const char* uloc_getDefault();
int main() {
    printf("%s\n", uloc_getDefault());
    return 0;
}

$ echo $LANG
en_US.UTF-8
$ gcc test.c -licucore -o test
$ ./test
en_US
$ unset LANG; ./test
en_US_POSIX
$ LANG=en_US.UTF-8 ./test
en_US
$ LANG=abcdef.UTF-8 ./test
abcdef

So, I think this patch is the right thing to do. We should never assume that the default locale is English anyway. Let me know what you think. If you agree, I will fix the change log.
Comment 8 Alexey Proskuryakov 2015-12-19 14:05:50 PST
> We should never assume that the default locale is English anyway.

Why are you saying that? Tests should run in a predictable environment - we don't them to fail for people whose computers are configured to use a different language or locale.

> It seems to have something to do with the environment variable LANG.

The ICU function that I mentioned does check LANG as part of fallback, so this is definitely one of the ways to make it use what we want. Other things that it checks are LC_ALL and LC_CTYPE environment variables, and the current POSIX locale.

The question is how to most cleanly make WebKitTestRunner use a consistent locale in all of the processes (including WebProcess and NetworkProcess). Given that it's an app, I suggest looking if Info.plist and/or localization changes can achieve that.
Comment 9 Sukolsak Sakshuwong 2015-12-19 14:57:33 PST
(In reply to comment #8)
> > We should never assume that the default locale is English anyway.
> 
> Why are you saying that? Tests should run in a predictable environment - we
> don't them to fail for people whose computers are configured to use a
> different language or locale.

That is what this patch is trying to do. The currently existing test will fail if the user uses a different locale that has a different order for 'a' and 'A'.

"new Intl.Collator()" returns a collator with the default locale. We cannot predict the value of "new Intl.Collator().compare('a', 'A')".

"new Intl.Collator('en')" returns a collator with the English locale. We can predict the value of "new Intl.Collator('en').compare('a', 'A')".
Comment 10 Alexey Proskuryakov 2015-12-19 19:13:15 PST
> "new Intl.Collator()" returns a collator with the default locale. We cannot predict the value of "new Intl.Collator().compare('a', 'A')".

We need to test that "new Intl.Collator()" works, can't just remove the test for this functionality.
Comment 11 Sukolsak Sakshuwong 2015-12-19 22:34:57 PST
(In reply to comment #10)
> > "new Intl.Collator()" returns a collator with the default locale. We cannot predict the value of "new Intl.Collator().compare('a', 'A')".
> 
> We need to test that "new Intl.Collator()" works, can't just remove the test
> for this functionality.

We have tests for "new Intl.Collator()", including "shouldBe("new Intl.Collator().compare.call(null, 'a', 'b')", "-1")". We can test that because any sensible language will likely have 'a' before 'b'. It can still be wrong, but it's extremely unlikely.

Testing compare('a', 'A') with an unknown locale is harder to predict. If we do really want to test it, we could do

shouldBeTrue("var collator = new Intl.Collator(); collator.resolvedOptions().locale != 'en-US' || collator.compare.call(Intl.Collator('en', { sensitivity:'base' }), 'A', 'a') == 1");

But that's not the point of this test case, if you look at the comment above it. The point of this test case is to test that calling compare() with any "this" has no effect on it.
Comment 12 Alexey Proskuryakov 2015-12-19 22:54:01 PST
I'm not really sure what you are arguing for.

It seems pretty obvious that the test harness should not depend on user's locale. It also seems pretty obvious that tests should not need to account for technical differences in how the harness is implemented on various platforms (command line vs. app, for example).

Do you disagree with the above? Or if you agree with that, do you think that the patch is the right way to fix the problem nonetheless?
Comment 13 Sukolsak Sakshuwong 2015-12-19 23:11:47 PST
(In reply to comment #12)
> I'm not really sure what you are arguing for.
> 
> It seems pretty obvious that the test harness should not depend on user's
> locale. It also seems pretty obvious that tests should not need to account
> for technical differences in how the harness is implemented on various
> platforms (command line vs. app, for example).
> 
> Do you disagree with the above? Or if you agree with that, do you think that
> the patch is the right way to fix the problem nonetheless?

I agree with that. My point is that the currently existing test depends on user's locale, because it uses "new Intl.Collator()". My patch is trying to make the test not depend on user's locale.

"new Intl.Collator()" will always depend on user's locale. That's what the spec says.
Comment 14 Alexey Proskuryakov 2015-12-19 23:30:08 PST
The right way to fix this is to make the test harness isolate tests from user's locale, and from details of the harness implementation. Will you be willing to do that?
Comment 15 Sukolsak Sakshuwong 2015-12-19 23:56:09 PST
(In reply to comment #14)
> The right way to fix this is to make the test harness isolate tests from
> user's locale, and from details of the harness implementation. Will you be
> willing to do that?

Ok, I think I misunderstood what "test harness" meant. Does that mean there should be no way for tests to read the default locale of the system that is running the harness? If so, I don't want to work on it as I am not so familiar with it.
Comment 16 Alexey Proskuryakov 2015-12-21 09:18:28 PST
> Does that mean there should be no way for tests to read the default locale of the system that is running the harness?

Obviously, tests should work the same on all machines. We don't really have the setup to run the tests with different system locales anyway, so a dependence on system settings would be only confusing, not helpful.

When tests need non-default system language and locale, they should use testRunner or internals APIs to do that.
Comment 17 Alexey Proskuryakov 2015-12-25 18:28:02 PST
This is actually a bug in Intl.Collator implementation, not in tools. WebKit should never fall back to uloc_getDefault, because it's not expected to be correct on OS X or iOS. We should get the locale using Cocoa.

Having said this, the difference between WebKitTestRunner and WebKitTestRunnerApp is caused by code in Tools/Scripts/webkitpy/common/host.py (see _engage_awesome_locale_hacks), which sets some language variables to override host tools' output. These happen to be inherited by WebKitTestRunner too.
Comment 18 Alexey Proskuryakov 2015-12-25 18:37:05 PST
This is somewhat tricky problem, because OS X has separate notions of locale and language, and it's not really clear which one we should use in the browser.

For the first cut, we should probably call out to WebCore's defaultLanguage(). That's not really the locale, but exposing the true system locale is likely both unnecessary and harmful for privacy.
Comment 19 Andy VanWagoner 2016-01-20 15:17:56 PST
(In reply to comment #18)
> For the first cut, we should probably call out to WebCore's
> defaultLanguage().

Can JSC depend on WebCore in that way?

I agree that the user's preferred language should be the default locale.
Comment 20 Sukolsak Sakshuwong 2016-01-20 15:25:27 PST
Can we do something similar to copyDefaultLocale() in WTF/wtf/unicode/icu/CollatorICU.cpp?
Comment 21 Alexey Proskuryakov 2016-01-20 17:07:53 PST
> Can JSC depend on WebCore in that way?

It can have a generic client callback (there are a few already), that WebCore would implement.

> Can we do something similar to copyDefaultLocale() in WTF/wtf/unicode/icu/CollatorICU.cpp?

This doesn't seem to respect testing overrides, and thus may need to change itself. There are several code paths for overriding the language though, so I'm not 100% certain.
Comment 22 Andy VanWagoner 2016-01-25 20:00:56 PST
(In reply to comment #21)
> > Can JSC depend on WebCore in that way?
> 
> It can have a generic client callback (there are a few already), that
> WebCore would implement.

If you can point out an example, I can take a shot at this.
Comment 23 Alexey Proskuryakov 2016-01-26 09:17:44 PST
> If you can point out an example, I can take a shot at this.

I could find a few:

- ConsoleClient, implemented in WebCore as PageConsoleClient;
- everything in JSVMClientData;
- everything in GlobalObjectMethodTable.

CC'ing JS folks who may have better guidance.
Comment 24 Andy VanWagoner 2016-02-17 22:04:25 PST
Created attachment 271630 [details]
Add defaultLanguage to globalObjectMethodTable
Comment 25 WebKit Commit Bot 2016-02-17 22:06:22 PST
Attachment 271630 [details] did not pass style-queue:


ERROR: Source/WebCore/bindings/js/JSWorkerGlobalScopeBase.cpp:33:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 20 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 26 Alexey Proskuryakov 2016-02-18 09:08:57 PST
Comment on attachment 271630 [details]
Add defaultLanguage to globalObjectMethodTable

Thank you! This direction makes sense to me; a JSC person should probably do an actual review.
Comment 27 Darin Adler 2016-02-26 08:34:59 PST
Comment on attachment 271630 [details]
Add defaultLanguage to globalObjectMethodTable

View in context: https://bugs.webkit.org/attachment.cgi?id=271630&action=review

> Source/JavaScriptCore/runtime/IntlObject.cpp:656
> +String defaultLocale(GlobalObjectMethodTable::DefaultLanguageFunctionPtr defaultLanguage)
> +{
> +    // 6.2.4 DefaultLocale ()
> +    if (defaultLanguage) {
> +        String locale = defaultLanguage();
> +        if (!locale.isEmpty())
> +            return canonicalizeLanguageTag(locale);
> +    }
> +    String locale = uloc_getDefault();
> +    convertICULocaleToBCP47LanguageTag(locale);
> +    return locale;
> +}

I think this is the wrong idiom. This long type name in multiple functions is a bad code smell that halves make it clear we are doing this wrong. The idiomatic way to do it is to add an ExecState& argument as the first argument to the various functions and then do the work to get to the global object method in the lowest level function where we need the function pointer, not up at the caller.

The code would then be something like this:

    if (auto defaultLanguageMethod = state.callee()->globalObject()->globalObjectMethodTable()->defaultLanguage) {
        String defaultLanguage = defaultLanguageMethod();
        if (!defaultLanguage.isEmpty())
            return canonicalizeLanguageTag(defaultLanguage);
    }

I also think it’s a bit peculiar to call a language a locale since they are not the same thing, but that’s not something new to this patch.

> Source/JavaScriptCore/runtime/JSGlobalObject.h:177
> +    typedef String (*DefaultLanguageFunctionPtr)(void);

The "(void)" here is unnecessary in C++ code that is not also used in C, so please just use "()".
Comment 28 Darin Adler 2016-02-26 08:35:37 PST
"helps make it clear", not "halves make it clear"
Comment 29 Andy VanWagoner 2016-02-27 10:28:07 PST
Created attachment 272419 [details]
Patch
Comment 30 Andy VanWagoner 2016-02-27 10:43:58 PST
(In reply to comment #27)
> Comment on attachment 271630 [details]
> Add defaultLanguage to globalObjectMethodTable
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=271630&action=review
> 
> > Source/JavaScriptCore/runtime/IntlObject.cpp:656
> > +String defaultLocale(GlobalObjectMethodTable::DefaultLanguageFunctionPtr defaultLanguage)
> > +{
> > +    // 6.2.4 DefaultLocale ()
> > +    if (defaultLanguage) {
> > +        String locale = defaultLanguage();
> > +        if (!locale.isEmpty())
> > +            return canonicalizeLanguageTag(locale);
> > +    }
> > +    String locale = uloc_getDefault();
> > +    convertICULocaleToBCP47LanguageTag(locale);
> > +    return locale;
> > +}
> 
> I think this is the wrong idiom. This long type name in multiple functions
> is a bad code smell that halves make it clear we are doing this wrong. The
> idiomatic way to do it is to add an ExecState& argument as the first
> argument to the various functions and then do the work to get to the global
> object method in the lowest level function where we need the function
> pointer, not up at the caller.
> 
> The code would then be something like this:
> 
>     if (auto defaultLanguageMethod =
> state.callee()->globalObject()->globalObjectMethodTable()->defaultLanguage) {
>         String defaultLanguage = defaultLanguageMethod();
>         if (!defaultLanguage.isEmpty())
>             return canonicalizeLanguageTag(defaultLanguage);
>     }
> 

Done.

> I also think it’s a bit peculiar to call a language a locale since they are
> not the same thing, but that’s not something new to this patch.
> 

I agree it's confusing. I made the global method match the name of the only implementation currently being passed to it. The Intl spec uses locale frequently, which is why it is the term in the Intl code.

As is stands, the language identifier in WebCore is BCP 47 Language Tag. The Intl spec also uses the same language tags, but refers to variables containing such a tag as locale. So while the terms are not really synonyms, in this context they are the same thing.

> > Source/JavaScriptCore/runtime/JSGlobalObject.h:177
> > +    typedef String (*DefaultLanguageFunctionPtr)(void);
> 
> The "(void)" here is unnecessary in C++ code that is not also used in C, so
> please just use "()".

Done.
Comment 31 WebKit Commit Bot 2016-02-27 16:35:55 PST
Comment on attachment 272419 [details]
Patch

Clearing flags on attachment: 272419

Committed r197261: <http://trac.webkit.org/changeset/197261>
Comment 32 WebKit Commit Bot 2016-02-27 16:36:02 PST
All reviewed patches have been landed.  Closing bug.
Comment 33 rurumi663@gmail.com 2016-11-17 12:55:30 PST
Created attachment 295073 [details]
/Applications/Safari\ Technology\ Preview.app