Bug 156993 - WebCore on Mac ignores the user's preferred region (country) while getting the language
Summary: WebCore on Mac ignores the user's preferred region (country) while getting th...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore JavaScript (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
Depends on:
Blocks: 90906 159693
  Show dependency treegraph
 
Reported: 2016-04-25 13:09 PDT by Filip Pizlo
Modified: 2016-07-12 15:37 PDT (History)
6 users (show)

See Also:


Attachments
the patch (3.59 KB, patch)
2016-04-25 13:13 PDT, Filip Pizlo
ggaren: review+
Details | Formatted Diff | Diff
the patch (5.20 KB, patch)
2016-04-25 18:52 PDT, Filip Pizlo
ggaren: review-
Details | Formatted Diff | Diff
the patch (5.17 KB, patch)
2016-04-25 21:23 PDT, Filip Pizlo
ggaren: review+
Details | Formatted Diff | Diff
the patch (8.40 KB, patch)
2016-04-26 12:27 PDT, Filip Pizlo
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2016-04-25 13:09:16 PDT
It doesn't allow for the possibility of en-pl, for example.
Comment 1 Filip Pizlo 2016-04-25 13:13:40 PDT
Created attachment 277272 [details]
the patch

I'm still testing this.
Comment 2 Geoffrey Garen 2016-04-25 15:54:27 PDT
Comment on attachment 277272 [details]
the patch

r=me, assuming that testing pans out.

Since this change is only to the part of the code that parses a preference from the system, I can't think of any way to test that without changing the preference on the system.

On the other hand, perhaps you could write a test that used window.internals just to call httpStyleLanguageCode with "en-jp" and verify that you get back "en-jp" -- along with other edge cases to do with "_" and so on.
Comment 3 Geoffrey Garen 2016-04-25 15:55:03 PDT
Comment on attachment 277272 [details]
the patch

"en-pl" I mean!
Comment 4 Darin Adler 2016-04-25 15:55:19 PDT
I’m concerned that this could construct language codes that ICU does not know how to handle.
Comment 5 Geoffrey Garen 2016-04-25 16:02:16 PDT
Maybe we can verify our language-country pair, and fall back to the old behavior if verification fails:

(a) search for language and country in uloc_getISOLanguages and uloc_getISOCountries;

OR

(b) try some simple ICU API like uloc_getVariant and see if it returns an error code.
Comment 6 Filip Pizlo 2016-04-25 16:52:11 PDT
(In reply to comment #5)
> Maybe we can verify our language-country pair, and fall back to the old
> behavior if verification fails:
> 
> (a) search for language and country in uloc_getISOLanguages and
> uloc_getISOCountries;
> 
> OR
> 
> (b) try some simple ICU API like uloc_getVariant and see if it returns an
> error code.

Good idea, I'll try that.
Comment 7 Filip Pizlo 2016-04-25 18:40:52 PDT
(In reply to comment #6)
> (In reply to comment #5)
> > Maybe we can verify our language-country pair, and fall back to the old
> > behavior if verification fails:
> > 
> > (a) search for language and country in uloc_getISOLanguages and
> > uloc_getISOCountries;
> > 
> > OR
> > 
> > (b) try some simple ICU API like uloc_getVariant and see if it returns an
> > error code.
> 
> Good idea, I'll try that.

Using uloc_getVariant() doesn't help.  It will gladly accept en-blahus for example, and it will return BLAHUS as the variant.
Comment 8 Filip Pizlo 2016-04-25 18:49:51 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > Maybe we can verify our language-country pair, and fall back to the old
> > > behavior if verification fails:
> > > 
> > > (a) search for language and country in uloc_getISOLanguages and
> > > uloc_getISOCountries;
> > > 
> > > OR
> > > 
> > > (b) try some simple ICU API like uloc_getVariant and see if it returns an
> > > error code.
> > 
> > Good idea, I'll try that.
> 
> Using uloc_getVariant() doesn't help.  It will gladly accept en-blahus for
> example, and it will return BLAHUS as the variant.

Using uloc_getISOCountries works.
Comment 9 Filip Pizlo 2016-04-25 18:52:24 PDT
Created attachment 277302 [details]
the patch

This version of the patch defends against the country not being recognized by ICU.
Comment 10 Geoffrey Garen 2016-04-25 19:34:40 PDT
> Using uloc_getVariant() doesn't help.  It will gladly accept en-blahus for
> example, and it will return BLAHUS as the variant.

en-BLAHUS is my favorite en.
Comment 11 Geoffrey Garen 2016-04-25 19:38:39 PDT
/Volumes/Data/EWS/WebKit/Source/WebCore/platform/mac/Language.mm:34:9: fatal error: 'unicode/unumsys.h' file not found
#import <unicode/unumsys.h>
Comment 12 Geoffrey Garen 2016-04-25 19:41:28 PDT
It looks like each project (WTF, JSC, WebCore, WebKit) includes its own copy of the icu headers in an icu folder. I'm not sure why we do this -- but it seems like the canonical way to fix this bug is to copy unumsys.h to WebCore/icu/unicode/.
Comment 13 Filip Pizlo 2016-04-25 21:23:09 PDT
Created attachment 277329 [details]
the patch

I don't need unumsys.h.  Checking if removing that import is enough to make all of the bots happy.
Comment 14 Darin Adler 2016-04-26 08:27:45 PDT
(In reply to comment #12)
> It looks like each project (WTF, JSC, WebCore, WebKit) includes its own copy
> of the icu headers in an icu folder. I'm not sure why we do this -- but it
> seems like the canonical way to fix this bug is to copy unumsys.h to
> WebCore/icu/unicode/.

We have a set of ICU headers in WebKit only because OS X and iOS include the ICU library, but do not include the headers.

However, we should be able to arrange things so we don’t need a separate set in each project. Might involve some build system changes that might be non-obvious, though.
Comment 15 Geoffrey Garen 2016-04-26 08:47:53 PDT
Comment on attachment 277329 [details]
the patch

r=me
Comment 16 Darin Adler 2016-04-26 08:53:35 PDT
Comment on attachment 277329 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=277329&action=review

> Source/WebCore/platform/mac/Language.mm:141
> +        // Detect if the country that was requested is one that ICU can handle.
> +        const char* const* countries = uloc_getISOCountries();
> +        const char* countryUTF8 = [countryCode UTF8String];
> +        bool foundCountry = false;
> +        for (unsigned i = 0; countries[i]; ++i) {
> +            const char* possibleCountry = countries[i];
> +            if (!strcmp(countryUTF8, possibleCountry)) {
> +                foundCountry = true;
> +                break;
> +            }
> +        }

I’d like to see this factored out into a helper function rather than written inline. The name of that little helper function could help make things clear enough that we wouldn’t even need the comment!
Comment 17 Filip Pizlo 2016-04-26 09:17:41 PDT
(In reply to comment #16)
> Comment on attachment 277329 [details]
> the patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=277329&action=review
> 
> > Source/WebCore/platform/mac/Language.mm:141
> > +        // Detect if the country that was requested is one that ICU can handle.
> > +        const char* const* countries = uloc_getISOCountries();
> > +        const char* countryUTF8 = [countryCode UTF8String];
> > +        bool foundCountry = false;
> > +        for (unsigned i = 0; countries[i]; ++i) {
> > +            const char* possibleCountry = countries[i];
> > +            if (!strcmp(countryUTF8, possibleCountry)) {
> > +                foundCountry = true;
> > +                break;
> > +            }
> > +        }
> 
> I’d like to see this factored out into a helper function rather than written
> inline. The name of that little helper function could help make things clear
> enough that we wouldn’t even need the comment!

Done.
Comment 18 Filip Pizlo 2016-04-26 09:22:25 PDT
Landed in http://trac.webkit.org/changeset/200089
Comment 19 Alexey Proskuryakov 2016-04-26 09:52:23 PDT
Comment on attachment 277329 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=277329&action=review

> Source/WebCore/ChangeLog:13
> +        the default region. That's wrong, since for example it doesn't respect the user's choice (in
> +        System Preferences) to display dates/calenders/etc according to a different region (like how
> +        I have my machine set to en-pl right now).

This doesn't seem right. "en-pl" is "Polish variant of English", not "English with Polish regional settings for things like date and time".
Comment 20 Ryan Haddad 2016-04-26 10:05:40 PDT
This change causes the following API tests to fail:
  WKWebView.NavigatorLanguage
  WebKit1.NavigatorLanguage

It looks like the expected results may need to be updated.

<https://build.webkit.org/builders/Apple%20El%20Capitan%20Release%20WK2%20%28Tests%29/builds/5452/steps/run-api-tests/logs/stdio>
Comment 21 Filip Pizlo 2016-04-26 10:10:48 PDT
(In reply to comment #19)
> Comment on attachment 277329 [details]
> the patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=277329&action=review
> 
> > Source/WebCore/ChangeLog:13
> > +        the default region. That's wrong, since for example it doesn't respect the user's choice (in
> > +        System Preferences) to display dates/calenders/etc according to a different region (like how
> > +        I have my machine set to en-pl right now).
> 
> This doesn't seem right. "en-pl" is "Polish variant of English", not
> "English with Polish regional settings for things like date and time".

That's not how our ICU interprets it.
Comment 22 Ryan Haddad 2016-04-26 10:17:21 PDT
Reverted r200089 for reason:

This change causes API test failures

Committed r200093: <http://trac.webkit.org/changeset/200093>
Comment 23 Alexey Proskuryakov 2016-04-26 10:20:39 PDT
We may be confusing locale and language codes when calling into ICU. Not super easy to untangle, given that Apple's locale is more of a dictionary than a simple string code, but ICU can also encode multiple preferences in a string.
Comment 24 Filip Pizlo 2016-04-26 12:27:06 PDT
Created attachment 277405 [details]
the patch

Fixes the test expectations but also fixes what was probably a real bug: if AppleLanguage specifies a variant, then we might as well fall back to old behavior.
Comment 25 Geoffrey Garen 2016-04-26 12:31:28 PDT
Comment on attachment 277405 [details]
the patch

r=me
Comment 26 Filip Pizlo 2016-04-26 13:01:23 PDT
Landed in http://trac.webkit.org/changeset/200105
Comment 27 Alexey Proskuryakov 2016-04-27 15:39:34 PDT
Comment on attachment 277405 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=277405&action=review

> Tools/TestWebKitAPI/Tests/mac/NavigatorLanguage.mm:77
> -    @[@"ru", @"ru-ru"], // This does not match other browsers or CFNetwork's Accept-Language, which all use "ru".
> +    @[@"ru", @"ru-us"], // This does not match other browsers or CFNetwork's Accept-Language, which all use "ru".

I still don't understand this patch. This test change looks completely wrong, navigator.language should never return "ru-us".

On my machine right here, the primary language is Russian, and the region is U.S. With these settings, navigator.language is ru-ru in shipping Safari. We shouldn't be changing that.
Comment 28 Alexey Proskuryakov 2016-04-27 15:43:52 PDT
navigator.language - "Must return a valid BCP 47 language tag representing either a plausible language or the user’s most preferred language."

This is why even the existing "ru-ru" is a bug, it should be just "ru". Regional settings don't go into navigator.language, that's why the new Intl API was necessary in the first place.