Bug 156993

Summary: WebCore on Mac ignores the user's preferred region (country) while getting the language
Product: WebKit Reporter: Filip Pizlo <fpizlo>
Component: WebCore JavaScriptAssignee: Filip Pizlo <fpizlo>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, darin, dbates, ggaren, mitz, ryanhaddad
Priority: P2    
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 90906, 159693    
Attachments:
Description Flags
the patch
ggaren: review+
the patch
ggaren: review-
the patch
ggaren: review+
the patch ggaren: review+

Filip Pizlo
Reported 2016-04-25 13:09:16 PDT
It doesn't allow for the possibility of en-pl, for example.
Attachments
the patch (3.59 KB, patch)
2016-04-25 13:13 PDT, Filip Pizlo
ggaren: review+
the patch (5.20 KB, patch)
2016-04-25 18:52 PDT, Filip Pizlo
ggaren: review-
the patch (5.17 KB, patch)
2016-04-25 21:23 PDT, Filip Pizlo
ggaren: review+
the patch (8.40 KB, patch)
2016-04-26 12:27 PDT, Filip Pizlo
ggaren: review+
Filip Pizlo
Comment 1 2016-04-25 13:13:40 PDT
Created attachment 277272 [details] the patch I'm still testing this.
Geoffrey Garen
Comment 2 2016-04-25 15:54:27 PDT
Comment on attachment 277272 [details] the patch r=me, assuming that testing pans out. Since this change is only to the part of the code that parses a preference from the system, I can't think of any way to test that without changing the preference on the system. On the other hand, perhaps you could write a test that used window.internals just to call httpStyleLanguageCode with "en-jp" and verify that you get back "en-jp" -- along with other edge cases to do with "_" and so on.
Geoffrey Garen
Comment 3 2016-04-25 15:55:03 PDT
Comment on attachment 277272 [details] the patch "en-pl" I mean!
Darin Adler
Comment 4 2016-04-25 15:55:19 PDT
I’m concerned that this could construct language codes that ICU does not know how to handle.
Geoffrey Garen
Comment 5 2016-04-25 16:02:16 PDT
Maybe we can verify our language-country pair, and fall back to the old behavior if verification fails: (a) search for language and country in uloc_getISOLanguages and uloc_getISOCountries; OR (b) try some simple ICU API like uloc_getVariant and see if it returns an error code.
Filip Pizlo
Comment 6 2016-04-25 16:52:11 PDT
(In reply to comment #5) > Maybe we can verify our language-country pair, and fall back to the old > behavior if verification fails: > > (a) search for language and country in uloc_getISOLanguages and > uloc_getISOCountries; > > OR > > (b) try some simple ICU API like uloc_getVariant and see if it returns an > error code. Good idea, I'll try that.
Filip Pizlo
Comment 7 2016-04-25 18:40:52 PDT
(In reply to comment #6) > (In reply to comment #5) > > Maybe we can verify our language-country pair, and fall back to the old > > behavior if verification fails: > > > > (a) search for language and country in uloc_getISOLanguages and > > uloc_getISOCountries; > > > > OR > > > > (b) try some simple ICU API like uloc_getVariant and see if it returns an > > error code. > > Good idea, I'll try that. Using uloc_getVariant() doesn't help. It will gladly accept en-blahus for example, and it will return BLAHUS as the variant.
Filip Pizlo
Comment 8 2016-04-25 18:49:51 PDT
(In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #5) > > > Maybe we can verify our language-country pair, and fall back to the old > > > behavior if verification fails: > > > > > > (a) search for language and country in uloc_getISOLanguages and > > > uloc_getISOCountries; > > > > > > OR > > > > > > (b) try some simple ICU API like uloc_getVariant and see if it returns an > > > error code. > > > > Good idea, I'll try that. > > Using uloc_getVariant() doesn't help. It will gladly accept en-blahus for > example, and it will return BLAHUS as the variant. Using uloc_getISOCountries works.
Filip Pizlo
Comment 9 2016-04-25 18:52:24 PDT
Created attachment 277302 [details] the patch This version of the patch defends against the country not being recognized by ICU.
Geoffrey Garen
Comment 10 2016-04-25 19:34:40 PDT
> Using uloc_getVariant() doesn't help. It will gladly accept en-blahus for > example, and it will return BLAHUS as the variant. en-BLAHUS is my favorite en.
Geoffrey Garen
Comment 11 2016-04-25 19:38:39 PDT
/Volumes/Data/EWS/WebKit/Source/WebCore/platform/mac/Language.mm:34:9: fatal error: 'unicode/unumsys.h' file not found #import <unicode/unumsys.h>
Geoffrey Garen
Comment 12 2016-04-25 19:41:28 PDT
It looks like each project (WTF, JSC, WebCore, WebKit) includes its own copy of the icu headers in an icu folder. I'm not sure why we do this -- but it seems like the canonical way to fix this bug is to copy unumsys.h to WebCore/icu/unicode/.
Filip Pizlo
Comment 13 2016-04-25 21:23:09 PDT
Created attachment 277329 [details] the patch I don't need unumsys.h. Checking if removing that import is enough to make all of the bots happy.
Darin Adler
Comment 14 2016-04-26 08:27:45 PDT
(In reply to comment #12) > It looks like each project (WTF, JSC, WebCore, WebKit) includes its own copy > of the icu headers in an icu folder. I'm not sure why we do this -- but it > seems like the canonical way to fix this bug is to copy unumsys.h to > WebCore/icu/unicode/. We have a set of ICU headers in WebKit only because OS X and iOS include the ICU library, but do not include the headers. However, we should be able to arrange things so we don’t need a separate set in each project. Might involve some build system changes that might be non-obvious, though.
Geoffrey Garen
Comment 15 2016-04-26 08:47:53 PDT
Comment on attachment 277329 [details] the patch r=me
Darin Adler
Comment 16 2016-04-26 08:53:35 PDT
Comment on attachment 277329 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=277329&action=review > Source/WebCore/platform/mac/Language.mm:141 > + // Detect if the country that was requested is one that ICU can handle. > + const char* const* countries = uloc_getISOCountries(); > + const char* countryUTF8 = [countryCode UTF8String]; > + bool foundCountry = false; > + for (unsigned i = 0; countries[i]; ++i) { > + const char* possibleCountry = countries[i]; > + if (!strcmp(countryUTF8, possibleCountry)) { > + foundCountry = true; > + break; > + } > + } I’d like to see this factored out into a helper function rather than written inline. The name of that little helper function could help make things clear enough that we wouldn’t even need the comment!
Filip Pizlo
Comment 17 2016-04-26 09:17:41 PDT
(In reply to comment #16) > Comment on attachment 277329 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=277329&action=review > > > Source/WebCore/platform/mac/Language.mm:141 > > + // Detect if the country that was requested is one that ICU can handle. > > + const char* const* countries = uloc_getISOCountries(); > > + const char* countryUTF8 = [countryCode UTF8String]; > > + bool foundCountry = false; > > + for (unsigned i = 0; countries[i]; ++i) { > > + const char* possibleCountry = countries[i]; > > + if (!strcmp(countryUTF8, possibleCountry)) { > > + foundCountry = true; > > + break; > > + } > > + } > > I’d like to see this factored out into a helper function rather than written > inline. The name of that little helper function could help make things clear > enough that we wouldn’t even need the comment! Done.
Filip Pizlo
Comment 18 2016-04-26 09:22:25 PDT
Alexey Proskuryakov
Comment 19 2016-04-26 09:52:23 PDT
Comment on attachment 277329 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=277329&action=review > Source/WebCore/ChangeLog:13 > + the default region. That's wrong, since for example it doesn't respect the user's choice (in > + System Preferences) to display dates/calenders/etc according to a different region (like how > + I have my machine set to en-pl right now). This doesn't seem right. "en-pl" is "Polish variant of English", not "English with Polish regional settings for things like date and time".
Ryan Haddad
Comment 20 2016-04-26 10:05:40 PDT
This change causes the following API tests to fail: WKWebView.NavigatorLanguage WebKit1.NavigatorLanguage It looks like the expected results may need to be updated. <https://build.webkit.org/builders/Apple%20El%20Capitan%20Release%20WK2%20%28Tests%29/builds/5452/steps/run-api-tests/logs/stdio>
Filip Pizlo
Comment 21 2016-04-26 10:10:48 PDT
(In reply to comment #19) > Comment on attachment 277329 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=277329&action=review > > > Source/WebCore/ChangeLog:13 > > + the default region. That's wrong, since for example it doesn't respect the user's choice (in > > + System Preferences) to display dates/calenders/etc according to a different region (like how > > + I have my machine set to en-pl right now). > > This doesn't seem right. "en-pl" is "Polish variant of English", not > "English with Polish regional settings for things like date and time". That's not how our ICU interprets it.
Ryan Haddad
Comment 22 2016-04-26 10:17:21 PDT
Reverted r200089 for reason: This change causes API test failures Committed r200093: <http://trac.webkit.org/changeset/200093>
Alexey Proskuryakov
Comment 23 2016-04-26 10:20:39 PDT
We may be confusing locale and language codes when calling into ICU. Not super easy to untangle, given that Apple's locale is more of a dictionary than a simple string code, but ICU can also encode multiple preferences in a string.
Filip Pizlo
Comment 24 2016-04-26 12:27:06 PDT
Created attachment 277405 [details] the patch Fixes the test expectations but also fixes what was probably a real bug: if AppleLanguage specifies a variant, then we might as well fall back to old behavior.
Geoffrey Garen
Comment 25 2016-04-26 12:31:28 PDT
Comment on attachment 277405 [details] the patch r=me
Filip Pizlo
Comment 26 2016-04-26 13:01:23 PDT
Alexey Proskuryakov
Comment 27 2016-04-27 15:39:34 PDT
Comment on attachment 277405 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=277405&action=review > Tools/TestWebKitAPI/Tests/mac/NavigatorLanguage.mm:77 > - @[@"ru", @"ru-ru"], // This does not match other browsers or CFNetwork's Accept-Language, which all use "ru". > + @[@"ru", @"ru-us"], // This does not match other browsers or CFNetwork's Accept-Language, which all use "ru". I still don't understand this patch. This test change looks completely wrong, navigator.language should never return "ru-us". On my machine right here, the primary language is Russian, and the region is U.S. With these settings, navigator.language is ru-ru in shipping Safari. We shouldn't be changing that.
Alexey Proskuryakov
Comment 28 2016-04-27 15:43:52 PDT
navigator.language - "Must return a valid BCP 47 language tag representing either a plausible language or the user’s most preferred language." This is why even the existing "ru-ru" is a bug, it should be just "ru". Regional settings don't go into navigator.language, that's why the new Intl API was necessary in the first place.
Note You need to log in before you can comment on or make changes to this bug.