It doesn't allow for the possibility of en-pl, for example.
Created attachment 277272 [details] the patch I'm still testing this.
Comment on attachment 277272 [details] the patch r=me, assuming that testing pans out. Since this change is only to the part of the code that parses a preference from the system, I can't think of any way to test that without changing the preference on the system. On the other hand, perhaps you could write a test that used window.internals just to call httpStyleLanguageCode with "en-jp" and verify that you get back "en-jp" -- along with other edge cases to do with "_" and so on.
Comment on attachment 277272 [details] the patch "en-pl" I mean!
I’m concerned that this could construct language codes that ICU does not know how to handle.
Maybe we can verify our language-country pair, and fall back to the old behavior if verification fails: (a) search for language and country in uloc_getISOLanguages and uloc_getISOCountries; OR (b) try some simple ICU API like uloc_getVariant and see if it returns an error code.
(In reply to comment #5) > Maybe we can verify our language-country pair, and fall back to the old > behavior if verification fails: > > (a) search for language and country in uloc_getISOLanguages and > uloc_getISOCountries; > > OR > > (b) try some simple ICU API like uloc_getVariant and see if it returns an > error code. Good idea, I'll try that.
(In reply to comment #6) > (In reply to comment #5) > > Maybe we can verify our language-country pair, and fall back to the old > > behavior if verification fails: > > > > (a) search for language and country in uloc_getISOLanguages and > > uloc_getISOCountries; > > > > OR > > > > (b) try some simple ICU API like uloc_getVariant and see if it returns an > > error code. > > Good idea, I'll try that. Using uloc_getVariant() doesn't help. It will gladly accept en-blahus for example, and it will return BLAHUS as the variant.
(In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #5) > > > Maybe we can verify our language-country pair, and fall back to the old > > > behavior if verification fails: > > > > > > (a) search for language and country in uloc_getISOLanguages and > > > uloc_getISOCountries; > > > > > > OR > > > > > > (b) try some simple ICU API like uloc_getVariant and see if it returns an > > > error code. > > > > Good idea, I'll try that. > > Using uloc_getVariant() doesn't help. It will gladly accept en-blahus for > example, and it will return BLAHUS as the variant. Using uloc_getISOCountries works.
Created attachment 277302 [details] the patch This version of the patch defends against the country not being recognized by ICU.
> Using uloc_getVariant() doesn't help. It will gladly accept en-blahus for > example, and it will return BLAHUS as the variant. en-BLAHUS is my favorite en.
/Volumes/Data/EWS/WebKit/Source/WebCore/platform/mac/Language.mm:34:9: fatal error: 'unicode/unumsys.h' file not found #import <unicode/unumsys.h>
It looks like each project (WTF, JSC, WebCore, WebKit) includes its own copy of the icu headers in an icu folder. I'm not sure why we do this -- but it seems like the canonical way to fix this bug is to copy unumsys.h to WebCore/icu/unicode/.
Created attachment 277329 [details] the patch I don't need unumsys.h. Checking if removing that import is enough to make all of the bots happy.
(In reply to comment #12) > It looks like each project (WTF, JSC, WebCore, WebKit) includes its own copy > of the icu headers in an icu folder. I'm not sure why we do this -- but it > seems like the canonical way to fix this bug is to copy unumsys.h to > WebCore/icu/unicode/. We have a set of ICU headers in WebKit only because OS X and iOS include the ICU library, but do not include the headers. However, we should be able to arrange things so we don’t need a separate set in each project. Might involve some build system changes that might be non-obvious, though.
Comment on attachment 277329 [details] the patch r=me
Comment on attachment 277329 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=277329&action=review > Source/WebCore/platform/mac/Language.mm:141 > + // Detect if the country that was requested is one that ICU can handle. > + const char* const* countries = uloc_getISOCountries(); > + const char* countryUTF8 = [countryCode UTF8String]; > + bool foundCountry = false; > + for (unsigned i = 0; countries[i]; ++i) { > + const char* possibleCountry = countries[i]; > + if (!strcmp(countryUTF8, possibleCountry)) { > + foundCountry = true; > + break; > + } > + } I’d like to see this factored out into a helper function rather than written inline. The name of that little helper function could help make things clear enough that we wouldn’t even need the comment!
(In reply to comment #16) > Comment on attachment 277329 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=277329&action=review > > > Source/WebCore/platform/mac/Language.mm:141 > > + // Detect if the country that was requested is one that ICU can handle. > > + const char* const* countries = uloc_getISOCountries(); > > + const char* countryUTF8 = [countryCode UTF8String]; > > + bool foundCountry = false; > > + for (unsigned i = 0; countries[i]; ++i) { > > + const char* possibleCountry = countries[i]; > > + if (!strcmp(countryUTF8, possibleCountry)) { > > + foundCountry = true; > > + break; > > + } > > + } > > I’d like to see this factored out into a helper function rather than written > inline. The name of that little helper function could help make things clear > enough that we wouldn’t even need the comment! Done.
Landed in http://trac.webkit.org/changeset/200089
Comment on attachment 277329 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=277329&action=review > Source/WebCore/ChangeLog:13 > + the default region. That's wrong, since for example it doesn't respect the user's choice (in > + System Preferences) to display dates/calenders/etc according to a different region (like how > + I have my machine set to en-pl right now). This doesn't seem right. "en-pl" is "Polish variant of English", not "English with Polish regional settings for things like date and time".
This change causes the following API tests to fail: WKWebView.NavigatorLanguage WebKit1.NavigatorLanguage It looks like the expected results may need to be updated. <https://build.webkit.org/builders/Apple%20El%20Capitan%20Release%20WK2%20%28Tests%29/builds/5452/steps/run-api-tests/logs/stdio>
(In reply to comment #19) > Comment on attachment 277329 [details] > the patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=277329&action=review > > > Source/WebCore/ChangeLog:13 > > + the default region. That's wrong, since for example it doesn't respect the user's choice (in > > + System Preferences) to display dates/calenders/etc according to a different region (like how > > + I have my machine set to en-pl right now). > > This doesn't seem right. "en-pl" is "Polish variant of English", not > "English with Polish regional settings for things like date and time". That's not how our ICU interprets it.
Reverted r200089 for reason: This change causes API test failures Committed r200093: <http://trac.webkit.org/changeset/200093>
We may be confusing locale and language codes when calling into ICU. Not super easy to untangle, given that Apple's locale is more of a dictionary than a simple string code, but ICU can also encode multiple preferences in a string.
Created attachment 277405 [details] the patch Fixes the test expectations but also fixes what was probably a real bug: if AppleLanguage specifies a variant, then we might as well fall back to old behavior.
Comment on attachment 277405 [details] the patch r=me
Landed in http://trac.webkit.org/changeset/200105
Comment on attachment 277405 [details] the patch View in context: https://bugs.webkit.org/attachment.cgi?id=277405&action=review > Tools/TestWebKitAPI/Tests/mac/NavigatorLanguage.mm:77 > - @[@"ru", @"ru-ru"], // This does not match other browsers or CFNetwork's Accept-Language, which all use "ru". > + @[@"ru", @"ru-us"], // This does not match other browsers or CFNetwork's Accept-Language, which all use "ru". I still don't understand this patch. This test change looks completely wrong, navigator.language should never return "ru-us". On my machine right here, the primary language is Russian, and the region is U.S. With these settings, navigator.language is ru-ru in shipping Safari. We shouldn't be changing that.
navigator.language - "Must return a valid BCP 47 language tag representing either a plausible language or the user’s most preferred language." This is why even the existing "ru-ru" is a bug, it should be just "ru". Regional settings don't go into navigator.language, that's why the new Intl API was necessary in the first place.