Bug 3510

Summary: Multiple issues with Accept-Language
Product: WebKit Reporter: Alexey Proskuryakov <ap>
Component: New BugsAssignee: Dave Hyatt <hyatt>
Status: RESOLVED WONTFIX    
Severity: Normal CC: dwood, grahamperrin, ian, nickshanks, xn--mlform-iua
Priority: P2 Keywords: InRadar
Version: 312.x   
Hardware: Mac   
OS: OS X 10.3   
URL: http://astro.nickshanks.com/library/extrasolar
Attachments:
Description Flags
Proposed patch darin: review-

Alexey Proskuryakov
Reported 2005-06-13 13:05:21 PDT
If the system primary language (as set in International control panel) is Russian, then only "ru-ru" is sent in Accept-Language. Steps to reproduce: 1. Set the system primary language to Russian (on my machine, the exact order is Russian, English, Japanese, Chinese Traditional, Chinese Simplified). 2. In Safari, go to http://astro.nickshanks.com/library/extrasolar.en Results: ---------------------------- Not Acceptable An appropriate representation of the requested resource /library/extrasolar.en could not be found on this server. Available variants: extrasolar.en.iso8859-1.html , type text/html, language en, charset iso-8859-1 ---------------------------- Expected results: an English version should be presented. Regression: worked OK in Safari 1.2 (I don't say the Accept-Language header was 100% OK, but it has at least allowed English). Discussion: I think that all system languages should be sent in Accept-Language header. See also: rdar://4076004 - "ru" should be sent instead of "ru-ru"
Attachments
Proposed patch (15.07 KB, patch)
2005-07-13 10:27 PDT, Alexey Proskuryakov
darin: review-
Nicholas Shanks
Comment 1 2005-06-16 08:19:34 PDT
The cited example is actually a different bug, that no Accept-Content header gets sent, it's not to do with the Accept-Languages header. For a better description, go to http://web.nickshanks.com/safari/accept-language/ That page tests what Safari sends for both the Accept-Language and Accept-Charset headers, and reports what is wrong. Play around with your system languages and different browsers and experiment to see what you get. Safari 2.0 (412) for me always now sends en-us despite not having American English in my language preferences (defaults read NSGlobalDomain AppleLanguages = "en-GB", fr, de, ru, hu, cy, gd, kw) Regarding rdar://4076004 - "ru" should be sent instead of "ru-ru", i don't see why this should be so. A HTTP server should respond to a request header of "Accept-Language: ru-RU" with a .ru document if there's no .ru-RU one available, specifying "Accept-Language: ru-RU, ru;q=0.8" for example would be redundant, however specifying "Accept-Language: ru-UK, ru-RU;q=0.9, uk;q=0.8, ru;q=0.7" for example would be a valid and useful example of specifying 'ru' on it's own, although there's no harm at all in doing so anyway. I hope this helps :-)
Alexey Proskuryakov
Comment 2 2005-06-16 12:15:09 PDT
(In reply to comment #1) Nick, I presume that you are talking about Accept-Charset, not Accept-Content (which I haven't heard of before)? From your page, it appears that you have been tracking this issue for a long time, just like myself :). At least, I used to also see the behavior with unconditionally appended Japanese, but no longer see it under 1.3 or 2.0. However, I still think that my original interpretation is correct. Hopefully, here is a proof: 1) In Firefox, modify the languages list to "ru" alone (removing "en" and "en-us") 2) The same error as for Safari is displayed when accessing the page. 3) The Accept-Charset sent is "windows-1251,utf-8;q=0.7,*;q=0.7"; Accept-Language sent is "ru". 4) So, Accept-Charset doesn't help if "en" is not sent in Accept-Language To double-check, I have sent a manually crafted HTTP request: GET /library/extrasolar.en HTTP/1.0 Host: astro.nickshanks.com Accept-Language: ru,en I got the correct document (Content-Location: extrasolar.en.iso8859-1.html), even though I didn't send any Accept-Charset header. As for rdar://4076004 ("ru-ru" vs "ru"), I have two reasons to ask for this. First, all other browsers I have tested with (MSIE, Mozilla, Firefox) send "ru", so even though your description is of course correct, real life testing of the fallback usually isn't performed. Second, Outlook Web Access (completely incorrectly) reencodes content based on Accept-Language, and only allows Cyrillic if the first language is "ru", not "ru-ru" (I do not have complete information about versions, maybe that's already fixed, but my company's server still has this problem). I do not know if this Radar issue has been already discussed within Apple, so I am not sure if we need to move the discussion here.
Alexey Proskuryakov
Comment 3 2005-06-28 12:07:23 PDT
This turned out to be an issue in NSURLConnection (WebKit doesn't set an Accept-Language haeder, so a default is used). I have started working on a fix for this (also rdar://4076004). Once a fix is available, it will be decided if it needs to go to WebKit, or only NSURLConnection should be fixed.
Alexey Proskuryakov
Comment 4 2005-07-13 10:27:31 PDT
Created attachment 2944 [details] Proposed patch With this patch, WebKit will specify an Accept-Language header, instead of relying on NSURLConnection's default one. The goals were to send a complete list of user's preferred languages and to send correct language codes (the same that other browsers do, e.g. "ru", not "ru-ru"). Some compatibility notes: 1. The maximum Accept-Language length is limited (<http://www.ireland.travel.ie> didn't work if this header was longer than 255) 2. No weights are assigned to languages (Netscape Directory Gateway breaks on these; notably, it breaks with Mozilla's default "en-us,en;q=0.50"). Verified at <http://gun.teipir.gr/ds/csearch>. Looks like a correct order is enough to prioritize the languages. 3. eBay does not allow access to certain items if German, Italian, French or Austrian is present in the Accept-Language header (I couldn't verify this myself, but got a confirmation from eBay support). The only solution I found was to special-case eBay - only the first language is sent to them. 4. To maximize compatibility, together with correct language codes, legacy ones are sent for some languages (e.g., zh-Hans is always accompanied by zh-cn). 5. As a side effect of this patch, the DOM 0 navigator.language property will return a different code for some languages than it did in previous versions of Safari; and user-agent will also be different for such languages (again, ru instead of ru-ru). I think it's good for consistency. Also, this seems to have minimal impact: navigator.language is currently broken in Firefox, and noone seems to care enough to fix it (https://bugzilla.mozilla.org/show_bug.cgi?id=285267). 6. With this patch, WebNSUserDefaultsExtras.m processes AppleLanguages list according to "best practices" described by Apple (see the source for a reference), which should fix a few minor issues.
Maciej Stachowiak
Comment 5 2005-07-24 17:17:32 PDT
Someone should look at this. If we changed this we'd probably want to do it at the Foundation level, not just for WebKit.
Nicholas Shanks
Comment 6 2005-08-08 07:11:18 PDT
There are several things I don't like about the proposed patch: 1) Not appending quality values because of a bug in Netscape Directory Server is wrong. It's their bug, it's their problem, we should follow the rules. Since all prior versions of Safari, Firefox (and probably IE from 7.0 onwards) send q-values, this bug is highly likely to be fixed in short time. When specifying all languages with the same q-value (1.0 in this case), Apache falls back to the server's preferred language order, which may be completely different from the user's preferred order (e.g. if the user has "AppleLanguages = {ru, de, en, fr}" and the server has the Apache default "LanguagePriority en da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv tw" set, then english will get served if available, followed by french, german and russian last of all, not what the user wanted! Furthermore, appending an asterisk to the end of the list (with a q-value of 1.0) is the same as supplying an Accept-Language header just consisting of the asterisk alone. The order of languages in the list is irrelevant: "en;q=0.2,fr;q=0.1,de;q=0.5,ru" is the same as "ru;q=1.0,de;q=0.9,en;q=0.5,fr;q=0.1". I recommend retaining use of q-values, which will also avoid potential regressions with some sites, and appending an asterisk at the end with a q-value of 0.01, allowing any language to match and be returned (and thus avoiding ever getting a 406 "Not Acceptable" error). 2) If AppleLanguages has a length of zero, this patch sets the default to @"en" - I suggest making this an asterisk instead (i.e. just @"*"), avoiding 406 errors. 3) Your eBay matching fails for URLs such as http://ebay.de/ and http://ebay.fr/ due to the leading full stop in the match string. I suggest first quickly checking for the string "ebay" in the host, and if present, evaluate against the regex "^(.*\.)?ebay\.[a-z]{2,3}(\.[a-z]{2})?$". Checking for "ebay" first will avoid a performance hit for all other sites. You should probably specify in the comment too that the reason they do this is because France, Germany and Austria have laws against the sale of Nazi memorabilia, so that future readers will know why the code is there And in direct response to comment #2, if Outlook Web Access can't display Cyrillic, without having to jump through hoops, then don't use it! In fact, I would recommend avoiding Microsoft and AOL products altogether, and you wouldn't have any of these problems. Thinking about the "ru" versus "ru-RU" problem though, the International pref pane does not specify a locale for "РуÑ?Ñ?кий", so perhaps sending "ru" is more correct after all. And yes, I meant Accept-Charset in comment #1, "Accept-Content" was just the result of my brain melting :-) *ponders an "Accept-Content: No" header*
Alexey Proskuryakov
Comment 7 2005-08-08 11:14:51 PDT
Nicholas, thank you for you insightful reply. 1) Are you sure about the default LanguagePriority? I couldn't find it documented anywhere. Also, Apache as shipped with OS X Tiger doesn't have the behavior you describe: a request to http:// 127.0.0.1 gives me a Russian page: GET / HTTP/1.0 Accept-Language: ru,en,fr HTTP/1.1 200 OK Date: Mon, 08 Aug 2005 17:52:39 GMT Server: Apache/1.3.33 (Darwin) Content-Location: index.html.ru.cp866 <...> Without an explicit LanguagePriority, Apache is documented to honor the order of the languages in Accept-Language, and this is the behavior that I observe in reality. So far, I haven't seen any problems occurring because of missing weights. As for appending an asterisk - I have nothing against it; I just felt reluctant to make ad hoc changes, not supported by real life problems (especially because Firefox doesn't append that asterisk). 2) Here, I have just preserved the existing behavior (since I haven't heard of any problems with it). 3) Doesn't <http://ebay.de/> just redirect to <http://www.ebay.de/>? Improving the comment in acceptLanguageForURL is good idea, thank you. As for Outlook Web Access - from my point of view, this one is the most substantial problem with Safari's Accept-Language (perhaps with Safari in general), worth being fixed ASAP in a software update.
Nicholas Shanks
Comment 8 2005-08-08 16:57:20 PDT
Replying to comment #7: > Apache as shipped with OS X Tiger doesn't have the behavior you describe Hmm, well my /etc/httpd/httpd.conf file has been dragged along with me since DP4 days and modified quite a bit, though I believe the one I have now I re-modified after a clean Jaguar install, but nonetheless I am pretty sure these lines have remained untouched from what was installed: <IfModule mod_mime.c> ... # in case of a tie during content negotiation. # # Just list the languages in decreasing order of preference. We have # more or less alphabetized them here. You probably want to change this. # <IfModule mod_negotiation.c> LanguagePriority en da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv tw </IfModule> </IfModule> This is where I copied the above from. It may have changed in more recent versions of the OS. But whatever the default actually is, or whatever it's been changed to, it doesn't really matter because 99.9% of the time it will differ from the user's language preference order. If Apache receives "Accept- Languages: ru, en" and has "LanguagePriority en ru" set, then, following the rules, it should serve the english page. If you're seeing differently then this is a bug in Apache (I presume you are using 1.3.x as ships with OS X, and not Apache 2.x ?). I could apply your patch, change my languages, set up another virtual host on my machine and test all this, but not tonight as I'm too tired :-) Regarding point 2, isn't this whole bug about you receiving a 406 error from a page on my website because en didn't get sent? Sending an asterisk would solve that (as would removing the ".en" from the filename of the file you mention, but that only masks the bug in Safari). And on point 3: Yes, ebay.de does currently redirect, but I can't guarantee that's going to be the case for every eBay-owned domain now and in the future, I was just using de and fr as simple examples. The regex method is a bit more complicated but a bit more robust. I would like to hear what opinions others have on this simple vs. comprehensive view. Also, although the regex would still catch false positives like "ebay.sux.fr" and "i.watch.ebay.on.tv", I don't think sending just the first language to these few sites would matter all that much :-) A question for you Alexey: How likely is it that Outlook Web Access and Netscape Directory Gateway will have their bugs fixed, and you or whomever controls them on your behalf will get around to installing the updates? It seems to me that basically you're requesting and implementing a 'degraded' behaviour in order to work around bugs in these two pieces of software. Is this a fair summary? I tend to be of the opinion that we should do what's right, and let other people worry about fixing their own bugs. The more people complain to THEM that their software is broken, the higher priority they'll give to fixing it, and I look forward to WinIE7's release later this summer as a much larger catalyst for this than Firefox and Safari have been so far. Of course, with your own patched version of Safari, you don't encounter these bugs anymore anyway :-)
Alexey Proskuryakov
Comment 9 2005-08-08 21:49:10 PDT
> LanguagePriority en da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv tw Interesting - In Tiger, it's the same, but the experiment I quoted above shows that Apache (1.3) still respects the order of languages in Accept-Langauage. Does yours do respect it (you can just use telnet instead of appying my patch to test this)? Regarding point 2, empty AppleLanguages is a very rare case (and a sign of a horribly broken installation). Since this patch is unlikely to be landed as is, I'd leave to the person modifying it to decide on this - both options look fine. As for fixing bugs in OWA and NDS - please note that for OWA, I'm not proposing degraded performance at all (sending 'ru' is logically more correct than sending 'ru-ru'). As for NDS, that was an example given by Darin Adler, and I tried to come out with the most compatible format. So far, the lack of weights hasn't resulted in any problems (and it shouldn't, because other clients are also known to send languages without weights).
Nicholas Shanks
Comment 10 2005-08-08 23:57:04 PDT
Perhaps there is a directive I've not seen, or consensus among http server implementors, that to support legacy behaviour where no quality values are given at all, then the list is to be treated as a descending priority list. Thus behaviour such as "en-gb, en-ca, en-au, en;q=0.5" would work (i.e. the first three would all be equal weight, and en-za, en-in or en-us would only be selected rarely) whereas "en, de, fr, ru, *", which is passed with no values at all, would be treated as if it were "en;q=1.0, de;q=0.8, fr;q=0.6, ru;q=0.4, *;q=0.2". If this latter case is indeed what seems to be happening, then I would certainly recommend adding the asterisk to all submissions. I believe this should be fixed upstream from WebKit, in the CF APIs. Whoever is to implement that should also add an "Accept-Charset: *" header too, since MacOS X can handle every charset.
Eric Seidel (no email)
Comment 11 2005-09-17 14:12:34 PDT
Comment on attachment 2944 [details] Proposed patch I have assigned this to adele, as according to the radar assignment guidelines she is the closest to dealing with "loading". I've looked over the patch and given my comments to Alexey (via IRC). Even if this should go into NSURL long term, I don't think landing it in WebKit for now is a bad thing. The fact that this has been sitting in review for OVER 2 MONTHS is unexceptable.
Maciej Stachowiak
Comment 12 2005-09-20 02:52:36 PDT
However, given Nicholas's comments below I this patch may be just plain wrong. This patch needs testing, review, and probably revision, not just immediate landing. And even then I am not sure we should do this just in WebKit instead of submitting a bug report and fix to Foundation. It is not usually our policy to work around NSURLConnection bugs in WebKit, and the NSURLConnection developers are pretty responsive.
Nicholas Shanks
Comment 13 2005-09-20 04:54:23 PDT
(In reply to comment #11) > The fact that this has been sitting in review for OVER 2 MONTHS is unexceptable. Ahh, that's nothing :) I have CSS patches on here awaiting review since mid-June. I have open bugs in radar logged against Mac OS 8.1 (and still present in Carbon).
Alexey Proskuryakov
Comment 14 2005-09-20 07:18:15 PDT
(In reply to comment #12) > However, given Nicholas's comments below I this patch may be just plain wrong. What particular comment are you referring to? I think I have answered them all, and there are only a few stylistic comments, and a question of whether an asterisk should be added (other browsers do not do that, so I do not see why WebKit should). > And even then I am not sure we should do this just in WebKit instead of submitting a bug report and fix to Foundation. No problem with that - rdar://4076004 talks about the one problem that causes most damage. For obvious reasons, I couldn't submit a fix there.
Alexey Proskuryakov
Comment 15 2005-09-20 13:15:02 PDT
BTW, some of this code is only needed on 10.3 and can be replaced with a single call to CFLocaleCreateCanonicalLanguageIdentifierFromString() now (see <http://developer.apple.com/ documentation/MacOSX/Conceptual/BPInternational/Articles/ChoosingLocalizations.html>). When I was making this patch, I still had hope that at least parts of it may go in a Safari 1.3 update...
Alexey Proskuryakov
Comment 16 2005-09-27 13:40:00 PDT
*** Bug 5152 has been marked as a duplicate of this bug. ***
Eric Seidel (no email)
Comment 17 2005-09-28 16:28:43 PDT
Comment on attachment 2944 [details] Proposed patch We decided this is better w/ Darin.
Alexey Proskuryakov
Comment 18 2005-11-03 13:52:19 PST
A "real life" site that is affected by Safari only sending the first preferred language: <http://www.w3.org/ TR/xhtml-media-types/> (one needs to prefer something else than English to be affected :) ).
Darin Adler
Comment 19 2005-12-03 11:23:58 PST
Comment on attachment 2944 [details] Proposed patch I'm conflicted about this patch. I'd really like to put changes like this into NSURL, rather than into WebKit. But some of these fixes are needed.
Alexey Proskuryakov
Comment 20 2005-12-04 03:07:30 PST
Just for future reference: a rather detailed discussion of real-life language negotiation is available here: <http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html>.
Alexey Proskuryakov
Comment 21 2005-12-20 05:54:56 PST
Another "real life" server that suffers from this issue: http://fluxiom.com/
Darin Adler
Comment 22 2006-01-15 07:57:52 PST
Comment on attachment 2944 [details] Proposed patch Here's my opinion at this point: We should get these fixes done in NSURLConnection first -- please file a bug in http://bugreport.apple.com about it. Once we get a response and find out what's going to happen at the lower level, we could consider working around this in WebKit by adding code like this.
Alexey Proskuryakov
Comment 23 2006-01-16 12:07:08 PST
Nicholas Shanks
Comment 24 2006-05-20 05:33:25 PDT
Whomever is to fix this should read bug #5152 as that concerns a related but slightly different issue (I don't understand why it is marked as a duplicate of this one, it's not the same problem). I created rdar://4556363 too, since this hasn't improved as of 10.4.6 and someone in the Foundation team needs goading with a hot poker. :-D
Alexey Proskuryakov
Comment 25 2006-06-27 21:57:57 PDT
*** Bug 9626 has been marked as a duplicate of this bug. ***
Leif Halvard Silli
Comment 26 2011-08-21 09:50:16 PDT
(In reply to comment #22) > (From update of attachment 2944 [details]) > Here's my opinion at this point: We should get these fixes done in > NSURLConnection first -- please file a bug in http://bugreport.apple.com about > it. > > Once we get a response and find out what's going to happen at the lower level, > we could consider working around this in WebKit by adding code like this. > 6 years on, and the bug has still not been fixed! Has a NSURLConnection bug been filed? I stumbled on real life effects of this bug when clicking on the link "Tools" on this page: http://www.w3.org/standards/webdesign/i18n I see this bug in Safari for Windows and Safari for Mac. However, the bug is not available in Chrome - may be they fixed it "for their own money" ..
Alexey Proskuryakov
Comment 27 2011-08-22 11:33:03 PDT
I no longer think that we should send an Accept-Language string with all the languages configured in Mac OS X preferences. The privacy implications (fingerprinting) and the generally failed state of HTTP based content negotiation make it not worth changing. Please file separate bugs for specific issues via <http://bugreport.apple.com>, as Accept-Language is sent by lower level network library, not by WebKit, and there is no pressing reason to add workarounds in WebKit. In particular, <http://www.w3.org/standards/webdesign/i18n> is certainly an evangelism issue - the page shouldn't be preventing access for people who didn't send "en" in Accept-Language.
Leif Halvard Silli
Comment 28 2011-08-23 16:33:49 PDT
(In reply to comment #27) FIRSTLY - the Surfin' Safari blog seems to assume that Webkit implement Accept-Language 100% percent: ]] The locale has been removed. Web authors who want to know what languages a browser supports should use the HTTP Accept-Language header instead, which can supply multiple locales. [[ <http://www.webkit.org/blog/1580/user-agent-string-changes-on-webkit-trunk/> SECONDLY: I read the gist of your decision to be that this subject is for Apple to decide - as it is they who must take responsibility for the potential for fingerprinting as it is their OS which eventually is responsible or those Accept-Header:s. Additionally, since content-negotiation is in a kind of failed state, as you see it, it is also not possible to justify a rushed fix that circumvents Apple.
Note You need to log in before you can comment on or make changes to this bug.