Bug 3510 - Multiple issues with Accept-Language
: Multiple issues with Accept-Language
Status: RESOLVED WONTFIX
: WebKit
New Bugs
: 312.x
: Macintosh Mac OS X 10.3
: P2 Normal
Assigned To:
: http://astro.nickshanks.com/library/e...
: InRadar, ReviewedForRadar
:
:
  Show dependency treegraph
 
Reported: 2005-06-13 13:05 PST by
Modified: 2011-08-23 16:33 PST (History)


Attachments
Proposed patch (15.07 KB, patch)
2005-07-13 10:27 PST, Alexey Proskuryakov
darin: review-
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-06-13 13:05:21 PST
If the system primary language (as set in International control panel) is Russian, then only "ru-ru" is 
sent in Accept-Language.

Steps to reproduce:
1. Set the system primary language to Russian (on my machine, the exact order is Russian, English, 
Japanese, Chinese Traditional, Chinese Simplified).
2. In Safari, go to http://astro.nickshanks.com/library/extrasolar.en

Results:
----------------------------
Not Acceptable

An appropriate representation of the requested resource /library/extrasolar.en could not be found on 
this server.
Available variants:

extrasolar.en.iso8859-1.html , type text/html, language en, charset iso-8859-1
----------------------------

Expected results: an English version should be presented.

Regression: worked OK in Safari 1.2 (I don't say the Accept-Language header was 100% OK, but it has 
at least allowed English).

Discussion: I think that all system languages should be sent in Accept-Language header.

See also: rdar://4076004 - "ru" should be sent instead of "ru-ru"
------- Comment #1 From 2005-06-16 08:19:34 PST -------
The cited example is actually a different bug, that no Accept-Content header gets sent, it's not to do 
with the Accept-Languages header.

For a better description, go to http://web.nickshanks.com/safari/accept-language/

That page tests what Safari sends for both the Accept-Language and Accept-Charset headers, and 
reports what is wrong. Play around with your system languages and different browsers and experiment 
to see what you get. Safari 2.0 (412) for me always now sends en-us despite not having American 
English in my language preferences (defaults read NSGlobalDomain AppleLanguages = "en-GB", fr, de, 
ru, hu, cy, gd, kw)

Regarding rdar://4076004 - "ru" should be sent instead of "ru-ru", i don't see why this should be so. A 
HTTP server should respond to a request header of "Accept-Language: ru-RU" with a .ru document if 
there's no .ru-RU one available, specifying "Accept-Language: ru-RU, ru;q=0.8" for example would be 
redundant, however specifying "Accept-Language: ru-UK, ru-RU;q=0.9, uk;q=0.8, ru;q=0.7" for 
example would be a valid and useful example of specifying 'ru' on it's own, although there's no harm at 
all in doing so anyway.

I hope this helps :-)
------- Comment #2 From 2005-06-16 12:15:09 PST -------
(In reply to comment #1)

  Nick, I presume that you are talking about Accept-Charset, not Accept-Content (which I haven't heard 
of before)? From your page, it appears that you have been tracking this issue for a long time, just like 
myself :). At least, I used to also see the behavior with unconditionally appended Japanese, but no 
longer see it under 1.3 or 2.0.

  However, I still think that my original interpretation is correct. Hopefully, here is a proof:
1) In Firefox, modify the languages list to "ru" alone (removing "en" and "en-us")
2) The same error as for Safari is displayed when accessing the page.
3) The Accept-Charset sent is "windows-1251,utf-8;q=0.7,*;q=0.7"; Accept-Language sent is "ru".
4) So, Accept-Charset doesn't help if "en" is not sent in  Accept-Language

  To double-check, I have sent a manually crafted HTTP request:
GET /library/extrasolar.en HTTP/1.0
Host: astro.nickshanks.com
Accept-Language: ru,en

  I got the correct document (Content-Location: extrasolar.en.iso8859-1.html), even though I didn't 
send any Accept-Charset header.

  As for rdar://4076004 ("ru-ru" vs "ru"), I have two reasons to ask for this. First, all other browsers I 
have tested with (MSIE, Mozilla, Firefox) send "ru", so even though your description is of course correct, 
real life testing of the fallback usually isn't performed. Second, Outlook Web Access (completely 
incorrectly) reencodes content based on Accept-Language, and only allows Cyrillic if the first language 
is "ru", not "ru-ru" (I do not have complete information about versions, maybe that's already fixed, but 
my company's server still has this problem). I do not know if this Radar issue has been already 
discussed within Apple, so I am not sure if we need to move the discussion here.
------- Comment #3 From 2005-06-28 12:07:23 PST -------
This turned out to be an issue in NSURLConnection (WebKit doesn't set an Accept-Language haeder, so a 
default is used).

I have started working on a fix for this (also rdar://4076004). Once a fix is available, it will be decided if it 
needs to go to WebKit, or only NSURLConnection should be fixed.
------- Comment #4 From 2005-07-13 10:27:31 PST -------
Created an attachment (id=2944) [details]
Proposed patch

With this patch, WebKit will specify an Accept-Language header, instead of
relying on NSURLConnection's default one. The goals were to send a complete
list of user's preferred languages and to send correct language codes (the same
that other browsers do, e.g. "ru", not "ru-ru").

Some compatibility notes:

1. The maximum Accept-Language length is limited
(<http://www.ireland.travel.ie> didn't work if this header was longer than 255)

2. No weights are assigned to languages (Netscape Directory Gateway breaks on
these; notably, it breaks with Mozilla's default "en-us,en;q=0.50"). Verified
at <http://gun.teipir.gr/ds/csearch>. Looks like a correct order is enough to
prioritize the languages.
3. eBay does not allow access to certain items if German, Italian, French or
Austrian is present in the Accept-Language header (I couldn't verify this
myself, but got a confirmation from eBay support). The only solution I found
was to special-case eBay - only the first language is sent to them.
4. To maximize compatibility, together with correct language codes, legacy ones
are sent for some languages (e.g., zh-Hans is always accompanied by zh-cn).
5. As a side effect of this patch, the DOM 0 navigator.language property will
return a different code for some languages than it did in previous versions of
Safari; and user-agent will also be different for such languages (again, ru
instead of ru-ru). I think it's good for consistency. Also, this seems to have
minimal impact: navigator.language is currently broken in Firefox, and noone
seems to care enough to fix it
(https://bugzilla.mozilla.org/show_bug.cgi?id=285267).
6. With this patch, WebNSUserDefaultsExtras.m processes AppleLanguages list
according to "best practices" described by Apple (see the source for a
reference), which should fix a few minor issues.
------- Comment #5 From 2005-07-24 17:17:32 PST -------
Someone should look at this. If we changed this we'd probably want to do it at the Foundation level, not 
just for WebKit.
------- Comment #6 From 2005-08-08 07:11:18 PST -------
There are several things I don't like about the proposed patch:

1) Not appending quality values because of a bug in Netscape Directory Server is wrong. It's their bug, 
it's their problem, we should follow the rules. Since all prior versions of Safari, Firefox (and probably IE 
from 7.0 onwards) send q-values, this bug is highly likely to be fixed in short time.
When specifying all languages with the same q-value (1.0 in this case), Apache falls back to the server's 
preferred language order, which may be completely different from the user's preferred order (e.g. if the 
user has "AppleLanguages = {ru, de, en, fr}" and the server has the Apache default "LanguagePriority en 
da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv tw" set, then english will get served if available, 
followed by french, german and russian last of all, not what the user wanted! Furthermore, appending 
an asterisk to the end of the list (with a q-value of 1.0) is the same as supplying an Accept-Language 
header just consisting of the asterisk alone. The order of languages in the list is irrelevant: 
"en;q=0.2,fr;q=0.1,de;q=0.5,ru" is the same as "ru;q=1.0,de;q=0.9,en;q=0.5,fr;q=0.1".
I recommend retaining use of q-values, which will also avoid potential regressions with some sites, and 
appending an asterisk at the end with a q-value of 0.01, allowing any language to match and be 
returned (and thus avoiding ever getting a 406 "Not Acceptable" error).

2) If AppleLanguages has a length of zero, this patch sets the default to @"en" - I suggest making this 
an asterisk instead (i.e. just @"*"), avoiding 406 errors.

3) Your eBay matching fails for URLs such as http://ebay.de/ and http://ebay.fr/ due to the leading full 
stop in the match string. I suggest first quickly checking for the string "ebay" in the host, and if 
present, evaluate against the regex "^(.*\.)?ebay\.[a-z]{2,3}(\.[a-z]{2})?$". Checking for "ebay" first will 
avoid a performance hit for all other sites. You should probably specify in the comment too that the 
reason they do this is because France, Germany and Austria have laws against the sale of Nazi 
memorabilia, so that future readers will know why the code is there

And in direct response to comment #2, if Outlook Web Access can't display Cyrillic, without having to 
jump through hoops, then don't use it! In fact, I would recommend avoiding Microsoft and AOL 
products altogether, and you wouldn't have any of these problems.

Thinking about the "ru" versus "ru-RU" problem though, the International pref pane does not specify a 
locale for "РуÑ?Ñ?кий", so perhaps sending "ru" is more correct after all.

And yes, I meant Accept-Charset in comment #1, "Accept-Content" was just the result of my brain 
melting :-)

*ponders an "Accept-Content: No" header*
------- Comment #7 From 2005-08-08 11:14:51 PST -------
Nicholas, thank you for you insightful reply.

1) Are you sure about the default LanguagePriority? I couldn't find it documented anywhere. Also, 
Apache as shipped with OS X Tiger doesn't have the behavior you describe: a request to http://
127.0.0.1 gives me a Russian page:

GET / HTTP/1.0
Accept-Language: ru,en,fr

HTTP/1.1 200 OK
Date: Mon, 08 Aug 2005 17:52:39 GMT
Server: Apache/1.3.33 (Darwin)
Content-Location: index.html.ru.cp866
<...>

  Without an explicit LanguagePriority, Apache is documented to honor the order of the languages in 
Accept-Language, and this is the behavior that I observe in reality. So far, I haven't seen any problems 
occurring because of missing weights.

  As for appending an asterisk - I have nothing against it; I just felt reluctant to make ad hoc changes, 
not supported by real life problems (especially because Firefox doesn't append that asterisk).

2) Here, I have just preserved the existing behavior (since I haven't heard of any problems with it).

3) Doesn't <http://ebay.de/> just redirect to <http://www.ebay.de/>? Improving the comment in 
acceptLanguageForURL is good idea, thank you.

  As for Outlook Web Access - from my point of view, this one is the most substantial problem with 
Safari's Accept-Language (perhaps with Safari in general), worth being fixed ASAP in a software update.
------- Comment #8 From 2005-08-08 16:57:20 PST -------
Replying to comment #7:
> Apache as shipped with OS X Tiger doesn't have the behavior you describe

Hmm, well my /etc/httpd/httpd.conf file has been dragged along with me since DP4 days and modified 
quite a bit, though I believe the one I have now I re-modified after a clean Jaguar install, but 
nonetheless I am pretty sure these lines have remained untouched from what was installed:

 <IfModule mod_mime.c>
     
     ...
     
     # in case of a tie during content negotiation.
     #
     # Just list the languages in decreasing order of preference. We have
     # more or less alphabetized them here. You probably want to change this.
     #
     <IfModule mod_negotiation.c>
           LanguagePriority en da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv tw
     </IfModule> 
 </IfModule>

This is where I copied the above from. It may have changed in more recent versions of the OS. But 
whatever the default actually is, or whatever it's been changed to, it doesn't really matter because 99.9% 
of the time it will differ from the user's language preference order. If Apache receives "Accept-
Languages: ru, en" and has "LanguagePriority en ru" set, then, following the rules, it should serve the 
english page. If you're seeing differently then this is a bug in Apache (I presume you are using 1.3.x as 
ships with OS X, and not Apache 2.x ?). I could apply your patch, change my languages, set up another 
virtual host on my machine and test all this, but not tonight as I'm too tired :-)

Regarding point 2, isn't this whole bug about you receiving a 406 error from a page on my website 
because en didn't get sent? Sending an asterisk would solve that (as would removing the ".en" from the 
filename of the file you mention, but that only masks the bug in Safari).

And on point 3: Yes, ebay.de does currently redirect, but I can't guarantee that's going to be the case 
for every eBay-owned domain now and in the future, I was just using de and fr as simple examples. The 
regex method is a bit more complicated but a bit more robust. I would like to hear what opinions 
others have on this simple vs. comprehensive view. Also, although the regex would still catch false 
positives like "ebay.sux.fr" and "i.watch.ebay.on.tv", I don't think sending just the first language to 
these few sites would matter all that much :-)

A question for you Alexey: How likely is it that Outlook Web Access and Netscape Directory Gateway will 
have their bugs fixed, and you or whomever controls them on your behalf will get around to installing 
the updates? It seems to me that basically you're requesting and implementing a 'degraded' behaviour 
in order to work around bugs in these two pieces of software. Is this a fair summary?
I tend to be of the opinion that we should do what's right, and let other people worry about fixing their 
own bugs. The more people complain to THEM that their software is broken, the higher priority they'll 
give to fixing it, and I look forward to WinIE7's release later this summer as a much larger catalyst for 
this than Firefox and Safari have been so far. Of course, with your own patched version of Safari, you 
don't encounter these bugs anymore anyway :-)
------- Comment #9 From 2005-08-08 21:49:10 PST -------
> LanguagePriority en da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv tw

Interesting - In Tiger, it's the same, but the experiment I quoted above shows that Apache (1.3) still 
respects the order of languages in Accept-Langauage. Does yours do respect it (you can just use telnet 
instead of appying my patch to test this)?

Regarding point 2, empty AppleLanguages is a very rare case (and a sign of a horribly broken 
installation). Since this patch is unlikely to be landed as is, I'd leave to the person modifying it to decide 
on this - both options look fine.

As for fixing bugs in OWA and NDS - please note that for OWA, I'm not proposing degraded 
performance at all (sending 'ru' is logically more correct than sending 'ru-ru'). As for NDS, that was an 
example given by Darin Adler, and I tried to come out with the most compatible format. So far, the lack 
of weights hasn't resulted in any problems (and it shouldn't, because other clients are also known to 
send languages without weights).
------- Comment #10 From 2005-08-08 23:57:04 PST -------
Perhaps there is a directive I've not seen, or consensus among http server implementors, that to support 
legacy behaviour where no quality values are given at all, then the list is to be treated as a descending 
priority list. Thus behaviour such as "en-gb, en-ca, en-au, en;q=0.5" would work (i.e. the first three would 
all be equal weight, and en-za, en-in or en-us would only be selected rarely) whereas "en, de, fr, ru, *", 
which is passed with no values at all, would be treated as if it were "en;q=1.0, de;q=0.8, fr;q=0.6, 
ru;q=0.4, *;q=0.2". If this latter case is indeed what seems to be happening, then I would certainly 
recommend adding the asterisk to all submissions.

I believe this should be fixed upstream from WebKit, in the CF APIs. Whoever is to implement that should 
also add an "Accept-Charset: *" header too, since MacOS X can handle every charset.
------- Comment #11 From 2005-09-17 14:12:34 PST -------
(From update of attachment 2944 [details])
I have assigned this to adele, as according to the radar assignment guidelines
she is the closest to dealing with "loading".  I've looked over the patch and
given my comments to Alexey (via IRC).	Even if this should go into NSURL long
term, I don't think landing it in WebKit for now is a bad thing.  The fact that
this has been sitting in review for  OVER 2 MONTHS is unexceptable.
------- Comment #12 From 2005-09-20 02:52:36 PST -------
However, given Nicholas's comments below I this patch may be just plain wrong. This patch needs testing, 
review, and probably revision, not just immediate landing. And even then I am not sure we should do this 
just in WebKit instead of submitting a bug report and fix to Foundation. It is not usually our policy to work 
around NSURLConnection bugs in WebKit, and the NSURLConnection developers are pretty responsive.
------- Comment #13 From 2005-09-20 04:54:23 PST -------
(In reply to comment #11)
> The fact that this has been sitting in review for  OVER 2 MONTHS is unexceptable.

Ahh, that's nothing :) I have CSS patches on here awaiting review since mid-June.
I have open bugs in radar logged against Mac OS 8.1 (and still present in Carbon).
------- Comment #14 From 2005-09-20 07:18:15 PST -------
(In reply to comment #12)
> However, given Nicholas's comments below I this patch may be just plain wrong. 

  What particular comment are you referring to? I think I have answered them all, and there are only a 
few stylistic comments, and a question of whether an asterisk should be added (other browsers do not 
do that, so I do not see why WebKit should).

> And even then I am not sure we should do this just in WebKit instead of submitting a bug report and 
fix to Foundation. 

No problem with that - rdar://4076004 talks about the one problem that causes most damage. For 
obvious reasons, I couldn't submit a fix there.
------- Comment #15 From 2005-09-20 13:15:02 PST -------
BTW, some of this code is only needed on 10.3 and can be replaced with a single call to 
CFLocaleCreateCanonicalLanguageIdentifierFromString() now (see <http://developer.apple.com/
documentation/MacOSX/Conceptual/BPInternational/Articles/ChoosingLocalizations.html>). When I was 
making this patch, I still had hope that at least parts of it may go in a Safari 1.3 update...
------- Comment #16 From 2005-09-27 13:40:00 PST -------
*** Bug 5152 has been marked as a duplicate of this bug. ***
------- Comment #17 From 2005-09-28 16:28:43 PST -------
(From update of attachment 2944 [details])
We decided this is better w/ Darin.
------- Comment #18 From 2005-11-03 13:52:19 PST -------
A "real life" site that is affected by Safari only sending the first preferred language: <http://www.w3.org/
TR/xhtml-media-types/> (one needs to prefer something else than English to be affected :) ).
------- Comment #19 From 2005-12-03 11:23:58 PST -------
(From update of attachment 2944 [details])
I'm conflicted about this patch. I'd really like to put changes like this into
NSURL, rather than into WebKit. But some of these fixes are needed.
------- Comment #20 From 2005-12-04 03:07:30 PST -------
Just for future reference: a rather detailed discussion of real-life language negotiation is available here: 
<http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html>.
------- Comment #21 From 2005-12-20 05:54:56 PST -------
Another "real life" server that suffers from this issue: http://fluxiom.com/
------- Comment #22 From 2006-01-15 07:57:52 PST -------
(From update of attachment 2944 [details])
Here's my opinion at this point: We should get these fixes done in
NSURLConnection first -- please file a bug in http://bugreport.apple.com about
it.

Once we get a response and find out what's going to happen at the lower level,
we could consider working around this in WebKit by adding code like this.
------- Comment #23 From 2006-01-16 12:07:08 PST -------
rdar://problem/4076004
rdar://problem/4410031
------- Comment #24 From 2006-05-20 05:33:25 PST -------
Whomever is to fix this should read bug #5152 as that concerns a related but slightly different issue (I don't understand why it is marked as a duplicate of this one, it's not the same problem).

I created rdar://4556363 too, since this hasn't improved as of 10.4.6 and someone in the Foundation team needs goading with a hot poker. :-D
------- Comment #25 From 2006-06-27 21:57:57 PST -------
*** Bug 9626 has been marked as a duplicate of this bug. ***
------- Comment #26 From 2011-08-21 09:50:16 PST -------
(In reply to comment #22)
> (From update of attachment 2944 [details] [details])
> Here's my opinion at this point: We should get these fixes done in
> NSURLConnection first -- please file a bug in http://bugreport.apple.com about
> it.
> 
> Once we get a response and find out what's going to happen at the lower level,
> we could consider working around this in WebKit by adding code like this.
> 

6 years on, and the bug has still not been fixed! Has a NSURLConnection bug been filed?

I stumbled on real life effects of this bug when clicking on the link "Tools" on this page: http://www.w3.org/standards/webdesign/i18n

I see this bug in Safari for Windows and Safari for Mac. However, the bug is not available in Chrome - may be they fixed it "for their own money" ..
------- Comment #27 From 2011-08-22 11:33:03 PST -------
I no longer think that we should send an Accept-Language string with all the languages configured in Mac OS X preferences. The privacy implications (fingerprinting) and the generally failed state of HTTP based content negotiation make it not worth changing.

Please file separate bugs for specific issues via <http://bugreport.apple.com>, as Accept-Language is sent by lower level network library, not by WebKit, and there is no pressing reason to add workarounds in WebKit.

In particular, <http://www.w3.org/standards/webdesign/i18n> is certainly an evangelism issue - the page shouldn't be preventing access for people who didn't send "en" in Accept-Language.
------- Comment #28 From 2011-08-23 16:33:49 PST -------
(In reply to comment #27)

FIRSTLY - the Surfin' Safari blog seems to assume that Webkit implement Accept-Language 100% percent:

]] The locale has been removed. Web authors who want to know what languages a browser supports should use the HTTP Accept-Language header instead, which can supply multiple locales. [[

<http://www.webkit.org/blog/1580/user-agent-string-changes-on-webkit-trunk/>

SECONDLY: I read the gist of your decision to be that this subject is for Apple to decide - as it is they who must take responsibility for the potential for fingerprinting as it is their OS which eventually is responsible or those Accept-Header:s. Additionally, since content-negotiation is in a kind of failed state, as you see it, it is also not possible to justify a rushed fix that circumvents Apple.