WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
24473
RTL: haaretz.co.il - wrong charset in HTTP headers (csISOLatinHebrew instead of Windows-1255) leading to a mirrored rendering
https://bugs.webkit.org/show_bug.cgi?id=24473
Summary
RTL: haaretz.co.il - wrong charset in HTTP headers (csISOLatinHebrew instead ...
Jungshik Shin
Reported
2009-03-09 16:22:12 PDT
The page at URL is encoded in windows-1255 (and in-file meta charset says so), but the server emits Content-Type: text/html; charset=csISOLatinHebrew Because csISOLatinHebrew is aliased to ISO-8859-8 (visual) and 'usesVisualOrdering' returns true for ISO-8859-8 (as it should), all the text nodes seem to be reversed before being rendered. Somehow, Firefox does much less than the complete reversal (here and there, words in Latin letters are reversed, but curiously, Hebrew sentences are not). IE also recognizes that csISOLatinHebrew as 'visual' (View | Encoding has 'Hebrew (ISO Visual)' checked when displaying the page) and reverses the text content for a simple test case, but NOT real web pages at haaretz.co.il (a popular newspaper site in Israel). This is obviously an evangelism issue and multiple contacts have been made to ask them to correct the issue, but we haven't heard back. One drastic measure would be to alias csISOLatinHebrew to windows-1255, but without knowing how many sites use 'csISOLatinHebrew' as its correct meaning (ISO-8859-8), it's hard to make a decision. Chromium bug :
http://crbug.com/3352
Attachments
Add attachment
proposed patch, testcase, etc.
Jeremy Moskovich
Comment 1
2009-03-11 02:03:50 PDT
This is a well known issue with considerable legacy of numerous people contacting haaretz and presenting the bug to them over the last few years. The chrome bug has additional information, Mozilla also have documentation on this issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=308187
Jeremy Moskovich
Comment 2
2009-04-23 15:13:11 PDT
Does anyone have anything against adding a site-specific hack in WebKit to ignore the content-type http header for haaretz.co.il? Considering the significant history of this issue, I think it's safe to say there's been more than enough attempts at outreach. The site remains broken for WebKit users which is what ultimately matters.
Darin Adler
Comment 3
2009-04-23 15:37:20 PDT
We could also consider making this work by getting closer to IE behavior. The claim is that IE works because it chokes on the quote marks in the header field and thus ignores the encoding.
Jeremy Moskovich
Comment 4
2009-04-23 15:49:59 PDT
Darin: Please forgive me if I'm misunderstanding, but what makes you think that? The linked Mozilla & Chrome bugs have some more background on this issue. As I understand this bug the issue is that the Content-type is specified twice: * Once in the HTTP Content-Type header - bogus value (see Jungshik's analysis). * Once in the Meta tag - which specifies the content type correctly Firefox & Webkit both use the HTTP header rather than the Meta tag and thus the page appears garbled. IE uses the value in Meta tag and thus displays the page OK (my understanding is that this is a long standing IE bug which they can't fix because of sites such as this one). The Chrome bug has an example of a page that is served from 2 servers, one of which doesn't output the http header and thus looks fine in all browsers.
Darin Adler
Comment 5
2009-04-23 15:53:26 PDT
(In reply to
comment #4
)
> Darin: Please forgive me if I'm misunderstanding, but what makes you think > that?
This comment <
https://bugzilla.mozilla.org/show_bug.cgi?id=308187#c17
>. "MSIE displays Haaretz pages in the correct encoding for the same reason Firefox 1.0 did: it does not support quotes around encoding names in HTTP headers. So it does violate a standard, but not this one." We could consider making ourselves work with all the same sites IE does by being bug-compatible with them in this respect. And yes, a site-specific quirk for haaretz.co.il is also worth considering.
Jeremy Moskovich
Comment 6
2009-04-23 17:07:28 PDT
Thanks Darin, FYI I used Fiddler to spoof the page request and pass back a version of the Content-Type header without the quotes. In my test it appears IE7 still picks the meta tag, so while it may be a good idea to match IE's rejection of content-type headers containging quotes, it appears that won't fix this issue.
Jeremy Moskovich
Comment 7
2009-04-23 17:10:54 PDT
Please ignore the second paragraph of my previous comment, I hit commit too soon :(
Jungshik Shin
Comment 8
2009-05-07 11:58:47 PDT
First of all, my comment in mozilla bugzilla at
https://bugzilla.mozilla.org/show_bug.cgi?id=308187#c20
is wrong. When I wrote that, I thought IE puts a higher precedence on meta than on http (I had seen so many pages with conflicting http charset and meta charset with meta being correct that only worked in IE. I don't know how now). It turned out that it does not. So, Uri's comment at
https://bugzilla.mozilla.org/show_bug.cgi?id=308187#c17
is right on. As for being a bug compatible with IE by choking on quotation marks in charset param of C-T header fields, I don't feel very comfortable with that. There's a possibility that we may break some web sites that we don't currently break. Sure, they're also broken in IE, but do we want that? A hypothetical case (perhaps a rare case) is as following: 1. The default charset of a browser is set to, say, GBK 2. A user visits a page which emits the following HTTP header field: Content-Type: text/html; charset="EUC-KR" And, there's no meta charset declaration but the page is encoded in EUC-KR. 3. Webkit and Firefox interpret the page correctly as in EUC-KR 4. IE interprets it as GBK (the default charset). IE can get away with this because the majority of visitors to the site are Koreans and their default encoding is set to EUC-KR. So, although I'm kinda annoyed by haaretz.co.il's failure/refusal? to fix their bug for so many years, I'm inclined toward special casing it (and hopefully only a few other).
Alexey Proskuryakov
Comment 9
2009-05-08 00:51:42 PDT
Special casing seems fine to me in this case, although it's not something we do lightly in general. Being a site-specific hack, it will need to be implemented in a way sensitive to Safari's "Disable Site-specific Hacks" setting.
Jeremy Moskovich
Comment 10
2009-06-01 06:58:15 PDT
Haaretz fixed the issue on their end, closing.
Jeremy Moskovich
Comment 11
2009-06-01 08:46:21 PDT
Reopening, not all pages where fixed, e.g.:
http://themarker.captain.co.il/captain/objects/ResponseDetails.jhtml?resNo=4877847&itemno=1089232&cont=2
Jeremy Moskovich
Comment 12
2009-06-04 11:31:34 PDT
captain.co.il is now fixed as well.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug