Bug 24473
Summary: | RTL: haaretz.co.il - wrong charset in HTTP headers (csISOLatinHebrew instead of Windows-1255) leading to a mirrored rendering | ||
---|---|---|---|
Product: | WebKit | Reporter: | Jungshik Shin <jshin> |
Component: | Evangelism | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | ap, darin, ian, mitz, playmobil, progame+wk |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://www.haaretz.co.il/captain/pages/indexCaptain.jhtml |
Jungshik Shin
The page at URL is encoded in windows-1255 (and in-file meta charset says so), but the server emits
Content-Type: text/html; charset=csISOLatinHebrew
Because csISOLatinHebrew is aliased to ISO-8859-8 (visual) and 'usesVisualOrdering' returns true for ISO-8859-8 (as it should), all the text nodes seem to be reversed before being rendered.
Somehow, Firefox does much less than the complete reversal (here and there, words in Latin letters are reversed, but curiously, Hebrew sentences are not).
IE also recognizes that csISOLatinHebrew as 'visual' (View | Encoding has 'Hebrew (ISO Visual)' checked when displaying the page) and reverses the text content for a simple test case, but NOT real web pages at haaretz.co.il (a popular newspaper site in Israel).
This is obviously an evangelism issue and multiple contacts have been made to ask them to correct the issue, but we haven't heard back.
One drastic measure would be to alias csISOLatinHebrew to windows-1255, but without knowing how many sites use 'csISOLatinHebrew' as its correct meaning (ISO-8859-8), it's hard to make a decision.
Chromium bug : http://crbug.com/3352
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Jeremy Moskovich
This is a well known issue with considerable legacy of numerous people contacting haaretz and presenting the bug to them over the last few years.
The chrome bug has additional information, Mozilla also have documentation on this issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=308187
Jeremy Moskovich
Does anyone have anything against adding a site-specific hack in WebKit to ignore the content-type http header for haaretz.co.il?
Considering the significant history of this issue, I think it's safe to say there's been more than enough attempts at outreach. The site remains broken for WebKit users which is what ultimately matters.
Darin Adler
We could also consider making this work by getting closer to IE behavior. The claim is that IE works because it chokes on the quote marks in the header field and thus ignores the encoding.
Jeremy Moskovich
Darin: Please forgive me if I'm misunderstanding, but what makes you think that?
The linked Mozilla & Chrome bugs have some more background on this issue. As I understand this bug the issue is that the Content-type is specified twice:
* Once in the HTTP Content-Type header - bogus value (see Jungshik's analysis).
* Once in the Meta tag - which specifies the content type correctly
Firefox & Webkit both use the HTTP header rather than the Meta tag and thus the page appears garbled.
IE uses the value in Meta tag and thus displays the page OK (my understanding is that this is a long standing IE bug which they can't fix because of sites such as this one).
The Chrome bug has an example of a page that is served from 2 servers, one of which doesn't output the http header and thus looks fine in all browsers.
Darin Adler
(In reply to comment #4)
> Darin: Please forgive me if I'm misunderstanding, but what makes you think
> that?
This comment <https://bugzilla.mozilla.org/show_bug.cgi?id=308187#c17>.
"MSIE displays Haaretz pages in the correct encoding for the same reason Firefox 1.0 did: it does not support quotes around encoding names in HTTP headers. So it does violate a standard, but not this one."
We could consider making ourselves work with all the same sites IE does by being bug-compatible with them in this respect. And yes, a site-specific quirk for haaretz.co.il is also worth considering.
Jeremy Moskovich
Thanks Darin, FYI I used Fiddler to spoof the page request and pass back a version of the Content-Type header without the quotes.
In my test it appears IE7 still picks the meta tag, so while it may be a good idea to match IE's rejection of content-type headers containging quotes, it appears that won't fix this issue.
Jeremy Moskovich
Please ignore the second paragraph of my previous comment, I hit commit too soon :(
Jungshik Shin
First of all, my comment in mozilla bugzilla at https://bugzilla.mozilla.org/show_bug.cgi?id=308187#c20 is wrong. When I wrote that, I thought IE puts a higher precedence on meta than on http (I had seen so many pages with conflicting http charset and meta charset with meta being correct that only worked in IE. I don't know how now). It turned out that it does not. So, Uri's comment at https://bugzilla.mozilla.org/show_bug.cgi?id=308187#c17 is right on.
As for being a bug compatible with IE by choking on quotation marks in charset param of C-T header fields, I don't feel very comfortable with that. There's a possibility that we may break some web sites that we don't currently break. Sure, they're also broken in IE, but do we want that? A hypothetical case (perhaps a rare case) is as following:
1. The default charset of a browser is set to, say, GBK
2. A user visits a page which emits the following HTTP header field:
Content-Type: text/html; charset="EUC-KR"
And, there's no meta charset declaration but the page is encoded in EUC-KR.
3. Webkit and Firefox interpret the page correctly as in EUC-KR
4. IE interprets it as GBK (the default charset).
IE can get away with this because the majority of visitors to the site are Koreans and their default encoding is set to EUC-KR.
So, although I'm kinda annoyed by haaretz.co.il's failure/refusal? to fix their bug for so many years, I'm inclined toward special casing it (and hopefully only a few other).
Alexey Proskuryakov
Special casing seems fine to me in this case, although it's not something we do lightly in general. Being a site-specific hack, it will need to be implemented in a way sensitive to Safari's "Disable Site-specific Hacks" setting.
Jeremy Moskovich
Haaretz fixed the issue on their end, closing.
Jeremy Moskovich
Reopening, not all pages where fixed, e.g.:
http://themarker.captain.co.il/captain/objects/ResponseDetails.jhtml?resNo=4877847&itemno=1089232&cont=2
Jeremy Moskovich
captain.co.il is now fixed as well.