Bug 23244 - Hebrew (Windows) Encoding is not auto-detected
Summary: Hebrew (Windows) Encoding is not auto-detected
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Enhancement
Assignee: Nobody
URL: http://www.rest.co.il/oren-giron/cour...
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-11 13:18 PST by Jeremy Moskovich
Modified: 2010-10-28 07:49 PDT (History)
5 users (show)

See Also:


Attachments
HTML for Mishmar88 (5.73 KB, text/html)
2009-01-13 09:09 PST, Jeremy Moskovich
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Moskovich 2009-01-11 13:18:30 PST
Chrome bug: http://crbug.com/5950

The site's text is encoded in "Hebrew (Windows-1255)", the page does not specify the content encoding.

WebKit doesn't detect the encoding correctly and the site's text is subsequently unreadable.

Firefox & IE correctly detect the content encoding.
Comment 1 Jeremy Moskovich 2009-01-11 13:39:40 PST
Another possibly related case:
http://mishmar88.info/

This site looks fine in IE & FF, but is displayed wrong in WebKit.

The page's meta tag is:
<meta content="text/html; charset=iso-8859-8" http-equiv="Content-Type"/>
Comment 2 Alexey Proskuryakov 2009-01-11 14:49:48 PST
See also: bug 16482.

> http://mishmar88.info/

I don't see any problems with this site in shipping Safari 3.1.2 or ToT. Also, I'm not sure why problems with a site that has a charset meta are related to auto-detection. Is the specified charset incorrect?
Comment 3 Jeremy Moskovich 2009-01-11 14:57:09 PST
It's very possible these are separate issues.

If you look closely you'll see that Safari displays the Hebrew text logically rather than visually and so the individual words are flipped.  So the page is *not* displayed correctly in Safari.
Comment 4 Alexey Proskuryakov 2009-01-11 14:59:37 PST
OK, then it's related to auto-detection indeed.
Comment 5 Jeremy Moskovich 2009-01-13 09:09:31 PST
Created attachment 26669 [details]
HTML for Mishmar88

http://mishmar88.info fixed their content-type.  Here's the HTML before the fix which might be useful as a test-case.
Comment 6 Hironori Bono 2009-01-14 00:46:03 PST
Jeremy,

Sorry for my stupid question. On my Windows PCs (whose system locales are "ja-JP"), both Firefox 3 and IE 7 treat the encoding of your attached HTML file as ISO-8859-8 (not windows-1255).
Is it your expected result?
Comment 7 Jeremy Moskovich 2009-01-14 08:13:15 PST
There are 2 test cases:
* rest.co.il - Encoded in Windows-1255 no encoding specified.
* mishmar88.info (attached) - Encoded as ISO-8859-8-I (logical) but advertises itself as ISO-8859-8 - WebKit renders this visually.

In both cases, IE/FF correctly detect the encoding while WebKit does not.

These may very well represent separate issues (and warrant separate bugs).
Comment 8 Jeremy Moskovich 2009-06-04 11:44:51 PDT
Yet another example:
http://www.advocatesfirm.co.il/office1.html
Comment 9 Jungshik Shin 2009-06-04 12:11:00 PDT
(In reply to comment #7)
> There are 2 test cases:
> * rest.co.il - Encoded in Windows-1255 no encoding specified.

So, Firefox and Webkit-based browsers will use the default encoding (configurable by a user), which is expected. In IE, there's no UI for that, but I guess there's the registry setting. 
If the default encoding is set to windows-1255, it'll be rendered correctly in Safari/Chrome. 

> * mishmar88.info (attached) - Encoded as ISO-8859-8-I (logical) but advertises
> itself as ISO-8859-8 - WebKit renders this visually.

This is certainly an evangelism issue, isn't it? 
 
> In both cases, IE/FF correctly detect the encoding while WebKit does not.

Do you turn ON the auto-detection in Firefox?  If it's on AND the encoding is NOT specified, Firefox is likely to get it right.   So does Chrome when auto-detection is ON(I just tested http://www.advocatesfirm.co.il/office1.html ).  

So, the first issue is invalid while the second is an evangelism bug for that specific web site.






> These may very well represent separate issues (and warrant separate bugs).
>