Bug 7172
Summary: | Japanese block text translation on Babelfish does not work with Default encoding | ||
---|---|---|---|
Product: | WebKit | Reporter: | Suraj Rai <surajrai> |
Component: | Evangelism | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | Major | CC: | ap, ian, markmalone, webkit |
Priority: | P2 | ||
Version: | 417.x | ||
Hardware: | Mac | ||
OS: | OS X 10.4 | ||
URL: | http://babelfish.altavista.com/tr |
Suraj Rai
The "Block Text" translation feature in BabelFish Altavista does work on the latest Safari using "Default" encoding. It works fine if I change the encoding to "Japanese (Shift JIS)".
To replicate:
1) Set your character encoding to Default
2) Go to http://babelfish.altavista.com/tr
3) In the "Translate a block of text" form, type in ?? (not sure if this will work with bugzilla) however any Japanese character should do the trick
4) Select "Japanese To English" and click translate and notice the results come back as ????
This appears to work fine in FireFox with default character encodings so not sure how Safari should behave but it is a bit annoying having to change the encoding every time you want to do translation.
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Suraj Rai
The ?? in Step 3 should have been the Japanese Character for Tokyo however it looks like bugzilla chomped it out. You can replace that with any Japanese Kanji character (go to yahoo.co.jp and copy paste one of the characters).
Alexey Proskuryakov
Confirmed with both stock Safari 2.0.3 and a current WebKit build (although the behavior is different between these versions).
The problem is quite unusual. HTTP headers specify ISO-8859-1 encoding, a meta specifies UTF-8. HTTP headers are supposed to take precedence, but Firefox somehow uses UTF-8. Still, translation doesn't work in Safari with forced UTF-8 - it needs Shift-JIS!
Alexey Proskuryakov
Here is what's happening here:
1) When the translation page is first loaded, Babelfish uses a charset from the client's Accept-Charset header to decide which charset to send in its Content-Type header. The HTTP META isn't changed accordingly, but it doesn't affect anything, either. Safari doesn't send Accept-Charset at all, and gets ISO-8859-1 in the response. When Firefox sends "windows-1251,utf-8;q=0.7,*;q=0.7", it gets UTF-8.
2) The browser encodes the text to be translated according to the page charset. E.g., ToT WebKit sends %26%2326908%3B for U+691C (which is correct for ISO-8859-1, but Babelfish apparently doesn't understand entities). Firefox sends %E6%A4%9C (UTF-8).
3) To determine the request encoding, Babelfish looks at Accept-Charset again. It also tries some default encoding for the "from" language. Since Safari doesn't send Accept-Charset, and the default encoding for Japanese is Shift-JIS, Babelfish thinks it should be Shift-JIS.
This looks like an abuse of Accept-Charset to me, but might mean that we have to send one.
Alexey Proskuryakov
MacIE doesn't send Accept-Charset, yet Babelfish uses User-Agent to give it UTF-8!
$ curl -I http://babelfish.altavista.com/tr --header "User-Agent: Mozilla/4.0 (compatible; MSIE 5.23; Mac_PowerPC)"
<...>
Content-Type: text/html; charset=UTF-8
Alexey Proskuryakov
Same behavior with WinIE. I'm afraid that even though sending Accept-Charset would fix this, it may break other servers that have only been tested with Internet Explorer, and don't expect to receive this header.
Could we first try evangelizing Babelfish to make them treat Safari just like they treat WinIE and MacIE?
Alexey Proskuryakov
Moving to Evangelism component.
Robert Blaut
As I tested today, the site seems works fine now, so we can close the bug report. Any confirmations?
Alexey Proskuryakov
Yay! Looks like Safari was finally added to the list of clients Babelfish sends UTF-8 to.
Their check appears to be horribly specific - e.g., even iPhone user agent gets the old broken behavior. But the bug as reported is fixed, so closing now.