Bug 42044 - Windows-1253 (Greek) encoding error
Summary: Windows-1253 (Greek) encoding error
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: All OS X 10.6
: P2 Normal
Assignee: Nobody
Keywords: InRadar
Depends on:
Reported: 2010-07-11 09:55 PDT by O. Andersen
Modified: 2012-08-03 15:32 PDT (History)
1 user (show)

See Also:

test case (190 bytes, text/html;charset=windows-1253)
2010-07-12 12:16 PDT, Alexey Proskuryakov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description O. Andersen 2010-07-11 09:55:54 PDT
Safari maps the byte 0xAA in Windows-1253 (Greek) encoding to U+AA, which is incorrect according to Microsoft's reference for this character encoding [1] (0xAA is undefined) and differs from Firefox, Opera and Internet Explorer (0xAA maps to U+FFFD).

As far as I can tell, 0xAA really should map to U+FFFD.

[1] <http://msdn.microsoft.com/en-gb/goglobal/cc305146.aspx>
Comment 1 Alexey Proskuryakov 2010-07-12 12:16:15 PDT
My testing gives somewhat different results:

1. IE converts AA to U+F8F9, which is a PUA character;
2. Firefox seems mysteriously broken, it "eats" the byte after AA.

We just use ICU, <http://icu-project.org/icu-bin/convexp?conv=ibm-5349_P100-1998&s=ALL>, so it would be more straightforward to discuss this in ICU bug tracker. The problem (if it's even a problem) doesn't seem serious enough to warrant a workaround in WebKit to me.
Comment 2 Alexey Proskuryakov 2010-07-12 12:16:38 PDT
Created attachment 61255 [details]
test case
Comment 3 O. Andersen 2010-07-12 13:21:22 PDT
1. You are of course right about IE mapping undefined characters to PUA instead of U+FFFD.  Sorry for being imprecise.
2. This is the result of a known bug in current versions of Firefox. The bug has been fixed, but the fix does not seem to have reached non-beta versions yet. Firefox did map 0xFF to U+FFFD before this bug was introduced, and future versions can be expected to do the same.

I agree that this should be fixed in ICU and that a temporary work-around is probably not needed.
Comment 4 O. Andersen 2010-07-12 13:23:09 PDT
(In reply to comment #3)
> 0xFF
That should of course be 0xAA.
Comment 5 Alexey Proskuryakov 2010-07-12 13:37:53 PDT
OK, let's treat this as any bug in underlying libraries that we don't plan to work around, and mark it as INVALID then.

If you file an ICU bug, please post its URL here.
Comment 6 O. Andersen 2010-07-14 02:06:06 PDT
Filed an ICU bug:
Comment 7 Alexey Proskuryakov 2012-08-03 15:32:50 PDT