Bug 42044

Summary: Windows-1253 (Greek) encoding error
Product: WebKit Reporter: O. Andersen <pub-webkit>
Component: TextAssignee: Nobody <webkit-unassigned>
Status: RESOLVED INVALID    
Severity: Normal CC: ap
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: All   
OS: OS X 10.6   
Attachments:
Description Flags
test case none

Description O. Andersen 2010-07-11 09:55:54 PDT
Safari maps the byte 0xAA in Windows-1253 (Greek) encoding to U+AA, which is incorrect according to Microsoft's reference for this character encoding [1] (0xAA is undefined) and differs from Firefox, Opera and Internet Explorer (0xAA maps to U+FFFD).

As far as I can tell, 0xAA really should map to U+FFFD.

[1] <http://msdn.microsoft.com/en-gb/goglobal/cc305146.aspx>
Comment 1 Alexey Proskuryakov 2010-07-12 12:16:15 PDT
My testing gives somewhat different results:

1. IE converts AA to U+F8F9, which is a PUA character;
2. Firefox seems mysteriously broken, it "eats" the byte after AA.

We just use ICU, <http://icu-project.org/icu-bin/convexp?conv=ibm-5349_P100-1998&s=ALL>, so it would be more straightforward to discuss this in ICU bug tracker. The problem (if it's even a problem) doesn't seem serious enough to warrant a workaround in WebKit to me.
Comment 2 Alexey Proskuryakov 2010-07-12 12:16:38 PDT
Created attachment 61255 [details]
test case
Comment 3 O. Andersen 2010-07-12 13:21:22 PDT
1. You are of course right about IE mapping undefined characters to PUA instead of U+FFFD.  Sorry for being imprecise.
2. This is the result of a known bug in current versions of Firefox. The bug has been fixed, but the fix does not seem to have reached non-beta versions yet. Firefox did map 0xFF to U+FFFD before this bug was introduced, and future versions can be expected to do the same.

I agree that this should be fixed in ICU and that a temporary work-around is probably not needed.
Comment 4 O. Andersen 2010-07-12 13:23:09 PDT
(In reply to comment #3)
> 0xFF
That should of course be 0xAA.
Comment 5 Alexey Proskuryakov 2010-07-12 13:37:53 PDT
OK, let's treat this as any bug in underlying libraries that we don't plan to work around, and mark it as INVALID then.

If you file an ICU bug, please post its URL here.
Comment 6 O. Andersen 2010-07-14 02:06:06 PDT
Filed an ICU bug:
<http://bugs.icu-project.org/trac/ticket/7818>
Comment 7 Alexey Proskuryakov 2012-08-03 15:32:50 PDT
<rdar://problem/8178871>