Bug 10094

Summary: REGRESSION: Japanese characters improperly rendering in TOT
Product: WebKit Reporter: Dan Wood <dwood>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap
Priority: P1 Keywords: HasReduction, Regression
Version: 420+   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
HTML file showing some Japanese characters
none
decomposed vs. precomposed characters
none
patch
none
patch (fixed) darin: review+

Dan Wood
Reported 2006-07-24 17:16:13 PDT
To reproduce, take the attached reduced file with some Japanese characters. In Safari 419.3, you see different characters which correspond to the HTML source. In TOT (nightly build), you see the first character repeates several times. Notes: This seems to be a problem whether it's UTF-8 or UTF-16 encoded. These Japanese words came from files, so -- if I understand the issue -- the characters may be "decomposed" rather than "precomposed" Unicode. <http://developer.apple.com/qa/qa2001/qa1235.html>
Attachments
HTML file showing some Japanese characters (642 bytes, text/html)
2006-07-24 17:17 PDT, Dan Wood
no flags
decomposed vs. precomposed characters (701 bytes, text/html)
2006-07-24 17:34 PDT, Dan Wood
no flags
patch (42.94 KB, patch)
2006-07-24 23:50 PDT, Graham Dennis
no flags
patch (fixed) (42.58 KB, patch)
2006-07-25 00:01 PDT, Graham Dennis
darin: review+
Dan Wood
Comment 1 2006-07-24 17:17:44 PDT
Created attachment 9664 [details] HTML file showing some Japanese characters
Dan Wood
Comment 2 2006-07-24 17:34:41 PDT
Created attachment 9665 [details] decomposed vs. precomposed characters The rendering problem seems to be in decompsed Japanese characters, not precomposed ones. The attachment shows the same string in both precomposed and decomposed forms. In the released Safari, these are IDENTICAL. In TOT, they are definitely not.
Graham Dennis
Comment 3 2006-07-24 23:50:46 PDT
Created attachment 9667 [details] patch This bug only occurs when the first character of a text run is a decomposed Japanese hiragana or katakana character with voice marks. The bug is caused because WidthIterator::advance does not update the m_currentCharacter variable while iterating, and then the call to WidthIterator::normalizeVoicingMarks normalises the character starting at m_currentCharacter instead of starting at currentCharacter. As a result, if the first character of a text run requires normalising, all subsequent characters in the run will be displayed as being identical to the first character (as m_currentCharacter will be 0). This patch fixes the bug by turning the currentCharacter variable into an argument to normalizeVoicingMarks. A testcase has been included in the patch, however it must be run as a pixel test for the test to actually check that the bug has been fixed.
Graham Dennis
Comment 4 2006-07-25 00:01:18 PDT
Created attachment 9668 [details] patch (fixed) I accidentally had some remnants of a previous patch in the previous patch file. I've fixed that in this version.
Alexey Proskuryakov
Comment 5 2006-07-25 01:39:44 PDT
Regression->P1. However, please note that any process generating HTML/XML/other Web content SHOULD normalize the text to NFC <http://www.w3.org/TR/charmod-norm/#C300>.
Alexey Proskuryakov
Comment 6 2006-07-25 02:38:38 PDT
Why is this special case for Hiragana&Katakana needed at all? The code appeared in r8701 without a test case: ------------------------------------------------------------------------ r8701 | rjw | 2005-02-25 23:54:19 +0300 (Fri, 25 Feb 2005) | 9 lines Fixed <rdar://problem/4000962> 8A375: Help Viewer displays voiced sound and semi-voiced characters strangely (characters don't seem to be composed) Added special case for voiced marks. Reviewed by John. * WebCoreSupport.subproj/WebTextRenderer.m: (widthForNextCharacter): ------------------------------------------------------------------------
Darin Adler
Comment 7 2006-07-25 08:30:00 PDT
Comment on attachment 9668 [details] patch (fixed) Looks good. r=me
Alexey Proskuryakov
Comment 8 2006-07-25 12:24:27 PDT
(In reply to comment #6) > Why is this special case for Hiragana&Katakana needed at all? Answering my own question: yes it is, because without it the voicing marks are drawn incorrectly. I still don't understand why voicing marks need to be handled differently from Latin accents or any other combining characters, but that's of course not related to this bug.
Alexey Proskuryakov
Comment 9 2006-07-27 11:52:12 PDT
Committed revision 15651.
Note You need to log in before you can comment on or make changes to this bug.