Bug 10094

Summary:

REGRESSION: Japanese characters improperly rendering in TOT

Product:

WebKit

Reporter:

Dan Wood <dwood>

Component:

Layout and Rendering

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

Priority:

Keywords:

HasReduction, Regression

Version:

420+

Hardware:

Mac

OS:

OS X 10.4

Attachments:

Description	Flags
HTML file showing some Japanese characters	none
decomposed vs. precomposed characters	none
patch	none
patch (fixed)	darin: review+

Dan Wood

Reported 2006-07-24 17:16:13 PDT

To reproduce, take the attached reduced file with some Japanese characters. In Safari 419.3, you see different characters which correspond to the HTML source. In TOT (nightly build), you see the first character repeates several times. Notes: This seems to be a problem whether it's UTF-8 or UTF-16 encoded. These Japanese words came from files, so -- if I understand the issue -- the characters may be "decomposed" rather than "precomposed" Unicode. <http://developer.apple.com/qa/qa2001/qa1235.html>

Attachments
HTML file showing some Japanese characters (642 bytes, text/html) 2006-07-24 17:17 PDT, Dan Wood	no flags	Details
decomposed vs. precomposed characters (701 bytes, text/html) 2006-07-24 17:34 PDT, Dan Wood	no flags	Details
patch (42.94 KB, patch) 2006-07-24 23:50 PDT, Graham Dennis	no flags	Details Formatted Diff Diff
patch (fixed) (42.58 KB, patch) 2006-07-25 00:01 PDT, Graham Dennis	darin: review+	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Dan Wood

Comment 1 2006-07-24 17:17:44 PDT

Created attachment 9664 [details] HTML file showing some Japanese characters

Dan Wood

Comment 2 2006-07-24 17:34:41 PDT

Created attachment 9665 [details] decomposed vs. precomposed characters The rendering problem seems to be in decompsed Japanese characters, not precomposed ones. The attachment shows the same string in both precomposed and decomposed forms. In the released Safari, these are IDENTICAL. In TOT, they are definitely not.

Graham Dennis

Comment 3 2006-07-24 23:50:46 PDT

Created attachment 9667 [details] patch This bug only occurs when the first character of a text run is a decomposed Japanese hiragana or katakana character with voice marks. The bug is caused because WidthIterator::advance does not update the m_currentCharacter variable while iterating, and then the call to WidthIterator::normalizeVoicingMarks normalises the character starting at m_currentCharacter instead of starting at currentCharacter. As a result, if the first character of a text run requires normalising, all subsequent characters in the run will be displayed as being identical to the first character (as m_currentCharacter will be 0). This patch fixes the bug by turning the currentCharacter variable into an argument to normalizeVoicingMarks. A testcase has been included in the patch, however it must be run as a pixel test for the test to actually check that the bug has been fixed.

Graham Dennis

Comment 4 2006-07-25 00:01:18 PDT

Created attachment 9668 [details] patch (fixed) I accidentally had some remnants of a previous patch in the previous patch file. I've fixed that in this version.

Alexey Proskuryakov

Comment 5 2006-07-25 01:39:44 PDT

Regression->P1. However, please note that any process generating HTML/XML/other Web content SHOULD normalize the text to NFC <http://www.w3.org/TR/charmod-norm/#C300>.

Alexey Proskuryakov

Comment 6 2006-07-25 02:38:38 PDT

Why is this special case for Hiragana&Katakana needed at all? The code appeared in r8701 without a test case: ------------------------------------------------------------------------ r8701 | rjw | 2005-02-25 23:54:19 +0300 (Fri, 25 Feb 2005) | 9 lines Fixed <rdar://problem/4000962> 8A375: Help Viewer displays voiced sound and semi-voiced characters strangely (characters don't seem to be composed) Added special case for voiced marks. Reviewed by John. * WebCoreSupport.subproj/WebTextRenderer.m: (widthForNextCharacter): ------------------------------------------------------------------------

Darin Adler

Comment 7 2006-07-25 08:30:00 PDT

Comment on attachment 9668 [details] patch (fixed) Looks good. r=me

Alexey Proskuryakov

Comment 8 2006-07-25 12:24:27 PDT

(In reply to comment #6) > Why is this special case for Hiragana&Katakana needed at all? Answering my own question: yes it is, because without it the voicing marks are drawn incorrectly. I still don't understand why voicing marks need to be handled differently from Latin accents or any other combining characters, but that's of course not related to this bug.

Alexey Proskuryakov

Comment 9 2006-07-27 11:52:12 PDT

Committed revision 15651.

Note You need to log in before you can comment on or make changes to this bug.