Bug 28742 - [Chromium] Combining Diacritical Marks (U+0300..) are not handled correctly
Summary: [Chromium] Combining Diacritical Marks (U+0300..) are not handled correctly
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-26 09:00 PDT by Yusuke Sato
Modified: 2009-09-03 06:45 PDT (History)
4 users (show)

See Also:


Attachments
screenshot (97.65 KB, image/png)
2009-08-26 09:00 PDT, Yusuke Sato
no flags Details
proposed patch v1 (7.33 KB, patch)
2009-08-26 10:29 PDT, Yusuke Sato
no flags Details | Formatted Diff | Diff
screenshot after applying the v1 patch (98.06 KB, image/png)
2009-08-26 10:33 PDT, Yusuke Sato
no flags Details
patch v2 (6.99 KB, patch)
2009-08-26 10:57 PDT, Yusuke Sato
eric: review-
Details | Formatted Diff | Diff
patch v3 (5.49 KB, patch)
2009-08-27 21:10 PDT, Yusuke Sato
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Sato 2009-08-26 09:00:28 PDT
Created attachment 38615 [details]
screenshot

Text sequences that contain combining diacritical marks (e.g., "è") are not rendered correctly on Chromium Linux, while pre-combined forms (e.g., "&#00e8;") are OK. For example, the position of the accent grave of "è" is too high (see the screen shot). We can't enable the LayoutTests/fast/text/stroking*.html tests on Linux unless this issue is fixed.

Chromium bug: http://code.google.com/p/chromium/issues/detail?id=20130
Comment 1 Dimitri Glazkov (Google) 2009-08-26 09:37:41 PDT
Adding agl to the bug, since he added complex text support to Linux.
Comment 2 Yusuke Sato 2009-08-26 10:29:22 PDT
Created attachment 38619 [details]
proposed patch v1
Comment 3 Yusuke Sato 2009-08-26 10:33:08 PDT
Created attachment 38620 [details]
screenshot after applying the v1 patch
Comment 4 Adam Langley 2009-08-26 10:37:56 PDT
Comment on attachment 38619 [details]
proposed patch v1

(I am not a WebKit reviewer): LGTM


> -    const unsigned m_startingX; // Offset in pixels of the first script run.
> +    unsigned m_startingX; // Offset in pixels of the first script run.

Why make this non-const? Maybe I missed something, but I don't see that you need to mutate it anywhere.
Comment 5 Yusuke Sato 2009-08-26 10:41:06 PDT
Oops, that's simply a typo. I'll attach v2 patch shortly. Thanks!
Comment 6 Yusuke Sato 2009-08-26 10:57:28 PDT
Created attachment 38623 [details]
patch v2
Comment 7 Adam Langley 2009-08-26 10:59:50 PDT
Comment on attachment 38623 [details]
patch v2

(I am not a WebKit reviewer): LGTM
Comment 8 Eric Seidel (no email) 2009-08-27 13:43:55 PDT
Comment on attachment 38623 [details]
patch v2

Two nits:

We can get rid of the delete() code by using:
OwnPtr<TextRun> m_normalizedRun;
OwnArrayPtr<UChar> m_normalizedBuffer;

Then you just set m_run = m_normalizedRun.get();

In fact, then it still could be a const TextRun&, it doesn't need to change to be a pointer (although it can be made one if you feel that's cleaner).

This should use early return:
316         if (U_SUCCESS(error)) {

if (U_FAILURE(error))
     return;
Comment 9 Yusuke Sato 2009-08-27 21:10:17 PDT
Created attachment 38713 [details]
patch v3
Comment 10 Yusuke Sato 2009-08-27 21:13:40 PDT
Thanks for the review!

> We can get rid of the delete() code by using:
> OwnPtr<TextRun> m_normalizedRun;
> OwnArrayPtr<UChar> m_normalizedBuffer;

Done.

> In fact, then it still could be a const TextRun&, it doesn't need to change to
> be a pointer (although it can be made one if you feel that's cleaner).

Changed it back to const TextRun&.

> if (U_FAILURE(error))
>     return;

Done.
Comment 11 Eric Seidel (no email) 2009-08-31 03:42:57 PDT
Comment on attachment 38713 [details]
patch v3

Looks good.

For the future, no { } around single line ifs:
             if (block == UBLOCK_COMBINING_DIACRITICAL_MARKS) {
 307                 return getNormalizedTextRun(originalRun);
 308             }
Comment 12 Eric Seidel (no email) 2009-08-31 03:43:57 PDT
Comment on attachment 38713 [details]
patch v3

Oh, actually... about testing?  I assume this is already covered by existing layout tests?  If not, wee need a test here.  removing from the commit queue for now.
Comment 13 Yusuke Sato 2009-08-31 04:16:41 PDT
> For the future, no { } around single line ifs:

I see. Thanks for the review.

>  I assume this is already covered by existing layout tests? 

Yes, fast/text/stroking.html and fast/text/stroking-decorations.html are.

Though these tests are currently disabled on Chromium Linux, I'm going to enable them as soon as this change is submitted and merged into Chromium tree. Here is the Chromium change to do that: http://codereview.chromium.org/173564 (already LGTM'ed).
Comment 14 Eric Seidel (no email) 2009-09-01 03:03:49 PDT
Comment on attachment 38713 [details]
patch v3

Best to mention what tests this affects, even if they are already existing.  Anyway, good enough for now, will land.  cq+
Comment 15 Eric Seidel (no email) 2009-09-01 03:15:05 PDT
Comment on attachment 38713 [details]
patch v3

Clearing flags on attachment: 38713

Committed r47922: <http://trac.webkit.org/changeset/47922>
Comment 16 Eric Seidel (no email) 2009-09-01 03:15:09 PDT
All reviewed patches have been landed.  Closing bug.
Comment 17 Jungshik Shin 2009-09-02 16:47:17 PDT
+        // Note that we don't use the icu::Normalizer::isNormalized(UNORM_NFC) API here since
+        // the API returns FALSE (= not normalized) for complex runs that don't require NFC
+        // normalization (e.g., Arabic text). 

Yuseke, can you give me (off-line) strings you have this problem with?  If input strings are indeed normalized and isNormalized returns false, that's a bug in ICU to fix. 

BTW, icu:: qualifier was missing for UnicodeString and I added it in r47998.
Comment 18 Yusuke Sato 2009-09-03 06:45:45 PDT
Ah, the comment I wrote was not clear enough... I meant to say "Harfbuzz can handle unnormalized Arabic string." ICU is working well. Sorry for the confusion.

> BTW, icu:: qualifier was missing for UnicodeString and I added it in r47998.

Thanks!