Bug 89843 - REGRESSION (r95391): Arabic shaping is incorrect if ZWNJ exist
Summary: REGRESSION (r95391): Arabic shaping is incorrect if ZWNJ exist
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P1 Normal
Assignee: Kenichi Ishibashi
URL: http://crbug.com/133720
Keywords: InRadar, Regression
Depends on:
Reported: 2012-06-24 18:38 PDT by Kenichi Ishibashi
Modified: 2012-07-01 16:08 PDT (History)
4 users (show)

See Also:

test case (333 bytes, text/html)
2012-06-24 18:39 PDT, Kenichi Ishibashi
no flags Details
Patch (5.98 KB, patch)
2012-06-24 18:44 PDT, Kenichi Ishibashi
no flags Details | Formatted Diff | Diff
screenshot of WebKit nightly and Firefox13 (227.07 KB, image/png)
2012-06-27 01:32 PDT, Kenichi Ishibashi
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kenichi Ishibashi 2012-06-24 18:38:37 PDT
Details at http://crbug.com/133720.

WebKit ToT mac port doesn't render Arabic script correctly when there is ZWNJ in the text. Test case is attached.

The stable version of Safari can render it correctly. The difference comes from http://trac.webkit.org/changeset/95391. The change fixed a bug related with combining character sequence and handled ZWJ and ZWNJ as a part of combining character sequence. The cause is that fontDataForCombiningCharacterSequence() is failing when the sequence contains ZWNJ and the font doesn't contain glyph for ZWNJ. As the result, the word is divided into two complex text runs and they are rendered in isolated form.

To fix the problem, I suggest:
- Treat ZWJ and ZWNJ as base characters. I don't think ZWJ and ZWNJ compose a combining character sequence because they just tell whether to form cursive connection or not (they aren't combined with other character).
- If we see a ZWJ, don't separate a complex text run at that point so that CoreText can handle it.

I'll post a patch soon.
Comment 1 Kenichi Ishibashi 2012-06-24 18:39:07 PDT
Created attachment 149220 [details]
test case
Comment 2 Kenichi Ishibashi 2012-06-24 18:44:52 PDT
Created attachment 149221 [details]
Comment 3 Kenichi Ishibashi 2012-06-24 18:47:47 PDT
Hi mitz,

Could you take a look?
Comment 4 Kenichi Ishibashi 2012-06-26 21:38:32 PDT
Ping mitz?
Comment 5 Alexey Proskuryakov 2012-06-27 01:18:35 PDT
Does this affect any real life web sites?
Comment 6 Kenichi Ishibashi 2012-06-27 01:32:08 PDT
(In reply to comment #5)
> Does this affect any real life web sites?

I think so. http://www.google.com/intl/fa/goodtoknow/ is an example. I'll attach a screenshot of WebKit nightly and Firefox 13.
Comment 7 Kenichi Ishibashi 2012-06-27 01:32:38 PDT
Created attachment 149704 [details]
screenshot of WebKit nightly and Firefox13
Comment 8 Alexey Proskuryakov 2012-06-27 01:37:57 PDT
Comment 9 Behdad Esfahbod 2012-06-27 05:06:06 PDT
Given that Geeza Pro doesn't have a ZWNJ glyph, and it's default Arabic font on OS X, pretty much any Persian website that doesn't request Arial or Tahoma is broken, as well as any using Google's Droid Arabic Naskh as a webfont.   I'm sure there are more fonts without those glyphs out there...
Comment 10 WebKit Review Bot 2012-07-01 16:08:22 PDT
Comment on attachment 149221 [details]

Clearing flags on attachment: 149221

Committed r121643: <http://trac.webkit.org/changeset/121643>
Comment 11 WebKit Review Bot 2012-07-01 16:08:27 PDT
All reviewed patches have been landed.  Closing bug.