WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 13572
Arabic characters are not correctly joined when using Arial
https://bugs.webkit.org/show_bug.cgi?id=13572
Summary
Arabic characters are not correctly joined when using Arial
Oliver Hunt
Reported
2007-05-03 01:25:26 PDT
The above url shows two divs, the first using Arial, the second in Helvetica. They should be more or less identical, however in the first case (Arial) four separate glyphs are drawn instead of the two combined glyphs produced by helvetica.
Attachments
Simple test case to show series of glyphs that aren't being merged correctly :(
(423 bytes, text/html)
2007-05-03 03:11 PDT
,
Oliver Hunt
no flags
Details
ushape.c patch
(4.44 KB, patch)
2007-05-03 05:43 PDT
,
mitz
no flags
Details
Formatted Diff
Diff
Test case rendering with the ushape.c patch
(3.48 KB, image/png)
2007-05-03 05:44 PDT
,
mitz
no flags
Details
Site rendering with the ushape.c patch
(144.52 KB, image/png)
2007-05-03 05:45 PDT
,
mitz
no flags
Details
Add a private stripped-down version of u_shapeArabic that handles the non-Arabic languages
(28.29 KB, patch)
2007-05-10 13:14 PDT
,
mitz
no flags
Details
Formatted Diff
Diff
Add a private stripped-down version of u_shapeArabic that handles the non-Arabic languages
(28.19 KB, patch)
2007-05-11 06:10 PDT
,
mitz
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
mitz
Comment 1
2007-05-03 02:38:44 PDT
I looked at this in the debugger. shapeArabic() is called, and the text is handed over to ICU for shaping. u_shapeArabic takes U+0686 U+0627 U+06A9 U+0648 and returns U+0686 U+FE8E U+06A9 U+FEEE Somehow at least the first and third characters don't seem to be shaped correctly. Could be a bug in ICU or in the way it's called (maybe it needs different options?). The string is small enough that I might try to run it through <
http://www.opensource.apple.com/darwinsource/10.4.9.ppc/ICU-6.2.13/icuSources/common/ushape.c
> manually and see what happens.
Oliver Hunt
Comment 2
2007-05-03 03:11:08 PDT
Created
attachment 14314
[details]
Simple test case to show series of glyphs that aren't being merged correctly :(
Oliver Hunt
Comment 3
2007-05-03 03:12:35 PDT
Here is a page that demonstrates these issues. Arial was clearly sent to damn us all :(
Oliver Hunt
Comment 4
2007-05-03 03:13:18 PDT
Oops, actually include url
http://www.voanews.com/pashto/2007-04-23-voa2.cfm
replaced original data url with this link as well
mitz
Comment 5
2007-05-03 04:21:00 PDT
OK, so that page and the test cases are in the Arabic script, but not in the Arabic language. The page is quite evidently in Pashto. The non-Arabic characters' contextual forms reside in the Arabic Presentation Forms-A block (U+FB50..U+FDFF), whereas Arabic characters' contextual forms are in Arabic Presentation Forms-B (U+FE70..U+FEFF). ICU's u_shapeArabic doesn't shape the non-Arabic characters, so they get no shaping in fonts where WebKit relies on ICU for that. However, ICU does have the linking type of those characters (left, right, both, none), so I think it might be possible to patch ICU and add the pointers to the contextual forms.
mitz
Comment 6
2007-05-03 05:43:16 PDT
Created
attachment 14315
[details]
ushape.c patch Done using <
http://www.unicode.org/Public/4.1.0/ucd/ArabicShaping.txt
> (which is identical to Unicode 5.0 for the characters in question) and Mac OS X's Character Palette.
mitz
Comment 7
2007-05-03 05:44:03 PDT
Created
attachment 14316
[details]
Test case rendering with the ushape.c patch
mitz
Comment 8
2007-05-03 05:45:43 PDT
Created
attachment 14317
[details]
Site rendering with the ushape.c patch
mitz
Comment 9
2007-05-03 07:39:16 PDT
It should be possible to build the patched u_shapeArabic into WebKit. That would also lock in (or allow fixing of) the Lam-Aleph quirk mentioned in FontMac.mm:298. I'm concerned about testing, though. I don't have a source for test data and if I make a test I'm not sure I'll know what to expect. Automatic testing poses another problem, since Tiger doesn't ship with any fonts that trigger the ICU code path. It is now possible to bundle extra fonts with DumpRenderTree, but I don't know if there's a suitable font that's compatible with WebKit's licensing requirements. In the long term, I think it would be nice to create a font for testing this and a few other code paths in WebKit that are only selected for fonts with pathologies that don't exist in standard fonts (e.g. non-zero-width zero-width-space, non-mirrored mirrored characters).
mitz
Comment 10
2007-05-03 09:27:00 PDT
Only now it occurred to me to check out ICU TOT. It has a few of those fixed <
http://source.icu-project.org/repos/icu/icu/trunk/source/common/ushape.c
> and it agrees with what I did in
attachment 14315
[details]
. I wonder if the rest are needed. A compile-time version check on ICU might be a good idea, if there's a release that includes what's currently TOT and if that's sufficient.
Darin Adler
Comment 11
2007-05-04 22:20:11 PDT
<
rdar://problem/5183696
>
mitz
Comment 12
2007-05-05 01:36:19 PDT
This is the ICU patch: <
http://bugs.icu-project.org/trac/changeset/20705
>. As far as I can tell it is not included in any ICU release.
mitz
Comment 13
2007-05-10 13:14:45 PDT
Created
attachment 14476
[details]
Add a private stripped-down version of u_shapeArabic that handles the non-Arabic languages What a mess. I started out just copying ushape.c and patching it, but I ended up deleting stuff that's unused in WebKit and removing code that reversed the string twice.
Darin Adler
Comment 14
2007-05-10 13:40:51 PDT
Comment on
attachment 14476
[details]
Add a private stripped-down version of u_shapeArabic that handles the non-Arabic languages I'm surprised by the function name. In what way is this Macintosh-specific? It's hard to review the differences in this between the ICU version and the new version because the diff just shows the new code. I don't think we need to use U_CAPI and U_EXPORT2 in an internal WebCore header. Typically we capitalize filenames, so I would call this ShapeArabicMac.h/cpp. I'm comfortable with this patch despite those concerns. So I'm going to mark this review+ -- feel free to clear the review flag if you want to do a new version.
mitz
Comment 15
2007-05-10 13:43:59 PDT
Comment on
attachment 14476
[details]
Add a private stripped-down version of u_shapeArabic that handles the non-Arabic languages I'm going to address at least some of Darin's comments.
mitz
Comment 16
2007-05-11 06:10:17 PDT
Created
attachment 14488
[details]
Add a private stripped-down version of u_shapeArabic that handles the non-Arabic languages (In reply to
comment #14
)
> (From update of
attachment 14476
[details]
[edit]) > I'm surprised by the function name. In what way is this Macintosh-specific?
I removed "Mac" from the function and file names, but I left them in the platform/mac subdirectory, since this code is not cross-platform (and in that sense it is Macintosh-specific). I really hope this hack can be removed in the future.
> I don't think we need to use U_CAPI and U_EXPORT2 in an internal WebCore > header.
Fixed.
> Typically we capitalize filenames, so I would call this ShapeArabicMac.h/cpp.
Fixed.
Darin Adler
Comment 17
2007-05-11 09:02:13 PDT
Comment on
attachment 14488
[details]
Add a private stripped-down version of u_shapeArabic that handles the non-Arabic languages r=me
Mark Rowe (bdash)
Comment 18
2007-05-11 09:47:31 PDT
Landed in
r21408
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug