Bug 6626 - Arabic & Farsi rendered with no shaping (all glyphs separate, unreadable!)
Summary: Arabic & Farsi rendered with no shaping (all glyphs separate, unreadable!)
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Critical
Assignee: Nobody
URL: http://news.bbc.co.uk/hi/arabic/news/
Keywords: InRadar
: 9770 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-01-17 15:13 PST by Neema Aghamohammadi
Modified: 2007-06-03 09:18 PDT (History)
7 users (show)

See Also:


Attachments
Snapshot of website in Safari (48.18 KB, image/jpeg)
2006-01-17 15:14 PST, Neema Aghamohammadi
no flags Details
ICU shaping applied (78.57 KB, image/png)
2006-03-09 04:33 PST, mitz
no flags Details
Use ICU to shape Arabic when the font does not contain shaping information (18.46 KB, patch)
2007-02-04 09:56 PST, mitz
no flags Details | Formatted Diff | Diff
Use ICU to shape Arabic when the font does not contain shaping information (20.75 KB, patch)
2007-02-05 00:27 PST, mitz
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Neema Aghamohammadi 2006-01-17 15:13:23 PST
Arabic and Farsi not displayed properly. Characters not connected together in proper fashion and as a 
result, it is impossible to read this text. This bug has persisted since Panther (10.3.0). It is 'critical' as I am 
unable to test for further bugs in Arabic/Farsi as a result of not being able to read the pages at all. For a 
comparison, please view the following:

http://home.earthlink.net/~nagha/safari-bbc.jpg
http://home.earthlink.net/~nagha/firefox-bbc.jpg
Comment 1 Neema Aghamohammadi 2006-01-17 15:14:57 PST
Created attachment 5745 [details]
Snapshot of website in Safari

Arabic/Farsi characters are connected to each other depending on nature of
previous and next character (from right to left). Safari does not make any
attempt to connect the characters.
Comment 2 Darin Adler 2006-01-17 15:22:20 PST
I think this has something to do with what fonts you have installed. The site looks fine for me, not broken 
the way the attached picture shows it.
Comment 3 Darin Adler 2006-01-17 16:47:44 PST
Doing some experiments with Mitz Pettel and a couple others, we seem to see a pattern: This problem 
happens if you have the MS Arial font installed and doesn't happen without it.

We're still not sure how to fix it, but we'll figure it out soon.

Bugs like bug 6148 will show you that some others see shaping working much better on most Arabic 
pages.
Comment 4 Ahmad 2006-01-21 02:10:14 PST
(In reply to comment #3)
Neema;

This is a known problem among the Persian/Arabic OSX user ( and a moment of big irritation too). If you would have checked IRMUG www.irmug.org you would have been able to get rid of it for long time ago. Safari has some  problem in dealing with bidi but this problem is rather a bug in Cocoa text engine /AAT technology which Safari is dependent on to display the text.
What happens is that when you install MS Office it will install some fonts with support for Arabic/Persian (the same fonts which are installed on its Windows counter part), these fonts are: Arial, Time New Roman, Tahoma. What happen in this case is that Arabic BBC has a tag in its HTML code which tells web browsers which fonts to use to display its content in Persian or Arabic, Urdu etc, one of the fonts mentioned above. 
But the problem is that text engine/AAT on OSX dose not support the fonts which is dependent on cursive connectivity (Arabic Script) if they do not contain AAT table. You see, OSX has problem with all TTF and OpenType fonts ONLY and ONLY when it comes to Arabic Script. All other script like Hebrew, CJK etc are not affected. From version 10.4, OSX translate all TTF/OT fonts into AAT for better compatibility, to this there is an exception and guess, as usual the exception are all Arabic script ttf/ot fonts. We in IRMUG has been reporting this since at least two years on and on again to Fonts and International group at Apple with no result so far. Try to write a text in TextEdit and Apply one of the above mentioned fonts to it, you'll see the same artifact, all letters are displayed in initial form and connectivity is gone.
The solution to this is (until Apple fix this at text engine /ATT level) to trash these fonts which are located in /User/Library/Fonts. There are other versions of these fonts which are located in System library, those you do not need to bother about.

When it come to Safari the problem is that Safari would not let user to define their own font to view the web-page content although there are such an option in its preference you can not force it to use a defined font for a certain script like you would do in Gecko clones (Firefox for example).

In the end I hope that Darin, David and other Apple engineer who read this would use thier influence and contact at Apple to solve this nasty bug.

Regards
Ahmad

> Doing some experiments with Mitz Pettel and a couple others, we seem to see a
> pattern: This problem 
> happens if you have the MS Arial font installed and doesn't happen without it.
> 
> We're still not sure how to fix it, but we'll figure it out soon.
> 
> Bugs like bug 6148 will show you that some others see shaping working much
> better on most Arabic 
> pages.
> 

Comment 5 mitz 2006-02-03 03:02:07 PST
(In reply to comment #4)
>
> But the problem is that text engine/AAT on OSX dose not support the fonts which
> is dependent on cursive connectivity (Arabic Script) if they do not contain AAT
> table. You see, OSX has problem with all TTF and OpenType fonts ONLY and ONLY
> when it comes to Arabic Script. All other script like Hebrew, CJK etc are not
> affected.

Actually, Hebrew Niqqud placement is also broken with these fonts. I have looked at one of them (Arial), and as far as I could see, the font file itself doesn't contain the information needed for shaping (or positioning) in any format, neither TrueType mort/morx nor OpenType GSUB/GPOS. In fact, I don't see how these fonts render properly in Windows unless the system applies to them some private knowledge (perhaps a default set of tables, based on hard-coded glyph IDs).

I'm not sure if and how ATS can identify these fonts as "broken" and "fix" them. On the WebKit level, it may be possible to have a heuristic that checks if a font doesn't have any (or sufficiently rich) shaping information, and in that case avoid it entirely for characters in the Arabic range. This will not work for Hebrew, though.

As a workaround, maybe a user style sheet can be used to override Arial even when it is specified by the page and replace it with another font.
Comment 6 Ahmad 2006-02-06 10:23:42 PST
(In reply to comment #5)

Dear Mitz;

The problem of broken cursive connectivity is not limited to these "bad" fonts (Tahoma, Arial, TimeNewRoman). It is rather a problem in ATS/AAT technology and OSX text engine. For example all of these fonts works properly in Mellel and Indesign ME, that because this two program uses their own text engine rather than the one built into OSX.  As I have mentioned in my previous post OSX has problem in dealing with ALL non Apple (the one without AAT table) Arabic/Persian/Urdu fonts no matter if they are TTF or OT. To prove this I can send you privately some TTF&OT fonts so that you can test and see by yourself. Although the Arabic OT and TTF I have has designed by professional and has all the necessary resources that they should have, they displayed broken by TextEdit or any other application which is dependent on OSX ATS/AAT to handle the text. As to "mort/morx, OpenType GSUB/GPOS" because I do not know about these I will ask a friend who is an expert to comment on these issues. 


As to the workaround you suggested using the style sheet, I have tried this but neither in Safari nor in WEbkit it is properly functional. Beside for a user to load an style sheet just to be able to set other fonts than the default fonts is not a good solution. It should be possible for user to define fonts for different script system as it is done (and works) in Mozilla. Webkit should respect the user choice of other fonts and it should override the default fonts, something than it is not possible in Webkits current version. 


best regards, Ahmad



> (In reply to comment #4)
> >
> > But the problem is that text engine/AAT on OSX dose not support the fonts which
> > is dependent on cursive connectivity (Arabic Script) if they do not contain AAT
> > table. You see, OSX has problem with all TTF and OpenType fonts ONLY and ONLY
> > when it comes to Arabic Script. All other script like Hebrew, CJK etc are not
> > affected.
> 
> Actually, Hebrew Niqqud placement is also broken with these fonts. I have
> looked at one of them (Arial), and as far as I could see, the font file itself
> doesn't contain the information needed for shaping (or positioning) in any
> format, neither TrueType mort/morx nor OpenType GSUB/GPOS. In fact, I don't see
> how these fonts render properly in Windows unless the system applies to them
> some private knowledge (perhaps a default set of tables, based on hard-coded
> glyph IDs).
> 
> I'm not sure if and how ATS can identify these fonts as "broken" and "fix"
> them. On the WebKit level, it may be possible to have a heuristic that checks
> if a font doesn't have any (or sufficiently rich) shaping information, and in
> that case avoid it entirely for characters in the Arabic range. This will not
> work for Hebrew, though.
> 
> As a workaround, maybe a user style sheet can be used to override Arial even
> when it is specified by the page and replace it with another font.
> 

Comment 7 Behnam 2006-02-06 13:52:06 PST
(In reply to comment #6)

Hi,

Here's what can be described as Apple problem with Arabic OT enhanced TrueType fonts in a nut shell:
OS 10.4 can now 'translate' OT instructions of a font into its own AAT environment. This by the way, also includes GPOS and GSUB tables of OT. Most of OT tables can now be interpreted in AAT environment of Apple except the most crucial part of Uniscribe for Arabic script which deals with Arabic letters contextualization. Naturally, when the initial, medial and final forms of letters are not produced, the other OT features that ARE interpreted in AAT environment are completely meaningless. Therefore, it is absolutely necessary for a TTF font to have AAT tables inside the font to be able to contextualize Arabic script.
So for a Mac user of the web of Arabic script, the ability of the browser to give the control over the font choice to the user (to override the MS font calls of most web-pages) has crucial importance.

Cheers,
Behnam

> 
> Dear Mitz;
> 
> The problem of broken cursive connectivity is not limited to these "bad" fonts
> (Tahoma, Arial, TimeNewRoman). It is rather a problem in ATS/AAT technology and
> OSX text engine. For example all of these fonts works properly in Mellel and
> Indesign ME, that because this two program uses their own text engine rather
> than the one built into OSX.  As I have mentioned in my previous post OSX has
> problem in dealing with ALL non Apple (the one without AAT table)
> Arabic/Persian/Urdu fonts no matter if they are TTF or OT. To prove this I can
> send you privately some TTF&OT fonts so that you can test and see by yourself.
> Although the Arabic OT and TTF I have has designed by professional and has all
> the necessary resources that they should have, they displayed broken by
> TextEdit or any other application which is dependent on OSX ATS/AAT to handle
> the text. As to "mort/morx, OpenType GSUB/GPOS" because I do not know about
> these I will ask a friend who is an expert to comment on these issues. 
> 
> 
> As to the workaround you suggested using the style sheet, I have tried this but
> neither in Safari nor in WEbkit it is properly functional. Beside for a user to
> load an style sheet just to be able to set other fonts than the default fonts
> is not a good solution. It should be possible for user to define fonts for
> different script system as it is done (and works) in Mozilla. Webkit should
> respect the user choice of other fonts and it should override the default
> fonts, something than it is not possible in Webkits current version. 
> 
> 
> best regards, Ahmad
> 
Comment 8 Behnam 2006-02-11 17:41:09 PST
(In reply to comment #7)
I discussed the matter with Ahmad and other members of IRMUG and we believe the following suggestion can be used as the basis for a solution:

	1- Safari should have a separate default font selection for rtl.
	2- This default should have the ability to override the font selection of web-pages but override is inactive by default.
	3- 'RTL Override' can be activated via an icon on Safari menu bar (something like 'bug' of bug report)

With these features in Safari, an Arabic script user can do the following:
	1- Select an AAT compatible Arabic font of his or her choice in Preferences of Safari for rtl default font.
	2- Bring in the icon of 'RTL Override' onto the menu bar of Safari.
	3- Browse, and when encountered a page with broken Arabic text, click on the icon and redraw the text of the page with rtl default font.

This solution gives the priority to the font selected by page designer and the override only takes place when the text is not contextualize properly. This is much better than a 'blanket' override by some style sheet.

The current workaround of removing MS fonts which contain Arabic support is not sustainable. Times New Roman, Arial, Courier etc. are important fonts and may be necessary for a mac user to have in some works and projects not related to Arabic script environment. Specially in work environment where Mac has to exchange a lot with Windows.
Also, the same fonts, on the same Mac can be used for Arabic script in some OT based applications without any problem. Removing them from the computer is simply not a sustainable option.

Looking forward to your comments,

Regards,
Behnam
Comment 9 Neema Aghamohammadi 2006-02-11 18:21:17 PST
(In reply to comment #8)

Whereas Behnam's suggestion might work, in reality, it is far from elegant and makes the user do work that should not be necessary. Essentially, it's a very un-"Maclike" solution. An enhanced preference panel for RTL languages is probably a better solution. Basically, one shouldn't futz to make the RTL work. It should "just work."
Comment 10 Maciej Stachowiak 2006-02-11 18:38:53 PST
I think the real solution is to make the OS X text system properly understand the information in these fonts for shaping Arabic and Farsi scripts. This would best be done by filing an appropriate Radar bug against the OS X text system.
Comment 11 Behnam 2006-02-11 20:56:11 PST
(In reply to comment #10)
> I think the real solution is to make the OS X text system properly understand
> the information in these fonts for shaping Arabic and Farsi scripts. This would
> best be done by filing an appropriate Radar bug against the OS X text system.
> 

This issue is not unknown to Apple. Based on indications that I get from Apple (which is none!) and based on few things that I know about the complexity of this task, I think that interpreting Uniscribe contextualization in AAT environment is not for anytime soon. Even if the next OS (and not the next update) implements this interpretation, it will be full of bugs for quite some time and we will regret not having that icon on menu bar!
This solution is a practical solution based on the reality. And I also fully disagree about being inelegant. It's just a button that helps us out until Uniscribe is fully interpreted... bug free.
Comment 12 mitz 2006-03-09 04:33:12 PST
Created attachment 6955 [details]
ICU shaping applied

What Windows does when the font does not contain shaping instructions (and maybe even when it does) is perform shaping according to the Unicode standard (Chapter 8 and http://www.unicode.org/Public/UNIDATA/ArabicShaping.txt), relying on the fact that initial/dual/final/isolated forms are distinct characters. ATSUI does not shape unless the font contains instructions.

There was a similar problem with character mirroring (bug 3435), where the fix was to determine heuristically if ATSUI is going to mirror, and if not, invoke ICU routine to do the mirroring.

This screenshot demonstrates that a similar approach may work here. It was produced by invoking the ICU function u_shapeArabic() to on all (ATSUI-rendered) text on the page. The real fix would have to:

1) Use some heuristic (or hard-coded list of those MS Office-originating fonts) to decide that a given font contains Arabic, does not contain shaping instructions, and does contain 'cmap' entries for the contextual forms.

2) Decide when a run needs to be shaped by u_shapeArabic() (which is similar to deciding if it contains Arabic characters). This may belong in shouldUseATSU.
Comment 13 Darin Adler 2006-03-09 08:53:40 PST
(In reply to comment #12)

Mitz's approach sounds good. We should work out the details.
Comment 14 Behnam 2006-03-09 13:18:54 PST
(In reply to comment #13)
As far as I know, currently Camino and Firefox do the contextualization based on Unicode point of individual shaping of each letter. It's much better solution than the current situation but bear in mind that it doesn't support fine typographic features of the font (if present) and also the fact that this solution will be limited to the fonts which do encode the Unicode points for glyphs for Arabic Presentation forms A and B of Unicode. It will solve %99 of current problem but it shouldn't be considered as definite solution to Arabic rendering because many fine fonts do not encode presentation forms with Unicode points and rely only on OT features to produce them.
Some other languages of Arabic script like Kurdish have some letters that do not have any Unicode point for their presentation forms at all. They uniquely rely on font technology to provide them.
The definite solution would be to fully and completely support OT, directly or via translating to AAT.
Comment 15 Alexey Proskuryakov 2006-07-06 21:29:27 PDT
*** Bug 9770 has been marked as a duplicate of this bug. ***
Comment 16 mitz 2007-02-04 09:56:27 PST
Created attachment 12920 [details]
Use ICU to shape Arabic when the font does not contain shaping information
Comment 17 mitz 2007-02-04 10:05:18 PST
Bug 9770 says this is <rdar://problem/4216018>.
Comment 18 Darin Adler 2007-02-04 18:08:05 PST
Comment on attachment 12920 [details]
Use ICU to shape Arabic when the font does not contain shaping information

+            if (isArabicLamWithAlefLigature(source[shapingEnd]) && source[shapingEnd + 1] == ' ')
+                foundLigatureSpace = true;

It's more efficient and probably clear enough to just make this an assignment rather than an if statement.

+    m_charBuffer = (UChar*)(font->isSmallCaps() ? new UChar[m_run.length()] : 0);

Why do we need a type cast here?

r=me
Comment 19 mitz 2007-02-04 22:34:52 PST
Comment on attachment 12920 [details]
Use ICU to shape Arabic when the font does not contain shaping information

Thanks for the review! I am going to address the comments and post an updated patch.
Comment 20 mitz 2007-02-05 00:27:05 PST
Created attachment 12929 [details]
Use ICU to shape Arabic when the font does not contain shaping information

Besides addressing Darin's comments, I fixed two problems with the previous patch:
1) Zero-space widths are now forced to be zero width.
2) Work around a quirk in u_shapeArabic when the Lam and Aelf are the last characters in the source string. In this case, despite U_SHAPE_LENGTH_FIXED_SPACES_NEAR, the space is added at the beginning of the shaped string.
Comment 21 Darin Adler 2007-02-05 10:04:34 PST
Comment on attachment 12929 [details]
Use ICU to shape Arabic when the font does not contain shaping information

r=me
Comment 22 Alexey Proskuryakov 2007-02-05 10:57:06 PST
Committed revision 19407.
Comment 23 Behnam 2007-06-03 08:25:22 PDT
The ICU implemented does not cover Persian specific characters. I'm told there will be a new version of ICU in near future. But it is already presented here:
http://projects.foss.ir/projects/icu
Comment 24 mitz 2007-06-03 09:18:52 PDT
(In reply to comment #23)
> The ICU implemented does not cover Persian specific characters. I'm told there
> will be a new version of ICU in near future. But it is already presented here:
> http://projects.foss.ir/projects/icu
> 

This was addressed in WebKit bug 13572 and fixed in <http://trac.webkit.org/projects/webkit/changeset/21408>. As noted in that bug, the ICU patch <http://bugs.icu-project.org/trac/changeset/20705> has not been integrated yet.