Bug 6148

Summary: WebKit doesn't shape characters (like Arabic) across style changes
Product: WebKit Reporter: Rosyna <webkit-bugs@gentlyusedunderwear.com>
Component: Layout and RenderingAssignee: Myles C. Maxfield <mmaxfield@apple.com>
Status: NEW    
Severity: Normal CC: abuzakham@gmail.com, ahmad.moshref@gmail.com, amir.aharoni@mail.huji.ac.il, ap@webkit.org, bashar.harfoush@gmail.com, behdad@google.com, brettw@chromium.org, daniel@wagner-home.com, dbates@webkit.org, dermot_rourke@yahoo.com, ehsan@mozilla.com, eric@webkit.org, glenn@skynav.com, hassan_deldar@yahoo.com, ian@hixie.ch, mitz@webkit.org, mmaxfield@apple.com, mostafa.h@gmail.com, munzirtaha@gmail.com, nickshanks@nickshanks.com, noonon@gmail.com, playmobil@google.com, rik@webkit.org, seikwon.kim@samsung.com, simon.fraser@apple.com
Priority: P2 Keywords: InRadar
Version: 420+   
Hardware: All   
OS: All   
URL: http://blogs.msdn.com/michkap/archive/2005/12/19/505309.aspx
Bug Depends on:    
Bug Blocks: 47213    
Attachments:
Description Flags
Arabic Style Shaping test.
none
test from bug 17116
none
Testcase from bug 91975 none

Description From 2005-12-19 04:47:00 PST
See the attached html document. The runs should look the same, but with different colorings as they do in 
IE for Windows.
------- Comment #1 From 2005-12-19 04:47:44 PST -------
Created an attachment (id=5149) [details]
Arabic Style Shaping test.
------- Comment #2 From 2008-01-31 13:11:48 PST -------
*** Bug 17116 has been marked as a duplicate of this bug. ***
------- Comment #3 From 2008-01-31 17:20:30 PST -------
<rdar://problem/5718885>
------- Comment #4 From 2008-01-31 23:21:37 PST -------
Created an attachment (id=18841) [details]
test from bug 17116

The original tests pass in Firefox 3 beta, but some of these do not.
------- Comment #5 From 2009-01-07 20:13:11 PST -------
Filed in Chrome as http://code.google.com/p/chromium/issues/detail?id=6122
------- Comment #6 From 2011-03-25 03:19:37 PST -------
*** Bug 47213 has been marked as a duplicate of this bug. ***
------- Comment #7 From 2011-05-30 22:45:12 PST -------
If nobody is actively working on this, I'm willing to take a pass at a patch.
------- Comment #8 From 2011-05-30 23:46:02 PST -------
Go for it!
------- Comment #9 From 2011-06-03 20:57:45 PST -------
Initial investigation shows that BidiResolver::createBidiRunsForLine is breaking runs at a display:inline element boundary, e.g., span, even if the eventual embedding levels on both side of the boundary are the same. This causes RenderBlock::createLineBoxesFromBidiRuns to create distinct inline boxes across this boundary, preventing eventual complex text shaping from applying shaping context across the boundary.
------- Comment #10 From 2011-06-06 17:35:13 PST -------
Yes.  LineBoxes have a pointer back to their renderer and do not span renderers.
------- Comment #11 From 2011-06-21 10:44:12 PST -------
*** Bug 63038 has been marked as a duplicate of this bug. ***
------- Comment #12 From 2011-11-26 12:43:52 PST -------
Hi, I just encountered this bug while I am loading a page using
  QWebView *view = new QWebView();
  view->load(QUrl("test.html"));

while the test.html contains

<!DOCTYPE HTML>
<html>
<head>
<meta charset=utf8>
<style type="text/css">
p:first-letter {
  color: red;
}
</style>
</head>
<body>
The two arabic letters should apear like عر but they really show as
<p>عر</p>
and
<div><span>ع</span>ر</div>

</body>
</html>

Any update on this?
------- Comment #13 From 2011-11-26 13:55:20 PST -------
(In reply to comment #12)
> Any update on this?

Not yet.
------- Comment #14 From 2012-04-06 08:12:19 PST -------
I was wondering if there's been any progress made on this issue?  I've come accross a number of Arabic Language Learning websites that rely on the ability to highlight specific letters within a word for teaching purposes.  For example:

http://transliteration.org/quran/WebSite_CD/HighlightSample/Fram3.htm

http://arabiccomplete.com/modules_colloquial_msa/possessive_suffix_1.htm

http://www.dalilusa.com/arabic_course/exercise02.asp

Thanks.
------- Comment #15 From 2012-04-06 09:20:06 PST -------
(In reply to comment #14)
> I was wondering if there's been any progress made on this issue?

I'm trying to get this back on my priority list, and there is a good chance I will do so in the next four weeks.
------- Comment #16 From 2012-05-09 08:16:44 PST -------
Just checking in to see if any progress has been made?

A colleague of mine found a temporary work-around that may be useful to some developers in some scenarios - using the zero-width-joiner (&zwj;/&#8205;) will force the letters to join (or, at least, appear joined).  Of course, it's not ideal as you'll need to test for the browser and insert them on page load (or something along those lines).  Also, it does not work for every situation - in the example in comment #12 above, the css selector fails.  The code would look something like:

<!DOCTYPE HTML>
<html>
<head>
<meta charset=utf8>
<style type="text/css">
p:first-letter { color: red; }
p, div { font-family: times new roman; font-size: x-large; }
</style>
</head>
<body>
The two arabic letters should appear like عر but they really show as
<p>&zwj;عر</p>
and
<div><span>ع&zwj;</span>ر</div>
</body>
</html>
------- Comment #17 From 2012-07-23 02:44:57 PST -------
*** Bug 91975 has been marked as a duplicate of this bug. ***
------- Comment #18 From 2012-07-23 02:50:40 PST -------
Created an attachment (id=153763) [details]
Testcase from bug 91975

Amir Aharoni's simple testcase from bug 91975
------- Comment #19 From 2012-07-23 02:52:52 PST -------
Also tracked as http://crbug.com/138434 (http://crbug.com/6122 tracks the color issue)
------- Comment #20 From 2012-09-13 23:09:07 PST -------
this is (finally) on the top of my queue, so assigning to myself
------- Comment #21 From 2012-10-27 00:27:53 PST -------
*** Bug 77790 has been marked as a duplicate of this bug. ***
------- Comment #22 From 2012-10-27 00:29:06 PST -------
If you're still working on this Glenn, we should chat.
------- Comment #23 From 2012-12-01 02:50:34 PST -------
@Dermot Rourke,

to fix this entirely, use TWO zero-width-joiners 
e.g.
<p>عرب&#x200d;<span style="color: Red;">&#x200d;ي</span></p>

e.g.
<!DOCTYPE HTML>
<html>
<head>
<meta charset=utf8>
<style type="text/css">
body{font-size:40px;}

.test{
color:red;
font-weight:bolder;
}
</style>
</head>
<body>
The two arabic letters should apear like عربي but they really show as the following in webkit(chrome,safari)
    <p>عرب<span>ي</span></p>
solution:

<p>عرب&#x200d;<span style="color: Red;">&#x200d;ي</span></p>
</body>
</html>​

demo: http://jsfiddle.net/noonon/esz4S/2/
------- Comment #24 From 2012-12-07 10:05:32 PST -------
Hi, any update on this issue ?

My case running Version 23.0.1271.95, i tried to work around the issue with the zero-width-joiner or double zero-width-joiner, still the Arabic letter shapes appears broken.

Coloring part of Arabic words is a common practice used in Arabic learning sites, currently we recommend our users to switch to other browsers as (Firefox, IE ) in order to render pages correctly.
------- Comment #25 From 2012-12-07 22:34:28 PST -------
@Hamzeh 
if you can paste some code samples?
I might be of some help
------- Comment #26 From 2012-12-07 23:17:23 PST -------
it is not necessary to provide any more examples; the problem is well understood; however, the solution requires working around certain design limitations that aren't straightforward
------- Comment #27 From 2012-12-08 03:16:57 PST -------
(In reply to comment #25)
> @Hamzeh 
> if you can paste some code samples?
> I might be of some help

Nasser,

I've followed the demo link provided by you and the same Arabic shape problem existed with my chrome version. trying to color part of the Arabic word fails on chrome regardless of using zero-width-joiner or not.

while Firefox and IE rendering engines are working just fine with or without zero-width-joiner. this defect is vital for learning sites, since coloring part of the word is widely used to identify prefixes, suffixes, and certain language characteristics. 

I hope that someone from webkit to give it priority, it's a very important for languages such as Arabic, Persian, and Urdo.
------- Comment #28 From 2012-12-10 09:53:00 PST -------
*** Bug 104530 has been marked as a duplicate of this bug. ***
------- Comment #29 From 2014-01-31 18:28:18 PST -------
Are there any updates on this bug?
------- Comment #30 From 2014-02-03 14:27:24 PST -------
Not as of yet.
------- Comment #31 From 2014-02-10 00:51:31 PST -------
Is there a way we can give this bug a higher priority(possibly Major)? The ability to style individual characters is very important for educational and word-game apps but it's currently broken for all sites that use complex script.
------- Comment #32 From 2014-02-10 07:06:08 PST -------
(In reply to comment #31)
> Is there a way we can give this bug a higher priority(possibly Major)? The ability to style individual characters is very important for educational and word-game apps but it's currently broken for all sites that use complex script.

Raising the priority on the bug won't make it get fixed faster if there is no body willing to take on the work, which is not going to be trivial. The fundamental problem is that the character to glyph shaping process in WK doesn't make use of any context that crosses an element boundary. Fixing this will most likely introduce a performance regression in the slow text path, which is already slow.

There are at least two temporary work arounds for this that authors may use. One is document in comment #16. The other is to specifically code for presentation forms (U+FB50-FDFF, FE70-FEFC). This isn't ideal, but it works.
------- Comment #33 From 2014-02-10 11:46:37 PST -------
(In reply to comment #32)
> Raising the priority on the bug won't make it get fixed faster if there is no body willing to take on the work, which is not going to be trivial. The fundamental problem is that the character to glyph shaping process in WK doesn't make use of any context that crosses an element boundary. Fixing this will most likely introduce a performance regression in the slow text path, which is already slow.
> 
> There are at least two temporary work arounds for this that authors may use. One is document in comment #16. The other is to specifically code for presentation forms (U+FB50-FDFF, FE70-FEFC). This isn't ideal, but it works.

The suggested fix in comment #16 does not produce the correct rendering. Observe the difference in rendering the last character in http://jsfiddle.net/noonon/esz4S/2/ between Chrome and Firefox. 

For your other suggestion, I think shifting the responsibility of producing the correct glyph to user scripts will add an unnecessary complication. IMO, this should be transparent to the web developer.

Any web developer who currently wants to style individual complex characters in Webkit is stuck. I was hoping giving the bug a higher priority would make it more visible and more likely to be picked up and fixed.
------- Comment #34 From 2014-02-10 11:49:21 PST -------
(In reply to comment #33)
> (In reply to comment #32)
> > Raising the priority on the bug won't make it get fixed faster if there is no body willing to take on the work, which is not going to be trivial. The fundamental problem is that the character to glyph shaping process in WK doesn't make use of any context that crosses an element boundary. Fixing this will most likely introduce a performance regression in the slow text path, which is already slow.
> > 
> > There are at least two temporary work arounds for this that authors may use. One is document in comment #16. The other is to specifically code for presentation forms (U+FB50-FDFF, FE70-FEFC). This isn't ideal, but it works.
> 
> The suggested fix in comment #16 does not produce the correct rendering. Observe the difference in rendering the last character in http://jsfiddle.net/noonon/esz4S/2/ between Chrome and Firefox. 

That was fixed very recently:

  https://code.google.com/p/chromium/issues/detail?id=311372


> For your other suggestion, I think shifting the responsibility of producing the correct glyph to user scripts will add an unnecessary complication. IMO, this should be transparent to the web developer.

True.  We understand that.

> Any web developer who currently wants to style individual complex characters in Webkit is stuck. I was hoping giving the bug a higher priority would make it more visible and more likely to be picked up and fixed.
------- Comment #35 From 2014-02-10 12:18:43 PST -------
(In reply to comment #34)
> > The suggested fix in comment #16 does not produce the correct rendering. Observe the difference in rendering the last character in http://jsfiddle.net/noonon/esz4S/2/ between Chrome and Firefox. 
> 
> That was fixed very recently:

Thank you! I'm looking forward to trying it. I hope you guys can still make a comprehensive fix for this bug so that we wouldn't even need to use zero-width joiners to display individually styled characters.