Bug 110230 - [harfbuzz] Always pass correct text direction to HarfBuzz
Summary: [harfbuzz] Always pass correct text direction to HarfBuzz
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-19 09:01 PST by Behdad Esfahbod
Modified: 2017-03-11 11:04 PST (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Behdad Esfahbod 2013-02-19 09:01:22 PST
The code in ./Source/WebCore/platform/graphics/harfbuzz/HarfBuzzShaper.cpp currently doesn't  pass text direction to HarfBuzz when webkit is measuring text.  I'm not sure whether this is a webkit limitation or just the harfbuzz layer.  But we should fix this.  FWIW, we should *always* pass the correct direction, script, and language (if known) to harfbuzz.

This is followup from bug 110145.
Comment 1 Glenn Adams 2013-02-19 09:48:58 PST
(In reply to comment #0)
> FWIW, we should *always* pass the correct direction, script, and language (if known) to harfbuzz.

How are you defining "correct"? Do you have a counter example showing the passing of an "incorrect" direction, script, or language?

Since script is not specified by the author and the HTML5 et al specs do not formally define an algorithm for mapping some sequence of text to script, then how are you defining "correct" in this regard?

How are you dealing with cases where some font formats define multiple values for "script" tags based on different versions of the font technology? For example, see [1] (script tag 'dev2' with post-2005 specifications) versus [2][3] (script tag 'deva' with pre-2005 implementations):

[1] http://www.microsoft.com/typography/OpenTypeDev/devanagari/intro.htm
[2] http://lb1.www.ms.akadns.net/typography/otfntdev/devanot/
[3] http://lb1.www.ms.akadns.net/typography/otfntdev/devanot/appen.htm
Comment 2 Behdad Esfahbod 2013-02-19 14:19:30 PST
(In reply to comment #1)
> (In reply to comment #0)
> > FWIW, we should *always* pass the correct direction, script, and language (if known) to harfbuzz.
> 
> How are you defining "correct"?

Correct is whatever the Unicode Bidirectional Algorithm says the piece of text should take.  UBA is run before shaping happens.


> Do you have a counter example showing the passing of an "incorrect" direction, script, or language?

Yes.  Normally, Arabic runs right-to-left.  But you can force it to go left-to-right using special Unicode characters (aka LRO) or the <bdo> tag.  When Arabic runs left-to-right, it "shapes" to different glyphs than when it goes right-to-left, because the shaping is dependent on what actually comes to the left and right of each character.  If you measure the text without telling HarfBuzz it's left-to-right, it will assume that it's right-to-left, because that's the default direction for Arabic.  And you get wrong results.

Try selection this piece of text:

data:text/html;charset=utf-8,<html><body style="font-size: 700px"><bdo dir=ltr>%D8%B3%D9%84%D9%85</body>

The desired behavior is that it should behave the same as this:

data:text/html;charset=utf-8,<html><body style="font-size: 700px">%D9%85%D9%84%D8%B3</body>

The second test has the Arabic characters reversed, and running right-to-left.  The first one has them forced left-to-right.


> Since script is not specified by the author and the HTML5 et al specs do not formally define an algorithm for mapping some sequence of text to script, then how are you defining "correct" in this regard?

Right.  Unicode defines Script per character.  All text rendering implementations have heuristics to assign script to characters of type Script=Common and Script=Inherited.  They take their property from surrounding characters.  For example, a U+002E FULL STOP character assumes the Script=Arabic property when used in Arabic text.


> How are you dealing with cases where some font formats define multiple values for "script" tags based on different versions of the font technology? For example, see [1] (script tag 'dev2' with post-2005 specifications) versus [2][3] (script tag 'deva' with pre-2005 implementations):

HarfBuzz knows about those.  You can ignore it.  What we're interested is the Unicode script assigned to the piece of text  This, again, can be guess by HarfBuzz, except for the case that the whole piece of text has Script=Common or Script=Inherited.  This can result in inferior shaping, but is not as serious as letting HarfBuzz guess text direction, which has much more severe implications.
Comment 3 Glenn Adams 2013-02-19 14:45:37 PST
(In reply to comment #2)
> (In reply to comment #1)
> This can result in inferior shaping, but is not as serious as letting HarfBuzz guess text direction, which has much more severe implications.

I already understand the processing. I was just trying to get to the bottom of this bug, which I understand now as a failure to pass the UBA determined directionality (and other lang/script info) to HB. Are you working on or planning on working on this bug?
Comment 4 Behdad Esfahbod 2013-02-19 14:48:53 PST
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > This can result in inferior shaping, but is not as serious as letting HarfBuzz guess text direction, which has much more severe implications.
> 
> I already understand the processing. I was just trying to get to the bottom of this bug, which I understand now as a failure to pass the UBA determined directionality (and other lang/script info) to HB.

Yes.  It only happens when measuring text though.  Rendering is fine.  That happened as a result of this issue:

  https://code.google.com/p/chromium/issues/detail?id=158969

> Are you working on or planning on working on this bug?

I'm studying the code to fix that, yes.  Since bashi is away, I don't think anyone else will be looking into it if I don't.
Comment 5 Glenn Adams 2013-02-19 14:58:58 PST
(In reply to comment #4)
> (In reply to comment #3)
> > Are you working on or planning on working on this bug?
> 
> I'm studying the code to fix that, yes.  Since bashi is away, I don't think anyone else will be looking into it if I don't.

If you don't have time, I could fix it. Sounds straightforward.
Comment 6 Behdad Esfahbod 2013-02-25 22:00:19 PST
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > Are you working on or planning on working on this bug?
> > 
> > I'm studying the code to fix that, yes.  Since bashi is away, I don't think anyone else will be looking into it if I don't.
> 
> If you don't have time, I could fix it. Sounds straightforward.

That's very nice of you to offer.  I don't have a Chrome build, and won't have for a while (in the middle of relocation without a beefy machine).  So if you can help, that's really appreciated.

I did build webkitgtk today though, and the results in webkitgtk are different from Chrome.  It looks like webkitgtk build doesn't respect <bdo> at all.

At any rate, yes, lets try to nail this down.  If you can take a look and see what you can find, that's appreciated.  I'll also take a look at it with my webkitgtk build.
Comment 7 Glenn Adams 2013-02-25 22:04:19 PST
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > (In reply to comment #3)
> > > > Are you working on or planning on working on this bug?
> > > 
> > > I'm studying the code to fix that, yes.  Since bashi is away, I don't think anyone else will be looking into it if I don't.
> > 
> > If you don't have time, I could fix it. Sounds straightforward.
> 
> That's very nice of you to offer.  I don't have a Chrome build, and won't have for a while (in the middle of relocation without a beefy machine).  So if you can help, that's really appreciated.
> 
> I did build webkitgtk today though, and the results in webkitgtk are different from Chrome.  It looks like webkitgtk build doesn't respect <bdo> at all.
> 
> At any rate, yes, lets try to nail this down.  If you can take a look and see what you can find, that's appreciated.  I'll also take a look at it with my webkitgtk build.

i'll try to look at this something in the next few days
Comment 8 Dominik Röttsches (drott) 2013-03-26 00:58:47 PDT
(In reply to comment #7)

> i'll try to look at this something in the next few days

Glenn, if you don't find the time, pls let me know. I would like to try in that case.
Comment 9 Glenn Adams 2013-03-26 07:44:28 PDT
(In reply to comment #8)
> (In reply to comment #7)
> 
> > i'll try to look at this something in the next few days
> 
> Glenn, if you don't find the time, pls let me know. I would like to try in that case.

go ahead Dominik, i'm occupied with some other bugs at present