Bug 110230

Summary: [harfbuzz] Always pass correct text direction to HarfBuzz
Product: WebKit Reporter: Behdad Esfahbod <behdad>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: ahmad.saleem792, aperez, ap, bashi, bugs-noreply, csaavedra, d-r, efidler, glenn, jshin, mcatanzaro, mmaxfield
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=167956

Description Behdad Esfahbod 2013-02-19 09:01:22 PST
The code in ./Source/WebCore/platform/graphics/harfbuzz/HarfBuzzShaper.cpp currently doesn't  pass text direction to HarfBuzz when webkit is measuring text.  I'm not sure whether this is a webkit limitation or just the harfbuzz layer.  But we should fix this.  FWIW, we should *always* pass the correct direction, script, and language (if known) to harfbuzz.

This is followup from bug 110145.
Comment 1 Glenn Adams 2013-02-19 09:48:58 PST
(In reply to comment #0)
> FWIW, we should *always* pass the correct direction, script, and language (if known) to harfbuzz.

How are you defining "correct"? Do you have a counter example showing the passing of an "incorrect" direction, script, or language?

Since script is not specified by the author and the HTML5 et al specs do not formally define an algorithm for mapping some sequence of text to script, then how are you defining "correct" in this regard?

How are you dealing with cases where some font formats define multiple values for "script" tags based on different versions of the font technology? For example, see [1] (script tag 'dev2' with post-2005 specifications) versus [2][3] (script tag 'deva' with pre-2005 implementations):

[1] http://www.microsoft.com/typography/OpenTypeDev/devanagari/intro.htm
[2] http://lb1.www.ms.akadns.net/typography/otfntdev/devanot/
[3] http://lb1.www.ms.akadns.net/typography/otfntdev/devanot/appen.htm
Comment 2 Behdad Esfahbod 2013-02-19 14:19:30 PST
(In reply to comment #1)
> (In reply to comment #0)
> > FWIW, we should *always* pass the correct direction, script, and language (if known) to harfbuzz.
> 
> How are you defining "correct"?

Correct is whatever the Unicode Bidirectional Algorithm says the piece of text should take.  UBA is run before shaping happens.


> Do you have a counter example showing the passing of an "incorrect" direction, script, or language?

Yes.  Normally, Arabic runs right-to-left.  But you can force it to go left-to-right using special Unicode characters (aka LRO) or the <bdo> tag.  When Arabic runs left-to-right, it "shapes" to different glyphs than when it goes right-to-left, because the shaping is dependent on what actually comes to the left and right of each character.  If you measure the text without telling HarfBuzz it's left-to-right, it will assume that it's right-to-left, because that's the default direction for Arabic.  And you get wrong results.

Try selection this piece of text:

data:text/html;charset=utf-8,<html><body style="font-size: 700px"><bdo dir=ltr>%D8%B3%D9%84%D9%85</body>

The desired behavior is that it should behave the same as this:

data:text/html;charset=utf-8,<html><body style="font-size: 700px">%D9%85%D9%84%D8%B3</body>

The second test has the Arabic characters reversed, and running right-to-left.  The first one has them forced left-to-right.


> Since script is not specified by the author and the HTML5 et al specs do not formally define an algorithm for mapping some sequence of text to script, then how are you defining "correct" in this regard?

Right.  Unicode defines Script per character.  All text rendering implementations have heuristics to assign script to characters of type Script=Common and Script=Inherited.  They take their property from surrounding characters.  For example, a U+002E FULL STOP character assumes the Script=Arabic property when used in Arabic text.


> How are you dealing with cases where some font formats define multiple values for "script" tags based on different versions of the font technology? For example, see [1] (script tag 'dev2' with post-2005 specifications) versus [2][3] (script tag 'deva' with pre-2005 implementations):

HarfBuzz knows about those.  You can ignore it.  What we're interested is the Unicode script assigned to the piece of text  This, again, can be guess by HarfBuzz, except for the case that the whole piece of text has Script=Common or Script=Inherited.  This can result in inferior shaping, but is not as serious as letting HarfBuzz guess text direction, which has much more severe implications.
Comment 3 Glenn Adams 2013-02-19 14:45:37 PST
(In reply to comment #2)
> (In reply to comment #1)
> This can result in inferior shaping, but is not as serious as letting HarfBuzz guess text direction, which has much more severe implications.

I already understand the processing. I was just trying to get to the bottom of this bug, which I understand now as a failure to pass the UBA determined directionality (and other lang/script info) to HB. Are you working on or planning on working on this bug?
Comment 4 Behdad Esfahbod 2013-02-19 14:48:53 PST
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > This can result in inferior shaping, but is not as serious as letting HarfBuzz guess text direction, which has much more severe implications.
> 
> I already understand the processing. I was just trying to get to the bottom of this bug, which I understand now as a failure to pass the UBA determined directionality (and other lang/script info) to HB.

Yes.  It only happens when measuring text though.  Rendering is fine.  That happened as a result of this issue:

  https://code.google.com/p/chromium/issues/detail?id=158969

> Are you working on or planning on working on this bug?

I'm studying the code to fix that, yes.  Since bashi is away, I don't think anyone else will be looking into it if I don't.
Comment 5 Glenn Adams 2013-02-19 14:58:58 PST
(In reply to comment #4)
> (In reply to comment #3)
> > Are you working on or planning on working on this bug?
> 
> I'm studying the code to fix that, yes.  Since bashi is away, I don't think anyone else will be looking into it if I don't.

If you don't have time, I could fix it. Sounds straightforward.
Comment 6 Behdad Esfahbod 2013-02-25 22:00:19 PST
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > Are you working on or planning on working on this bug?
> > 
> > I'm studying the code to fix that, yes.  Since bashi is away, I don't think anyone else will be looking into it if I don't.
> 
> If you don't have time, I could fix it. Sounds straightforward.

That's very nice of you to offer.  I don't have a Chrome build, and won't have for a while (in the middle of relocation without a beefy machine).  So if you can help, that's really appreciated.

I did build webkitgtk today though, and the results in webkitgtk are different from Chrome.  It looks like webkitgtk build doesn't respect <bdo> at all.

At any rate, yes, lets try to nail this down.  If you can take a look and see what you can find, that's appreciated.  I'll also take a look at it with my webkitgtk build.
Comment 7 Glenn Adams 2013-02-25 22:04:19 PST
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > (In reply to comment #3)
> > > > Are you working on or planning on working on this bug?
> > > 
> > > I'm studying the code to fix that, yes.  Since bashi is away, I don't think anyone else will be looking into it if I don't.
> > 
> > If you don't have time, I could fix it. Sounds straightforward.
> 
> That's very nice of you to offer.  I don't have a Chrome build, and won't have for a while (in the middle of relocation without a beefy machine).  So if you can help, that's really appreciated.
> 
> I did build webkitgtk today though, and the results in webkitgtk are different from Chrome.  It looks like webkitgtk build doesn't respect <bdo> at all.
> 
> At any rate, yes, lets try to nail this down.  If you can take a look and see what you can find, that's appreciated.  I'll also take a look at it with my webkitgtk build.

i'll try to look at this something in the next few days
Comment 8 Dominik Röttsches (drott) 2013-03-26 00:58:47 PDT
(In reply to comment #7)

> i'll try to look at this something in the next few days

Glenn, if you don't find the time, pls let me know. I would like to try in that case.
Comment 9 Glenn Adams 2013-03-26 07:44:28 PDT
(In reply to comment #8)
> (In reply to comment #7)
> 
> > i'll try to look at this something in the next few days
> 
> Glenn, if you don't find the time, pls let me know. I would like to try in that case.

go ahead Dominik, i'm occupied with some other bugs at present
Comment 10 Ahmad Saleem 2023-03-31 19:02:07 PDT
@ap - Do any of the ports use Harfbuzz? Because I am able to find this in WebKit Source only but not 'HarfBuzzShaper.cpp'.

https://github.com/WebKit/WebKit/tree/main/Source/WebCore/platform/graphics/harfbuzz
Comment 11 Alexey Proskuryakov 2023-03-31 21:17:32 PDT
I would have said no, but seeing recent changes in this directory from Apple contributors makes me feel uncertain. Myles will know for certain.
Comment 12 Michael Catanzaro 2023-04-01 05:50:42 PDT
I think all ports use Harfbuzz except Apple ports. But this bug report is 10 years old, so it's no surprise that HarfBuzzShaper.cpp does not exist anymore.

I have no clue whether this bug is still a problem or not,
Comment 13 Adrian Perez 2023-04-01 06:04:41 PDT
(In reply to Michael Catanzaro from comment #12)
> I think all ports use Harfbuzz except Apple ports. But this bug report is 10
> years old, so it's no surprise that HarfBuzzShaper.cpp does not exist
> anymore.

The file was removed in bug #167956, the patch moved some code around as
well, but from a quick glance at it I am not able to decide if the issue
is still there or not... I am far from knowing anything about HarfBuzz,
but maybe that serves as a starting point?