Bug 77568

Summary: [Cocoa] Glyph lookup should be language-sensitive (specifically between Yiddish and Hebrew)
Product: WebKit Reporter: Ze'ev <zeevclem>
Component: Layout and RenderingAssignee: Myles C. Maxfield <mmaxfield>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, bashi, benjamin, buildbot, cdumez, cmarcelo, commit-queue, dbates, dino, efidler, ews-watchlist, falken, glenn, Hironori.Fujii, jdaggett, jonlee, jshin, koivisto, mail, mitz, mmaxfield, rniwa, shikisuen, simon.fraser, Stevan_White, thorton, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: 525.x (Safari 3.2)   
Hardware: Mac (Intel)   
OS: OS X 10.7   
See Also: https://bugs.webkit.org/show_bug.cgi?id=179102
Bug Depends on: 10874    
Bug Blocks: 179104, 179105    
Attachments:
Description Flags
Zip file containing the bug illustration files mentioned in the Description.
none
Serbian vs. Eastern Cyrillic
none
Serbian file as displayed by Konqueror 4.7.4
none
Serbian file as displayed by FireFox Nightly
none
Patch
none
Patch
none
Archive of layout-test-results from ews102 for mac-elcapitan
none
Archive of layout-test-results from ews107 for mac-elcapitan-wk2
none
Archive of layout-test-results from ews114 for mac-elcapitan
none
Archive of layout-test-results from ews124 for ios-simulator-wk2
none
Patch
none
Patch
none
Patch
none
Patch
none
Archive of layout-test-results from ews105 for mac-elcapitan-wk2
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
simon.fraser: review+
Doesn't affect font hash
none
Doesn't affect font hash
simon.fraser: review-
Doesn't affect font hash
none
Doesn't affect font hash
simon.fraser: review+
Doesn't affect font hash
none
Doesn't affect font hash none

Description Ze'ev 2012-02-01 12:10:27 PST
Created attachment 124985 [details]
Zip file containing the bug illustration files mentioned in the Description.

Both the Hebrew and Yiddish language use the same Hebrew character set. However, in Yiddish, for the "pasekh tsvey yudn" and "khirik yud" character combinations, the convention is to "raise" the vowel. This is done for Yiddish only, not Hebrew. It is possible on an HTML page to differentiate Yiddish text from Hebrew text using the "lang" attribute and to display the Yiddish glyphs when 'lang="yi"' is specified and to display the Hebrew glyphs when 'lang="he"' is specified; however, WebKit/Safari does not do this. This is not a WebKit-only issue, most browsers have this same bug. Firefox only recently fixed the bug in Firefox 10.0beta (see this Firefox bug, in particular the last few comments in the comment thread: https://bugzilla.mozilla.org/show_bug.cgi?id=694205). 

In order to demonstrate the problem and to illustrate the correct behaviour, I have attached a zip file with the following:

1. yiddish-hebrew.html : A basic example to illustrate the problem.
2. Firefox.jpg : A screen shot of the correct Hebrew & Yiddish vowel positioning in Firefox 10.0beta.
3. Safari.jpg : A screen shot showing the correct Hebrew & INCORRECT Yiddish vowel positioning in Safari.
4. FreeSans ttf files (4) : The FreeSans font used in the yiddish-hebrew.html example. FreeSans is one of the fonts in GNU FreeFont (http://www.gnu.org/software/freefont/) and the ttf files were built from the current SVN source files. These files should be used to replicate the issue because not all fonts provide support for the different Hebrew/Yiddish glyphs.
Comment 1 Steve White 2012-02-01 13:17:59 PST
Ze'ev was using a recent build of FreeSerif, from the sources available via SVN at 
   https://savannah.gnu.org/svn/?group=freefont
Comment 2 Steve White 2012-02-01 13:27:17 PST
Created attachment 125002 [details]
Serbian vs. Eastern Cyrillic

Uses 'lang' attributes to select font features specific to a language.
This one uses any recent version of the DejaVu fonts.
Comment 3 Steve White 2012-02-01 13:29:08 PST
Created attachment 125003 [details]
Serbian file as displayed by Konqueror 4.7.4

Note that the Russian and Serbian lines are identical.
That's not what is intended.
Comment 4 Steve White 2012-02-01 13:31:33 PST
Created attachment 125005 [details]
Serbian file as displayed by FireFox Nightly

FireFox nightly 12.0a1 (2012-01-31)
(This is a quite recent improvement in FireFox.)

Note that the Cursive letters are all different,
and the 'de' letter is different in the normal face.
This is desirable.
Comment 5 Steve White 2012-02-01 13:35:08 PST
So here is another example of the same problem as Ze'ev describes, except with a more accessible font, and this time Cyrillic.

Again the HTML 'lang' specifies a language to pick out font features for the given script.

Near as I can tell, all WebKit based browsers show this wrong behavior (although Pango and other font rendering libraries make the distinction quite well.)

Cheers!
Comment 6 Matt Falkenhagen 2012-02-02 20:45:32 PST
Does this mean it's not sufficient to just choose the correct font for the language, which the work on bug 10874 has basically provided, but we must also ensure the correct glyphs within the font are chosen for the language?
Comment 7 Steve White 2012-02-03 04:07:50 PST
Hi Matt!

Not sure just who you mean by "we" here.

And I'm a little bewildered by the suggestion that this would *force* somebody to do something.  This is an *enabling* technology.

* If "we" means "we web page designers"
    We may be providing a web font with these special features.
    We do so by specifying the language with the 'lang' tag.
    But no, we don't *have to*.
    On the other hand, if we can rely upon browsers to do the right thing,
    this *allows* us to use the language-specific features of the font.

* If "we" means "we WebKit developers"
    We simply pass the "lang" tag properly to the font rendering layer.
    It will do the rest.
    And our product will look good compared to the competition.
    But no, we don't have to excel, or even keep up with the competition.
    
As to "correct font": yes, separate fonts can be used to display Hebrew and Yiddish, even though they differ only in the placement of a few letters.  Likewise with various versions of Cyrillic, and even Latin.  
It's a waste, in several ways.

This longstanding OpenType technology *enables* a simple way to make fine typographic distinctions between languages.  If you're not a native reader of those languages, it seems a little delicate.  But to native readers, it can make a big difference.

Let me emphasize that this is easy.  Just properly pass the 'lang' attribute down to the underlying font rendering libraries.
Comment 8 Kenichi Ishibashi 2012-02-03 04:51:29 PST
Hi Steve,

(In reply to comment #7)
> This longstanding OpenType technology *enables* a simple way to make fine typographic distinctions between languages.  If you're not a native reader of those languages, it seems a little delicate.  But to native readers, it can make a big difference.

I agree that it could be a big difference.

> Let me emphasize that this is easy.  Just properly pass the 'lang' attribute down to the underlying font rendering libraries.

What is the exact mean when you say "passing 'lang' attribute to the underling libraries"? WebKit uses different font rendering libraries for each port. For example, Mac Safari/Chrome use CoreText, Chrome Win uses Uniscribe, and Chrome Linux uses (old) HarfBuzz. Which APIs we can use to solve the problem? I'm happy to create patches if it's actually easy as you said. If you can tell us what OpenType feature should we use, it would be also helpful.
Comment 9 Matt Falkenhagen 2012-02-03 05:00:11 PST
(In reply to comment #7)
Steve, thanks for the explanation. In my comment, I had just meant: "to fix the bug, we [people who want to fix the bug] must do ___."
Comment 10 Alexey Proskuryakov 2012-02-03 08:59:05 PST
The feature is locl, see <http://en.wikipedia.org/wiki/OpenType_feature_tag_list#OpenType_Typographic_Features>.

Implementing this is likely non-trivial because of glyph cache, which will need to be made aware of this. The Mozilla bug Ze'ev mentioned did not have code changes, but it has links to actual bugs with code changes.

Also, I don't know whether CoreText even supports this feature.
Comment 11 Steve White 2012-02-03 14:52:53 PST
Hi Kenichi, 

You asked
    What is the exact mean when you say "passing 'lang' attribute 
    to the underling libraries

Of course that depends on the underlying libraries.  But the 'lang' attribute is just a string from a set of standard language codes.  The only complication I can imagine here is that some APIs may be picky about the particular languge code standard they accept (but I don't know that.)

The DOM-level *intent* is to mark the text content of the entity as being of a different language than that of enclosing entities.  Just how that translates into calls to the underlying formatting software -- you would surely know better than I.  In principle it should be easy though (barring the ususual premature optimizations etc).

You also asked 
     "what OpenType feature should we use"

All OpenType features are ultimately activated by a list of 
      script( lang1, lang2, ... )
Where one of the languages may be the 'default' language.
The script is detected from the Unicode range of the input characters.
The language is (in the present case) derived from 'lang' tags.
The feature is activated if the Unicode of the input characters matches the script and the specified language matches one of the languages in the list.

So the answer is "this applies to all OpenType feature lookups".

In the Cyrillic example I provided, DejaVu Serif uses the 'locl' feature (this one feature is specifically intended to substitute individual glyphs based on language).
In the Hebrew example Ze'ev gave, FreeSans uses 'mark' and 'mkmk' positioning that depend upon language.  
I'm currently working on another font in which various Indic lookups ('pres', 'blwf', etc) depend upon language.

As to the specifics of API calls you need to make: you would know better than I.
I can only tell you: they are certainly present on each platform.

Let me know if I can provide further examples.  I have, e.g. some examples of pango-view making different output based upon language specification.

Cheers!
Comment 12 Alexey Proskuryakov 2013-06-01 18:49:19 PDT
*** Bug 117109 has been marked as a duplicate of this bug. ***
Comment 13 Alexey Proskuryakov 2013-08-05 09:45:22 PDT
<rdar://problem/14649193>
Comment 14 Shiki Suen 2016-03-06 11:45:34 PST
I tested WebKit-SVN-r197635 nightly build and have found it supports at least 'palt' feature.
Comment 15 Myles C. Maxfield 2016-03-09 16:56:15 PST
With CoreText, this is probably done with kCTLanguageAttributeName instead of font features.
Comment 16 Myles C. Maxfield 2016-03-09 17:19:46 PST
This is implemented in the text shaping step, so that is the part which should be updated to implement this fix. This needs to happen in both the simple text code path and the complex text code path.
Comment 17 Myles C. Maxfield 2016-03-09 17:21:02 PST
Background info: there are some font features which are enabled by default. These features may (or may not!) consult with a language in order to perform their shaping.
Comment 18 Myles C. Maxfield 2017-10-31 12:59:09 PDT
Created attachment 325477 [details]
Patch
Comment 19 Myles C. Maxfield 2017-10-31 13:09:26 PDT
Created attachment 325480 [details]
Patch
Comment 20 mitz 2017-10-31 13:12:58 PDT
Comment on attachment 325480 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325480&action=review

> Source/WebCore/ChangeLog:21
> +        (WebCore::FontCascade::characterRangeCodePath): Any language which has language-specific shaping (such as
> +        Hebrew) needs to use the complex path.

Sounds like this patch is making a bad tradeoff: it is slowing down rendering of all Hebrew text just because of something that happens on fewer than one in ten thousand webpages. Is that so?
Comment 21 Myles C. Maxfield 2017-10-31 13:24:53 PDT
(In reply to mitz from comment #20)
> Comment on attachment 325480 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=325480&action=review
> 
> > Source/WebCore/ChangeLog:21
> > +        (WebCore::FontCascade::characterRangeCodePath): Any language which has language-specific shaping (such as
> > +        Hebrew) needs to use the complex path.
> 
> Sounds like this patch is making a bad tradeoff: it is slowing down
> rendering of all Hebrew text just because of something that happens on fewer
> than one in ten thousand webpages. Is that so?

The character scan check must happen before we know which font will be used to render the character, and we need to know which font will be used to render the character before we know whether or not there could be language-specific alternates. It seems that the only way to solve this problem is to either 1) render Hebrew / Yiddish text incorrectly, or 2) completely eliminate one of our two text codepaths.
Comment 22 Myles C. Maxfield 2017-10-31 13:26:03 PDT
(In reply to Myles C. Maxfield from comment #21)
> (In reply to mitz from comment #20)
> > Comment on attachment 325480 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=325480&action=review
> > 
> > > Source/WebCore/ChangeLog:21
> > > +        (WebCore::FontCascade::characterRangeCodePath): Any language which has language-specific shaping (such as
> > > +        Hebrew) needs to use the complex path.
> > 
> > Sounds like this patch is making a bad tradeoff: it is slowing down
> > rendering of all Hebrew text just because of something that happens on fewer
> > than one in ten thousand webpages. Is that so?
> 
> The character scan check must happen before we know which font will be used
> to render the character, and we need to know which font will be used to
> render the character before we know whether or not there could be
> language-specific alternates. It seems that the only way to solve this
> problem is to either 1) render Hebrew / Yiddish text incorrectly, or 2)
> completely eliminate one of our two text codepaths.

I guess we could also fix this for Devanagari but not for Hebrew because Devanagari already uses the complex text codepath.
Comment 23 mitz 2017-10-31 13:30:32 PDT
(In reply to Myles C. Maxfield from comment #21)
> (In reply to mitz from comment #20)
> > Comment on attachment 325480 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=325480&action=review
> > 
> > > Source/WebCore/ChangeLog:21
> > > +        (WebCore::FontCascade::characterRangeCodePath): Any language which has language-specific shaping (such as
> > > +        Hebrew) needs to use the complex path.
> > 
> > Sounds like this patch is making a bad tradeoff: it is slowing down
> > rendering of all Hebrew text just because of something that happens on fewer
> > than one in ten thousand webpages. Is that so?
> 
> The character scan check must happen before we know which font will be used
> to render the character, and we need to know which font will be used to
> render the character before we know whether or not there could be
> language-specific alternates. It seems that the only way to solve this
> problem is to either 1) render Hebrew / Yiddish text incorrectly, or 2)
> completely eliminate one of our two text codepaths.

I don’t know that there are indeed only two possible choices here, but my understanding is that we shouldn’t make code changes that slow things down.
Comment 24 Build Bot 2017-10-31 14:02:29 PDT
Comment on attachment 325480 [details]
Patch

Attachment 325480 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/5054208

New failing tests:
fast/text/international/pop-up-button-text-alignment-and-direction.html
fast/text/international/bidi-LDB-2-formatting-characters.html
svg/text/bidi-reorder-value-lists.svg
svg/W3C-SVG-1.1-SE/text-intro-09-b.svg
fast/ruby/ruby-expansion-cjk-3.html
Comment 25 Build Bot 2017-10-31 14:02:31 PDT
Created attachment 325488 [details]
Archive of layout-test-results from ews102 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews102  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 26 Build Bot 2017-10-31 14:15:32 PDT
Comment on attachment 325480 [details]
Patch

Attachment 325480 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/5054341

New failing tests:
fast/text/international/pop-up-button-text-alignment-and-direction.html
fast/ruby/ruby-expansion-cjk-3.html
fast/text/international/bidi-LDB-2-formatting-characters.html
svg/text/bidi-reorder-value-lists.svg
svg/W3C-SVG-1.1-SE/text-intro-09-b.svg
fast/ruby/ruby-expansion-cjk-2.html
Comment 27 Build Bot 2017-10-31 14:15:34 PDT
Created attachment 325492 [details]
Archive of layout-test-results from ews107 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 28 Build Bot 2017-10-31 14:22:59 PDT
Comment on attachment 325480 [details]
Patch

Attachment 325480 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/5054281

New failing tests:
fast/text/international/pop-up-button-text-alignment-and-direction.html
fast/text/international/bidi-LDB-2-formatting-characters.html
svg/text/bidi-reorder-value-lists.svg
svg/W3C-SVG-1.1-SE/text-intro-09-b.svg
fast/ruby/ruby-expansion-cjk-3.html
Comment 29 Build Bot 2017-10-31 14:23:00 PDT
Created attachment 325494 [details]
Archive of layout-test-results from ews114 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 30 Build Bot 2017-10-31 14:30:21 PDT
Comment on attachment 325480 [details]
Patch

Attachment 325480 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/5054363

New failing tests:
fast/text/international/pop-up-button-text-alignment-and-direction.html
svg/W3C-SVG-1.1-SE/text-intro-09-b.svg
Comment 31 Build Bot 2017-10-31 14:30:23 PDT
Created attachment 325497 [details]
Archive of layout-test-results from ews124 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.6
Comment 32 Myles C. Maxfield 2017-10-31 14:51:02 PDT
Comment on attachment 325480 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325480&action=review

>>>>> Source/WebCore/ChangeLog:21
>>>>> +        Hebrew) needs to use the complex path.
>>>> 
>>>> Sounds like this patch is making a bad tradeoff: it is slowing down rendering of all Hebrew text just because of something that happens on fewer than one in ten thousand webpages. Is that so?
>>> 
>>> The character scan check must happen before we know which font will be used to render the character, and we need to know which font will be used to render the character before we know whether or not there could be language-specific alternates. It seems that the only way to solve this problem is to either 1) render Hebrew / Yiddish text incorrectly, or 2) completely eliminate one of our two text codepaths.
>> 
>> I guess we could also fix this for Devanagari but not for Hebrew because Devanagari already uses the complex text codepath.
> 
> I don’t know that there are indeed only two possible choices here, but my understanding is that we shouldn’t make code changes that slow things down.

OpenType allows language dependent behavior for most scripts – see, for example, the Writing System Tags appendix for Latin/Greek/Cyrillic: https://www.microsoft.com/typography/OpenTypeDev/standard/intro.htm
Comment 33 Myles C. Maxfield 2017-10-31 20:50:09 PDT
Created attachment 325541 [details]
Patch
Comment 34 Myles C. Maxfield 2017-10-31 20:56:42 PDT
Comment on attachment 325541 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325541&action=review

> Source/WebCore/platform/graphics/FontCascade.cpp:333
> +        simpleIterator.finalize(nullptr);

All the calls to finalize() are probably not correct, because we should only finalize when we are iterating to the very end of the TextRun.
Comment 35 Myles C. Maxfield 2017-10-31 21:03:53 PDT
Comment on attachment 325541 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325541&action=review

I need to split this patch up into a few different pieces:
1) Refactoring GlyphBuffer
2) Using CTFontTransformGlyphsWithLanguage(), but with no handler and no initial advance
3) Adding a handler to CTFontTransformGlyphsWithLanguage()
4) Adding initial advance support to WidthIterator

> Source/WebCore/platform/graphics/WidthIterator.cpp:127
> +        widthDifference += m_rtlInitialAdvance.width();

This is not correct. Initial advances are paint advances, not layout advances.
Comment 36 Myles C. Maxfield 2017-10-31 21:04:28 PDT
Comment on attachment 325541 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325541&action=review

>> Source/WebCore/platform/graphics/FontCascade.cpp:333
>> +        simpleIterator.finalize(nullptr);
> 
> All the calls to finalize() are probably not correct, because we should only finalize when we are iterating to the very end of the TextRun.

Or are they paint advances? I don't think this API has that distinction at all.
Comment 37 Myles C. Maxfield 2017-11-01 00:08:37 PDT
Created attachment 325548 [details]
Patch
Comment 38 Myles C. Maxfield 2017-11-01 00:21:51 PDT
Created attachment 325550 [details]
Patch
Comment 39 Myles C. Maxfield 2017-11-01 00:34:57 PDT
Created attachment 325552 [details]
Patch
Comment 40 Build Bot 2017-11-01 02:14:33 PDT
Comment on attachment 325552 [details]
Patch

Attachment 325552 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/5060732

New failing tests:
imported/w3c/web-platform-tests/service-workers/cache-storage/serviceworker/cache-match.https.html
Comment 41 Build Bot 2017-11-01 02:14:35 PDT
Created attachment 325557 [details]
Archive of layout-test-results from ews105 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 42 Myles C. Maxfield 2017-11-07 12:08:39 PST
Test failure is unrelated.
Comment 43 mitz 2017-11-07 13:02:45 PST
Comment on attachment 325552 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325552&action=review

> Source/WebCore/platform/graphics/FontPlatformData.h:250
> +    AtomicString m_locale;

Is it OK that this member variable doesn’t participate in operator== and in hash()?
Comment 44 Myles C. Maxfield 2017-11-07 18:02:02 PST
Comment on attachment 325552 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=325552&action=review

>> Source/WebCore/platform/graphics/FontPlatformData.h:250
>> +    AtomicString m_locale;
> 
> Is it OK that this member variable doesn’t participate in operator== and in hash()?

Nope.
Comment 45 Myles C. Maxfield 2018-01-19 10:21:59 PST
Created attachment 331749 [details]
Patch
Comment 46 Myles C. Maxfield 2018-01-19 11:47:39 PST
Created attachment 331762 [details]
Patch
Comment 47 Myles C. Maxfield 2018-05-03 11:18:01 PDT
It's easy to see the problem with initial advances using the test fonts at https://bugs.webkit.org/show_bug.cgi?id=185062
Comment 48 Myles C. Maxfield 2018-10-20 21:43:30 PDT
Comment on attachment 331762 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=331762&action=review

> Source/WebCore/platform/graphics/mac/SimpleFontDataCoreText.cpp:48
> +    if (!platformData().locale().isEmpty())
> +        CFDictionarySetValue(attributesDictionary.get(), kCTLanguageAttributeName, platformData().locale().string().createCFString().get());

This now needs additional attributes: kCTParagraphStyleAttributeName -> { CTParagraphStyleCreate(...), CTParagraphStyleSetCompositionLanguage(..., kCTCompositionLanguageNone) }
Comment 49 Myles C. Maxfield 2019-12-18 19:06:40 PST
Remaining tasks:

1) Delete CharactersTreatedAsSpace and use string indices instead. This probably necessitates using CTFontShapeGlyphs().
2) Use the initial advance returned
3) Hook up the callback to allow glyphs to be inserted (the glyph backing store to be reallocated)
Comment 50 Myles C. Maxfield 2019-12-18 20:13:42 PST
Created attachment 386055 [details]
Patch
Comment 51 Myles C. Maxfield 2020-01-08 19:37:32 PST
Created attachment 387182 [details]
Patch
Comment 52 Myles C. Maxfield 2020-01-08 19:44:49 PST
One possible test: Rendering U+015E using Avenir-Roman should look different if the language is set to "ro"
Comment 53 Myles C. Maxfield 2020-01-08 19:51:24 PST
Another test: Rendering U+00ED using ComicSansMS should look different if the language is set to "lt"
Comment 54 Myles C. Maxfield 2020-01-08 19:55:58 PST
Created attachment 387183 [details]
Patch
Comment 55 Myles C. Maxfield 2020-01-10 22:07:57 PST
Created attachment 387424 [details]
Patch
Comment 56 Myles C. Maxfield 2020-01-11 22:59:59 PST
Created attachment 387456 [details]
Patch
Comment 57 Myles C. Maxfield 2020-01-11 23:10:17 PST
Created attachment 387457 [details]
Patch
Comment 58 Myles C. Maxfield 2020-01-11 23:38:06 PST
Created attachment 387458 [details]
Patch
Comment 59 Myles C. Maxfield 2020-01-11 23:55:36 PST
Created attachment 387459 [details]
Patch
Comment 60 Myles C. Maxfield 2020-01-12 00:08:39 PST
Created attachment 387460 [details]
Patch
Comment 61 Myles C. Maxfield 2020-01-12 09:57:02 PST
Created attachment 387471 [details]
Patch
Comment 62 Myles C. Maxfield 2020-01-12 10:30:34 PST
Created attachment 387473 [details]
Patch
Comment 63 Myles C. Maxfield 2020-01-12 12:36:11 PST
Created attachment 387487 [details]
Patch
Comment 64 Simon Fraser (smfr) 2020-01-13 13:55:16 PST
Comment on attachment 387487 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review

> Source/WebCore/platform/graphics/GlyphBuffer.h:172
> +        for (size_t i = 0; i < length; ++i)
> +            m_fonts[location + i] = font;

Maybe i = location; I < location + length, and you could precompute location + length as end.

Also originalSize - location is computed multiple times.

> Source/WebCore/platform/graphics/WidthIterator.cpp:126
> +    // This is totally, 100%, furiously, utterly, frustratingly bogus.
> +    // There is absolutely no guarantee that glyph indices before shaping have any relation at all with glyph indices after shaping.

Perhaps turn this anguished comment into an actionable bug and reference it here.

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:562
> +    auto handler = ^(CFRange range, CGGlyph** newGlyphsPointer, CGSize** newAdvancesPointer) {
> +        range.location = std::min(std::max(range.location, static_cast<CFIndex>(0)), static_cast<CFIndex>(glyphBuffer.size()));
> +        if (range.length < 0) {
> +            range.length = std::min(range.location, -range.length);
> +            range.location = range.location - range.length;
> +            glyphBuffer.remove(beginningIndex + range.location, range.length);
> +        } else
> +            glyphBuffer.makeHole(beginningIndex + range.location, range.length, this);
> +        *newGlyphsPointer = glyphBuffer.glyphs(beginningIndex);
> +        *newAdvancesPointer = glyphBuffer.advances(beginningIndex);
> +    };
> +    CTFontTransformGlyphsWithLanguage(m_platformData.ctFont(), glyphBuffer.glyphs(beginningIndex), reinterpret_cast<CGSize*>(glyphBuffer.advances(beginningIndex)), glyphBuffer.size() - beginningIndex, options, platformData().locale().string().createCFString().get(), handler);

It seems that this could fall into pathological O(n^2) cases where makeHole or glyphBuffer.remove are called over and over and each time move the rest of the buffers. Is there a more efficient way?
Comment 65 Myles C. Maxfield 2020-01-13 13:58:01 PST
I should see if there's a way to do this without affecting the font hash
Comment 66 Myles C. Maxfield 2020-01-13 16:17:41 PST
Comment on attachment 387487 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:562
>> +    CTFontTransformGlyphsWithLanguage(m_platformData.ctFont(), glyphBuffer.glyphs(beginningIndex), reinterpret_cast<CGSize*>(glyphBuffer.advances(beginningIndex)), glyphBuffer.size() - beginningIndex, options, platformData().locale().string().createCFString().get(), handler);
> 
> It seems that this could fall into pathological O(n^2) cases where makeHole or glyphBuffer.remove are called over and over and each time move the rest of the buffers. Is there a more efficient way?

Yes! I asked the same question of the CoreText team. They replied that any solution to that would involve some more complicated data structure to hold the glyph sequence (like a linked list or a rope or something) which would involve redesigning significant parts of the internals of CoreText. They assume that the glyph sequence is tightly packed in a single array throughout their framework. This "pathological" case happens rarely enough that it isn't worth redesigning huge parts of the CoreText framework.
Comment 67 Myles C. Maxfield 2020-01-13 17:40:11 PST
Created attachment 387600 [details]
Doesn't affect font hash
Comment 68 Myles C. Maxfield 2020-01-13 18:38:08 PST
Created attachment 387606 [details]
Doesn't affect font hash
Comment 69 Myles C. Maxfield 2020-01-13 18:39:56 PST
Created attachment 387607 [details]
Doesn't affect font hash
Comment 70 Simon Fraser (smfr) 2020-01-13 18:45:16 PST
Comment on attachment 387606 [details]
Doesn't affect font hash

View in context: https://bugs.webkit.org/attachment.cgi?id=387606&action=review

> Source/WebCore/platform/graphics/Font.h:286
> +    struct CFStringAttributesKey {

This struct name makes it sound like attributes of a CFString.

> Source/WebCore/platform/graphics/Font.h:322
> +        bool enableKerning;
> +        FontOrientation orientation;
> +        AtomString locale;

It bugs me seeing small members before big members (packing).

> Source/WebCore/platform/graphics/GlyphBuffer.h:167
> +        auto originalSize = size();

Maybe assert that location < size().

> Source/WebCore/platform/graphics/GlyphBuffer.h:170
> +        m_fonts.grow(length);

This is wrong. grow() takes the new size, so you want size() + length.
Comment 71 Myles C. Maxfield 2020-01-13 19:08:50 PST
Created attachment 387608 [details]
Doesn't affect font hash
Comment 72 Myles C. Maxfield 2020-01-13 22:05:26 PST
Created attachment 387616 [details]
Doesn't affect font hash
Comment 73 Myles C. Maxfield 2020-01-14 12:35:43 PST
Created attachment 387690 [details]
Doesn't affect font hash
Comment 74 WebKit Commit Bot 2020-01-14 14:13:59 PST
Comment on attachment 387690 [details]
Doesn't affect font hash

Clearing flags on attachment: 387690

Committed r254534: <https://trac.webkit.org/changeset/254534>
Comment 75 Antti Koivisto 2020-01-29 23:06:18 PST
Comment on attachment 387487 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review

> Source/WebCore/platform/graphics/FontCascade.cpp:421
> +    GlyphBuffer glyphBuffer;
>      Vector<GlyphBufferGlyph, 16> glyphs;
>      Vector<GlyphBufferAdvance, 16> advances;

GlyphBuffer is a large type (somewhere in ~50KB range). While it is stack-allocated here maybe we should still be worried about memory locality in this super hot function?

Both of the vectors are now unused.
Comment 76 Antti Koivisto 2020-01-30 00:42:39 PST
Comment on attachment 387487 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review

>>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:562
>>> +    CTFontTransformGlyphsWithLanguage(m_platformData.ctFont(), glyphBuffer.glyphs(beginningIndex), reinterpret_cast<CGSize*>(glyphBuffer.advances(beginningIndex)), glyphBuffer.size() - beginningIndex, options, platformData().locale().string().createCFString().get(), handler);
>> 
>> It seems that this could fall into pathological O(n^2) cases where makeHole or glyphBuffer.remove are called over and over and each time move the rest of the buffers. Is there a more efficient way?
> 
> Yes! I asked the same question of the CoreText team. They replied that any solution to that would involve some more complicated data structure to hold the glyph sequence (like a linked list or a rope or something) which would involve redesigning significant parts of the internals of CoreText. They assume that the glyph sequence is tightly packed in a single array throughout their framework. This "pathological" case happens rarely enough that it isn't worth redesigning huge parts of the CoreText framework.

We only need this stuff for a limited set of known languages right? Maybe we just use the old code path in common cases if this turns out to be slower?
Comment 77 Myles C. Maxfield 2022-01-21 11:13:37 PST
Comment on attachment 387487 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review

>>>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:562
>>>> +    CTFontTransformGlyphsWithLanguage(m_platformData.ctFont(), glyphBuffer.glyphs(beginningIndex), reinterpret_cast<CGSize*>(glyphBuffer.advances(beginningIndex)), glyphBuffer.size() - beginningIndex, options, platformData().locale().string().createCFString().get(), handler);
>>> 
>>> It seems that this could fall into pathological O(n^2) cases where makeHole or glyphBuffer.remove are called over and over and each time move the rest of the buffers. Is there a more efficient way?
>> 
>> Yes! I asked the same question of the CoreText team. They replied that any solution to that would involve some more complicated data structure to hold the glyph sequence (like a linked list or a rope or something) which would involve redesigning significant parts of the internals of CoreText. They assume that the glyph sequence is tightly packed in a single array throughout their framework. This "pathological" case happens rarely enough that it isn't worth redesigning huge parts of the CoreText framework.
> 
> We only need this stuff for a limited set of known languages right? Maybe we just use the old code path in common cases if this turns out to be slower?

Yes, we could.

Ever since https://trac.webkit.org/changeset/255988/webkit the locale-specific shaping hasn’t been a regression, so at least for now it doesn't seem like it's necessary to redesign this code. If we ever do encounter some content which triggers this pathological behavior, we can redesign this stuff at that point.