Created attachment 124985 [details] Zip file containing the bug illustration files mentioned in the Description. Both the Hebrew and Yiddish language use the same Hebrew character set. However, in Yiddish, for the "pasekh tsvey yudn" and "khirik yud" character combinations, the convention is to "raise" the vowel. This is done for Yiddish only, not Hebrew. It is possible on an HTML page to differentiate Yiddish text from Hebrew text using the "lang" attribute and to display the Yiddish glyphs when 'lang="yi"' is specified and to display the Hebrew glyphs when 'lang="he"' is specified; however, WebKit/Safari does not do this. This is not a WebKit-only issue, most browsers have this same bug. Firefox only recently fixed the bug in Firefox 10.0beta (see this Firefox bug, in particular the last few comments in the comment thread: https://bugzilla.mozilla.org/show_bug.cgi?id=694205). In order to demonstrate the problem and to illustrate the correct behaviour, I have attached a zip file with the following: 1. yiddish-hebrew.html : A basic example to illustrate the problem. 2. Firefox.jpg : A screen shot of the correct Hebrew & Yiddish vowel positioning in Firefox 10.0beta. 3. Safari.jpg : A screen shot showing the correct Hebrew & INCORRECT Yiddish vowel positioning in Safari. 4. FreeSans ttf files (4) : The FreeSans font used in the yiddish-hebrew.html example. FreeSans is one of the fonts in GNU FreeFont (http://www.gnu.org/software/freefont/) and the ttf files were built from the current SVN source files. These files should be used to replicate the issue because not all fonts provide support for the different Hebrew/Yiddish glyphs.
Ze'ev was using a recent build of FreeSerif, from the sources available via SVN at https://savannah.gnu.org/svn/?group=freefont
Created attachment 125002 [details] Serbian vs. Eastern Cyrillic Uses 'lang' attributes to select font features specific to a language. This one uses any recent version of the DejaVu fonts.
Created attachment 125003 [details] Serbian file as displayed by Konqueror 4.7.4 Note that the Russian and Serbian lines are identical. That's not what is intended.
Created attachment 125005 [details] Serbian file as displayed by FireFox Nightly FireFox nightly 12.0a1 (2012-01-31) (This is a quite recent improvement in FireFox.) Note that the Cursive letters are all different, and the 'de' letter is different in the normal face. This is desirable.
So here is another example of the same problem as Ze'ev describes, except with a more accessible font, and this time Cyrillic. Again the HTML 'lang' specifies a language to pick out font features for the given script. Near as I can tell, all WebKit based browsers show this wrong behavior (although Pango and other font rendering libraries make the distinction quite well.) Cheers!
Does this mean it's not sufficient to just choose the correct font for the language, which the work on bug 10874 has basically provided, but we must also ensure the correct glyphs within the font are chosen for the language?
Hi Matt! Not sure just who you mean by "we" here. And I'm a little bewildered by the suggestion that this would *force* somebody to do something. This is an *enabling* technology. * If "we" means "we web page designers" We may be providing a web font with these special features. We do so by specifying the language with the 'lang' tag. But no, we don't *have to*. On the other hand, if we can rely upon browsers to do the right thing, this *allows* us to use the language-specific features of the font. * If "we" means "we WebKit developers" We simply pass the "lang" tag properly to the font rendering layer. It will do the rest. And our product will look good compared to the competition. But no, we don't have to excel, or even keep up with the competition. As to "correct font": yes, separate fonts can be used to display Hebrew and Yiddish, even though they differ only in the placement of a few letters. Likewise with various versions of Cyrillic, and even Latin. It's a waste, in several ways. This longstanding OpenType technology *enables* a simple way to make fine typographic distinctions between languages. If you're not a native reader of those languages, it seems a little delicate. But to native readers, it can make a big difference. Let me emphasize that this is easy. Just properly pass the 'lang' attribute down to the underlying font rendering libraries.
Hi Steve, (In reply to comment #7) > This longstanding OpenType technology *enables* a simple way to make fine typographic distinctions between languages. If you're not a native reader of those languages, it seems a little delicate. But to native readers, it can make a big difference. I agree that it could be a big difference. > Let me emphasize that this is easy. Just properly pass the 'lang' attribute down to the underlying font rendering libraries. What is the exact mean when you say "passing 'lang' attribute to the underling libraries"? WebKit uses different font rendering libraries for each port. For example, Mac Safari/Chrome use CoreText, Chrome Win uses Uniscribe, and Chrome Linux uses (old) HarfBuzz. Which APIs we can use to solve the problem? I'm happy to create patches if it's actually easy as you said. If you can tell us what OpenType feature should we use, it would be also helpful.
(In reply to comment #7) Steve, thanks for the explanation. In my comment, I had just meant: "to fix the bug, we [people who want to fix the bug] must do ___."
The feature is locl, see <http://en.wikipedia.org/wiki/OpenType_feature_tag_list#OpenType_Typographic_Features>. Implementing this is likely non-trivial because of glyph cache, which will need to be made aware of this. The Mozilla bug Ze'ev mentioned did not have code changes, but it has links to actual bugs with code changes. Also, I don't know whether CoreText even supports this feature.
Hi Kenichi, You asked What is the exact mean when you say "passing 'lang' attribute to the underling libraries Of course that depends on the underlying libraries. But the 'lang' attribute is just a string from a set of standard language codes. The only complication I can imagine here is that some APIs may be picky about the particular languge code standard they accept (but I don't know that.) The DOM-level *intent* is to mark the text content of the entity as being of a different language than that of enclosing entities. Just how that translates into calls to the underlying formatting software -- you would surely know better than I. In principle it should be easy though (barring the ususual premature optimizations etc). You also asked "what OpenType feature should we use" All OpenType features are ultimately activated by a list of script( lang1, lang2, ... ) Where one of the languages may be the 'default' language. The script is detected from the Unicode range of the input characters. The language is (in the present case) derived from 'lang' tags. The feature is activated if the Unicode of the input characters matches the script and the specified language matches one of the languages in the list. So the answer is "this applies to all OpenType feature lookups". In the Cyrillic example I provided, DejaVu Serif uses the 'locl' feature (this one feature is specifically intended to substitute individual glyphs based on language). In the Hebrew example Ze'ev gave, FreeSans uses 'mark' and 'mkmk' positioning that depend upon language. I'm currently working on another font in which various Indic lookups ('pres', 'blwf', etc) depend upon language. As to the specifics of API calls you need to make: you would know better than I. I can only tell you: they are certainly present on each platform. Let me know if I can provide further examples. I have, e.g. some examples of pango-view making different output based upon language specification. Cheers!
*** Bug 117109 has been marked as a duplicate of this bug. ***
<rdar://problem/14649193>
I tested WebKit-SVN-r197635 nightly build and have found it supports at least 'palt' feature.
With CoreText, this is probably done with kCTLanguageAttributeName instead of font features.
This is implemented in the text shaping step, so that is the part which should be updated to implement this fix. This needs to happen in both the simple text code path and the complex text code path.
Background info: there are some font features which are enabled by default. These features may (or may not!) consult with a language in order to perform their shaping.
Created attachment 325477 [details] Patch
Created attachment 325480 [details] Patch
Comment on attachment 325480 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325480&action=review > Source/WebCore/ChangeLog:21 > + (WebCore::FontCascade::characterRangeCodePath): Any language which has language-specific shaping (such as > + Hebrew) needs to use the complex path. Sounds like this patch is making a bad tradeoff: it is slowing down rendering of all Hebrew text just because of something that happens on fewer than one in ten thousand webpages. Is that so?
(In reply to mitz from comment #20) > Comment on attachment 325480 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=325480&action=review > > > Source/WebCore/ChangeLog:21 > > + (WebCore::FontCascade::characterRangeCodePath): Any language which has language-specific shaping (such as > > + Hebrew) needs to use the complex path. > > Sounds like this patch is making a bad tradeoff: it is slowing down > rendering of all Hebrew text just because of something that happens on fewer > than one in ten thousand webpages. Is that so? The character scan check must happen before we know which font will be used to render the character, and we need to know which font will be used to render the character before we know whether or not there could be language-specific alternates. It seems that the only way to solve this problem is to either 1) render Hebrew / Yiddish text incorrectly, or 2) completely eliminate one of our two text codepaths.
(In reply to Myles C. Maxfield from comment #21) > (In reply to mitz from comment #20) > > Comment on attachment 325480 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=325480&action=review > > > > > Source/WebCore/ChangeLog:21 > > > + (WebCore::FontCascade::characterRangeCodePath): Any language which has language-specific shaping (such as > > > + Hebrew) needs to use the complex path. > > > > Sounds like this patch is making a bad tradeoff: it is slowing down > > rendering of all Hebrew text just because of something that happens on fewer > > than one in ten thousand webpages. Is that so? > > The character scan check must happen before we know which font will be used > to render the character, and we need to know which font will be used to > render the character before we know whether or not there could be > language-specific alternates. It seems that the only way to solve this > problem is to either 1) render Hebrew / Yiddish text incorrectly, or 2) > completely eliminate one of our two text codepaths. I guess we could also fix this for Devanagari but not for Hebrew because Devanagari already uses the complex text codepath.
(In reply to Myles C. Maxfield from comment #21) > (In reply to mitz from comment #20) > > Comment on attachment 325480 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=325480&action=review > > > > > Source/WebCore/ChangeLog:21 > > > + (WebCore::FontCascade::characterRangeCodePath): Any language which has language-specific shaping (such as > > > + Hebrew) needs to use the complex path. > > > > Sounds like this patch is making a bad tradeoff: it is slowing down > > rendering of all Hebrew text just because of something that happens on fewer > > than one in ten thousand webpages. Is that so? > > The character scan check must happen before we know which font will be used > to render the character, and we need to know which font will be used to > render the character before we know whether or not there could be > language-specific alternates. It seems that the only way to solve this > problem is to either 1) render Hebrew / Yiddish text incorrectly, or 2) > completely eliminate one of our two text codepaths. I don’t know that there are indeed only two possible choices here, but my understanding is that we shouldn’t make code changes that slow things down.
Comment on attachment 325480 [details] Patch Attachment 325480 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/5054208 New failing tests: fast/text/international/pop-up-button-text-alignment-and-direction.html fast/text/international/bidi-LDB-2-formatting-characters.html svg/text/bidi-reorder-value-lists.svg svg/W3C-SVG-1.1-SE/text-intro-09-b.svg fast/ruby/ruby-expansion-cjk-3.html
Created attachment 325488 [details] Archive of layout-test-results from ews102 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews102 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Comment on attachment 325480 [details] Patch Attachment 325480 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/5054341 New failing tests: fast/text/international/pop-up-button-text-alignment-and-direction.html fast/ruby/ruby-expansion-cjk-3.html fast/text/international/bidi-LDB-2-formatting-characters.html svg/text/bidi-reorder-value-lists.svg svg/W3C-SVG-1.1-SE/text-intro-09-b.svg fast/ruby/ruby-expansion-cjk-2.html
Created attachment 325492 [details] Archive of layout-test-results from ews107 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews107 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Comment on attachment 325480 [details] Patch Attachment 325480 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/5054281 New failing tests: fast/text/international/pop-up-button-text-alignment-and-direction.html fast/text/international/bidi-LDB-2-formatting-characters.html svg/text/bidi-reorder-value-lists.svg svg/W3C-SVG-1.1-SE/text-intro-09-b.svg fast/ruby/ruby-expansion-cjk-3.html
Created attachment 325494 [details] Archive of layout-test-results from ews114 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews114 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Comment on attachment 325480 [details] Patch Attachment 325480 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/5054363 New failing tests: fast/text/international/pop-up-button-text-alignment-and-direction.html svg/W3C-SVG-1.1-SE/text-intro-09-b.svg
Created attachment 325497 [details] Archive of layout-test-results from ews124 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews124 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.6
Comment on attachment 325480 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325480&action=review >>>>> Source/WebCore/ChangeLog:21 >>>>> + Hebrew) needs to use the complex path. >>>> >>>> Sounds like this patch is making a bad tradeoff: it is slowing down rendering of all Hebrew text just because of something that happens on fewer than one in ten thousand webpages. Is that so? >>> >>> The character scan check must happen before we know which font will be used to render the character, and we need to know which font will be used to render the character before we know whether or not there could be language-specific alternates. It seems that the only way to solve this problem is to either 1) render Hebrew / Yiddish text incorrectly, or 2) completely eliminate one of our two text codepaths. >> >> I guess we could also fix this for Devanagari but not for Hebrew because Devanagari already uses the complex text codepath. > > I don’t know that there are indeed only two possible choices here, but my understanding is that we shouldn’t make code changes that slow things down. OpenType allows language dependent behavior for most scripts – see, for example, the Writing System Tags appendix for Latin/Greek/Cyrillic: https://www.microsoft.com/typography/OpenTypeDev/standard/intro.htm
Created attachment 325541 [details] Patch
Comment on attachment 325541 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325541&action=review > Source/WebCore/platform/graphics/FontCascade.cpp:333 > + simpleIterator.finalize(nullptr); All the calls to finalize() are probably not correct, because we should only finalize when we are iterating to the very end of the TextRun.
Comment on attachment 325541 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325541&action=review I need to split this patch up into a few different pieces: 1) Refactoring GlyphBuffer 2) Using CTFontTransformGlyphsWithLanguage(), but with no handler and no initial advance 3) Adding a handler to CTFontTransformGlyphsWithLanguage() 4) Adding initial advance support to WidthIterator > Source/WebCore/platform/graphics/WidthIterator.cpp:127 > + widthDifference += m_rtlInitialAdvance.width(); This is not correct. Initial advances are paint advances, not layout advances.
Comment on attachment 325541 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325541&action=review >> Source/WebCore/platform/graphics/FontCascade.cpp:333 >> + simpleIterator.finalize(nullptr); > > All the calls to finalize() are probably not correct, because we should only finalize when we are iterating to the very end of the TextRun. Or are they paint advances? I don't think this API has that distinction at all.
Created attachment 325548 [details] Patch
Created attachment 325550 [details] Patch
Created attachment 325552 [details] Patch
Comment on attachment 325552 [details] Patch Attachment 325552 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/5060732 New failing tests: imported/w3c/web-platform-tests/service-workers/cache-storage/serviceworker/cache-match.https.html
Created attachment 325557 [details] Archive of layout-test-results from ews105 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews105 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Test failure is unrelated.
Comment on attachment 325552 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325552&action=review > Source/WebCore/platform/graphics/FontPlatformData.h:250 > + AtomicString m_locale; Is it OK that this member variable doesn’t participate in operator== and in hash()?
Comment on attachment 325552 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=325552&action=review >> Source/WebCore/platform/graphics/FontPlatformData.h:250 >> + AtomicString m_locale; > > Is it OK that this member variable doesn’t participate in operator== and in hash()? Nope.
Created attachment 331749 [details] Patch
Created attachment 331762 [details] Patch
It's easy to see the problem with initial advances using the test fonts at https://bugs.webkit.org/show_bug.cgi?id=185062
Comment on attachment 331762 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331762&action=review > Source/WebCore/platform/graphics/mac/SimpleFontDataCoreText.cpp:48 > + if (!platformData().locale().isEmpty()) > + CFDictionarySetValue(attributesDictionary.get(), kCTLanguageAttributeName, platformData().locale().string().createCFString().get()); This now needs additional attributes: kCTParagraphStyleAttributeName -> { CTParagraphStyleCreate(...), CTParagraphStyleSetCompositionLanguage(..., kCTCompositionLanguageNone) }
Remaining tasks: 1) Delete CharactersTreatedAsSpace and use string indices instead. This probably necessitates using CTFontShapeGlyphs(). 2) Use the initial advance returned 3) Hook up the callback to allow glyphs to be inserted (the glyph backing store to be reallocated)
Created attachment 386055 [details] Patch
Created attachment 387182 [details] Patch
One possible test: Rendering U+015E using Avenir-Roman should look different if the language is set to "ro"
Another test: Rendering U+00ED using ComicSansMS should look different if the language is set to "lt"
Created attachment 387183 [details] Patch
Created attachment 387424 [details] Patch
Created attachment 387456 [details] Patch
Created attachment 387457 [details] Patch
Created attachment 387458 [details] Patch
Created attachment 387459 [details] Patch
Created attachment 387460 [details] Patch
Created attachment 387471 [details] Patch
Created attachment 387473 [details] Patch
Created attachment 387487 [details] Patch
Comment on attachment 387487 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review > Source/WebCore/platform/graphics/GlyphBuffer.h:172 > + for (size_t i = 0; i < length; ++i) > + m_fonts[location + i] = font; Maybe i = location; I < location + length, and you could precompute location + length as end. Also originalSize - location is computed multiple times. > Source/WebCore/platform/graphics/WidthIterator.cpp:126 > + // This is totally, 100%, furiously, utterly, frustratingly bogus. > + // There is absolutely no guarantee that glyph indices before shaping have any relation at all with glyph indices after shaping. Perhaps turn this anguished comment into an actionable bug and reference it here. > Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:562 > + auto handler = ^(CFRange range, CGGlyph** newGlyphsPointer, CGSize** newAdvancesPointer) { > + range.location = std::min(std::max(range.location, static_cast<CFIndex>(0)), static_cast<CFIndex>(glyphBuffer.size())); > + if (range.length < 0) { > + range.length = std::min(range.location, -range.length); > + range.location = range.location - range.length; > + glyphBuffer.remove(beginningIndex + range.location, range.length); > + } else > + glyphBuffer.makeHole(beginningIndex + range.location, range.length, this); > + *newGlyphsPointer = glyphBuffer.glyphs(beginningIndex); > + *newAdvancesPointer = glyphBuffer.advances(beginningIndex); > + }; > + CTFontTransformGlyphsWithLanguage(m_platformData.ctFont(), glyphBuffer.glyphs(beginningIndex), reinterpret_cast<CGSize*>(glyphBuffer.advances(beginningIndex)), glyphBuffer.size() - beginningIndex, options, platformData().locale().string().createCFString().get(), handler); It seems that this could fall into pathological O(n^2) cases where makeHole or glyphBuffer.remove are called over and over and each time move the rest of the buffers. Is there a more efficient way?
I should see if there's a way to do this without affecting the font hash
Comment on attachment 387487 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review >> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:562 >> + CTFontTransformGlyphsWithLanguage(m_platformData.ctFont(), glyphBuffer.glyphs(beginningIndex), reinterpret_cast<CGSize*>(glyphBuffer.advances(beginningIndex)), glyphBuffer.size() - beginningIndex, options, platformData().locale().string().createCFString().get(), handler); > > It seems that this could fall into pathological O(n^2) cases where makeHole or glyphBuffer.remove are called over and over and each time move the rest of the buffers. Is there a more efficient way? Yes! I asked the same question of the CoreText team. They replied that any solution to that would involve some more complicated data structure to hold the glyph sequence (like a linked list or a rope or something) which would involve redesigning significant parts of the internals of CoreText. They assume that the glyph sequence is tightly packed in a single array throughout their framework. This "pathological" case happens rarely enough that it isn't worth redesigning huge parts of the CoreText framework.
Created attachment 387600 [details] Doesn't affect font hash
Created attachment 387606 [details] Doesn't affect font hash
Created attachment 387607 [details] Doesn't affect font hash
Comment on attachment 387606 [details] Doesn't affect font hash View in context: https://bugs.webkit.org/attachment.cgi?id=387606&action=review > Source/WebCore/platform/graphics/Font.h:286 > + struct CFStringAttributesKey { This struct name makes it sound like attributes of a CFString. > Source/WebCore/platform/graphics/Font.h:322 > + bool enableKerning; > + FontOrientation orientation; > + AtomString locale; It bugs me seeing small members before big members (packing). > Source/WebCore/platform/graphics/GlyphBuffer.h:167 > + auto originalSize = size(); Maybe assert that location < size(). > Source/WebCore/platform/graphics/GlyphBuffer.h:170 > + m_fonts.grow(length); This is wrong. grow() takes the new size, so you want size() + length.
Created attachment 387608 [details] Doesn't affect font hash
Created attachment 387616 [details] Doesn't affect font hash
Created attachment 387690 [details] Doesn't affect font hash
Comment on attachment 387690 [details] Doesn't affect font hash Clearing flags on attachment: 387690 Committed r254534: <https://trac.webkit.org/changeset/254534>
Comment on attachment 387487 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review > Source/WebCore/platform/graphics/FontCascade.cpp:421 > + GlyphBuffer glyphBuffer; > Vector<GlyphBufferGlyph, 16> glyphs; > Vector<GlyphBufferAdvance, 16> advances; GlyphBuffer is a large type (somewhere in ~50KB range). While it is stack-allocated here maybe we should still be worried about memory locality in this super hot function? Both of the vectors are now unused.
Comment on attachment 387487 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review >>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:562 >>> + CTFontTransformGlyphsWithLanguage(m_platformData.ctFont(), glyphBuffer.glyphs(beginningIndex), reinterpret_cast<CGSize*>(glyphBuffer.advances(beginningIndex)), glyphBuffer.size() - beginningIndex, options, platformData().locale().string().createCFString().get(), handler); >> >> It seems that this could fall into pathological O(n^2) cases where makeHole or glyphBuffer.remove are called over and over and each time move the rest of the buffers. Is there a more efficient way? > > Yes! I asked the same question of the CoreText team. They replied that any solution to that would involve some more complicated data structure to hold the glyph sequence (like a linked list or a rope or something) which would involve redesigning significant parts of the internals of CoreText. They assume that the glyph sequence is tightly packed in a single array throughout their framework. This "pathological" case happens rarely enough that it isn't worth redesigning huge parts of the CoreText framework. We only need this stuff for a limited set of known languages right? Maybe we just use the old code path in common cases if this turns out to be slower?
Comment on attachment 387487 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=387487&action=review >>>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:562 >>>> + CTFontTransformGlyphsWithLanguage(m_platformData.ctFont(), glyphBuffer.glyphs(beginningIndex), reinterpret_cast<CGSize*>(glyphBuffer.advances(beginningIndex)), glyphBuffer.size() - beginningIndex, options, platformData().locale().string().createCFString().get(), handler); >>> >>> It seems that this could fall into pathological O(n^2) cases where makeHole or glyphBuffer.remove are called over and over and each time move the rest of the buffers. Is there a more efficient way? >> >> Yes! I asked the same question of the CoreText team. They replied that any solution to that would involve some more complicated data structure to hold the glyph sequence (like a linked list or a rope or something) which would involve redesigning significant parts of the internals of CoreText. They assume that the glyph sequence is tightly packed in a single array throughout their framework. This "pathological" case happens rarely enough that it isn't worth redesigning huge parts of the CoreText framework. > > We only need this stuff for a limited set of known languages right? Maybe we just use the old code path in common cases if this turns out to be slower? Yes, we could. Ever since https://trac.webkit.org/changeset/255988/webkit the locale-specific shaping hasn’t been a regression, so at least for now it doesn't seem like it's necessary to redesign this code. If we ever do encounter some content which triggers this pathological behavior, we can redesign this stuff at that point.