Bug 235053 - REGRESSION(r281389): using font-variant-ligatures causes Unicode bidi isolation control characters to render
Summary: REGRESSION(r281389): using font-variant-ligatures causes Unicode bidi isolati...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Myles C. Maxfield
URL:
Keywords: InRadar
Depends on: 215643
Blocks:
  Show dependency treegraph
 
Reported: 2022-01-10 16:42 PST by Cameron McCormack (:heycam)
Modified: 2022-01-25 17:36 PST (History)
6 users (show)

See Also:


Attachments
test (103 bytes, text/html)
2022-01-10 16:42 PST, Cameron McCormack (:heycam)
no flags Details
Patch (71.13 KB, patch)
2022-01-16 18:36 PST, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
Patch (72.14 KB, patch)
2022-01-16 20:32 PST, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
WIP (987.79 KB, patch)
2022-01-16 21:29 PST, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
Patch (72.15 KB, patch)
2022-01-16 23:45 PST, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
Patch (72.10 KB, patch)
2022-01-17 01:56 PST, Myles C. Maxfield
darin: review+
Details | Formatted Diff | Diff
Patch for committing (73.95 KB, patch)
2022-01-17 16:26 PST, Myles C. Maxfield
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Cameron McCormack (:heycam) 2022-01-10 16:42:48 PST
Created attachment 448813 [details]
test

See the attached test.
Comment 1 Myles C. Maxfield 2022-01-10 16:50:48 PST
Reproduces on Big Sur. Doesn't reproduce on Monterey.
Comment 2 Myles C. Maxfield 2022-01-10 16:51:28 PST
Doesn't reproduce on 20G405
Comment 3 Myles C. Maxfield 2022-01-10 16:53:41 PST
Doesn't reproduce on 21A559
Comment 4 Radar WebKit Bug Importer 2022-01-11 16:15:53 PST
<rdar://problem/87425066>
Comment 5 Myles C. Maxfield 2022-01-16 17:18:52 PST
I see what's happening. The call to m_font.fontForCombiningCharacterSequence() in the complex text codepath is now returning nullptr when it used to be returning some font.
Comment 6 Myles C. Maxfield 2022-01-16 17:20:48 PST
We're hitting this block in ComplexTextController::collectComplexTextRunsForCharacters():

    if (!font) {
        // Create a run of missing glyphs from the primary font.
        m_complexTextRuns.append(ComplexTextRun::create(m_font.primaryFont(), cp, stringLocation, length, 0, length, m_run.ltr()));
        return;
    }
Comment 7 Myles C. Maxfield 2022-01-16 18:36:43 PST
Created attachment 449299 [details]
Patch
Comment 8 EWS Watchlist 2022-01-16 18:38:10 PST
This patch modifies the imported WPT tests. Please ensure that any changes on the tests (not coming from a WPT import) are exported to WPT. Please see https://trac.webkit.org/wiki/WPTExportProcess
Comment 9 Myles C. Maxfield 2022-01-16 20:32:25 PST
Created attachment 449302 [details]
Patch
Comment 10 Myles C. Maxfield 2022-01-16 21:29:44 PST
Created attachment 449303 [details]
WIP
Comment 11 Myles C. Maxfield 2022-01-16 23:39:50 PST
Looks like soft hyphen (U+00AD) is rendering
Comment 12 Myles C. Maxfield 2022-01-16 23:45:31 PST
Created attachment 449312 [details]
Patch
Comment 13 Myles C. Maxfield 2022-01-17 01:56:14 PST
Created attachment 449318 [details]
Patch
Comment 14 Darin Adler 2022-01-17 10:05:12 PST
Comment on attachment 449318 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=449318&action=review

> Source/WebCore/platform/graphics/ComplexTextController.cpp:834
> +        if (!u_hasBinaryProperty(character, UCHAR_DEFAULT_IGNORABLE_CODE_POINT) && character != objectReplacementCharacter)

Is this efficient enough as-is, or do we need a fast path that skips the call to u_hasBinaryProperty for the most common characters? I suppose we did a memory allocation here so it’s not super-performance-sensitive.

> Source/WebCore/platform/graphics/WidthIterator.cpp:622
> +            || (characterResponsibleForThisGlyph >= deleteCharacter && characterResponsibleForThisGlyph < noBreakSpace)
> +            || characterResponsibleForThisGlyph == objectReplacementCharacter
> +            || u_hasBinaryProperty(characterResponsibleForThisGlyph, UCHAR_DEFAULT_IGNORABLE_CODE_POINT)) {

I don’t understand the [nullCharacter,space) and [delete,noBreakSpace) parts of this expression; is that an optimization or are those not ignorable? Can we make this clearer with a helper function? If it’s just an optimization that could go into the helper function. Since this repeats the UCHAR_DEFAULT_IGNORABLE_CODE_POINT + objectReplacementCharacter logic in ComplexTextController.cpp, a helper would make both call sites less wordy and give us a place to write a comment or use a name that makes it clear why we need that exception. Even if those other checks are not just an optimization.
Comment 15 Myles C. Maxfield 2022-01-17 16:15:11 PST
Comment on attachment 449318 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=449318&action=review

>> Source/WebCore/platform/graphics/ComplexTextController.cpp:834
>> +        if (!u_hasBinaryProperty(character, UCHAR_DEFAULT_IGNORABLE_CODE_POINT) && character != objectReplacementCharacter)
> 
> Is this efficient enough as-is, or do we need a fast path that skips the call to u_hasBinaryProperty for the most common characters? I suppose we did a memory allocation here so it’s not super-performance-sensitive.

The implementation of u_hasBinaryProperty() just does some bit math, and 2 reads from data tables in https://github.com/unicode-org/icu/blob/main/icu4c/source/common/uchar_props_data.h. From my testing, each call takes 4.65 nanoseconds on my development machine. The machine is a 2.3GHz machine, so if I did the math correctly, that means each call took between 10 and 11 cycles. (Of course, my test program is just a loop that calls the function, so we can assume in this test everything is in cache lines, etc.) But even with that disclaimer, I think these results indicate that it's probably fast enough.

>> Source/WebCore/platform/graphics/WidthIterator.cpp:622
>> +            || u_hasBinaryProperty(characterResponsibleForThisGlyph, UCHAR_DEFAULT_IGNORABLE_CODE_POINT)) {
> 
> I don’t understand the [nullCharacter,space) and [delete,noBreakSpace) parts of this expression; is that an optimization or are those not ignorable? Can we make this clearer with a helper function? If it’s just an optimization that could go into the helper function. Since this repeats the UCHAR_DEFAULT_IGNORABLE_CODE_POINT + objectReplacementCharacter logic in ComplexTextController.cpp, a helper would make both call sites less wordy and give us a place to write a comment or use a name that makes it clear why we need that exception. Even if those other checks are not just an optimization.

Yep - [delete,noBreakSpace) are not ignorable. The first ignorable code point is U+00AD SOFT HYPHEN.

The helper function is a good idea. In fact, we already have some similar helper functions: FontCascade::treatAsZeroWidthSpace() and FontCascade::treatAsZeroWidthSpaceInComplexScript(). I'll make a 3rd one for now, and then in a follow-up patch I'll see if I can unify them.
Comment 16 Myles C. Maxfield 2022-01-17 16:26:06 PST
Created attachment 449353 [details]
Patch for committing
Comment 17 Myles C. Maxfield 2022-01-17 19:02:30 PST
Committed r288107 (246121@trunk): <https://commits.webkit.org/246121@trunk>
Comment 18 Myles C. Maxfield 2022-01-17 20:07:55 PST
https://github.com/web-platform-tests/wpt/pull/32421
Comment 19 Darin Adler 2022-01-18 13:23:14 PST
Comment on attachment 449353 [details]
Patch for committing

View in context: https://bugs.webkit.org/attachment.cgi?id=449353&action=review

> Source/WebCore/platform/graphics/FontCascade.h:269
> +        return (character >= nullCharacter && character < space)
> +            || (character >= deleteCharacter && character < noBreakSpace)
> +            || character == objectReplacementCharacter

Still unclear if these special cases are needed for correctness or for performance.

If the checks are needed for correctness, a comment might make it clearer why we are ignoring characters that are not "default ignorable".

If the checks are there as a performance optimization, then either we can delete the checks because the performance of u_hasBinaryProperty is quite fast, or if we do think we need an optimization we didn’t do it well, since characters like space and the letter "a" take the slowest path.
Comment 20 Myles C. Maxfield 2022-01-21 14:19:07 PST
Comment on attachment 449353 [details]
Patch for committing

View in context: https://bugs.webkit.org/attachment.cgi?id=449353&action=review

>> Source/WebCore/platform/graphics/FontCascade.h:269
>> +            || character == objectReplacementCharacter
> 
> Still unclear if these special cases are needed for correctness or for performance.
> 
> If the checks are needed for correctness, a comment might make it clearer why we are ignoring characters that are not "default ignorable".
> 
> If the checks are there as a performance optimization, then either we can delete the checks because the performance of u_hasBinaryProperty is quite fast, or if we do think we need an optimization we didn’t do it well, since characters like space and the letter "a" take the slowest path.

Right, yes - with clear eyes, I looked this today and realized the comment says "characters in set X should have special processing" and then the implementation of the function returns set Y!!

Let me see if I can do some additional research today and clear this up.
Comment 21 Myles C. Maxfield 2022-01-25 17:36:23 PST
Comment on attachment 449353 [details]
Patch for committing

View in context: https://bugs.webkit.org/attachment.cgi?id=449353&action=review

>>> Source/WebCore/platform/graphics/FontCascade.h:269
>>> +            || character == objectReplacementCharacter
>> 
>> Still unclear if these special cases are needed for correctness or for performance.
>> 
>> If the checks are needed for correctness, a comment might make it clearer why we are ignoring characters that are not "default ignorable".
>> 
>> If the checks are there as a performance optimization, then either we can delete the checks because the performance of u_hasBinaryProperty is quite fast, or if we do think we need an optimization we didn’t do it well, since characters like space and the letter "a" take the slowest path.
> 
> Right, yes - with clear eyes, I looked this today and realized the comment says "characters in set X should have special processing" and then the implementation of the function returns set Y!!
> 
> Let me see if I can do some additional research today and clear this up.

I'm working on this at https://bugs.webkit.org/show_bug.cgi?id=235622. It's still far from being ready, though.