Bug 213446 - Text manipulation should exclude text rendered using icon-only fonts
Summary: Text manipulation should exclude text rendered using icon-only fonts
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Wenson Hsieh
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-06-21 10:31 PDT by Wenson Hsieh
Modified: 2020-06-25 13:37 PDT (History)
8 users (show)

See Also:


Attachments
For EWS (24.83 KB, patch)
2020-06-21 11:08 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Patch (24.83 KB, patch)
2020-06-21 11:40 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Patch (27.29 KB, patch)
2020-06-22 11:04 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Adjusted for feedback (25.33 KB, patch)
2020-06-25 11:59 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Adjusted for feedback (25.20 KB, patch)
2020-06-25 12:00 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Augment ChangeLog (25.51 KB, patch)
2020-06-25 12:14 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Wenson Hsieh 2020-06-21 10:31:45 PDT
<rdar://problem/63734298>
Comment 1 Wenson Hsieh 2020-06-21 11:08:41 PDT Comment hidden (obsolete)
Comment 2 Wenson Hsieh 2020-06-21 11:40:55 PDT Comment hidden (obsolete)
Comment 3 Wenson Hsieh 2020-06-22 11:04:27 PDT
Created attachment 402486 [details]
Patch
Comment 4 Myles C. Maxfield 2020-06-24 21:43:25 PDT
Comment on attachment 402486 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402486&action=review

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:662
> +bool Font::shouldExcludeFontFromTextManipulation() const

I'm not sure this actually belongs in Font. Font is a platform/ class, which indicates (at least to me) that all its functionality should be generally applicable to the various clients in WebCore/WebKit who want to use it.

Historically, I'm really bad at answering the question "should this function belong in one place or another" so if you feel that it really belongs here, I'm probably wrong.

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:664
> +    if (origin() != Origin::Remote)

Does translation work in WKWebView? (WKWebView doesn't disallow user-installed fonts by default.) We may want this logic to be insensitive to web fonts vs installed fonts. Or maybe not.

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:667
> +    auto platformFont = platformData().ctFont();

I think you want .font() instead here

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:671
> +    auto supportedCharacters = adoptCF(CTFontCopyCharacterSet(platformFont));

Is this call expensive? It might be cheaper to call CTFontGetGlyphsForCharacters("a") for the test on line 676.

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:672
> +    if (CFCharacterSetHasMemberInPlane(supportedCharacters.get(), 1) || CFCharacterSetHasMemberInPlane(supportedCharacters.get(), 2))

I assume there's no reason for this other than "it's a heuristic"?

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:678
> +        bool foundGlyph = CTFontGetGlyphsForCharacterRange(platformFont, &lowercaseAGlyph, CFRangeMake('a', 1));

Is it worth making a range instead of CTFontGetGlyphsForCharacters here?

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:683
> +    constexpr UniChar firstCharacterToTest = ' ';

Might want to add a comment saying "this includes the uppercase and lowercase alphabet" or whatever. I haven't memorized the ascii table.

Do we really need to test almost a hundred characters? Getting bounding boxes isn't free.

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:687
> +    CGGlyph glyphs[numberOfCharactersToTest];

Is this too much stack space? I don't know

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:690
> +    CGRect boundingRects[numberOfCharactersToTest];

Ditto

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:691
> +    CTFontGetBoundingRectsForGlyphs(platformFont, kCTFontOrientationDefault, glyphs, boundingRects, numberOfCharactersToTest);

Might want to consider de-duping duplicate glyphs (where multiple characters map to the same glyph). It might be worth it.

I'd expect such duplication would only be common when multiple characters map to glyphID 0.

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:694
> +        if (!CFCharacterSetIsCharacterMember(supportedCharacters.get(), character))

You can get this information faster if you just compare glyphs[character - firstCharacterToTest] to 0

> Source/WebCore/platform/graphics/cocoa/FontPlatformDataCocoa.mm:186
> +    if (auto platformFont = font())

Consider:

@font-face {
    font-family: FontA;
    src: url("FontB.ttf");
}
...
font-family: FontA, MyFallbackFont;

How confident are you that this comparison should be done on FontA?

Similarly, what if the user has content blockers, and FontA has not been downloaded yet? Or the site loads fonts themselves in JS and we trigger text manipulation before the JS is done? In this situation, primaryFont() will return MyFallbackFont instead of FontA. But the content still says WORLD or whatever.

I didn't see anything in the ChangeLog about this.
Comment 5 Myles C. Maxfield 2020-06-24 21:44:35 PDT
Comment on attachment 402486 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402486&action=review

>> Source/WebCore/platform/graphics/cocoa/FontPlatformDataCocoa.mm:186
>> +    if (auto platformFont = font())
> 
> Consider:
> 
> @font-face {
>     font-family: FontA;
>     src: url("FontB.ttf");
> }
> ...
> font-family: FontA, MyFallbackFont;
> 
> How confident are you that this comparison should be done on FontA?
> 
> Similarly, what if the user has content blockers, and FontA has not been downloaded yet? Or the site loads fonts themselves in JS and we trigger text manipulation before the JS is done? In this situation, primaryFont() will return MyFallbackFont instead of FontA. But the content still says WORLD or whatever.
> 
> I didn't see anything in the ChangeLog about this.

*****How confident are you that this comparison should be done on FontB?
Comment 6 Myles C. Maxfield 2020-06-24 21:50:11 PDT
Have you considered fonts which don't support any of the latin characters but also don't have any code points in non-BMP?
Comment 7 Wenson Hsieh 2020-06-24 22:30:27 PDT
Comment on attachment 402486 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402486&action=review

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:662
>> +bool Font::shouldExcludeFontFromTextManipulation() const
> 
> I'm not sure this actually belongs in Font. Font is a platform/ class, which indicates (at least to me) that all its functionality should be generally applicable to the various clients in WebCore/WebKit who want to use it.
> 
> Historically, I'm really bad at answering the question "should this function belong in one place or another" so if you feel that it really belongs here, I'm probably wrong.

I think your reasoning makes sense! The reason I put it here is because the implementation is highly platform-specific, and also uses lots of CoreText API/SPI which are already present in this file. How about renaming this to be something that isn't specific to "text manipulation" (i.e. translation), such as Font::containsNoNonEmptyBasicLatinGlyphs() or Font::mayOnlyRenderGlyphsAsIcons()?

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:664
>> +    if (origin() != Origin::Remote)
> 
> Does translation work in WKWebView? (WKWebView doesn't disallow user-installed fonts by default.) We may want this logic to be insensitive to web fonts vs installed fonts. Or maybe not.

It does not (Safari is the only client).

I added this so that installed fonts (which, as I understand, should never be "icon fonts" in Safari) would bail early. But since I have a mechanism for caching the results anyways, it's probably not worth having this extra check. I'll remove this.

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:667
>> +    auto platformFont = platformData().ctFont();
> 
> I think you want .font() instead here

👍🏻 Changed to font().

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:671
>> +    auto supportedCharacters = adoptCF(CTFontCopyCharacterSet(platformFont));
> 
> Is this call expensive? It might be cheaper to call CTFontGetGlyphsForCharacters("a") for the test on line 676.

Sounds good — I've changed this so that it only copies the character set after checking the fast path (i.e. checking the character 'a').

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:672
>> +    if (CFCharacterSetHasMemberInPlane(supportedCharacters.get(), 1) || CFCharacterSetHasMemberInPlane(supportedCharacters.get(), 2))
> 
> I assume there's no reason for this other than "it's a heuristic"?

Correct.

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:678
>> +        bool foundGlyph = CTFontGetGlyphsForCharacterRange(platformFont, &lowercaseAGlyph, CFRangeMake('a', 1));
> 
> Is it worth making a range instead of CTFontGetGlyphsForCharacters here?

That's a good point — it's probably more optimal to use `CTFontGetGlyphsForCharacters` here because it would allow us to avoid copying the character set for most fonts. I'll change this to use `CTFontGetGlyphsForCharacters`.

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:683
>> +    constexpr UniChar firstCharacterToTest = ' ';
> 
> Might want to add a comment saying "this includes the uppercase and lowercase alphabet" or whatever. I haven't memorized the ascii table.
> 
> Do we really need to test almost a hundred characters? Getting bounding boxes isn't free.

Sounds good — I'll add a comment.

In the email thread I CC'd you on, my initial heuristic tested only 52 characters (uppercase/lowercase Latin alphabet), but was advised to encompass all basic latin ASCII characters. Since this is already the "slow path", I don't think this extra work will do much harm (in my testing, I've only observed icon fonts reach this point; all other fonts I tested bail early when testing the single character 'a').

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:687
>> +    CGGlyph glyphs[numberOfCharactersToTest];
> 
> Is this too much stack space? I don't know

Hm...me neither :/ A 200-byte stack-allocated buffer doesn't seem excessive, but then again a quick grep through our codebase for "[128]" and "[256]" show mostly static data structures...

I'll try and move this to the heap instead.

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:690
>> +    CGRect boundingRects[numberOfCharactersToTest];
> 
> Ditto

I'll see if I can move this to the heap as well.

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:691
>> +    CTFontGetBoundingRectsForGlyphs(platformFont, kCTFontOrientationDefault, glyphs, boundingRects, numberOfCharactersToTest);
> 
> Might want to consider de-duping duplicate glyphs (where multiple characters map to the same glyph). It might be worth it.
> 
> I'd expect such duplication would only be common when multiple characters map to glyphID 0.

That's true — in my testing, for icon glyphs, the only duplicated glyphs I saw were `glyphID=0`s due to unsupported characters, though I saw this a lot! I'll change this to avoid grabbing bounding rects for unsupported characters.

>> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:694
>> +        if (!CFCharacterSetIsCharacterMember(supportedCharacters.get(), character))
> 
> You can get this information faster if you just compare glyphs[character - firstCharacterToTest] to 0

👍🏻

>>> Source/WebCore/platform/graphics/cocoa/FontPlatformDataCocoa.mm:186
>>> +    if (auto platformFont = font())
>> 
>> Consider:
>> 
>> @font-face {
>>     font-family: FontA;
>>     src: url("FontB.ttf");
>> }
>> ...
>> font-family: FontA, MyFallbackFont;
>> 
>> How confident are you that this comparison should be done on FontA?
>> 
>> Similarly, what if the user has content blockers, and FontA has not been downloaded yet? Or the site loads fonts themselves in JS and we trigger text manipulation before the JS is done? In this situation, primaryFont() will return MyFallbackFont instead of FontA. But the content still says WORLD or whatever.
>> 
>> I didn't see anything in the ChangeLog about this.
> 
> *****How confident are you that this comparison should be done on FontB?

Hm...so in my experience, text nodes that render using icon fonts don't have fallbacks to non-icon fonts, since the icon would show up as (undesired) text if it were to take the fallback.

I think we expect translation to only start after the page is mostly loaded (for instance, we don't flash the webpage translation button until after certain page loading milestones). It is true we could be in a situation where an icon font finishes loading later (after translation) and text ends up getting translated when it shouldn't. I don't really have a great answer for this at the moment, unfortunately — I'll leave a FIXME and describe this in the ChangeLog as well.
Comment 8 Wenson Hsieh 2020-06-24 22:36:18 PDT
(In reply to Myles C. Maxfield from comment #6)
> Have you considered fonts which don't support any of the latin characters
> but also don't have any code points in non-BMP?

That's interesting! Would you happen to know any examples off-hand?

Perhaps what I should do in this case is bail early and return false in the case where no Latin characters are supported (right now, we'll end up iterating over all Latin characters, hitting the continue; each time, and then finally `return true;`).

This heuristic is intended to detect fonts that use combinations of Latin characters to render icon-like glyphs, so I think we could disqualify any font that does not support any basic Latin characters.
Comment 9 Wenson Hsieh 2020-06-25 11:59:17 PDT Comment hidden (obsolete)
Comment 10 Wenson Hsieh 2020-06-25 12:00:10 PDT Comment hidden (obsolete)
Comment 11 Wenson Hsieh 2020-06-25 12:14:03 PDT
Created attachment 402784 [details]
Augment ChangeLog
Comment 12 EWS 2020-06-25 13:37:38 PDT
Committed r263527: <https://trac.webkit.org/changeset/263527>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 402784 [details].