Bug 187209 - [Cocoa] LastResort in the font family list causes emoji with joiners to be rendered as multiple .notdef characters
Summary: [Cocoa] LastResort in the font family list causes emoji with joiners to be re...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Myles C. Maxfield
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-06-29 17:20 PDT by Myles C. Maxfield
Modified: 2018-07-01 18:36 PDT (History)
8 users (show)

See Also:


Attachments
Patch (12.82 KB, patch)
2018-06-29 17:30 PDT, Myles C. Maxfield
darin: review+
Details | Formatted Diff | Diff
Patch for committing (14.81 KB, patch)
2018-07-01 04:47 PDT, Myles C. Maxfield
commit-queue: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Myles C. Maxfield 2018-06-29 17:20:33 PDT
[Cocoa] LastResort in the font family list causes emoji with joiners to be rendered as multiple .notdef characters
Comment 1 Myles C. Maxfield 2018-06-29 17:30:32 PDT
Created attachment 343984 [details]
Patch
Comment 2 Myles C. Maxfield 2018-06-29 17:31:06 PDT
<rdar://problem/40920785>
Comment 3 Darin Adler 2018-06-29 17:49:54 PDT
Comment on attachment 343984 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=343984&action=review

> Source/WebCore/platform/graphics/Font.cpp:158
> +    // FIXME: Consider reordering these so the most common ones are at the front.
> +    // Doing this could cause the BitVector to fit inside inline storage and therefore
> +    // be both a performance and a memory progression.

What if there’s a typo in the numbers or a logic mistake in this function where we accidentally used a number twice? How would we notice? What test would detect the mistake?

> Source/WebCore/platform/graphics/Font.cpp:585
> +    // This is very similar to static_cast<bool>(glyphForCharacter(character))
> +    // except that glyphForCharacter() maps certain code points to ZWS (because they
> +    // shouldn't show up in the fast text codepath). This function doesn't do that
> +    // mapping, and instead is as honest as possible about what code points the font
> +    // supports.

The glyphForCharacter function maps certain code points to zero-width spaces because we never want those code points to render visibly. We want that behavior for both the fast and slow text code paths. I wish the comment was clearer on this point. And I wish I understood how we will get that behavior.

The comment does not make clear why "as honest as possible" is needed here. I think the idea is that "even if we don’t want the code points to render visibly we want to correctly understood the code points are 'supported' because we makes some other kind of decision based on that". But I don’t fully understand.

The comment also doesn’t make it clear why we decided that this can be the slower function and the other one must be the faster function, rather than the other way around. Seems natural to make the "honest" function the fast one, but I’m sure we have a good reason not to do that.

> Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:612
> +    CFIndex count;
> +    if (U_IS_BMP(character)) {
> +        codeUnits[0] = character;
> +        count = 1;
> +    } else {
> +        codeUnits[0] = U16_LEAD(character);
> +        codeUnits[1] = U16_TRAIL(character);
> +        count = 2;
> +    }

This code can be written like this:

    CFIndex count = 0;
    U16_APPEND_UNSAFE(codeUnits, count, character);

I think it’s better that way.
Comment 4 Myles C. Maxfield 2018-07-01 04:44:36 PDT
Comment on attachment 343984 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=343984&action=review

>> Source/WebCore/platform/graphics/Font.cpp:158
>> +    // be both a performance and a memory progression.
> 
> What if there’s a typo in the numbers or a logic mistake in this function where we accidentally used a number twice? How would we notice? What test would detect the mistake?

There might be a way to do it at compile time with tuples and template parameter packs, but I can't figure out how. I'll just add some ASSERT()s. 😕

>> Source/WebCore/platform/graphics/Font.cpp:585
>> +    // supports.
> 
> The glyphForCharacter function maps certain code points to zero-width spaces because we never want those code points to render visibly. We want that behavior for both the fast and slow text code paths. I wish the comment was clearer on this point. And I wish I understood how we will get that behavior.
> 
> The comment does not make clear why "as honest as possible" is needed here. I think the idea is that "even if we don’t want the code points to render visibly we want to correctly understood the code points are 'supported' because we makes some other kind of decision based on that". But I don’t fully understand.
> 
> The comment also doesn’t make it clear why we decided that this can be the slower function and the other one must be the faster function, rather than the other way around. Seems natural to make the "honest" function the fast one, but I’m sure we have a good reason not to do that.

Yeah, when I first discovered this, I was surprised like you were to see that we didn't have any custom code for the complex path (and instead the custom code only handles the simple path). Upon further investigation, I found that Core Text handles this for us.

Mapping characters to ZWS when we want them to be invisible has two problems:
1) Fonts can totally render visible glyphs for ZWS
2) Performing the substation at the code point level means that the results of mapping the code points to glyphs will be affected, and the shaping rules inside fonts won't expect these new glyphIDs to be used. Shaping won't work correctly.

Instead, the better solution to this is to not do any code-point-level substitution, but instead run shaping just like normal, and then after shaping, look through the transformed glyphs to see if any of them came from these special characters. If so, we can literally remove the glyph from the stream, thereby forcing it not to render.

The better solution is too risky to do now, but https://bugs.webkit.org/show_bug.cgi?id=187166 is for it.

I'm not sure what you mean about fast vs slow functions; both the honest and dishonest ones should be fast. The dishonest one is cached using GlyphPages, and the honest one is cached using a bit vector.
Comment 5 Myles C. Maxfield 2018-07-01 04:47:44 PDT
Created attachment 344048 [details]
Patch for committing
Comment 6 WebKit Commit Bot 2018-07-01 05:30:00 PDT
Comment on attachment 344048 [details]
Patch for committing

Rejecting attachment 344048 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-02', 'validate-changelog', '--check-oops', '--non-interactive', 344048, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

ChangeLog entry in LayoutTests/ChangeLog contains OOPS!.

Full output: https://webkit-queues.webkit.org/results/8403296
Comment 7 Darin Adler 2018-07-01 13:25:19 PDT
Comment on attachment 343984 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=343984&action=review

>>> Source/WebCore/platform/graphics/Font.cpp:585
>>> +    // supports.
>> 
>> The glyphForCharacter function maps certain code points to zero-width spaces because we never want those code points to render visibly. We want that behavior for both the fast and slow text code paths. I wish the comment was clearer on this point. And I wish I understood how we will get that behavior.
>> 
>> The comment does not make clear why "as honest as possible" is needed here. I think the idea is that "even if we don’t want the code points to render visibly we want to correctly understood the code points are 'supported' because we makes some other kind of decision based on that". But I don’t fully understand.
>> 
>> The comment also doesn’t make it clear why we decided that this can be the slower function and the other one must be the faster function, rather than the other way around. Seems natural to make the "honest" function the fast one, but I’m sure we have a good reason not to do that.
> 
> Yeah, when I first discovered this, I was surprised like you were to see that we didn't have any custom code for the complex path (and instead the custom code only handles the simple path). Upon further investigation, I found that Core Text handles this for us.
> 
> Mapping characters to ZWS when we want them to be invisible has two problems:
> 1) Fonts can totally render visible glyphs for ZWS
> 2) Performing the substation at the code point level means that the results of mapping the code points to glyphs will be affected, and the shaping rules inside fonts won't expect these new glyphIDs to be used. Shaping won't work correctly.
> 
> Instead, the better solution to this is to not do any code-point-level substitution, but instead run shaping just like normal, and then after shaping, look through the transformed glyphs to see if any of them came from these special characters. If so, we can literally remove the glyph from the stream, thereby forcing it not to render.
> 
> The better solution is too risky to do now, but https://bugs.webkit.org/show_bug.cgi?id=187166 is for it.
> 
> I'm not sure what you mean about fast vs slow functions; both the honest and dishonest ones should be fast. The dishonest one is cached using GlyphPages, and the honest one is cached using a bit vector.

We could have decided to cache the honest one using glyph pages and the dishonest one with a bit vector. The one with the bit vector will be slower than the one that uses the glyph pages alone.
Comment 8 Myles C. Maxfield 2018-07-01 18:30:35 PDT
(In reply to Darin Adler from comment #7)
> Comment on attachment 343984 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=343984&action=review
> 
> >>> Source/WebCore/platform/graphics/Font.cpp:585
> >>> +    // supports.
> >> 
> >> The glyphForCharacter function maps certain code points to zero-width spaces because we never want those code points to render visibly. We want that behavior for both the fast and slow text code paths. I wish the comment was clearer on this point. And I wish I understood how we will get that behavior.
> >> 
> >> The comment does not make clear why "as honest as possible" is needed here. I think the idea is that "even if we don’t want the code points to render visibly we want to correctly understood the code points are 'supported' because we makes some other kind of decision based on that". But I don’t fully understand.
> >> 
> >> The comment also doesn’t make it clear why we decided that this can be the slower function and the other one must be the faster function, rather than the other way around. Seems natural to make the "honest" function the fast one, but I’m sure we have a good reason not to do that.
> > 
> > Yeah, when I first discovered this, I was surprised like you were to see that we didn't have any custom code for the complex path (and instead the custom code only handles the simple path). Upon further investigation, I found that Core Text handles this for us.
> > 
> > Mapping characters to ZWS when we want them to be invisible has two problems:
> > 1) Fonts can totally render visible glyphs for ZWS
> > 2) Performing the substation at the code point level means that the results of mapping the code points to glyphs will be affected, and the shaping rules inside fonts won't expect these new glyphIDs to be used. Shaping won't work correctly.
> > 
> > Instead, the better solution to this is to not do any code-point-level substitution, but instead run shaping just like normal, and then after shaping, look through the transformed glyphs to see if any of them came from these special characters. If so, we can literally remove the glyph from the stream, thereby forcing it not to render.
> > 
> > The better solution is too risky to do now, but https://bugs.webkit.org/show_bug.cgi?id=187166 is for it.
> > 
> > I'm not sure what you mean about fast vs slow functions; both the honest and dishonest ones should be fast. The dishonest one is cached using GlyphPages, and the honest one is cached using a bit vector.
> 
> We could have decided to cache the honest one using glyph pages and the
> dishonest one with a bit vector. The one with the bit vector will be slower
> than the one that uses the glyph pages alone.

The dishonest one is for the fast text codepath, which is the most common code path used on the Web today. That is the one that uses glyph pages.
Comment 9 Myles C. Maxfield 2018-07-01 18:36:53 PDT
Committed r233413: <https://trac.webkit.org/changeset/233413>