187209 – [Cocoa] LastResort in the font family list causes emoji with joiners to be rendered as multiple .notdef characters

RESOLVED FIXED 187209

[Cocoa] LastResort in the font family list causes emoji with joiners to be rendered as multiple .notdef characters

https://bugs.webkit.org/show_bug.cgi?id=187209

Summary [Cocoa] LastResort in the font family list causes emoji with joiners to be re...

Myles C. Maxfield

Reported 2018-06-29 17:20:33 PDT

[Cocoa] LastResort in the font family list causes emoji with joiners to be rendered as multiple .notdef characters

Attachments
Patch (12.82 KB, patch) 2018-06-29 17:30 PDT, Myles C. Maxfield	darin: review+	Details Formatted Diff Diff
Patch for committing (14.81 KB, patch) 2018-07-01 04:47 PDT, Myles C. Maxfield	commit-queue: commit-queue-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Myles C. Maxfield

Comment 1 2018-06-29 17:30:32 PDT

Created attachment 343984 [details] Patch

Myles C. Maxfield

Comment 2 2018-06-29 17:31:06 PDT

<rdar://problem/40920785>

Darin Adler

Comment 3 2018-06-29 17:49:54 PDT

Comment on attachment 343984 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=343984&action=review > Source/WebCore/platform/graphics/Font.cpp:158 > + // FIXME: Consider reordering these so the most common ones are at the front. > + // Doing this could cause the BitVector to fit inside inline storage and therefore > + // be both a performance and a memory progression. What if there’s a typo in the numbers or a logic mistake in this function where we accidentally used a number twice? How would we notice? What test would detect the mistake? > Source/WebCore/platform/graphics/Font.cpp:585 > + // This is very similar to static_cast<bool>(glyphForCharacter(character)) > + // except that glyphForCharacter() maps certain code points to ZWS (because they > + // shouldn't show up in the fast text codepath). This function doesn't do that > + // mapping, and instead is as honest as possible about what code points the font > + // supports. The glyphForCharacter function maps certain code points to zero-width spaces because we never want those code points to render visibly. We want that behavior for both the fast and slow text code paths. I wish the comment was clearer on this point. And I wish I understood how we will get that behavior. The comment does not make clear why "as honest as possible" is needed here. I think the idea is that "even if we don’t want the code points to render visibly we want to correctly understood the code points are 'supported' because we makes some other kind of decision based on that". But I don’t fully understand. The comment also doesn’t make it clear why we decided that this can be the slower function and the other one must be the faster function, rather than the other way around. Seems natural to make the "honest" function the fast one, but I’m sure we have a good reason not to do that. > Source/WebCore/platform/graphics/cocoa/FontCocoa.mm:612 > + CFIndex count; > + if (U_IS_BMP(character)) { > + codeUnits[0] = character; > + count = 1; > + } else { > + codeUnits[0] = U16_LEAD(character); > + codeUnits[1] = U16_TRAIL(character); > + count = 2; > + } This code can be written like this: CFIndex count = 0; U16_APPEND_UNSAFE(codeUnits, count, character); I think it’s better that way.

Myles C. Maxfield

Comment 4 2018-07-01 04:44:36 PDT

Comment on attachment 343984 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=343984&action=review >> Source/WebCore/platform/graphics/Font.cpp:158 >> + // be both a performance and a memory progression. > > What if there’s a typo in the numbers or a logic mistake in this function where we accidentally used a number twice? How would we notice? What test would detect the mistake? There might be a way to do it at compile time with tuples and template parameter packs, but I can't figure out how. I'll just add some ASSERT()s. 😕 >> Source/WebCore/platform/graphics/Font.cpp:585 >> + // supports. > > The glyphForCharacter function maps certain code points to zero-width spaces because we never want those code points to render visibly. We want that behavior for both the fast and slow text code paths. I wish the comment was clearer on this point. And I wish I understood how we will get that behavior. > > The comment does not make clear why "as honest as possible" is needed here. I think the idea is that "even if we don’t want the code points to render visibly we want to correctly understood the code points are 'supported' because we makes some other kind of decision based on that". But I don’t fully understand. > > The comment also doesn’t make it clear why we decided that this can be the slower function and the other one must be the faster function, rather than the other way around. Seems natural to make the "honest" function the fast one, but I’m sure we have a good reason not to do that. Yeah, when I first discovered this, I was surprised like you were to see that we didn't have any custom code for the complex path (and instead the custom code only handles the simple path). Upon further investigation, I found that Core Text handles this for us. Mapping characters to ZWS when we want them to be invisible has two problems: 1) Fonts can totally render visible glyphs for ZWS 2) Performing the substation at the code point level means that the results of mapping the code points to glyphs will be affected, and the shaping rules inside fonts won't expect these new glyphIDs to be used. Shaping won't work correctly. Instead, the better solution to this is to not do any code-point-level substitution, but instead run shaping just like normal, and then after shaping, look through the transformed glyphs to see if any of them came from these special characters. If so, we can literally remove the glyph from the stream, thereby forcing it not to render. The better solution is too risky to do now, but https://bugs.webkit.org/show_bug.cgi?id=187166 is for it. I'm not sure what you mean about fast vs slow functions; both the honest and dishonest ones should be fast. The dishonest one is cached using GlyphPages, and the honest one is cached using a bit vector.

Myles C. Maxfield

Comment 5 2018-07-01 04:47:44 PDT

Created attachment 344048 [details] Patch for committing

WebKit Commit Bot

Comment 6 2018-07-01 05:30:00 PDT

Comment on attachment 344048 [details] Patch for committing Rejecting attachment 344048 [details] from commit-queue. Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-02', 'validate-changelog', '--check-oops', '--non-interactive', 344048, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit ChangeLog entry in LayoutTests/ChangeLog contains OOPS!. Full output: https://webkit-queues.webkit.org/results/8403296

Darin Adler

Comment 7 2018-07-01 13:25:19 PDT

Comment on attachment 343984 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=343984&action=review >>> Source/WebCore/platform/graphics/Font.cpp:585 >>> + // supports. >> >> The glyphForCharacter function maps certain code points to zero-width spaces because we never want those code points to render visibly. We want that behavior for both the fast and slow text code paths. I wish the comment was clearer on this point. And I wish I understood how we will get that behavior. >> >> The comment does not make clear why "as honest as possible" is needed here. I think the idea is that "even if we don’t want the code points to render visibly we want to correctly understood the code points are 'supported' because we makes some other kind of decision based on that". But I don’t fully understand. >> >> The comment also doesn’t make it clear why we decided that this can be the slower function and the other one must be the faster function, rather than the other way around. Seems natural to make the "honest" function the fast one, but I’m sure we have a good reason not to do that. > > Yeah, when I first discovered this, I was surprised like you were to see that we didn't have any custom code for the complex path (and instead the custom code only handles the simple path). Upon further investigation, I found that Core Text handles this for us. > > Mapping characters to ZWS when we want them to be invisible has two problems: > 1) Fonts can totally render visible glyphs for ZWS > 2) Performing the substation at the code point level means that the results of mapping the code points to glyphs will be affected, and the shaping rules inside fonts won't expect these new glyphIDs to be used. Shaping won't work correctly. > > Instead, the better solution to this is to not do any code-point-level substitution, but instead run shaping just like normal, and then after shaping, look through the transformed glyphs to see if any of them came from these special characters. If so, we can literally remove the glyph from the stream, thereby forcing it not to render. > > The better solution is too risky to do now, but https://bugs.webkit.org/show_bug.cgi?id=187166 is for it. > > I'm not sure what you mean about fast vs slow functions; both the honest and dishonest ones should be fast. The dishonest one is cached using GlyphPages, and the honest one is cached using a bit vector. We could have decided to cache the honest one using glyph pages and the dishonest one with a bit vector. The one with the bit vector will be slower than the one that uses the glyph pages alone.

Myles C. Maxfield

Comment 8 2018-07-01 18:30:35 PDT

(In reply to Darin Adler from comment #7) > Comment on attachment 343984 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=343984&action=review > > >>> Source/WebCore/platform/graphics/Font.cpp:585 > >>> + // supports. > >> > >> The glyphForCharacter function maps certain code points to zero-width spaces because we never want those code points to render visibly. We want that behavior for both the fast and slow text code paths. I wish the comment was clearer on this point. And I wish I understood how we will get that behavior. > >> > >> The comment does not make clear why "as honest as possible" is needed here. I think the idea is that "even if we don’t want the code points to render visibly we want to correctly understood the code points are 'supported' because we makes some other kind of decision based on that". But I don’t fully understand. > >> > >> The comment also doesn’t make it clear why we decided that this can be the slower function and the other one must be the faster function, rather than the other way around. Seems natural to make the "honest" function the fast one, but I’m sure we have a good reason not to do that. > > > > Yeah, when I first discovered this, I was surprised like you were to see that we didn't have any custom code for the complex path (and instead the custom code only handles the simple path). Upon further investigation, I found that Core Text handles this for us. > > > > Mapping characters to ZWS when we want them to be invisible has two problems: > > 1) Fonts can totally render visible glyphs for ZWS > > 2) Performing the substation at the code point level means that the results of mapping the code points to glyphs will be affected, and the shaping rules inside fonts won't expect these new glyphIDs to be used. Shaping won't work correctly. > > > > Instead, the better solution to this is to not do any code-point-level substitution, but instead run shaping just like normal, and then after shaping, look through the transformed glyphs to see if any of them came from these special characters. If so, we can literally remove the glyph from the stream, thereby forcing it not to render. > > > > The better solution is too risky to do now, but https://bugs.webkit.org/show_bug.cgi?id=187166 is for it. > > > > I'm not sure what you mean about fast vs slow functions; both the honest and dishonest ones should be fast. The dishonest one is cached using GlyphPages, and the honest one is cached using a bit vector. > > We could have decided to cache the honest one using glyph pages and the > dishonest one with a bit vector. The one with the bit vector will be slower > than the one that uses the glyph pages alone. The dishonest one is for the fast text codepath, which is the most common code path used on the Web today. That is the one that uses glyph pages.

Myles C. Maxfield

Comment 9 2018-07-01 18:36:53 PDT

Committed r233413: <https://trac.webkit.org/changeset/233413>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Myles C. Maxfield

Reported

2018-06-29 17:20 PDT

Modified

2018-07-01 18:36 PDT History

CC List

8 users Show

URL

Keywords InRadar

Depends on

Blocks