Bug 177040 - [FreeType] Support emoji modifiers
Summary: [FreeType] Support emoji modifiers
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: Other
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-09-16 13:54 PDT by Michael Catanzaro
Modified: 2019-01-21 00:42 PST (History)
6 users (show)

See Also:


Attachments
Patch (9.44 KB, patch)
2019-01-11 04:57 PST, Carlos Garcia Campos
mmaxfield: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Catanzaro 2017-09-16 13:54:32 PDT
We now have support for color emoji, but gender and skin-tone modifiers [1] are not supported. We should change that.

Apple ports already have this support (see e.g. r213499).

[1] http://www.unicode.org/reports/tr51/
Comment 1 Michael Catanzaro 2018-12-13 09:12:18 PST
Test page: https://unicode.org/emoji/charts/full-emoji-list.html

If you scroll down you'll notice that none of the emoji combinations work.
Comment 2 Carlos Garcia Campos 2018-12-14 02:03:49 PST
I see that some of them don't work, no none of them.
Comment 3 Michael Catanzaro 2018-12-14 06:04:06 PST
Is there a combination on that page that does work for you?
Comment 4 Carlos Garcia Campos 2018-12-14 06:34:39 PST
Yes, see all the flags for example, see also the test I attached to original bug:

https://bug-156579-attachments.webkit.org/attachment.cgi?id=284325
Comment 5 Carlos Garcia Campos 2018-12-14 06:59:05 PST
Ok, so the problem only happens with emojis having the zero with joiner (U+200D) in the sequence. The sequence is broken because createAndFillGlyphPage() in Font.cpp overwrites zero with joiner with zero width space (U+200B). Myles, why are we doing that overwrite? should that depend on the font? Not doing the overwrite fixes all the combinations for me in https://unicode.org/emoji/charts/full-emoji-list.html. If we can't remove the overwrite we can handle that as a special case in harfbuzz set_nominal_glyph function.
Comment 6 Myles C. Maxfield 2018-12-19 13:35:42 PST
(In reply to Carlos Garcia Campos from comment #5)
> Ok, so the problem only happens with emojis having the zero with joiner
> (U+200D) in the sequence. The sequence is broken because
> createAndFillGlyphPage() in Font.cpp overwrites zero with joiner with zero
> width space (U+200B). Myles, why are we doing that overwrite? should that
> depend on the font? Not doing the overwrite fixes all the combinations for
> me in https://unicode.org/emoji/charts/full-emoji-list.html. If we can't
> remove the overwrite we can handle that as a special case in harfbuzz
> set_nominal_glyph function.

This code is trying to make control characters invisible. I haven't done the research, but I expect the replacement code in createAndFillGlyphPage() is older than the WidthIterator::applyFontTransforms() code. (Both the simple text path and the complex text path use createAndFillGlyphPage().)

You've hit the reason I filed https://bugs.webkit.org/show_bug.cgi?id=187166. Because shaping operates on the character stream of the input, if we muck around with the character string, we break shaping. This has bitten us multiple times, see https://bugs.webkit.org/show_bug.cgi?id=185976

The right way to do this is to run shaping on the unperturbed character stream, and after shaping is finished, map each glyph in the result back to the character in the stream it represents, and if it represents a control character, remove that glyph from the stream.

Unfortunately, "map each glyph in the result back to the character in the stream it represents" is impossible in the Cocoa fast text codepath. That codepath uses CTFontTransformGlyphs() to do the shaping, but doesn't output a mapping back to the original string indices. <rdar://problem/44466695> is tracking this.

So, if I understand the Harfbuzz API correctly, the Harfbuzz ports should be able to fix this immediately, but the existing code needs to continue to exist for the Cocoa ports until we get a solution from the platform that fits.

Bonus: There used to be interop for many years between the browsers about making control characters invisible. However, the spec changed[1] and now says "Control characters (Unicode category Cc) other than tab (U+0009), line feed (U+000A), and form feed (U+000C), must be rendered as a visible glyph." However, I'm not sure we can actually make that change, because I don't think we've characterized how many sites would break (where "break" means "control character garbage shows up and makes content unreadable where it used to be invisible and readable"). The decision about rendering control characters visibly should be independently from the method of fixing createAndFillGlyphPage().

[1] https://drafts.csswg.org/css-text-3/#white-space-processing
Comment 7 Carlos Garcia Campos 2018-12-31 06:16:59 PST
Harfbuzz already replaces characters with Default_Ignorable Unicode property with space glyph and zeroes the advance width when shaping. So, I think we can probably avoid all the overrides for the GTK port.
Comment 8 Myles C. Maxfield 2019-01-02 14:41:49 PST
(In reply to Carlos Garcia Campos from comment #7)
> Harfbuzz already replaces characters with Default_Ignorable Unicode property
> with space glyph and zeroes the advance width when shaping. So, I think we
> can probably avoid all the overrides for the GTK port.

Awesome!
Comment 9 Carlos Garcia Campos 2019-01-11 02:04:32 PST
(In reply to Myles C. Maxfield from comment #8)
> (In reply to Carlos Garcia Campos from comment #7)
> > Harfbuzz already replaces characters with Default_Ignorable Unicode property
> > with space glyph and zeroes the advance width when shaping. So, I think we
> > can probably avoid all the overrides for the GTK port.
> 
> Awesome!

Unfortunately this is not so easy in the end. We still need to replace \n and \t to space too, and some characters never reach to harfbuzz anyway like isolations, because they are left out when splitting the text runs, and added back to list of glyphs when rendering, causing the missing glyph to be rendered. So, the safest approach that fixes this bug would be to only override the code points when the font doesn't support the code point. That would fix this bug without introducing any regression in tests.
Comment 10 Carlos Garcia Campos 2019-01-11 04:57:17 PST
Created attachment 358890 [details]
Patch
Comment 11 Carlos Garcia Campos 2019-01-13 23:59:06 PST
Committed r239915: <https://trac.webkit.org/changeset/239915>
Comment 12 Radar WebKit Bug Importer 2019-01-14 00:02:04 PST
<rdar://problem/47245470>