Bug 199991

Summary: [FreeType] Shrugging woman emoji 🤷‍♀️ not joined sometimes depending on mysterious unknown conditions
Product: WebKit Reporter: Michael Catanzaro <mcatanzaro>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: RESOLVED WORKSFORME    
Severity: Normal CC: aperez, bugs-noreply, calvaris, cgarcia, mcatanzaro
Priority: P2    
Version: WebKit Nightly Build   
Hardware: PC   
OS: Linux   
Attachments:
Description Flags
Sample rendering with WebKitGTK r247673
none
Ephy Tech preview updated this morning
none
Michael's screenshot of Tech Preview none

Description Michael Catanzaro 2019-07-21 10:52:37 PDT
The shrugging woman emoji 🤷‍♀️ doesn't display properly in WebKitGTK. Copy/paste it into gedit or, in Epiphany, select and right-click to see proper rendering done by pango. This is U+1F937 U+200D U+2640 U+FE0F. 🤷 (U+1F937 person shrugging) displays fine.

In contrast, 🧟‍♀️ (woman zombie) works fine. This is U+1F9DF U+200D U+2640 U+FE0F, an almost identical sequence. I have no clue what the difference here could possibly be.

Some other nearby characters, e.g. 🤷‍♂️ (man shrugging) and 👨‍⚕️ (man health worker) and 💇‍♂️ (man getting haircut) have the same problem. Other seemingly-random characters like 🧞‍♀️ (woman genie) work fine.
Comment 1 Adrian Perez 2019-07-21 12:13:30 PDT
Created attachment 374572 [details]
Sample rendering with WebKitGTK r247673

Here I get the same rendering in GEdit and Epiphany with
WebKitGTK built from trunk, see the attached image.
Comment 2 Michael Catanzaro 2019-07-21 14:19:43 PDT
Hm, my result is with 2.25.2. Let me check trunk.
Comment 3 Michael Catanzaro 2019-07-21 14:22:17 PDT
Indeed, the emoji display properly in my JHBuild environment, but they're broken in Tech Preview (2.25.2).
Comment 4 Michael Catanzaro 2019-07-21 14:43:13 PDT
When I build 2.25.2 in my JHBuild environment, the emoji still work. That means something other than WebKit is probably too old in Tech Preview. There are so many font rendering libraries it's hard to know what, though: FreeType, harfbuzz, Fontconfig, cairo? Probably Carlos Garcia will know.
Comment 5 Xabier RodrĂ­guez Calvar 2019-07-21 23:24:31 PDT
Created attachment 374588 [details]
Ephy Tech preview updated this morning
Comment 6 Carlos Garcia Campos 2019-07-22 01:22:46 PDT
(In reply to Michael Catanzaro from comment #4)
> When I build 2.25.2 in my JHBuild environment, the emoji still work. That
> means something other than WebKit is probably too old in Tech Preview. There
> are so many font rendering libraries it's hard to know what, though:
> FreeType, harfbuzz, Fontconfig, cairo? Probably Carlos Garcia will know.

ICU?
Comment 7 Michael Catanzaro 2019-07-22 07:14:32 PDT
We have ICU 64.2 in Tech Preview, as opposed to 63.2 in Fedora 30 (used in my JHBuild environment, which works).

(In reply to Xabier RodrĂ­guez Calvar from comment #5)
> Created attachment 374588 [details]
> Ephy Tech preview updated this morning

Huh. WTF.

Let's make sure we have identical runtime versions to remove any possibility of different software versions. Please run:

$ flatpak info org.gnome.Platform//master

I have:

      Commit: 51802df53c6efcb2492a1b220f1ab404d9f4de648a52671e52a80d76543fdb73
      Parent: ff230068e406dc0f4079afaa7131dcf36b7ccda23380a2296369d969d38db131
     Subject: Export org.gnome.Platform
        Date: 2019-07-17 07:35:03 +0000

Then my Tech Preview build is from an experimental repo, but it should be the same as the official one. Not that Epiphany version should matter at all: 3.33.4-24-g9f95b6c3e.

I wonder if host fonts could affect this (are host fonts used in the flatpak environment?), but my emoji that do appear look identical to Calvaris's....
Comment 8 Michael Catanzaro 2019-07-22 07:15:12 PDT
Created attachment 374598 [details]
Michael's screenshot of Tech Preview
Comment 9 Xabier RodrĂ­guez Calvar 2019-07-22 07:19:48 PDT
(In reply to Michael Catanzaro from comment #7)
> $ flatpak info org.gnome.Platform//master

$ LANG=C flatpak info org.gnome.Platform//master

...
      Commit: 51802df53c6efcb2492a1b220f1ab404d9f4de648a52671e52a80d76543fdb73
      Parent: ff230068e406dc0f4079afaa7131dcf36b7ccda23380a2296369d969d38db131
...

calvaris@rachado:~$ LANG=C flatpak info org.gnome.Epiphany.Devel

...
      Commit: 09f48c99a3143679701ba4247425c2f8a42c67cd4e0e814af2b8872ebe1ad0cd
      Parent: 67ad56b7f45f3ef1b14de268fb7c8e5401721e98b84556e4dd2b93a6fb9d5f95
...
Comment 10 Michael Catanzaro 2019-07-22 07:56:01 PDT
I switched back to normal Tech Preview and have:

      Commit: 4f6a115800bccc54843ef95f44962a118a014fb36c918592e4bee5f78779292d
      Parent: 09f48c99a3143679701ba4247425c2f8a42c67cd4e0e814af2b8872ebe1ad0cd

Which is probably a couple hours newer than yours. Regardless, Carlos, we know the runtime version is identical, eliminating any possibility of difference in dependency versions or WebKit itself. And the Epiphany version is nearly identical.

I'm told that flatpak exposes host fonts inside the flatpak environment, so almost surely the difference is caused by different host fonts. Perhaps Fontconfig is choosing a non-ideal font for the characters that aren't rendered properly.
Comment 11 Michael Catanzaro 2019-09-01 05:40:42 PDT
So I can reproduce this bug 100% of the time on my development workstation, and 0% of the time on my travel laptop. I guessed it must be caused by a difference in host fonts, because I couldn't think of any other host config that could affect the flatpak runtime.

Using fc-list, I discovered that my development workstation has a few fonts installed that my travel laptop does not:

/usr/share/fonts/smc-raghumalayalamsans/RaghuMalayalamSans-Regular.ttf: RaghuMalayalamSans:style=Regular
/usr/share/fonts/smc-rachana/Rachana-Regular.ttf: Rachana:style=Regular
/usr/share/fonts/smc-suruma/Suruma.ttf: Suruma:style=Medium
/usr/share/fonts/smc-anjalioldlipi/AnjaliOldLipi-Regular.ttf: AnjaliOldLipi:style=Regular
/usr/share/fonts/smc-dyuthi/Dyuthi-Regular.ttf: Dyuthi:style=Regular
/usr/share/fonts/smc-rachana/Rachana-Bold.ttf: Rachana:style=Bold

These fonts are provided by the Fedora packages smc-raghumalayalamsans-fonts, smc-rachana-fonts, smc-suruma-fonts, smc-anjalioldlipi-fonts, and smc-dyuthi-fonts. All other fonts are identical between the two systems. I uninstalled all these fonts from my development workstation, but it did not fix the bug. I conclude a difference in installed fonts is not to blame.
Comment 12 Michael Catanzaro 2019-10-01 10:30:18 PDT
Another example: https://gankra.github.io/blah/text-hates-you/

"""
In the case of emoji, you've probably seen the failure mode of this process before! Because some emoji are actually ligatures of several simpler emoji, a font may successfully report support for the character while only yielding the components. So 🤦🏿‍♀️ may literally appear as 🤦 🏿‍ ♀ if the font is "too old" to know about the new ligature. This can also happen if your unicode implementation is "too old" to know about a character, causing the styling system to accept a partial match in the font.
"""

Here the first emoji is broken as described, I see a black-and-white facepalm, some weird tofu character, and then the ♀ character. (The second one is intentionally "broken" to demonstrate.)
Comment 13 Michael Catanzaro 2020-02-14 17:05:12 PST
Can't reproduce this one anymore, but that doesn't necessarily mean it's fixed, since we never figured out under which conditions this is reproducible.
Comment 14 Adrian Perez 2020-02-15 01:35:56 PST
(In reply to Michael Catanzaro from comment #13)
> Can't reproduce this one anymore, but that doesn't necessarily mean it's
> fixed, since we never figured out under which conditions this is
> reproducible.

Same here, the emoji shows correctly now… Checking recent package updates
I can see that Arch Linux got ICU updated from 64.2 to version 65.1 on
November 11th 2019; I don't see anything obvious in its release notes,
but they do say the Unicode CLDR data has been updated to v36, and in turn
the release notes for CLDR have this:

  “Emoji:
     Added names and keywords for Emoji 13.0 draft candidates; these are to be fleshed out further in v36.1. 
     Refined names and keywords for Emoji 12.0, including for English.”

I wonder if this could be the reason for things working fine now 🤔
Comment 15 Adrian Perez 2020-02-15 01:38:01 PST
(In reply to Adrian Perez from comment #14)
> (In reply to Michael Catanzaro from comment #13)
> > Can't reproduce this one anymore, but that doesn't necessarily mean it's
> > fixed, since we never figured out under which conditions this is
> > reproducible.
> 
> Same here, the emoji shows correctly now… Checking recent package updates
> I can see that Arch Linux got ICU updated from 64.2 to version 65.1 on
> November 11th 2019; I don't see anything obvious in its release notes,
> but they do say the Unicode CLDR data has been updated to v36, and in turn
> the release notes for CLDR have this:
> 
>   “Emoji:
>      Added names and keywords for Emoji 13.0 draft candidates; these are to
> be fleshed out further in v36.1. 
>      Refined names and keywords for Emoji 12.0, including for English.”
> 
> I wonder if this could be the reason for things working fine now 🤔

Also, which version of ICU did you have before? I see that ICU 64.2 got an
update to Unicode 12.1, so maybe it's not needed to be using ICU 65.x to
have this fixed :]
Comment 16 Adrian Perez 2020-02-15 03:07:47 PST
This page is great for testing:

  https://www.unicode.org/emoji/charts/emoji-zwj-sequences.html

Currently there are only a few ones that are having issues here for me:

  - Hair color / baldness modifiers.
  - Some profession modifiers (but not all).
  - Woman/man with veil.
  - Woman/man feeding baby.
  - Santa Claus.
  - Person with cane (but man/woman with cane works!).
  - Person in manual/motorized wheelchair (but man/woman in wheelchairs work!).
  - Black cat.
  - Polar bear.

The rest all work perfectly.

Currently I have installed the Twemoji font, version 12.1.4, and I think
the cases above would be fixed with the update to 12.1.5, which includes
this in release notes:

  “Update parsing and assets for new gender-neutral emojis introduced in Emoji 12.1”
Comment 17 Adrian Perez 2020-02-15 03:16:55 PST
(In reply to Adrian Perez from comment #16)
> This page is great for testing:
> 
>   https://www.unicode.org/emoji/charts/emoji-zwj-sequences.html
> 
> Currently there are only a few ones that are having issues here for me:
> 
>   - Hair color / baldness modifiers.
>   - Some profession modifiers (but not all).
>   - Woman/man with veil.
>   - Woman/man feeding baby.
>   - Santa Claus.
>   - Person with cane (but man/woman with cane works!).
>   - Person in manual/motorized wheelchair (but man/woman in wheelchairs
> work!).
>   - Black cat.
>   - Polar bear.
> 
> The rest all work perfectly.
> 
> Currently I have installed the Twemoji font, version 12.1.4, and I think
> the cases above would be fixed with the update to 12.1.5, which includes
> this in release notes:
> 
>   “Update parsing and assets for new gender-neutral emojis introduced in
> Emoji 12.1”

I have manually built and installed a Twemoji 12.1.5 package here, and
there are a few more from the list above that render correctly (person
in motorized/manual wheelchair , person with cane, professions, and the
hair color/baldness modifiers); the rest still render as separate glyphs.

So I think that the conclusion here is that as new Unicode versions which
add more emoji variants using ZWJ sequences, now and then there will be
a few of them “broken” until either ICU is updated, the font is updated,
or both are updated (depending on cases); but the problem is outside of
WebKit's scope. WDYT?
Comment 18 Michael Catanzaro 2020-02-15 09:44:26 PST
(In reply to Adrian Perez from comment #15)
> Also, which version of ICU did you have before? I see that ICU 64.2 got an
> update to Unicode 12.1, so maybe it's not needed to be using ICU 65.x to
> have this fixed :]

According to comment #7, I was previously able to reproduce this issue with ICU 64.2, but not with 63.2. Meanwhile Calvaris and I were seeing different results using identical runtime versions, eliminating any possibility of software version being related to the difference except for host configuration (e.g. host fonts).

(In reply to Adrian Perez from comment #17)
> So I think that the conclusion here is that as new Unicode versions which
> add more emoji variants using ZWJ sequences, now and then there will be
> a few of them “broken” until either ICU is updated, the font is updated,
> or both are updated (depending on cases); but the problem is outside of
> WebKit's scope. WDYT?

That much is certain.
Comment 19 Adrian Perez 2020-02-26 03:27:15 PST
(In reply to Michael Catanzaro from comment #18)
> (In reply to Adrian Perez from comment #15)
> > Also, which version of ICU did you have before? I see that ICU 64.2 got an
> > update to Unicode 12.1, so maybe it's not needed to be using ICU 65.x to
> > have this fixed :]
> 
> According to comment #7, I was previously able to reproduce this issue with
> ICU 64.2, but not with 63.2. Meanwhile Calvaris and I were seeing different
> results using identical runtime versions, eliminating any possibility of
> software version being related to the difference except for host
> configuration (e.g. host fonts).

…and today the Arch package for Twemoji had an update, and there are a few
more emojis that show fine now here.
 
> (In reply to Adrian Perez from comment #17)
> > So I think that the conclusion here is that as new Unicode versions which
> > add more emoji variants using ZWJ sequences, now and then there will be
> > a few of them “broken” until either ICU is updated, the font is updated,
> > or both are updated (depending on cases); but the problem is outside of
> > WebKit's scope. WDYT?
> 
> That much is certain.

Let's close this bug, then. Probably the most adequate resolution is
“WORKSFORME” (maybe “CONFIGURATION CHANGED” is fine, too).

If it turns out that there are more related issues where WebKit is to
blame, we can either reopen this or file new bugs for those specific
ones.