Bug 214769

Summary: Spacing of Chinese characters is inconsistent in macOS 11/Safari 14 beta
Product: WebKit Reporter: cipang
Component: TextAssignee: Myles C. Maxfield <mmaxfield>
Status: RESOLVED FIXED    
Severity: Major CC: bfulgham, changseok, darin, dino, esprehn+autocc, ews-watchlist, fmalita, glenn, gyuyoung.kim, kondapallykalyan, mmaxfield, pdr, sabouhallawa, schenney, sergio, simon.fraser, stephen, webkit-bug-importer, zalan
Priority: P2 Keywords: InRadar
Version: Safari Technology Preview   
Hardware: Mac   
OS: Other   
Bug Depends on: 215051, 215057, 215059    
Bug Blocks: 206208    
Attachments:
Description Flags
Comparison between Safari on Catalina and Big Sur
none
Selection rects are wrong
none
WIP
none
WIP
none
WIP
none
WIP
none
Patch
none
Patch
none
Needs tests
none
Patch
darin: review+
Patch for committing
none
Patch for committing
none
Patch for committing
none
Patch for committing
none
Patch for committing none

Description cipang 2020-07-24 16:39:36 PDT
Created attachment 405202 [details]
Comparison between Safari on Catalina and Big Sur

Please refer to the attached screenshot. This is the link to reproduce the problem: https://orientaldaily.on.cc/cnt/finance/20200723/00202_001.html

On macOS Catalina, the spacing between each character is consistent. But on Big Sur, some of the characters do not have spacing before/after (the green circles just highlight some examples).
Comment 1 Radar WebKit Bug Importer 2020-07-24 17:15:22 PDT
<rdar://problem/66081833>
Comment 2 Myles C. Maxfield 2020-07-30 15:52:31 PDT
Created attachment 405631 [details]
Selection rects are wrong
Comment 3 Myles C. Maxfield 2020-07-30 16:07:15 PDT
We're using the simple text codepath
Comment 4 Myles C. Maxfield 2020-07-30 16:12:10 PDT
The whole run is being rendered with PingFangTC-Regular 16.00
Comment 5 Myles C. Maxfield 2020-07-30 16:14:40 PDT
When we run font->widthForGlyph(glyph), it's returning 16.0 every time.
Comment 6 Myles C. Maxfield 2020-07-30 16:17:09 PDT
Because of letter-spacing, before we call WidthIterator::applyFontTransforms(), the GlyphBuffer only contains a bunch of (width = 21, height = 0)
Comment 7 Myles C. Maxfield 2020-07-30 16:19:56 PDT
After calling Font::applyTransforms(), some of the advances are 16 and some are 21. Looks like this is caused by us applying letter-spacing before shaping, rather than after shaping.

If we wanted to do it correctly, we would have to apply letter-spacing after shaping. However, for that, we need to be able to determine which glyphs come from which characters, which CTFontTransformGlyphsWithLanguage() does not do. Other functions may do this, though.
Comment 8 Myles C. Maxfield 2020-07-30 16:22:04 PDT
I'll try to determine what the difference is between Catalina and Big Sur.
Comment 9 Myles C. Maxfield 2020-07-30 16:48:17 PDT
Verified that CTFontTransformGlyphsWithLanguage() yields different results on Catalina and Big Sur when run on the same characters using PingFangTC-Regular and using "zh-hk" as the locale
Comment 10 Myles C. Maxfield 2020-08-07 02:23:12 PDT
Created attachment 406159 [details]
WIP
Comment 11 Myles C. Maxfield 2020-08-07 02:30:55 PDT
Created attachment 406160 [details]
WIP
Comment 12 Myles C. Maxfield 2020-08-07 03:12:16 PDT
Created attachment 406161 [details]
WIP
Comment 13 Myles C. Maxfield 2020-08-07 03:49:50 PDT
This last patch causes all the tests to pass
Comment 14 Myles C. Maxfield 2020-08-07 12:37:29 PDT
This patch needs tests, of course.
Comment 15 Myles C. Maxfield 2020-08-07 18:30:41 PDT
Created attachment 406240 [details]
WIP
Comment 16 Myles C. Maxfield 2020-08-07 22:38:32 PDT
Created attachment 406246 [details]
Patch
Comment 17 Myles C. Maxfield 2020-08-07 23:30:50 PDT
Created attachment 406251 [details]
Patch
Comment 18 Myles C. Maxfield 2020-08-08 01:43:51 PDT
Created attachment 406253 [details]
Needs tests
Comment 19 Myles C. Maxfield 2020-08-09 02:27:23 PDT
Created attachment 406266 [details]
Patch
Comment 20 Darin Adler 2020-08-09 09:46:20 PDT
Comment on attachment 406266 [details]
Patch

The win EWS failure does seem to be related to these changes. The mac-debug-wk1 one does not, so I am pushing the retry button.
Comment 21 Darin Adler 2020-08-09 10:10:01 PDT
Comment on attachment 406266 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=406266&action=review

The win EWS failure does seem to be related to these changes. The mac-debug-wk1 one does not, so I am pushing the retry button.

Looks good, but need to figure out why the Windows failure is happening.

> Source/WebCore/platform/graphics/FontCascade.cpp:1395
> +    glyphBuffer.setInitialAdvance(FloatSize(initialAdvance, 0) + static_cast<FloatSize>(glyphBuffer.initialAdvance()));

Could we use a local variable to avoid the static_cast here?

> Source/WebCore/platform/graphics/GlyphBuffer.h:262
> +        GlyphBufferAdvance& lastAdvance = m_advances[index];

auto&

> Source/WebCore/platform/graphics/WidthIterator.cpp:197
> +        if (character == tabCharacter)
> +            m_containsTabs = true;

Consider |= instead?

> Source/WebCore/platform/graphics/WidthIterator.cpp:201
> +        bool currentIsLastCharacter = currentCharacterIndex + advanceLength == static_cast<size_t>(m_run.length());
> +        if (currentIsLastCharacter)

I think this would read better without the local variable.

Is the static_cast<size_t> really needed?

> Source/WebCore/platform/graphics/WidthIterator.cpp:303
> +    if (hasExtraSpacing()) { // Tab stops apparently don't count as "extra spacing".

The "apparently" here spreads a little fear, uncertainty, and doubt. Could you write this in a more affirmative way? Or if you think this is wrong, say why?

> Source/WebCore/platform/graphics/WidthIterator.cpp:306
> +        { // Letter-spacing

Could we comment in a more traditional way? I don’t think adding a comment on the brace that starts a block is helpful enough to be worth the unconventional formatting. I also also don’t think we need to scope "baseWidth", and so traditional blocks of code would be OK without adding braces. If it’s critical to break these sections out a traditional way would be to use named functions, and I think that *would* be practical here.

> Source/WebCore/platform/graphics/WidthIterator.cpp:307
> +            // This is an approximation to determine if the character is non-visible. Non-visible characters don't get letter-spacing.

Not totally sure about the code below.

I’m not sure what the word "approximation" means in this context. Checking a floating point sum for exactly zero is quite precise, so maybe we mean something more like "heuristic". What would the "real" way to find out if something is non-visible be? Is it OK to work around it by checking for the sum of advances to be zero? Since are are summing the widths it seems that we think it’s likely that there will be a combination of positive and negative widths that exactly cancel each other out, and it’s important to treat that case as non-visible? Is that right? Do we have tests of cases like that?

> Source/WebCore/platform/graphics/WidthIterator.cpp:321
> +            bool currentIsLastCharacter = m_lastCharacterIndex && currentCharacterIndex == static_cast<GlyphBufferStringOffset>(*m_lastCharacterIndex);

Is there a way to avoid the static_cast<GlyphBufferStringOffset> that makes this line so long?

> Source/WebCore/platform/graphics/WidthIterator.cpp:339
> +            bool runForcesLeftExpansion = (m_run.expansionBehavior() & LeftExpansionMask) == ForceLeftExpansion;
> +            bool runForcesRightExpansion = (m_run.expansionBehavior() & RightExpansionMask) == ForceRightExpansion;
> +            bool runForbidsLeftExpansion = (m_run.expansionBehavior() & LeftExpansionMask) == ForbidLeftExpansion;
> +            bool runForbidsRightExpansion = (m_run.expansionBehavior() & RightExpansionMask) == ForbidRightExpansion;
> +
> +            bool forceLeftExpansion = false;
> +            bool forceRightExpansion = false;
> +            bool forbidLeftExpansion = false;
> +            bool forbidRightExpansion = false;
> +            if (runForcesLeftExpansion)
> +                forceLeftExpansion = m_run.ltr() ? !currentCharacterIndex : currentIsLastCharacter;
> +            if (runForcesRightExpansion)
> +                forceRightExpansion = m_run.ltr() ? currentIsLastCharacter : !currentCharacterIndex;
> +            if (runForbidsLeftExpansion)
> +                forbidLeftExpansion = m_run.ltr() ? !currentCharacterIndex : currentIsLastCharacter;
> +            if (runForbidsRightExpansion)
> +                forbidRightExpansion = m_run.ltr() ? currentIsLastCharacter : !currentCharacterIndex;

I think these local variables makes the code harder to read. I suggest writing it more like this:

    bool isLeft = !currentCharacterIndex;
    bool isRight = m_lastCharacterIndex && currentCharacterIndex == static_cast<GlyphBufferStringOffset>(*m_lastCharacterIndex);
    if (m_run.ltr())
        std::swap(isLeft, isRight);

    bool forceLeftExpansion = isLeft && ((m_run.expansionBehavior() & LeftExpansionMask) == ForceLeftExpansion);
    bool forceRightExpansion = isRight && ((m_run.expansionBehavior() & RightExpansionMask) == ForceRightExpansion);
    bool forbidLeftExpansion = isLeft && ((m_run.expansionBehavior() & LeftExpansionMask) == ForbidLeftExpansion);
    bool forbidRightExpansion = isRight && ((m_run.expansionBehavior() & RightExpansionMask) == ForbidRightExpansion);

Easier to see what’s going on, I think. Maybe we can go even further in improving this. Might be helpful to put m_run.expansionBehavior() into a local? Maybe use a structure to pass these four booleans rather than as four separate function arguments?

> Source/WebCore/platform/graphics/WidthIterator.cpp:341
> +            static bool expandAroundIdeographs = FontCascade::canExpandAroundIdeographsInComplexText();

I think here we want to write static const bool to make it clear that we can’t accidentally change during on one call and have it affect a subsequent call.

> Source/WebCore/platform/graphics/WidthIterator.cpp:342
> +            bool ideograph = (expandAroundIdeographs && FontCascade::isCJKIdeographOrSymbol(character));

We’d want to call this "isIdeograph" because the boolean doesn’t contain an ideograph. Also I would omit the parentheses here.

> Source/WebCore/platform/graphics/WidthIterator.cpp:405
> +    Vector<Optional<GlyphIndexRange>> characterIndexToGlyphIndexRange(m_run.length());
> +    Vector<float> advanceWidths(m_run.length(), 0);

Not sure why in the case of Optional we don’t explicitly pass WTF::nullopt but in the case of float we explicitly pass 0. I don’t think either is required, but perhaps I am mistaken.

> Source/WebCore/platform/graphics/WidthIterator.cpp:433
> +    for (GlyphBufferStringOffset currentCharacterIndex = characterStartIndex; currentCharacterIndex < static_cast<GlyphBufferStringOffset>(characterDestinationIndex); ++currentCharacterIndex) {

auto?

That static_cast here and these long variable names make this super-simple loop hard to recognize as such. Compare with the loop above using "I".

> Source/WebCore/platform/graphics/WidthIterator.cpp:438
> +        auto additionalWidth = calculateAdditionalWidth(glyphBuffer, currentCharacterIndex, glyphIndexRange->leadingGlyphIndex, glyphIndexRange->trailingGlyphIndex, position);

In this very narrow and clear context I think we can name this local just "width", making the code easier to read. After all, each structure member *also* has the word additional. Additional additional.

> Source/WebCore/platform/graphics/WidthIterator.cpp:463
> +void WidthIterator::finalize(GlyphBuffer& glyphBuffer)

I recommend naming the argument just "buffer" given the context is narrow and clear. Single words for the win.

> Source/WebCore/platform/graphics/WidthIterator.cpp:495
> +    // In general, we have to apply spacing after shaping, because shaping requires its input to be unperterbed (see https://bugs.webkit.org/show_bug.cgi?id=215052).

Spelling error here: "unperturbed".

> Source/WebCore/platform/graphics/WidthIterator.cpp:497
> +    {

Why the braces here?

> Source/WebCore/platform/graphics/WidthIterator.h:68
> +    inline bool hasExtraSpacing() const;

Not sure this inline is needed or valuable.

> Source/WebCore/platform/graphics/WidthIterator.h:74
> +        float leftAdditionalWidth;
> +        float rightAdditionalWidth;
> +        float leftExpansionAdditionalWidth;
> +        float rightExpansionAdditionalWidth;

Could consider leaving the words "additional width" off the names of these data members since the structure itself is named additional width.
Comment 22 Myles C. Maxfield 2020-08-09 16:55:10 PDT
Comment on attachment 406266 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=406266&action=review

>> Source/WebCore/platform/graphics/WidthIterator.cpp:307
>> +            // This is an approximation to determine if the character is non-visible. Non-visible characters don't get letter-spacing.
> 
> Not totally sure about the code below.
> 
> I’m not sure what the word "approximation" means in this context. Checking a floating point sum for exactly zero is quite precise, so maybe we mean something more like "heuristic". What would the "real" way to find out if something is non-visible be? Is it OK to work around it by checking for the sum of advances to be zero? Since are are summing the widths it seems that we think it’s likely that there will be a combination of positive and negative widths that exactly cancel each other out, and it’s important to treat that case as non-visible? Is that right? Do we have tests of cases like that?

Yeah, it doesn't make perfect sense to me either. This code is attempting to reproduce the deleted code from above, which does the same checking of a floating point value for exactly zero:

249             if (width) {
250                 width += m_font.letterSpacing();

I invented the "summing" behavior because a single character may produce multiple glyphs, so it's not super obvious what to turn this zero-check into.

The "real" way would probably be to get the path for the glyph and see if it contains any contours. Perhaps I should do this in a follow-up.

>> Source/WebCore/platform/graphics/WidthIterator.cpp:405
>> +    Vector<float> advanceWidths(m_run.length(), 0);
> 
> Not sure why in the case of Optional we don’t explicitly pass WTF::nullopt but in the case of float we explicitly pass 0. I don’t think either is required, but perhaps I am mistaken.

I didn't know scalar types get zero-initialized in WTF::Vectors.

>> Source/WebCore/platform/graphics/WidthIterator.h:68
>> +    inline bool hasExtraSpacing() const;
> 
> Not sure this inline is needed or valuable.

It's just for code reuse. It's called in two places.
Comment 23 Myles C. Maxfield 2020-08-09 17:00:13 PDT
Created attachment 406275 [details]
Patch for committing
Comment 24 Darin Adler 2020-08-09 17:59:09 PDT
Comment on attachment 406266 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=406266&action=review

>>> Source/WebCore/platform/graphics/WidthIterator.cpp:405
>>> +    Vector<Optional<GlyphIndexRange>> characterIndexToGlyphIndexRange(m_run.length());
>>> +    Vector<float> advanceWidths(m_run.length(), 0);
>> 
>> Not sure why in the case of Optional we don’t explicitly pass WTF::nullopt but in the case of float we explicitly pass 0. I don’t think either is required, but perhaps I am mistaken.
> 
> I didn't know scalar types get zero-initialized in WTF::Vectors.

Pretty sure they do, but let me check to be sure.

>>> Source/WebCore/platform/graphics/WidthIterator.h:68
>>> +    inline bool hasExtraSpacing() const;
>> 
>> Not sure this inline is needed or valuable.
> 
> It's just for code reuse. It's called in two places.

I meant the keyword "inline" seemed not needed or valuable here; no comment on the value of the function itself.
Comment 25 Myles C. Maxfield 2020-08-10 00:14:00 PDT
Created attachment 406283 [details]
Patch for committing
Comment 26 Myles C. Maxfield 2020-08-10 00:14:56 PDT
Created attachment 406284 [details]
Patch for committing
Comment 27 Myles C. Maxfield 2020-08-10 00:18:05 PDT
Comment on attachment 406266 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=406266&action=review

>>>> Source/WebCore/platform/graphics/WidthIterator.cpp:405
>>>> +    Vector<float> advanceWidths(m_run.length(), 0);
>>> 
>>> Not sure why in the case of Optional we don’t explicitly pass WTF::nullopt but in the case of float we explicitly pass 0. I don’t think either is required, but perhaps I am mistaken.
>> 
>> I didn't know scalar types get zero-initialized in WTF::Vectors.
> 
> Pretty sure they do, but let me check to be sure.

// Unlike in std::vector, this constructor does not initialize POD types.
    explicit Vector(size_t size)
Comment 28 Myles C. Maxfield 2020-08-10 04:07:38 PDT
Created attachment 406292 [details]
Patch for committing
Comment 29 Myles C. Maxfield 2020-08-10 21:44:37 PDT
Created attachment 406360 [details]
Patch for committing
Comment 30 Myles C. Maxfield 2020-08-10 22:17:09 PDT
Committed r265488: <https://trac.webkit.org/changeset/265488>
Comment 31 Myles C. Maxfield 2020-08-10 23:11:58 PDT
Committed r265490: <https://trac.webkit.org/changeset/265490>
Comment 32 Myles C. Maxfield 2020-08-17 02:35:06 PDT
*** Bug 206898 has been marked as a duplicate of this bug. ***