Bug 214769

Summary: Spacing of Chinese characters is inconsistent in macOS 11/Safari 14 beta
Product: WebKit Reporter: cipang
Component: TextAssignee: Myles C. Maxfield <mmaxfield>
Status: RESOLVED FIXED    
Severity: Major CC: bfulgham, changseok, darin, dino, esprehn+autocc, ews-watchlist, fmalita, glenn, gyuyoung.kim, kondapallykalyan, mmaxfield, pdr, sabouhallawa, schenney, sergio, simon.fraser, stephen, webkit-bug-importer, zalan
Priority: P2 Keywords: InRadar
Version: Safari Technology Preview   
Hardware: Mac   
OS: Other   
Bug Depends on: 215051, 215057, 215059    
Bug Blocks: 206208    
Attachments:
Description Flags
Comparison between Safari on Catalina and Big Sur
none
Selection rects are wrong
none
WIP
none
WIP
none
WIP
none
WIP
none
Patch
none
Patch
none
Needs tests
none
Patch
darin: review+
Patch for committing
none
Patch for committing
none
Patch for committing
none
Patch for committing
none
Patch for committing none

cipang
Reported 2020-07-24 16:39:36 PDT
Created attachment 405202 [details] Comparison between Safari on Catalina and Big Sur Please refer to the attached screenshot. This is the link to reproduce the problem: https://orientaldaily.on.cc/cnt/finance/20200723/00202_001.html On macOS Catalina, the spacing between each character is consistent. But on Big Sur, some of the characters do not have spacing before/after (the green circles just highlight some examples).
Attachments
Comparison between Safari on Catalina and Big Sur (286.73 KB, image/png)
2020-07-24 16:39 PDT, cipang
no flags
Selection rects are wrong (43.88 KB, image/png)
2020-07-30 15:52 PDT, Myles C. Maxfield
no flags
WIP (22.98 KB, patch)
2020-08-07 02:23 PDT, Myles C. Maxfield
no flags
WIP (23.08 KB, patch)
2020-08-07 02:30 PDT, Myles C. Maxfield
no flags
WIP (24.88 KB, patch)
2020-08-07 03:12 PDT, Myles C. Maxfield
no flags
WIP (26.06 KB, patch)
2020-08-07 18:30 PDT, Myles C. Maxfield
no flags
Patch (25.59 KB, patch)
2020-08-07 22:38 PDT, Myles C. Maxfield
no flags
Patch (25.63 KB, patch)
2020-08-07 23:30 PDT, Myles C. Maxfield
no flags
Needs tests (29.67 KB, patch)
2020-08-08 01:43 PDT, Myles C. Maxfield
no flags
Patch (38.53 KB, patch)
2020-08-09 02:27 PDT, Myles C. Maxfield
darin: review+
Patch for committing (37.34 KB, patch)
2020-08-09 17:00 PDT, Myles C. Maxfield
no flags
Patch for committing (37.10 KB, patch)
2020-08-10 00:14 PDT, Myles C. Maxfield
no flags
Patch for committing (37.09 KB, patch)
2020-08-10 00:14 PDT, Myles C. Maxfield
no flags
Patch for committing (38.09 KB, patch)
2020-08-10 04:07 PDT, Myles C. Maxfield
no flags
Patch for committing (37.27 KB, patch)
2020-08-10 21:44 PDT, Myles C. Maxfield
no flags
Radar WebKit Bug Importer
Comment 1 2020-07-24 17:15:22 PDT
Myles C. Maxfield
Comment 2 2020-07-30 15:52:31 PDT
Created attachment 405631 [details] Selection rects are wrong
Myles C. Maxfield
Comment 3 2020-07-30 16:07:15 PDT
We're using the simple text codepath
Myles C. Maxfield
Comment 4 2020-07-30 16:12:10 PDT
The whole run is being rendered with PingFangTC-Regular 16.00
Myles C. Maxfield
Comment 5 2020-07-30 16:14:40 PDT
When we run font->widthForGlyph(glyph), it's returning 16.0 every time.
Myles C. Maxfield
Comment 6 2020-07-30 16:17:09 PDT
Because of letter-spacing, before we call WidthIterator::applyFontTransforms(), the GlyphBuffer only contains a bunch of (width = 21, height = 0)
Myles C. Maxfield
Comment 7 2020-07-30 16:19:56 PDT
After calling Font::applyTransforms(), some of the advances are 16 and some are 21. Looks like this is caused by us applying letter-spacing before shaping, rather than after shaping. If we wanted to do it correctly, we would have to apply letter-spacing after shaping. However, for that, we need to be able to determine which glyphs come from which characters, which CTFontTransformGlyphsWithLanguage() does not do. Other functions may do this, though.
Myles C. Maxfield
Comment 8 2020-07-30 16:22:04 PDT
I'll try to determine what the difference is between Catalina and Big Sur.
Myles C. Maxfield
Comment 9 2020-07-30 16:48:17 PDT
Verified that CTFontTransformGlyphsWithLanguage() yields different results on Catalina and Big Sur when run on the same characters using PingFangTC-Regular and using "zh-hk" as the locale
Myles C. Maxfield
Comment 10 2020-08-07 02:23:12 PDT
Myles C. Maxfield
Comment 11 2020-08-07 02:30:55 PDT
Myles C. Maxfield
Comment 12 2020-08-07 03:12:16 PDT
Myles C. Maxfield
Comment 13 2020-08-07 03:49:50 PDT
This last patch causes all the tests to pass
Myles C. Maxfield
Comment 14 2020-08-07 12:37:29 PDT
This patch needs tests, of course.
Myles C. Maxfield
Comment 15 2020-08-07 18:30:41 PDT
Myles C. Maxfield
Comment 16 2020-08-07 22:38:32 PDT
Myles C. Maxfield
Comment 17 2020-08-07 23:30:50 PDT
Myles C. Maxfield
Comment 18 2020-08-08 01:43:51 PDT
Created attachment 406253 [details] Needs tests
Myles C. Maxfield
Comment 19 2020-08-09 02:27:23 PDT
Darin Adler
Comment 20 2020-08-09 09:46:20 PDT
Comment on attachment 406266 [details] Patch The win EWS failure does seem to be related to these changes. The mac-debug-wk1 one does not, so I am pushing the retry button.
Darin Adler
Comment 21 2020-08-09 10:10:01 PDT
Comment on attachment 406266 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406266&action=review The win EWS failure does seem to be related to these changes. The mac-debug-wk1 one does not, so I am pushing the retry button. Looks good, but need to figure out why the Windows failure is happening. > Source/WebCore/platform/graphics/FontCascade.cpp:1395 > + glyphBuffer.setInitialAdvance(FloatSize(initialAdvance, 0) + static_cast<FloatSize>(glyphBuffer.initialAdvance())); Could we use a local variable to avoid the static_cast here? > Source/WebCore/platform/graphics/GlyphBuffer.h:262 > + GlyphBufferAdvance& lastAdvance = m_advances[index]; auto& > Source/WebCore/platform/graphics/WidthIterator.cpp:197 > + if (character == tabCharacter) > + m_containsTabs = true; Consider |= instead? > Source/WebCore/platform/graphics/WidthIterator.cpp:201 > + bool currentIsLastCharacter = currentCharacterIndex + advanceLength == static_cast<size_t>(m_run.length()); > + if (currentIsLastCharacter) I think this would read better without the local variable. Is the static_cast<size_t> really needed? > Source/WebCore/platform/graphics/WidthIterator.cpp:303 > + if (hasExtraSpacing()) { // Tab stops apparently don't count as "extra spacing". The "apparently" here spreads a little fear, uncertainty, and doubt. Could you write this in a more affirmative way? Or if you think this is wrong, say why? > Source/WebCore/platform/graphics/WidthIterator.cpp:306 > + { // Letter-spacing Could we comment in a more traditional way? I don’t think adding a comment on the brace that starts a block is helpful enough to be worth the unconventional formatting. I also also don’t think we need to scope "baseWidth", and so traditional blocks of code would be OK without adding braces. If it’s critical to break these sections out a traditional way would be to use named functions, and I think that *would* be practical here. > Source/WebCore/platform/graphics/WidthIterator.cpp:307 > + // This is an approximation to determine if the character is non-visible. Non-visible characters don't get letter-spacing. Not totally sure about the code below. I’m not sure what the word "approximation" means in this context. Checking a floating point sum for exactly zero is quite precise, so maybe we mean something more like "heuristic". What would the "real" way to find out if something is non-visible be? Is it OK to work around it by checking for the sum of advances to be zero? Since are are summing the widths it seems that we think it’s likely that there will be a combination of positive and negative widths that exactly cancel each other out, and it’s important to treat that case as non-visible? Is that right? Do we have tests of cases like that? > Source/WebCore/platform/graphics/WidthIterator.cpp:321 > + bool currentIsLastCharacter = m_lastCharacterIndex && currentCharacterIndex == static_cast<GlyphBufferStringOffset>(*m_lastCharacterIndex); Is there a way to avoid the static_cast<GlyphBufferStringOffset> that makes this line so long? > Source/WebCore/platform/graphics/WidthIterator.cpp:339 > + bool runForcesLeftExpansion = (m_run.expansionBehavior() & LeftExpansionMask) == ForceLeftExpansion; > + bool runForcesRightExpansion = (m_run.expansionBehavior() & RightExpansionMask) == ForceRightExpansion; > + bool runForbidsLeftExpansion = (m_run.expansionBehavior() & LeftExpansionMask) == ForbidLeftExpansion; > + bool runForbidsRightExpansion = (m_run.expansionBehavior() & RightExpansionMask) == ForbidRightExpansion; > + > + bool forceLeftExpansion = false; > + bool forceRightExpansion = false; > + bool forbidLeftExpansion = false; > + bool forbidRightExpansion = false; > + if (runForcesLeftExpansion) > + forceLeftExpansion = m_run.ltr() ? !currentCharacterIndex : currentIsLastCharacter; > + if (runForcesRightExpansion) > + forceRightExpansion = m_run.ltr() ? currentIsLastCharacter : !currentCharacterIndex; > + if (runForbidsLeftExpansion) > + forbidLeftExpansion = m_run.ltr() ? !currentCharacterIndex : currentIsLastCharacter; > + if (runForbidsRightExpansion) > + forbidRightExpansion = m_run.ltr() ? currentIsLastCharacter : !currentCharacterIndex; I think these local variables makes the code harder to read. I suggest writing it more like this: bool isLeft = !currentCharacterIndex; bool isRight = m_lastCharacterIndex && currentCharacterIndex == static_cast<GlyphBufferStringOffset>(*m_lastCharacterIndex); if (m_run.ltr()) std::swap(isLeft, isRight); bool forceLeftExpansion = isLeft && ((m_run.expansionBehavior() & LeftExpansionMask) == ForceLeftExpansion); bool forceRightExpansion = isRight && ((m_run.expansionBehavior() & RightExpansionMask) == ForceRightExpansion); bool forbidLeftExpansion = isLeft && ((m_run.expansionBehavior() & LeftExpansionMask) == ForbidLeftExpansion); bool forbidRightExpansion = isRight && ((m_run.expansionBehavior() & RightExpansionMask) == ForbidRightExpansion); Easier to see what’s going on, I think. Maybe we can go even further in improving this. Might be helpful to put m_run.expansionBehavior() into a local? Maybe use a structure to pass these four booleans rather than as four separate function arguments? > Source/WebCore/platform/graphics/WidthIterator.cpp:341 > + static bool expandAroundIdeographs = FontCascade::canExpandAroundIdeographsInComplexText(); I think here we want to write static const bool to make it clear that we can’t accidentally change during on one call and have it affect a subsequent call. > Source/WebCore/platform/graphics/WidthIterator.cpp:342 > + bool ideograph = (expandAroundIdeographs && FontCascade::isCJKIdeographOrSymbol(character)); We’d want to call this "isIdeograph" because the boolean doesn’t contain an ideograph. Also I would omit the parentheses here. > Source/WebCore/platform/graphics/WidthIterator.cpp:405 > + Vector<Optional<GlyphIndexRange>> characterIndexToGlyphIndexRange(m_run.length()); > + Vector<float> advanceWidths(m_run.length(), 0); Not sure why in the case of Optional we don’t explicitly pass WTF::nullopt but in the case of float we explicitly pass 0. I don’t think either is required, but perhaps I am mistaken. > Source/WebCore/platform/graphics/WidthIterator.cpp:433 > + for (GlyphBufferStringOffset currentCharacterIndex = characterStartIndex; currentCharacterIndex < static_cast<GlyphBufferStringOffset>(characterDestinationIndex); ++currentCharacterIndex) { auto? That static_cast here and these long variable names make this super-simple loop hard to recognize as such. Compare with the loop above using "I". > Source/WebCore/platform/graphics/WidthIterator.cpp:438 > + auto additionalWidth = calculateAdditionalWidth(glyphBuffer, currentCharacterIndex, glyphIndexRange->leadingGlyphIndex, glyphIndexRange->trailingGlyphIndex, position); In this very narrow and clear context I think we can name this local just "width", making the code easier to read. After all, each structure member *also* has the word additional. Additional additional. > Source/WebCore/platform/graphics/WidthIterator.cpp:463 > +void WidthIterator::finalize(GlyphBuffer& glyphBuffer) I recommend naming the argument just "buffer" given the context is narrow and clear. Single words for the win. > Source/WebCore/platform/graphics/WidthIterator.cpp:495 > + // In general, we have to apply spacing after shaping, because shaping requires its input to be unperterbed (see https://bugs.webkit.org/show_bug.cgi?id=215052). Spelling error here: "unperturbed". > Source/WebCore/platform/graphics/WidthIterator.cpp:497 > + { Why the braces here? > Source/WebCore/platform/graphics/WidthIterator.h:68 > + inline bool hasExtraSpacing() const; Not sure this inline is needed or valuable. > Source/WebCore/platform/graphics/WidthIterator.h:74 > + float leftAdditionalWidth; > + float rightAdditionalWidth; > + float leftExpansionAdditionalWidth; > + float rightExpansionAdditionalWidth; Could consider leaving the words "additional width" off the names of these data members since the structure itself is named additional width.
Myles C. Maxfield
Comment 22 2020-08-09 16:55:10 PDT
Comment on attachment 406266 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406266&action=review >> Source/WebCore/platform/graphics/WidthIterator.cpp:307 >> + // This is an approximation to determine if the character is non-visible. Non-visible characters don't get letter-spacing. > > Not totally sure about the code below. > > I’m not sure what the word "approximation" means in this context. Checking a floating point sum for exactly zero is quite precise, so maybe we mean something more like "heuristic". What would the "real" way to find out if something is non-visible be? Is it OK to work around it by checking for the sum of advances to be zero? Since are are summing the widths it seems that we think it’s likely that there will be a combination of positive and negative widths that exactly cancel each other out, and it’s important to treat that case as non-visible? Is that right? Do we have tests of cases like that? Yeah, it doesn't make perfect sense to me either. This code is attempting to reproduce the deleted code from above, which does the same checking of a floating point value for exactly zero: 249 if (width) { 250 width += m_font.letterSpacing(); I invented the "summing" behavior because a single character may produce multiple glyphs, so it's not super obvious what to turn this zero-check into. The "real" way would probably be to get the path for the glyph and see if it contains any contours. Perhaps I should do this in a follow-up. >> Source/WebCore/platform/graphics/WidthIterator.cpp:405 >> + Vector<float> advanceWidths(m_run.length(), 0); > > Not sure why in the case of Optional we don’t explicitly pass WTF::nullopt but in the case of float we explicitly pass 0. I don’t think either is required, but perhaps I am mistaken. I didn't know scalar types get zero-initialized in WTF::Vectors. >> Source/WebCore/platform/graphics/WidthIterator.h:68 >> + inline bool hasExtraSpacing() const; > > Not sure this inline is needed or valuable. It's just for code reuse. It's called in two places.
Myles C. Maxfield
Comment 23 2020-08-09 17:00:13 PDT
Created attachment 406275 [details] Patch for committing
Darin Adler
Comment 24 2020-08-09 17:59:09 PDT
Comment on attachment 406266 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406266&action=review >>> Source/WebCore/platform/graphics/WidthIterator.cpp:405 >>> + Vector<Optional<GlyphIndexRange>> characterIndexToGlyphIndexRange(m_run.length()); >>> + Vector<float> advanceWidths(m_run.length(), 0); >> >> Not sure why in the case of Optional we don’t explicitly pass WTF::nullopt but in the case of float we explicitly pass 0. I don’t think either is required, but perhaps I am mistaken. > > I didn't know scalar types get zero-initialized in WTF::Vectors. Pretty sure they do, but let me check to be sure. >>> Source/WebCore/platform/graphics/WidthIterator.h:68 >>> + inline bool hasExtraSpacing() const; >> >> Not sure this inline is needed or valuable. > > It's just for code reuse. It's called in two places. I meant the keyword "inline" seemed not needed or valuable here; no comment on the value of the function itself.
Myles C. Maxfield
Comment 25 2020-08-10 00:14:00 PDT
Created attachment 406283 [details] Patch for committing
Myles C. Maxfield
Comment 26 2020-08-10 00:14:56 PDT
Created attachment 406284 [details] Patch for committing
Myles C. Maxfield
Comment 27 2020-08-10 00:18:05 PDT
Comment on attachment 406266 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=406266&action=review >>>> Source/WebCore/platform/graphics/WidthIterator.cpp:405 >>>> + Vector<float> advanceWidths(m_run.length(), 0); >>> >>> Not sure why in the case of Optional we don’t explicitly pass WTF::nullopt but in the case of float we explicitly pass 0. I don’t think either is required, but perhaps I am mistaken. >> >> I didn't know scalar types get zero-initialized in WTF::Vectors. > > Pretty sure they do, but let me check to be sure. // Unlike in std::vector, this constructor does not initialize POD types. explicit Vector(size_t size)
Myles C. Maxfield
Comment 28 2020-08-10 04:07:38 PDT
Created attachment 406292 [details] Patch for committing
Myles C. Maxfield
Comment 29 2020-08-10 21:44:37 PDT
Created attachment 406360 [details] Patch for committing
Myles C. Maxfield
Comment 30 2020-08-10 22:17:09 PDT
Myles C. Maxfield
Comment 31 2020-08-10 23:11:58 PDT
Myles C. Maxfield
Comment 32 2020-08-17 02:35:06 PDT
*** Bug 206898 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.