RESOLVED FIXED Bug 181315
::first-letter incorrectly selects grapheme pairs
https://bugs.webkit.org/show_bug.cgi?id=181315
Summary ::first-letter incorrectly selects grapheme pairs
Chris Nardi
Reported 2018-01-04 19:24:24 PST
::first-letter incorrectly selects grapheme pairs
Attachments
Patch (4.65 KB, patch)
2018-01-04 19:29 PST, Chris Nardi
no flags
Patch (4.65 KB, patch)
2018-01-04 19:51 PST, Chris Nardi
no flags
Patch (4.65 KB, patch)
2018-01-04 19:52 PST, Chris Nardi
no flags
Archive of layout-test-results from ews102 for mac-elcapitan (2.24 MB, application/zip)
2018-01-04 20:49 PST, EWS Watchlist
no flags
Archive of layout-test-results from ews115 for mac-elcapitan (2.95 MB, application/zip)
2018-01-04 20:59 PST, EWS Watchlist
no flags
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (2.62 MB, application/zip)
2018-01-04 21:01 PST, EWS Watchlist
no flags
Archive of layout-test-results from ews125 for ios-simulator-wk2 (2.20 MB, application/zip)
2018-01-04 21:19 PST, EWS Watchlist
no flags
Patch (46.57 KB, patch)
2018-01-05 09:00 PST, Chris Nardi
no flags
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (2.66 MB, application/zip)
2018-01-05 10:09 PST, EWS Watchlist
no flags
Patch (49.02 KB, patch)
2018-01-05 10:33 PST, Chris Nardi
no flags
Patch (49.05 KB, patch)
2018-01-05 22:23 PST, Chris Nardi
no flags
Patch (49.21 KB, patch)
2018-01-05 22:34 PST, Chris Nardi
no flags
Patch (49.21 KB, patch)
2018-01-06 15:37 PST, Chris Nardi
no flags
Patch (55.68 KB, patch)
2018-01-08 12:26 PST, Chris Nardi
no flags
Archive of layout-test-results from ews101 for mac-elcapitan (2.52 MB, application/zip)
2018-01-08 13:29 PST, EWS Watchlist
no flags
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (3.05 MB, application/zip)
2018-01-08 13:40 PST, EWS Watchlist
no flags
Archive of layout-test-results from ews113 for mac-elcapitan (3.38 MB, application/zip)
2018-01-08 14:04 PST, EWS Watchlist
no flags
Archive of layout-test-results from ews123 for ios-simulator-wk2 (23.54 MB, application/zip)
2018-01-08 14:27 PST, EWS Watchlist
no flags
Patch (52.85 KB, patch)
2018-01-08 19:09 PST, Chris Nardi
no flags
Chris Nardi
Comment 1 2018-01-04 19:29:29 PST Comment hidden (obsolete)
Chris Nardi
Comment 2 2018-01-04 19:51:02 PST Comment hidden (obsolete)
Chris Nardi
Comment 3 2018-01-04 19:52:53 PST Comment hidden (obsolete)
EWS Watchlist
Comment 4 2018-01-04 20:49:55 PST Comment hidden (obsolete)
EWS Watchlist
Comment 5 2018-01-04 20:49:56 PST Comment hidden (obsolete)
EWS Watchlist
Comment 6 2018-01-04 20:59:19 PST Comment hidden (obsolete)
EWS Watchlist
Comment 7 2018-01-04 20:59:20 PST Comment hidden (obsolete)
EWS Watchlist
Comment 8 2018-01-04 21:01:36 PST Comment hidden (obsolete)
EWS Watchlist
Comment 9 2018-01-04 21:01:37 PST Comment hidden (obsolete)
EWS Watchlist
Comment 10 2018-01-04 21:19:38 PST Comment hidden (obsolete)
EWS Watchlist
Comment 11 2018-01-04 21:19:39 PST Comment hidden (obsolete)
Chris Nardi
Comment 12 2018-01-05 09:00:50 PST Comment hidden (obsolete)
EWS Watchlist
Comment 13 2018-01-05 10:09:34 PST Comment hidden (obsolete)
EWS Watchlist
Comment 14 2018-01-05 10:09:35 PST Comment hidden (obsolete)
Chris Nardi
Comment 15 2018-01-05 10:33:30 PST Comment hidden (obsolete)
Chris Nardi
Comment 16 2018-01-05 13:24:47 PST
I think this is ready for review. All bots seem to be passing except for wincairom, which for some reason cannot apply the patch (even though the others can). I'm sure there's more rebaselineing that needs to occur, but I've just been working off of what the bots provide and GTK/Win don't seem to be providing any new expectation files. This was found from CSS2 tests in WPT (https://github.com/w3c/web-platform-tests/blob/master/css/CSS2/selectors/first-letter-punctuation-335.xht). I used that test as the basis for what I included in fast/css/first-letter-punctuation.html. A similar version of this patch was also submitted to Chromium (https://chromium-review.googlesource.com/c/chromium/src/+/847293).
Myles C. Maxfield
Comment 17 2018-01-05 15:43:39 PST
Comment on attachment 330557 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=330557&action=review > Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:225 > + length += U16_LENGTH(oldText.characterStartingAt(length)); Are we sure this is going to work? What if the first character is a combining sequence like U+0065 LATIN SMALL LETTER E followed by U+0300 COMBINING GRAVE ACCENT?
Chris Nardi
Comment 18 2018-01-05 22:23:11 PST Comment hidden (obsolete)
Chris Nardi
Comment 19 2018-01-05 22:34:07 PST Comment hidden (obsolete)
Chris Nardi
Comment 20 2018-01-05 22:35:35 PST
Comment on attachment 330557 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=330557&action=review >> Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:225 >> + length += U16_LENGTH(oldText.characterStartingAt(length)); > > Are we sure this is going to work? What if the first character is a combining sequence like U+0065 LATIN SMALL LETTER E followed by U+0300 COMBINING GRAVE ACCENT? That's true, that didn't occur to me. I updated the rest of the patch with this consideration.
Chris Nardi
Comment 21 2018-01-06 15:37:56 PST
Darin Adler
Comment 22 2018-01-07 22:36:22 PST
Comment on attachment 330650 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=330650&action=review > Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:240 > + if (numCharacters > 1) > + scanLength += numCharacters - 1; The is a strange way to write the loop. I would instead suggest either: 1) Removing the ++scanLength from the for loop above and instead writing this here: scanLength += numCharacters; or 2) Defining numCharacters before the loop so it can be used in the for loop and then writing: for (unsigned scanLength = length; scanLength < oldText.length(); scanLength += numCharacters) Also seems that numCharacters and numCharacterInGraphemeClusters are both not great names. Since these are "code units", not "characters". Should be numCodeUnitsInGraphemeClusters and numCodeUnits. Someone should come back and rename.
Darin Adler
Comment 23 2018-01-07 22:39:14 PST
(In reply to Chris Nardi from comment #20) > > Are we sure this is going to work? What if the first character is a combining sequence like U+0065 LATIN SMALL LETTER E followed by U+0300 COMBINING GRAVE ACCENT? > > That's true, that didn't occur to me. I updated the rest of the patch with > this consideration. Thanks for doing that. Did you add a test case that includes a grapheme cluster that is more than one code point? I didn’t notice.
Chris Nardi
Comment 24 2018-01-08 08:51:29 PST
(In reply to Darin Adler from comment #23) > (In reply to Chris Nardi from comment #20) > > > Are we sure this is going to work? What if the first character is a combining sequence like U+0065 LATIN SMALL LETTER E followed by U+0300 COMBINING GRAVE ACCENT? > > > > That's true, that didn't occur to me. I updated the rest of the patch with > > this consideration. > > Thanks for doing that. > > Did you add a test case that includes a grapheme cluster that is more than > one code point? I didn’t notice. The punctuation I included in the test case is two code points long, but I didn't include a test case that involved the first letter itself having multiple code points. I proposed a web platform test upstream (https://github.com/w3c/web-platform-tests/pull/8939) covering this; I could try to import that if it gets merged soon.
Chris Nardi
Comment 25 2018-01-08 12:26:48 PST
Chris Nardi
Comment 26 2018-01-08 12:32:06 PST
Comment on attachment 330650 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=330650&action=review >> Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:240 >> + scanLength += numCharacters - 1; > > The is a strange way to write the loop. I would instead suggest either: > > 1) Removing the ++scanLength from the for loop above and instead writing this here: > > scanLength += numCharacters; > > or > > 2) Defining numCharacters before the loop so it can be used in the for loop and then writing: > > for (unsigned scanLength = length; scanLength < oldText.length(); scanLength += numCharacters) > > Also seems that numCharacters and numCharacterInGraphemeClusters are both not great names. Since these are "code units", not "characters". Should be numCodeUnitsInGraphemeClusters and numCodeUnits. Someone should come back and rename. I implemented your second suggestion. I did not change the function name or the variable name; I figured that would be better with another patch. I also added my proposed WPT testcase as I realized that it could just be reviewed here instead of upstream.
Myles C. Maxfield
Comment 27 2018-01-08 12:35:28 PST
Comment on attachment 330721 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=330721&action=review > LayoutTests/platform/mac-wk2/fast/css/first-letter-punctuation-expected.txt:1 > +layer at (0,0) size 800x600 It's surprising that we need a mac-wk2 specific result. Do you know why we need this file?
Myles C. Maxfield
Comment 28 2018-01-08 12:36:40 PST
Comment on attachment 330721 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=330721&action=review > LayoutTests/platform/ios/fast/css/first-letter-punctuation-expected.txt:38 > + RenderBlock {DIV} at (0,208) size 784x50 > + RenderInline (generated) at (0,0) size 49x41 [color=#008000] > + RenderText at (0,6) size 49x41 > + text run at (0,6) width 49: "\x{D800}\x{DD00}T\x{D800}\x{DD00}" > + RenderText {#text} at (48,24) size 19x19 > + text run at (48,24) width 19: "est" We should try to change the test so it's a reftest instead of a DRT test. But this work shouldn't block this patch from landing.
Chris Nardi
Comment 29 2018-01-08 12:46:30 PST
(In reply to Myles C. Maxfield from comment #27) > Comment on attachment 330721 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=330721&action=review > > > LayoutTests/platform/mac-wk2/fast/css/first-letter-punctuation-expected.txt:1 > > +layer at (0,0) size 800x600 > > It's surprising that we need a mac-wk2 specific result. Do you know why we > need this file? I'm not sure. My only guess was that one shows the glyph whereas the other shows an empty box, but the images matched so I don't think that's it. This is my first contribution to WebKit itself; I believe submitting this to the CQ is the proper step but I'm not sure how to. This will probably also result in failures of fast/css/first-letter-punctuation.html on Win since I wasn't able to rebaseline it there.
EWS Watchlist
Comment 30 2018-01-08 13:29:49 PST Comment hidden (obsolete)
EWS Watchlist
Comment 31 2018-01-08 13:29:51 PST Comment hidden (obsolete)
EWS Watchlist
Comment 32 2018-01-08 13:40:04 PST Comment hidden (obsolete)
EWS Watchlist
Comment 33 2018-01-08 13:40:06 PST Comment hidden (obsolete)
Myles C. Maxfield
Comment 34 2018-01-08 14:00:50 PST
(In reply to Chris Nardi from comment #25) > Created attachment 330721 [details] > Patch Yikes, lots of red. Please fix this before you land it.
EWS Watchlist
Comment 35 2018-01-08 14:04:45 PST Comment hidden (obsolete)
EWS Watchlist
Comment 36 2018-01-08 14:04:47 PST Comment hidden (obsolete)
EWS Watchlist
Comment 37 2018-01-08 14:27:43 PST Comment hidden (obsolete)
EWS Watchlist
Comment 38 2018-01-08 14:27:45 PST Comment hidden (obsolete)
Chris Nardi
Comment 39 2018-01-08 19:09:14 PST
Chris Nardi
Comment 40 2018-01-08 19:41:44 PST
(In reply to Myles C. Maxfield from comment #34) > (In reply to Chris Nardi from comment #25) > > Created attachment 330721 [details] > > Patch > > Yikes, lots of red. Please fix this before you land it. Yeah, forgot to change the original line when copying & pasting. I'm surprised that didn't generated a compiler warning. Since I was creating another patch anyway, I changed fast/css/first-letter-punctuation.html into a reftest. That way, there shouldn't need to be any gardening after this patch lands.
WebKit Commit Bot
Comment 41 2018-01-08 22:41:22 PST
Comment on attachment 330778 [details] Patch Clearing flags on attachment: 330778 Committed r226614: <https://trac.webkit.org/changeset/226614>
WebKit Commit Bot
Comment 42 2018-01-08 22:41:24 PST
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 43 2018-01-08 22:45:55 PST
Darin Adler
Comment 44 2018-01-08 22:46:44 PST
Comment on attachment 330778 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=330778&action=review > Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:101 > return isSpaceOrNewline(c) || c == noBreakSpace || isPunctuationForFirstLetter(c); Oops, I just spotted a bug here! The isSpaceOrNewline function takes a UChar, not a UChar32, so calling it will chop off the low 16 bits of "c" and possibly return true when the character is not a space or newline. The fix is to change isSpaceOrNewline to take a UChar32.
Darin Adler
Comment 45 2018-01-08 22:50:23 PST
Comment on attachment 330778 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=330778&action=review >> Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:101 >> return isSpaceOrNewline(c) || c == noBreakSpace || isPunctuationForFirstLetter(c); > > Oops, I just spotted a bug here! > > The isSpaceOrNewline function takes a UChar, not a UChar32, so calling it will chop off the low 16 bits of "c" and possibly return true when the character is not a space or newline. The fix is to change isSpaceOrNewline to take a UChar32. That might not work if anyone is passing the isSpaceOrNewline function as a CodeUnitMatchFunction. So I don’t know exactly what the fix is, but there is definitely a bug here.
Note You need to log in before you can comment on or make changes to this bug.