Bug 181315 - ::first-letter incorrectly selects grapheme pairs
Summary: ::first-letter incorrectly selects grapheme pairs
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-01-04 19:24 PST by Chris Nardi
Modified: 2018-01-08 22:50 PST (History)
6 users (show)

See Also:


Attachments
Patch (4.65 KB, patch)
2018-01-04 19:29 PST, Chris Nardi
no flags Details | Formatted Diff | Diff
Patch (4.65 KB, patch)
2018-01-04 19:51 PST, Chris Nardi
no flags Details | Formatted Diff | Diff
Patch (4.65 KB, patch)
2018-01-04 19:52 PST, Chris Nardi
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews102 for mac-elcapitan (2.24 MB, application/zip)
2018-01-04 20:49 PST, EWS Watchlist
no flags Details
Archive of layout-test-results from ews115 for mac-elcapitan (2.95 MB, application/zip)
2018-01-04 20:59 PST, EWS Watchlist
no flags Details
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (2.62 MB, application/zip)
2018-01-04 21:01 PST, EWS Watchlist
no flags Details
Archive of layout-test-results from ews125 for ios-simulator-wk2 (2.20 MB, application/zip)
2018-01-04 21:19 PST, EWS Watchlist
no flags Details
Patch (46.57 KB, patch)
2018-01-05 09:00 PST, Chris Nardi
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (2.66 MB, application/zip)
2018-01-05 10:09 PST, EWS Watchlist
no flags Details
Patch (49.02 KB, patch)
2018-01-05 10:33 PST, Chris Nardi
no flags Details | Formatted Diff | Diff
Patch (49.05 KB, patch)
2018-01-05 22:23 PST, Chris Nardi
no flags Details | Formatted Diff | Diff
Patch (49.21 KB, patch)
2018-01-05 22:34 PST, Chris Nardi
no flags Details | Formatted Diff | Diff
Patch (49.21 KB, patch)
2018-01-06 15:37 PST, Chris Nardi
no flags Details | Formatted Diff | Diff
Patch (55.68 KB, patch)
2018-01-08 12:26 PST, Chris Nardi
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews101 for mac-elcapitan (2.52 MB, application/zip)
2018-01-08 13:29 PST, EWS Watchlist
no flags Details
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (3.05 MB, application/zip)
2018-01-08 13:40 PST, EWS Watchlist
no flags Details
Archive of layout-test-results from ews113 for mac-elcapitan (3.38 MB, application/zip)
2018-01-08 14:04 PST, EWS Watchlist
no flags Details
Archive of layout-test-results from ews123 for ios-simulator-wk2 (23.54 MB, application/zip)
2018-01-08 14:27 PST, EWS Watchlist
no flags Details
Patch (52.85 KB, patch)
2018-01-08 19:09 PST, Chris Nardi
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Nardi 2018-01-04 19:24:24 PST
::first-letter incorrectly selects grapheme pairs
Comment 1 Chris Nardi 2018-01-04 19:29:29 PST Comment hidden (obsolete)
Comment 2 Chris Nardi 2018-01-04 19:51:02 PST Comment hidden (obsolete)
Comment 3 Chris Nardi 2018-01-04 19:52:53 PST Comment hidden (obsolete)
Comment 4 EWS Watchlist 2018-01-04 20:49:55 PST Comment hidden (obsolete)
Comment 5 EWS Watchlist 2018-01-04 20:49:56 PST Comment hidden (obsolete)
Comment 6 EWS Watchlist 2018-01-04 20:59:19 PST Comment hidden (obsolete)
Comment 7 EWS Watchlist 2018-01-04 20:59:20 PST Comment hidden (obsolete)
Comment 8 EWS Watchlist 2018-01-04 21:01:36 PST Comment hidden (obsolete)
Comment 9 EWS Watchlist 2018-01-04 21:01:37 PST Comment hidden (obsolete)
Comment 10 EWS Watchlist 2018-01-04 21:19:38 PST Comment hidden (obsolete)
Comment 11 EWS Watchlist 2018-01-04 21:19:39 PST Comment hidden (obsolete)
Comment 12 Chris Nardi 2018-01-05 09:00:50 PST Comment hidden (obsolete)
Comment 13 EWS Watchlist 2018-01-05 10:09:34 PST Comment hidden (obsolete)
Comment 14 EWS Watchlist 2018-01-05 10:09:35 PST Comment hidden (obsolete)
Comment 15 Chris Nardi 2018-01-05 10:33:30 PST Comment hidden (obsolete)
Comment 16 Chris Nardi 2018-01-05 13:24:47 PST
I think this is ready for review. All bots seem to be passing except for wincairom, which for some reason cannot apply the patch (even though the others can). I'm sure there's more rebaselineing that needs to occur, but I've just been working off of what the bots provide and GTK/Win don't seem to be providing any new expectation files.

This was found from CSS2 tests in WPT (https://github.com/w3c/web-platform-tests/blob/master/css/CSS2/selectors/first-letter-punctuation-335.xht). I used that test as the basis for what I included in fast/css/first-letter-punctuation.html.

A similar version of this patch was also submitted to Chromium (https://chromium-review.googlesource.com/c/chromium/src/+/847293).
Comment 17 Myles C. Maxfield 2018-01-05 15:43:39 PST
Comment on attachment 330557 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=330557&action=review

> Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:225
> +        length += U16_LENGTH(oldText.characterStartingAt(length));

Are we sure this is going to work? What if the first character is a combining sequence like U+0065 LATIN SMALL LETTER E followed by U+0300 COMBINING GRAVE ACCENT?
Comment 18 Chris Nardi 2018-01-05 22:23:11 PST Comment hidden (obsolete)
Comment 19 Chris Nardi 2018-01-05 22:34:07 PST Comment hidden (obsolete)
Comment 20 Chris Nardi 2018-01-05 22:35:35 PST
Comment on attachment 330557 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=330557&action=review

>> Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:225
>> +        length += U16_LENGTH(oldText.characterStartingAt(length));
> 
> Are we sure this is going to work? What if the first character is a combining sequence like U+0065 LATIN SMALL LETTER E followed by U+0300 COMBINING GRAVE ACCENT?

That's true, that didn't occur to me. I updated the rest of the patch with this consideration.
Comment 21 Chris Nardi 2018-01-06 15:37:56 PST
Created attachment 330650 [details]
Patch
Comment 22 Darin Adler 2018-01-07 22:36:22 PST
Comment on attachment 330650 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=330650&action=review

> Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:240
> +            if (numCharacters > 1)
> +                scanLength += numCharacters - 1;

The is a strange way to write the loop. I would instead suggest either:

1) Removing the ++scanLength from the for loop above and instead writing this here:

    scanLength += numCharacters;

or

2) Defining numCharacters before the loop so it can be used in the for loop and then writing:

    for (unsigned scanLength = length; scanLength < oldText.length(); scanLength += numCharacters)

Also seems that numCharacters and numCharacterInGraphemeClusters are both not great names. Since these are "code units", not "characters". Should be numCodeUnitsInGraphemeClusters and numCodeUnits. Someone should come back and rename.
Comment 23 Darin Adler 2018-01-07 22:39:14 PST
(In reply to Chris Nardi from comment #20)
> > Are we sure this is going to work? What if the first character is a combining sequence like U+0065 LATIN SMALL LETTER E followed by U+0300 COMBINING GRAVE ACCENT?
> 
> That's true, that didn't occur to me. I updated the rest of the patch with
> this consideration.

Thanks for doing that.

Did you add a test case that includes a grapheme cluster that is more than one code point? I didn’t notice.
Comment 24 Chris Nardi 2018-01-08 08:51:29 PST
(In reply to Darin Adler from comment #23)
> (In reply to Chris Nardi from comment #20)
> > > Are we sure this is going to work? What if the first character is a combining sequence like U+0065 LATIN SMALL LETTER E followed by U+0300 COMBINING GRAVE ACCENT?
> > 
> > That's true, that didn't occur to me. I updated the rest of the patch with
> > this consideration.
> 
> Thanks for doing that.
> 
> Did you add a test case that includes a grapheme cluster that is more than
> one code point? I didn’t notice.


The punctuation I included in the test case is two code points long, but I didn't include a test case that involved the first letter itself having multiple code points. I proposed a web platform test upstream (https://github.com/w3c/web-platform-tests/pull/8939) covering this; I could try to import that if it gets merged soon.
Comment 25 Chris Nardi 2018-01-08 12:26:48 PST
Created attachment 330721 [details]
Patch
Comment 26 Chris Nardi 2018-01-08 12:32:06 PST
Comment on attachment 330650 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=330650&action=review

>> Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:240
>> +                scanLength += numCharacters - 1;
> 
> The is a strange way to write the loop. I would instead suggest either:
> 
> 1) Removing the ++scanLength from the for loop above and instead writing this here:
> 
>     scanLength += numCharacters;
> 
> or
> 
> 2) Defining numCharacters before the loop so it can be used in the for loop and then writing:
> 
>     for (unsigned scanLength = length; scanLength < oldText.length(); scanLength += numCharacters)
> 
> Also seems that numCharacters and numCharacterInGraphemeClusters are both not great names. Since these are "code units", not "characters". Should be numCodeUnitsInGraphemeClusters and numCodeUnits. Someone should come back and rename.

I implemented your second suggestion. I did not change the function name or the variable name; I figured that would be better with another patch.

I also added my proposed WPT testcase as I realized that it could just be reviewed here instead of upstream.
Comment 27 Myles C. Maxfield 2018-01-08 12:35:28 PST
Comment on attachment 330721 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=330721&action=review

> LayoutTests/platform/mac-wk2/fast/css/first-letter-punctuation-expected.txt:1
> +layer at (0,0) size 800x600

It's surprising that we need a mac-wk2 specific result. Do you know why we need this file?
Comment 28 Myles C. Maxfield 2018-01-08 12:36:40 PST
Comment on attachment 330721 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=330721&action=review

> LayoutTests/platform/ios/fast/css/first-letter-punctuation-expected.txt:38
> +      RenderBlock {DIV} at (0,208) size 784x50
> +        RenderInline (generated) at (0,0) size 49x41 [color=#008000]
> +          RenderText at (0,6) size 49x41
> +            text run at (0,6) width 49: "\x{D800}\x{DD00}T\x{D800}\x{DD00}"
> +        RenderText {#text} at (48,24) size 19x19
> +          text run at (48,24) width 19: "est"

We should try to change the test so it's a reftest instead of a DRT test. But this work shouldn't block this patch from landing.
Comment 29 Chris Nardi 2018-01-08 12:46:30 PST
(In reply to Myles C. Maxfield from comment #27)
> Comment on attachment 330721 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=330721&action=review
> 
> > LayoutTests/platform/mac-wk2/fast/css/first-letter-punctuation-expected.txt:1
> > +layer at (0,0) size 800x600
> 
> It's surprising that we need a mac-wk2 specific result. Do you know why we
> need this file?

I'm not sure. My only guess was that one shows the glyph whereas the other shows an empty box, but the images matched so I don't think that's it.

This is my first contribution to WebKit itself; I believe submitting this to the CQ is the proper step but I'm not sure how to.

This will probably also result in failures of fast/css/first-letter-punctuation.html on Win since I wasn't able to rebaseline it there.
Comment 30 EWS Watchlist 2018-01-08 13:29:49 PST Comment hidden (obsolete)
Comment 31 EWS Watchlist 2018-01-08 13:29:51 PST Comment hidden (obsolete)
Comment 32 EWS Watchlist 2018-01-08 13:40:04 PST Comment hidden (obsolete)
Comment 33 EWS Watchlist 2018-01-08 13:40:06 PST Comment hidden (obsolete)
Comment 34 Myles C. Maxfield 2018-01-08 14:00:50 PST
(In reply to Chris Nardi from comment #25)
> Created attachment 330721 [details]
> Patch

Yikes, lots of red. Please fix this before you land it.
Comment 35 EWS Watchlist 2018-01-08 14:04:45 PST Comment hidden (obsolete)
Comment 36 EWS Watchlist 2018-01-08 14:04:47 PST Comment hidden (obsolete)
Comment 37 EWS Watchlist 2018-01-08 14:27:43 PST Comment hidden (obsolete)
Comment 38 EWS Watchlist 2018-01-08 14:27:45 PST Comment hidden (obsolete)
Comment 39 Chris Nardi 2018-01-08 19:09:14 PST
Created attachment 330778 [details]
Patch
Comment 40 Chris Nardi 2018-01-08 19:41:44 PST
(In reply to Myles C. Maxfield from comment #34)
> (In reply to Chris Nardi from comment #25)
> > Created attachment 330721 [details]
> > Patch
> 
> Yikes, lots of red. Please fix this before you land it.

Yeah, forgot to change the original line when copying & pasting. I'm surprised that didn't generated a compiler warning.

Since I was creating another patch anyway, I changed fast/css/first-letter-punctuation.html into a reftest. That way, there shouldn't need to be any gardening after this patch lands.
Comment 41 WebKit Commit Bot 2018-01-08 22:41:22 PST
Comment on attachment 330778 [details]
Patch

Clearing flags on attachment: 330778

Committed r226614: <https://trac.webkit.org/changeset/226614>
Comment 42 WebKit Commit Bot 2018-01-08 22:41:24 PST
All reviewed patches have been landed.  Closing bug.
Comment 43 Radar WebKit Bug Importer 2018-01-08 22:45:55 PST
<rdar://problem/36369465>
Comment 44 Darin Adler 2018-01-08 22:46:44 PST
Comment on attachment 330778 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=330778&action=review

> Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:101
>      return isSpaceOrNewline(c) || c == noBreakSpace || isPunctuationForFirstLetter(c);

Oops, I just spotted a bug here!

The isSpaceOrNewline function takes a UChar, not a UChar32, so calling it will chop off the low 16 bits of "c" and possibly return true when the character is not a space or newline. The fix is to change isSpaceOrNewline to take a UChar32.
Comment 45 Darin Adler 2018-01-08 22:50:23 PST
Comment on attachment 330778 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=330778&action=review

>> Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:101
>>      return isSpaceOrNewline(c) || c == noBreakSpace || isPunctuationForFirstLetter(c);
> 
> Oops, I just spotted a bug here!
> 
> The isSpaceOrNewline function takes a UChar, not a UChar32, so calling it will chop off the low 16 bits of "c" and possibly return true when the character is not a space or newline. The fix is to change isSpaceOrNewline to take a UChar32.

That might not work if anyone is passing the isSpaceOrNewline function as a CodeUnitMatchFunction. So I don’t know exactly what the fix is, but there is definitely a bug here.