Bug 135756 - Elements whose contents start with an astral Unicode symbol disappear when CSS `::first-letter` is applied to them
Summary: Elements whose contents start with an astral Unicode symbol disappear when CS...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Myles C. Maxfield
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-08 06:20 PDT by Mathias Bynens
Modified: 2014-08-12 20:54 PDT (History)
14 users (show)

See Also:


Attachments
Reduced test case (221 bytes, text/html)
2014-08-08 06:20 PDT, Mathias Bynens
no flags Details
Patch (3.76 KB, patch)
2014-08-11 13:14 PDT, Myles C. Maxfield
no flags Details | Formatted Diff | Diff
Patch (10.33 KB, patch)
2014-08-12 16:58 PDT, Myles C. Maxfield
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mathias Bynens 2014-08-08 06:20:55 PDT
Created attachment 236279 [details]
Reduced test case

Elements whose contents start with an astral Unicode symbol (such as U+1D306, which I’ll use in examples) disappear when CSS `::first-letter` is applied to them.

For example, the following displays just fine, with and without `::first-letter { background: red; }`:

    <p>Lorem ipsum</p>

But prepend any astral symbol to the element’s contents and it disappears when the `::first-letter` styles are active:

    <p>&#x1D306;Lorem ipsum (1)</p>

Blink has the same bug: https://code.google.com/p/chromium/issues/detail?id=401945
Comment 1 Myles C. Maxfield 2014-08-11 10:48:03 PDT
DumpRenderTree says that we aren't determining the first letter correctly.

      RenderBlock {P} at (0,34) size 784x18
        RenderInline (generated) at (0,0) size 8x18 [bgcolor=#FF0000]
          RenderText {#text} at (0,0) size 8x18
            text run at (0,0) width 8: "\x{F0}"
        RenderText {#text} at (8,0) size 128x18
          text run at (8,0) width 128: "\x{9D}\x{152}\x{2020}Lorem ipsum (1)"
Comment 2 Myles C. Maxfield 2014-08-11 10:48:55 PDT
Whoops, ignore that last. I didn't put a meta charset tag in the document.
Comment 3 Myles C. Maxfield 2014-08-11 10:50:28 PDT
      RenderBlock {P} at (0,34) size 784x18
        RenderInline (generated) at (0,0) size 0x18 [bgcolor=#FF0000]
          RenderText {#text} at (0,0) size 0x18
            text run at (0,0) width 0: "\x{D834}"
        RenderText {#text} at (0,0) size 64x18
          text run at (0,0) width 64: "\x{DF06}Lorem ipsum (1)"
Comment 4 Myles C. Maxfield 2014-08-11 13:14:41 PDT
Created attachment 236395 [details]
Patch
Comment 5 Darin Adler 2014-08-11 22:27:38 PDT
Comment on attachment 236395 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=236395&action=review

We can do better.

> Source/WebCore/rendering/RenderBlock.cpp:-3589
> -        length++;

You should just write something like this:

    length += numCharactersInGraphemeClusters(StringView(oldText).substring(length), 1);

You’ll have to change the numCharactersInGraphemeClusters to take a StringView rather than a String, but otherwise this should do the job.

Maybe you’ll need to performance test to see that’s fast enough, but it will do the right thing.

> Source/WebCore/rendering/RenderBlock.cpp:3589
> +        // FIXME: The surrogate might come from another <span>

It would be more appropriate to say another text node, not <span>:

    // FIXME: The trail surrogate might be in a subsequent text node.

But it seems strange to put the FIXME here when the same issue applies to all the rest of the string processing in this function. It’s not specific to this one part of the algorithm.

> Source/WebCore/rendering/RenderBlock.cpp:3590
> +        // FIXME: Presumably, combining marks should be a part of this, too

I would word this completely differently, but it’s not needed if you take my advice above.

> Source/WebCore/rendering/RenderBlock.cpp:3593
> +            UChar32 codepoint = 0;
> +            U16_NEXT(oldText.characters16(), length, oldText.length(), codepoint);

No need to initialize this to zero. Also no reason to smash two words together: “codepoint”. I would just call this local variable “character”. But you don’t need this code at all.
Comment 6 Myles C. Maxfield 2014-08-12 16:30:45 PDT
Comment on attachment 236395 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=236395&action=review

>> Source/WebCore/rendering/RenderBlock.cpp:-3589
>> -        length++;
> 
> You should just write something like this:
> 
>     length += numCharactersInGraphemeClusters(StringView(oldText).substring(length), 1);
> 
> You’ll have to change the numCharactersInGraphemeClusters to take a StringView rather than a String, but otherwise this should do the job.
> 
> Maybe you’ll need to performance test to see that’s fast enough, but it will do the right thing.

numCharactersInGraphemeClusters is super useful! Thanks for letting me know about it.

Done.

>> Source/WebCore/rendering/RenderBlock.cpp:3589
>> +        // FIXME: The surrogate might come from another <span>
> 
> It would be more appropriate to say another text node, not <span>:
> 
>     // FIXME: The trail surrogate might be in a subsequent text node.
> 
> But it seems strange to put the FIXME here when the same issue applies to all the rest of the string processing in this function. It’s not specific to this one part of the algorithm.

I moved the FIXME to RenderBlock::updateFirstLetter where the assumption is actually made.

>> Source/WebCore/rendering/RenderBlock.cpp:3590
>> +        // FIXME: Presumably, combining marks should be a part of this, too
> 
> I would word this completely differently, but it’s not needed if you take my advice above.

Yup. Done.

>> Source/WebCore/rendering/RenderBlock.cpp:3593
>> +            U16_NEXT(oldText.characters16(), length, oldText.length(), codepoint);
> 
> No need to initialize this to zero. Also no reason to smash two words together: “codepoint”. I would just call this local variable “character”. But you don’t need this code at all.

Yeah - numCharactersInGraphemeClusters fixes this right up. Done.
Comment 7 Myles C. Maxfield 2014-08-12 16:58:36 PDT
Created attachment 236483 [details]
Patch
Comment 8 Darin Adler 2014-08-12 17:49:03 PDT
Comment on attachment 236483 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=236483&action=review

> Source/WebCore/html/HTMLTextAreaElement.cpp:324
> +    return proposedValue.left(numCharactersInGraphemeClusters(StringView(proposedValue), maxLength));

Is the explicit conversion to StringView needed? I’d expect an implicit conversion.

> Source/WebCore/html/TextFieldInputType.cpp:341
> +    unsigned newLength = numCharactersInGraphemeClusters(StringView(string), maxLength);

Is the explicit conversion to StringView needed? I’d expect an implicit conversion.

> Source/WebCore/platform/LocalizedStrings.cpp:85
> +    unsigned numberOfCharacters = numCharactersInGraphemeClusters(StringView(trimmed), maxNumberOfGraphemeClustersInLookupMenuItem);

Is the explicit conversion to StringView needed? I’d expect an implicit conversion.
Comment 9 Darin Adler 2014-08-12 17:49:52 PDT
Comment on attachment 236483 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=236483&action=review

> Source/WebCore/ChangeLog:8
> +        The previous code assumed that all "characters" are exactly 1 16-bit code unit wide. Instead, use U16_NEXT().

Oops, this comment is obsolete since it mentions U16_NEXT.

> Source/WebCore/ChangeLog:12
> +        This patch also modifies the signature of numCharactersInGraphemeClusters() to take a StringView instead
> +        of a string, which will avoid a copy. As soon as there is an implicit StringView constructor which takes a
> +        String, we can revert the changes to these callers.

Oh, I did not see this comment. I thought there was an implicit StringView constructor.
Comment 10 Myles C. Maxfield 2014-08-12 20:50:35 PDT
Comment on attachment 236483 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=236483&action=review

>> Source/WebCore/ChangeLog:8
>> +        The previous code assumed that all "characters" are exactly 1 16-bit code unit wide. Instead, use U16_NEXT().
> 
> Oops, this comment is obsolete since it mentions U16_NEXT.

Done.

>> Source/WebCore/ChangeLog:12
>> +        String, we can revert the changes to these callers.
> 
> Oh, I did not see this comment. I thought there was an implicit StringView constructor.

There is now!

>> Source/WebCore/html/HTMLTextAreaElement.cpp:324
>> +    return proposedValue.left(numCharactersInGraphemeClusters(StringView(proposedValue), maxLength));
> 
> Is the explicit conversion to StringView needed? I’d expect an implicit conversion.

Nope!

>> Source/WebCore/html/TextFieldInputType.cpp:341
>> +    unsigned newLength = numCharactersInGraphemeClusters(StringView(string), maxLength);
> 
> Is the explicit conversion to StringView needed? I’d expect an implicit conversion.

Nope!

>> Source/WebCore/platform/LocalizedStrings.cpp:85
>> +    unsigned numberOfCharacters = numCharactersInGraphemeClusters(StringView(trimmed), maxNumberOfGraphemeClustersInLookupMenuItem);
> 
> Is the explicit conversion to StringView needed? I’d expect an implicit conversion.

Nope!
Comment 11 Myles C. Maxfield 2014-08-12 20:54:35 PDT
http://trac.webkit.org/changeset/172513