Bug 165368 - [CSS Parser] Eliminate in-place lowercasing in the parser.
Summary: [CSS Parser] Eliminate in-place lowercasing in the parser.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dave Hyatt
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-04 11:00 PST by Dave Hyatt
Modified: 2016-12-04 16:36 PST (History)
0 users

See Also:


Attachments
Patch (19.27 KB, patch)
2016-12-04 11:11 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
Patch (19.13 KB, patch)
2016-12-04 11:13 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
Patch (20.67 KB, patch)
2016-12-04 11:42 PST, Dave Hyatt
no flags Details | Formatted Diff | Diff
Patch (21.78 KB, patch)
2016-12-04 12:26 PST, Dave Hyatt
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Hyatt 2016-12-04 11:00:37 PST
[CSS Parser] Eliminate in-place lowercasing in the parser.
Comment 1 Dave Hyatt 2016-12-04 11:11:27 PST
Created attachment 296090 [details]
Patch
Comment 2 Dave Hyatt 2016-12-04 11:13:25 PST
Created attachment 296091 [details]
Patch
Comment 3 Dave Hyatt 2016-12-04 11:42:23 PST
Created attachment 296092 [details]
Patch
Comment 4 Dave Hyatt 2016-12-04 12:26:21 PST
Created attachment 296097 [details]
Patch
Comment 5 Darin Adler 2016-12-04 14:18:36 PST
Comment on attachment 296097 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=296097&action=review

Looks great!

> Source/WebCore/css/CSSSelector.cpp:89
> +    AtomicString value(m_data.m_value);
> +    if (m_data.m_value)
> +        m_data.m_value->deref();

Instead of the three lines above, we should write this:

    AtomicString value { adoptRef(m_data.m_value) };

This avoids some reference count churn.

> Source/WebCore/css/CSSSelector.cpp:90
> +    m_data.m_rareData = &RareData::create(value).leakRef();

Should use WTFMove on the AtomicString value to pass in to avoid a little reference count churn:

    m_data.m_rareData = &RareData::create(WTFMove(value)).leakRef();

> Source/WebCore/css/CSSSelector.cpp:826
> +CSSSelector::RareData::RareData(const AtomicString& value)

Should take AtomicString&& instead of AtomicString. Avoid one round of reference count churn.

> Source/WebCore/css/CSSSelector.h:245
> +        void setValue(const AtomicString&, bool matchLowerCase = false);

I think this needs a different name because "match lower case" sounds like "should I match some lower case thing". I suggest the name shouldConvertToLowercaseForMatching. But maybe you can find a great name that is shorter.

> Source/WebCore/css/CSSSelector.h:353
> +            static Ref<RareData> create(const AtomicString& value) { return adoptRef(*new RareData(value)); }

Should be AtomicString&& instead of const AtomicString& and then use WTFMove inside the call to the constructor.

> Source/WebCore/css/CSSSelector.h:374
> +            RareData(const AtomicString& value);

Should be AtomicString&&. No need for the name "value". Should be marked explicit.

> Source/WebCore/css/CSSSelector.h:468
> +    

Should not add the stray space here.

> Source/WebCore/css/CSSSelector.h:473
> +    if (matchLowerCase && !m_hasRareData && !isASCIILowerCSSIdentifier(value))
> +        createRareData();

Here’s how I would write this:

    AtomicString matchingValue = matchLowerCase ? value.convertToASCIILowercase() : value;
    if (!m_hasRareData && matchingValue != value)
        createRareData();

Here’s why: convertToASCIILowercase is already optimized for the case where there are no uppercase characters, and so it should be fast in that case, possibly even faster than isASCIILowerCSSIdentifier. And in the case where we already have rare data, this was something we were going to do below anyway. Then below it would be this:

    m_data.m_rareData->m_matchingValue = WTFMove(matchingValue);

Doing that avoids adding reference count churn just because we had a temporary AtomicString.

> Source/WebCore/css/MediaQueryExp.cpp:296
>      : m_mediaFeature(feature)
>      , m_isValid(false)
>  {
> +    if (!isASCIILowerCSSIdentifier(m_mediaFeature))
> +        m_mediaFeature = m_mediaFeature.convertToASCIILowercase();

Calling isASCIILowerCSSIdentifier isn’t an effective optimization. AtomicString::convertToASCIILowercase already has a fast path for when nothing is uppercase. It should just be written like this:

    : m_mediaFeature(feature.convertToASCIILowercase())

No need for the if statement.

> Source/WebCore/css/parser/CSSParserIdioms.h:67
> +inline bool isASCIILowerCSSIdentifier(const AtomicString& value)

This function should just be named containsASCIIUppercase. We might also want to make version with a fast case for 8 bit; this version checks 8-bit every time through the loop.

But more importantly, if you take my advice above, then you don’t need this function at all.

> Source/WebCore/css/parser/CSSPropertyParser.cpp:2127
> +    if (context.isHTMLDocument && !isASCIILowerCSSIdentifier(attrName))
> +        attrName = attrName.convertToASCIILowercase();

Again, the isASCIILowerCSSIdentifier check here is not an optimization. Just write this:

    if (context.isHTMLDocument)
        attrName = attrName.convertToASCIILowercase();

But it’s a tiny bit annoying that we did toAtomicString on a string that could have contained uppercase; there is a chance we will be destroying that AtomicString we made because we make the lowercase one instead. If we care to tighten that case we could probably add a CSSParserToken function that produces a possibly-lowercased AtomicString efficiently and never has to allocate two strings. But that’s probably overdoing it.

> Source/WebCore/css/parser/CSSSelectorParser.cpp:513
> +    auto lowerValue = token.value().toAtomicString();
> +    if (!isASCIILowerCSSIdentifier(lowerValue))
> +        lowerValue.convertToASCIILowercase();
> +    auto value = StringView(lowerValue);

No reason to use AtomicString here, and isASCIILowerCSSIdentifier is not an optimization. So this should be just this:

    auto lowercasedValue = token.value().toString().convertToASCIILowercase();
    auto value = StringView { lowercasedValue };

Same thought as above about the extra temporary string here; the token class could be taught to make a string that is already lowercased rather than doing it after allocating a String. This will be slower than the old code that lowercased in place when uppercase characters are involved, because that code would never allocate two strings.

> Source/WebCore/css/parser/CSSSelectorParser.cpp:523
> +        auto tokenString = value.toString();
>          if (!tokenString.startsWithIgnoringASCIICase("host")) {
>              newValue = value.toString() + "(";
>              value = newValue;

This code allocates a lot of unnecessary strings. This version allocates no StringImpl when it doesn't start with "host" and allocates only one StringImpl when it does. The old code allocated 1 and 3 StringImpl in those cases!

    if (!value.startsWithIgnoringASCIICase(StringVIew { "host" })) {
        newValue = makeString(value, '(');
        value = newValue;
    }

And if we have some reason to wait to optimize it even tighter, we could easily write a startsWithLettersIgnoringASCIICase function that takes a StringView and then we could have written:

    if (!startsLettersWithIgnoringASCIICase(value, "host")) {
        newValue = makeString(value, '(');
        value = newValue;
    }

But that seems like maybe premature optimization whereas the other one above just seems like common sense.
Comment 6 Darin Adler 2016-12-04 14:20:23 PST
Comment on attachment 296097 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=296097&action=review

>> Source/WebCore/css/CSSSelector.h:245
>> +        void setValue(const AtomicString&, bool matchLowerCase = false);
> 
> I think this needs a different name because "match lower case" sounds like "should I match some lower case thing". I suggest the name shouldConvertToLowercaseForMatching. But maybe you can find a great name that is shorter.

Maybe matchLowerCase is not so bad. Or maybe matchAsLowerCase.
Comment 7 Dave Hyatt 2016-12-04 14:58:58 PST
Note that the code involving the "host" check and all that is just temporary and can be removed once we have gotten rid of the old parser. I still made your suggested change to it, but it's living on borrowed time anyway.
Comment 8 Dave Hyatt 2016-12-04 16:36:11 PST
Fixed in r209314.