165368 – [CSS Parser] Eliminate in-place lowercasing in the parser.

RESOLVED FIXED 165368

[CSS Parser] Eliminate in-place lowercasing in the parser.

https://bugs.webkit.org/show_bug.cgi?id=165368

Summary [CSS Parser] Eliminate in-place lowercasing in the parser.

Dave Hyatt

Reported 2016-12-04 11:00:37 PST

[CSS Parser] Eliminate in-place lowercasing in the parser.

Attachments
Patch (19.27 KB, patch) 2016-12-04 11:11 PST, Dave Hyatt	no flags	Details Formatted Diff Diff
Patch (19.13 KB, patch) 2016-12-04 11:13 PST, Dave Hyatt	no flags	Details Formatted Diff Diff
Patch (20.67 KB, patch) 2016-12-04 11:42 PST, Dave Hyatt	no flags	Details Formatted Diff Diff
Patch (21.78 KB, patch) 2016-12-04 12:26 PST, Dave Hyatt	darin: review+	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Dave Hyatt

Comment 1 2016-12-04 11:11:27 PST

Created attachment 296090 [details] Patch

Dave Hyatt

Comment 2 2016-12-04 11:13:25 PST

Created attachment 296091 [details] Patch

Dave Hyatt

Comment 3 2016-12-04 11:42:23 PST

Created attachment 296092 [details] Patch

Dave Hyatt

Comment 4 2016-12-04 12:26:21 PST

Created attachment 296097 [details] Patch

Darin Adler

Comment 5 2016-12-04 14:18:36 PST

Comment on attachment 296097 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=296097&action=review Looks great! > Source/WebCore/css/CSSSelector.cpp:89 > + AtomicString value(m_data.m_value); > + if (m_data.m_value) > + m_data.m_value->deref(); Instead of the three lines above, we should write this: AtomicString value { adoptRef(m_data.m_value) }; This avoids some reference count churn. > Source/WebCore/css/CSSSelector.cpp:90 > + m_data.m_rareData = &RareData::create(value).leakRef(); Should use WTFMove on the AtomicString value to pass in to avoid a little reference count churn: m_data.m_rareData = &RareData::create(WTFMove(value)).leakRef(); > Source/WebCore/css/CSSSelector.cpp:826 > +CSSSelector::RareData::RareData(const AtomicString& value) Should take AtomicString&& instead of AtomicString. Avoid one round of reference count churn. > Source/WebCore/css/CSSSelector.h:245 > + void setValue(const AtomicString&, bool matchLowerCase = false); I think this needs a different name because "match lower case" sounds like "should I match some lower case thing". I suggest the name shouldConvertToLowercaseForMatching. But maybe you can find a great name that is shorter. > Source/WebCore/css/CSSSelector.h:353 > + static Ref<RareData> create(const AtomicString& value) { return adoptRef(*new RareData(value)); } Should be AtomicString&& instead of const AtomicString& and then use WTFMove inside the call to the constructor. > Source/WebCore/css/CSSSelector.h:374 > + RareData(const AtomicString& value); Should be AtomicString&&. No need for the name "value". Should be marked explicit. > Source/WebCore/css/CSSSelector.h:468 > + Should not add the stray space here. > Source/WebCore/css/CSSSelector.h:473 > + if (matchLowerCase && !m_hasRareData && !isASCIILowerCSSIdentifier(value)) > + createRareData(); Here’s how I would write this: AtomicString matchingValue = matchLowerCase ? value.convertToASCIILowercase() : value; if (!m_hasRareData && matchingValue != value) createRareData(); Here’s why: convertToASCIILowercase is already optimized for the case where there are no uppercase characters, and so it should be fast in that case, possibly even faster than isASCIILowerCSSIdentifier. And in the case where we already have rare data, this was something we were going to do below anyway. Then below it would be this: m_data.m_rareData->m_matchingValue = WTFMove(matchingValue); Doing that avoids adding reference count churn just because we had a temporary AtomicString. > Source/WebCore/css/MediaQueryExp.cpp:296 > : m_mediaFeature(feature) > , m_isValid(false) > { > + if (!isASCIILowerCSSIdentifier(m_mediaFeature)) > + m_mediaFeature = m_mediaFeature.convertToASCIILowercase(); Calling isASCIILowerCSSIdentifier isn’t an effective optimization. AtomicString::convertToASCIILowercase already has a fast path for when nothing is uppercase. It should just be written like this: : m_mediaFeature(feature.convertToASCIILowercase()) No need for the if statement. > Source/WebCore/css/parser/CSSParserIdioms.h:67 > +inline bool isASCIILowerCSSIdentifier(const AtomicString& value) This function should just be named containsASCIIUppercase. We might also want to make version with a fast case for 8 bit; this version checks 8-bit every time through the loop. But more importantly, if you take my advice above, then you don’t need this function at all. > Source/WebCore/css/parser/CSSPropertyParser.cpp:2127 > + if (context.isHTMLDocument && !isASCIILowerCSSIdentifier(attrName)) > + attrName = attrName.convertToASCIILowercase(); Again, the isASCIILowerCSSIdentifier check here is not an optimization. Just write this: if (context.isHTMLDocument) attrName = attrName.convertToASCIILowercase(); But it’s a tiny bit annoying that we did toAtomicString on a string that could have contained uppercase; there is a chance we will be destroying that AtomicString we made because we make the lowercase one instead. If we care to tighten that case we could probably add a CSSParserToken function that produces a possibly-lowercased AtomicString efficiently and never has to allocate two strings. But that’s probably overdoing it. > Source/WebCore/css/parser/CSSSelectorParser.cpp:513 > + auto lowerValue = token.value().toAtomicString(); > + if (!isASCIILowerCSSIdentifier(lowerValue)) > + lowerValue.convertToASCIILowercase(); > + auto value = StringView(lowerValue); No reason to use AtomicString here, and isASCIILowerCSSIdentifier is not an optimization. So this should be just this: auto lowercasedValue = token.value().toString().convertToASCIILowercase(); auto value = StringView { lowercasedValue }; Same thought as above about the extra temporary string here; the token class could be taught to make a string that is already lowercased rather than doing it after allocating a String. This will be slower than the old code that lowercased in place when uppercase characters are involved, because that code would never allocate two strings. > Source/WebCore/css/parser/CSSSelectorParser.cpp:523 > + auto tokenString = value.toString(); > if (!tokenString.startsWithIgnoringASCIICase("host")) { > newValue = value.toString() + "("; > value = newValue; This code allocates a lot of unnecessary strings. This version allocates no StringImpl when it doesn't start with "host" and allocates only one StringImpl when it does. The old code allocated 1 and 3 StringImpl in those cases! if (!value.startsWithIgnoringASCIICase(StringVIew { "host" })) { newValue = makeString(value, '('); value = newValue; } And if we have some reason to wait to optimize it even tighter, we could easily write a startsWithLettersIgnoringASCIICase function that takes a StringView and then we could have written: if (!startsLettersWithIgnoringASCIICase(value, "host")) { newValue = makeString(value, '('); value = newValue; } But that seems like maybe premature optimization whereas the other one above just seems like common sense.

Darin Adler

Comment 6 2016-12-04 14:20:23 PST

Comment on attachment 296097 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=296097&action=review >> Source/WebCore/css/CSSSelector.h:245 >> + void setValue(const AtomicString&, bool matchLowerCase = false); > > I think this needs a different name because "match lower case" sounds like "should I match some lower case thing". I suggest the name shouldConvertToLowercaseForMatching. But maybe you can find a great name that is shorter. Maybe matchLowerCase is not so bad. Or maybe matchAsLowerCase.

Dave Hyatt

Comment 7 2016-12-04 14:58:58 PST

Note that the code involving the "host" check and all that is just temporary and can be removed once we have gotten rid of the old parser. I still made your suggested change to it, but it's living on borrowed time anyway.

Dave Hyatt

Comment 8 2016-12-04 16:36:11 PST

Fixed in r209314.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Dave Hyatt

Reported

2016-12-04 11:00 PST

Modified

2016-12-04 16:36 PST History

CC List

0 users

URL

Keywords

Depends on

Blocks