Bug 49876

Summary: Optimize matching of descendant selectors
Product: WebKit Reporter: Simon Fraser (smfr) <simon.fraser@apple.com>
Component: CSSAssignee: Antti Koivisto <koivisto@iki.fi>
Status: RESOLVED FIXED    
Severity: Normal CC: alex@spamcop.net, hyatt@apple.com, jamesr@chromium.org, koivisto@iki.fi, mihaip@chromium.org, peter@chromium.org, psolanki@apple.com, sam@webkit.org, simon.fraser@apple.com, tonikitoo@webkit.org, tonyg@chromium.org, webkit.review.bot@gmail.com
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Mac OS X 10.5   
Attachments:
Description Flags
patch hyatt: review+

Description From 2010-11-20 21:20:46 PST
Pages with selectors like:
.foo .bar *
can spend a lot of time in style resolution, because these types of selectors require walking back up the tree when matching every node.

A possible optimization here would be to keep track of which parts of the selectors have been matched while walking down the tree in recalcStyle.
------- Comment #1 From 2010-12-15 10:28:06 PST -------
<rdar://problem/8772822>
------- Comment #2 From 2011-02-03 16:07:02 PST -------
Created an attachment (id=81133) [details]
patch

During style recalculation, maintain a hash of tags, ids and classes seen in ancestor elements. Use the hash to quickly reject descendant and child selectors when doing style matching.

This speeds up style recalculations 3-6x on major web sites.
------- Comment #3 From 2011-02-03 16:32:51 PST -------
Does this have overhead when the page isn't using descendant selectors?  I don't want to penalize pages who write good selectors just to optimize bad ones.  What's the hit/miss rate like on this cache?
------- Comment #4 From 2011-02-03 16:56:10 PST -------
Third rule on google.com stylesheet: .gac_m td{line-height:17px}

Also check out our default stylesheets.
------- Comment #5 From 2011-02-03 16:57:37 PST -------
Do we have good selector test coverage for this this may break? I'm thinking of selectors like:
.foo div .foo .bar
where greediness may trip you up.
------- Comment #6 From 2011-02-03 16:58:29 PST -------
Dibs.  Will review later tonight.
------- Comment #7 From 2011-02-03 17:00:44 PST -------
(In reply to comment #5)
> Do we have good selector test coverage for this this may break? I'm thinking of selectors like:
> .foo div .foo .bar
> where greediness may trip you up.

Judging from the number of failure case it picked up, we have pretty good coverage. Modulo bugs, this shouldn't be tripped by anything.
------- Comment #8 From 2011-02-03 17:01:06 PST -------
(In reply to comment #4)
> Third rule on google.com stylesheet: .gac_m td{line-height:17px}
> 
> Also check out our default stylesheets.

I know descendant selectors are common, but that doesn't really answer my question.  Consider the size of the DOM for google.com - it's tiny.  Pages with significantly larger DOMs also tend to more aggressively avoid bad selectors (assuming the authors are savvy), and we definitely don't want to penalize those pages.  On google.com the overhead from this is probably pretty tiny.
------- Comment #9 From 2011-02-03 17:03:19 PST -------
(From update of attachment 81133 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=81133&action=review

> Source/WebCore/css/CSSStyleSelector.cpp:364
> +    void collectIdentifiersInDescendantSelectors();

collectDescendantSelectorIdentifiers()?

> Source/WebCore/css/CSSStyleSelector.cpp:370
> +    static const unsigned maximumIdentifierCount = 4;
> +    const unsigned* identifiersInDescendantSelectors() const { return m_identifiersInDescendantSelectors; }

How do you know that '4' is a good number? Maybe use a Vector would be clearer, and a typedef for the unsigned (which I guess is really a hash) would be even clearer.

> Source/WebCore/css/CSSStyleSelector.cpp:652
> +static inline void addContextIdentifiersToVector(Element* element, Vector<unsigned, 4>& contextIdentifiers)

Do you really need the , 4 here?

> Source/WebCore/css/CSSStyleSelector.cpp:654
> +    // Mix tags, class names and ids into sort of tasty bouillabaisse.

This comment, though amusing, doesn't really help me to understand the code.

> Source/WebCore/css/CSSStyleSelector.cpp:671
> +    // There are all kinds of vacky special cases where the style recalc may temporarily branch to some random elements.

s/vacky/wacky

> Source/WebCore/css/CSSStyleSelector.cpp:672
> +    // We could fix up the stack but that seems like a hassle.

Maybe use a FIXME: make this work in more unusual cases (e.g. when....)

> Source/WebCore/css/CSSStyleSelector.cpp:680
> +    m_parentStack.append(ParentStackFrame(parent));

I'd like a blank line above this one.

> Source/WebCore/css/CSSStyleSelector.cpp:683
> +    size_t size = contextFrame.identifiers.size();

I'd prefer count to size.

> Source/WebCore/dom/Element.cpp:81
> +    ~StyleSelectorParentPusher() { if (m_didPush) m_styleSelector->popParent(m_parent); }

Please unwrap this.
------- Comment #10 From 2011-02-03 17:07:49 PST -------
(In reply to comment #8)
> I know descendant selectors are common, but that doesn't really answer my question.  Consider the size of the DOM for google.com - it's tiny.  Pages with significantly larger DOMs also tend to more aggressively avoid bad selectors (assuming the authors are savvy), and we definitely don't want to penalize those pages.  On google.com the overhead from this is probably pretty tiny.

Anecdotally, on Google Reader we knew that descendant selectors were bad and we had tooling to calculate their approximate cost and warn if they were out of hand, but we never removed them entirely (i.e. the only thing we outright forbade was *). I believe Gmail and Spreadsheets are the same. So I don't think it's safe to assume that savvy web developers will not have descendant selectors.
------- Comment #11 From 2011-02-03 17:17:29 PST -------
> > Source/WebCore/css/CSSStyleSelector.cpp:370
> > +    static const unsigned maximumIdentifierCount = 4;
> > +    const unsigned* identifiersInDescendantSelectors() const { return m_identifiersInDescendantSelectors; }
> 
> How do you know that '4' is a good number? Maybe use a Vector would be clearer, and a typedef for the unsigned (which I guess is really a hash) would be even clearer.

It was based on looking through a number of stylesheets on popular web sites. 2 or 3 might still give good enough coverage but this can be adjusted later. Much more (or a Vector!) is a bad idea as these data structures are significant portion of our memory usage. 

We consistently use "unsigned" for hash through the codebase.

> > Source/WebCore/css/CSSStyleSelector.cpp:652
> > +static inline void addContextIdentifiersToVector(Element* element, Vector<unsigned, 4>& contextIdentifiers)
> 
> Do you really need the , 4 here?

That avoids most of the unnecessary mallocs. 4 is enough to fit the tag name and some classes. This is not an important structure in terms of for memory usage, it could be more too.

> This comment, though amusing, doesn't really help me to understand the code.

It is my contribution to the recent important webkit-dev discussions.
------- Comment #12 From 2011-02-03 17:22:29 PST -------
(In reply to comment #11)
> > > Source/WebCore/css/CSSStyleSelector.cpp:370
> > > +    static const unsigned maximumIdentifierCount = 4;
> > > +    const unsigned* identifiersInDescendantSelectors() const { return m_identifiersInDescendantSelectors; }
> > 
> > How do you know that '4' is a good number? Maybe use a Vector would be clearer, and a typedef for the unsigned (which I guess is really a hash) would be even clearer.
> 
> It was based on looking through a number of stylesheets on popular web sites. 2 or 3 might still give good enough coverage but this can be adjusted later. Much more (or a Vector!) is a bad idea as these data structures are significant portion of our memory usage. 

A Vector of fixed capacity shouldn't have any additional overhead.

> We consistently use "unsigned" for hash through the codebase.
> 
> > > Source/WebCore/css/CSSStyleSelector.cpp:652
> > > +static inline void addContextIdentifiersToVector(Element* element, Vector<unsigned, 4>& contextIdentifiers)
> > 
> > Do you really need the , 4 here?
> 
> That avoids most of the unnecessary mallocs. 4 is enough to fit the tag name and some classes. This is not an important structure in terms of for memory usage, it could be more too.

But here you're just passing a reference to a Vector.
------- Comment #13 From 2011-02-03 17:25:40 PST -------
> A Vector of fixed capacity shouldn't have any additional overhead.

size_t m_size; <-----
Buffer m_buffer;

> But here you're just passing a reference to a Vector.

I prefer my code to compile.
------- Comment #14 From 2011-02-03 18:03:15 PST -------
(In reply to comment #3)
> What's the hit/miss rate like on this cache?

Using this simple dtrace script with a debug build

sudo dtrace -n 'pid$target:WebCore:WebCore??CSSStyleSelector??fastRejectSelector*:return { @[arg1] = count(); }' -p <pid>

I see

nytimes.com
                0            91425
                1           533836

google.com (not logged in)
                0             8589
                1             2447

reader.google.com (logged into my account)
                0            12810
                1            31293
engadget.com
                0            15017
                1           326230

csszengarden.com
                0              412
                1              279
------- Comment #15 From 2011-02-03 20:34:26 PST -------
Attachment 81133 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1

Source/WebCore/dom/Element.cpp:81:  More than one command on the same line in if  [whitespace/parens] [4]
Source/WebCore/css/CSSStyleSelector.h:194:  The parameter name "ruleData" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 2 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
------- Comment #16 From 2011-02-04 09:38:40 PST -------
(From update of attachment 81133 [details])
This looks great to me.  Address Simon's comments/feedback and r=me.
------- Comment #17 From 2011-02-05 03:25:21 PST -------
http://trac.webkit.org/changeset/77740

(with Simon's comments implemented and some additional cleanups)
------- Comment #18 From 2011-02-05 08:54:33 PST -------
(In reply to comment #10)
> Anecdotally, on Google Reader we knew that descendant selectors were bad

Note that the reason descendant selectors are considered "bad" is that browser implementations have sucked. There is nothing inherently wrong with them.