Bug 20180 - Matching :nth-child selectors is slow
Summary: Matching :nth-child selectors is slow
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: David Smith
URL: http://webkit.org/perf/slickspeed
Keywords: Performance
Depends on:
Blocks:
 
Reported: 2008-07-26 12:35 PDT by David Smith
Modified: 2008-09-15 23:16 PDT (History)
2 users (show)

See Also:


Attachments
Exploratory patch (8.63 KB, patch)
2008-09-08 03:08 PDT, David Smith
no flags Details | Formatted Diff | Diff
Another unsatisfactory but working patch (783 bytes, patch)
2008-09-10 19:22 PDT, David Smith
no flags Details | Formatted Diff | Diff
Better patch (10.95 KB, patch)
2008-09-10 22:54 PDT, David Smith
no flags Details | Formatted Diff | Diff
Addresses a few pseudoreview comments (11.62 KB, patch)
2008-09-10 23:50 PDT, David Smith
darin: review+
Details | Formatted Diff | Diff
Cache an+b parsing (20.25 KB, patch)
2008-09-11 01:34 PDT, David Smith
no flags Details | Formatted Diff | Diff
Minor improvements (21.67 KB, patch)
2008-09-14 15:10 PDT, David Smith
sam: review+
Details | Formatted Diff | Diff
Enable an existing optimization that wasn't being triggered (1.54 KB, patch)
2008-09-15 01:06 PDT, David Smith
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Smith 2008-07-26 12:35:03 PDT
Firefox numbers (nightly build downloaded July 26th): 
div:nth-child(even)	25 ms | 106 found
div:nth-child(2n)	        26 ms | 106 found
div:nth-child(odd)	26 ms | 137 found
div:nth-child(2n+1)	26 ms | 137 found
div:nth-child(n)	        27 ms | 243 found

WebKit numbers (r35363 + patch to fix id optimization)
div:nth-child(even)	37 ms | 106 found
div:nth-child(2n)	        66 ms | 106 found
div:nth-child(odd)	38 ms | 137 found
div:nth-child(2n+1)	61 ms | 137 found
div:nth-child(n)	        54 ms | 243 found
Comment 1 David Smith 2008-09-08 03:08:41 PDT
Created attachment 23244 [details]
Exploratory patch

This patch caches the results of parsing the an+b expression in CSSSelector. The upside is obvious, the downside is that it adds two ints and a 1 bit flag (which should fit in the leftover bit from the enums) to CSSSelector. Please ignore lack of changelog, tabs, etc... this is just a "hey it's 1 AM and this sounds interesting" patch.

Suggestions on ways to avoid increasing the size of CSSSelector are welcome, although the comments indicate that it may be tricky/impossible :/

New numbers (same machine, but now based on r36246)

div:nth-child(even)	36 ms | 106 found
div:nth-child(2n)	36 ms | 106 found
div:nth-child(odd)	37 ms | 137 found
div:nth-child(2n+1)	36 ms | 137 found
div:nth-child(n)	39 ms | 243 found
Comment 2 David Smith 2008-09-10 19:22:26 PDT
Created attachment 23333 [details]
Another unsatisfactory but working patch

This is about a 10% speedup for :nth-child on slickspeed. Unfortunately, it's basically just manually inlining part of Node::renderStyle(). The correct solution is to make renderStyle() non-virtual and have it call a virtual function only in the uncommon case, but for reasons unknown that doesn't speed things up. Attempts to inline it (which presumably would speed things up as this does) have run into circular include issues.
Comment 3 David Smith 2008-09-10 22:54:59 PDT
Created attachment 23336 [details]
Better patch

Thanks to some suggestions from Maciej, this avoids having to do the silly manual inlining thing.
Comment 4 David Smith 2008-09-10 22:58:09 PDT
Oops, forgot to add this: http://dscoder.com/renderstylebenchmark.html is a tiny qsa benchmark that I've been using to check this.
Comment 5 David Smith 2008-09-10 23:50:32 PDT
Created attachment 23337 [details]
Addresses a few pseudoreview comments

Ordered includes, and made a blind but hopefully correct attempt to update the windows project file
Comment 6 David Smith 2008-09-11 01:34:13 PDT
Created attachment 23341 [details]
Cache an+b parsing

This avoids the size hit on CSSSelector by making an :nth-* specific subclass that handles the caching.
Comment 7 Darin Adler 2008-09-11 10:32:46 PDT
Comment on attachment 23337 [details]
Addresses a few pseudoreview comments

This is a great fix!

+    virtual RenderStyle* privateRenderStyle() const;

In other places I've called this sort of function virtualXXX, such as virtualFirstChild. But also, this function isn't really just a private renderStyle() function. It's specifically renderStyle() when m_renderer is 0. I'd suggest a name more like nonRendererRenderStyle().

I'm a little sad that we have to add a whole header just for a single function. Is there a better name for that header that could allow us to use it for more things in the future? It's not scalable to have a separate header for each function.

+    virtual RenderStyle* privateRenderStyle() const { return m_style; }
     String m_value;

I'd like to see a blank line here between the functions and data members.

It's not so great to have these virtual functions defined in the class definition. There's no reason to inline them.

It's a little strange to have this 10% speedup be a patch on a bug saying "Firefox is faster". I'd prefer to not have bugs where the titles are are literally about competitive comparisons, although I think it's great to compare us with other engines to figure out what we should work on. Does this actually make WebKit faster than some version of Gecko? Could we retitle the bug or something.

r=me, but please consider my suggestions
Comment 8 David Smith 2008-09-11 13:12:00 PDT
Bug title changed; I wasn't thinking in terms of it going on a changelog when I filed it ;)

privateRenderStyle() is definitely a lame name for it. mitzpettel and I have been tossing ideas back and forth on #webkit but hadn't come up with anything we liked. nonRendererRenderStyle() is... sorta awful, but at least it describes what it does.

The separate header is there to avoid a circular dependency problem. I'm really not sure how to improve that :/

I'll make the other changes before I commit. Sadly even the combination of these patches doesn't make us faster than Gecko :/ they're still 25-27ms to WebKit's 34-35.
Comment 9 David Smith 2008-09-11 14:56:37 PDT
Committed the renderStyle() inlining patch with the suggested changes in r36339
Comment 10 David Smith 2008-09-14 15:10:55 PDT
Created attachment 23417 [details]
Minor improvements

This gets rid of the copy-pasted headers from CSSSelector since this is a new file, and moves m_parsedNth to CSSSelector since it has a free bit.
Comment 11 David Smith 2008-09-15 01:06:53 PDT
Created attachment 23432 [details]
Enable an existing optimization that wasn't being triggered

Slightly more than a 2x speedup (testing with the other patches applied).
Comment 12 David Smith 2008-09-15 01:46:31 PDT
Comment on attachment 23432 [details]
Enable an existing optimization that wasn't being triggered

A better version of this was committed in http://trac.webkit.org/changeset/36432
Comment 13 Sam Weinig 2008-09-15 22:57:16 PDT
Comment on attachment 23417 [details]
Minor improvements

r=me.  You need to make sure to update all the build systems before landing.  Nice work!
Comment 14 David Smith 2008-09-15 23:16:10 PDT
Committed that change in http://trac.webkit.org/changeset/36485. That brings us to 15-17ms on all the relevant tests on my machine, significantly faster than Firefox :)