Bug 74004

Summary: Web Inspector: Implement CSS selector profiler
Product: WebKit Reporter: Alexander Pavlov (apavlov) <apavlov>
Component: Web Inspector (Deprecated)Assignee: Alexander Pavlov (apavlov) <apavlov>
Status: RESOLVED FIXED    
Severity: Normal CC: apavlov, bweinstein, darin, hyatt, joepeck, kangax, keishi, kling, koivisto, komoroske, loislo, macpherson, mathias, paulirish, peter, pfeldman, pmuellr, rik, rniwa, timothy, webkit.review.bot, yurys
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 74269, 74391, 74603, 75228    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
[IMAGE] Screenshot of a CSS selector profile
none
[PATCH] Fix Mac compilability
none
[PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching
none
Patch
none
Patch
none
Patch pfeldman: review+

Description Alexander Pavlov (apavlov) 2011-12-07 09:12:28 PST
Patch to follow
Comment 1 Alexander Pavlov (apavlov) 2011-12-07 09:30:31 PST
Created attachment 118218 [details]
Patch
Comment 2 Alexander Pavlov (apavlov) 2011-12-07 09:38:13 PST
Created attachment 118220 [details]
[IMAGE] Screenshot of a CSS selector profile

- Percentages or absolute values can be displayed
- The profile can be sorted and searched by all columns, like CPU profiles (see "< 10 matches" in the Search field)
- The profile name reflects the aggregate time of all selector matching during profiling
- The "Start/stop CSS profiling" button is far from perfect, I know :)

Please speak up if you have any questions, comments, concerns!..
Comment 3 Alexander Pavlov (apavlov) 2011-12-07 09:45:40 PST
Created attachment 118221 [details]
[PATCH] Fix Mac compilability
Comment 4 Antti Koivisto 2011-12-07 10:07:53 PST
Comment on attachment 118221 [details]
[PATCH] Fix Mac compilability

View in context: https://bugs.webkit.org/attachment.cgi?id=118221&action=review

I don't quite understand what this patch is measuring. Could you explain what the measured times exactly mean?

> Source/WebCore/css/CSSStyleSelector.cpp:712
>      unsigned size = rules->size();
>      for (unsigned i = 0; i < size; ++i) {
>          const RuleData& ruleData = rules->at(i);
> -        if (canUseFastReject && m_checker.fastRejectSelector<RuleData::maximumIdentifierCount>(ruleData.descendantSelectorIdentifierHashes()))
> +        CSSStyleRule* rule = ruleData.rule();
> +        InspectorInstrumentationCookie cookie = InspectorInstrumentation::willMatchRule(document(), rule);
> +        if (canUseFastReject && m_checker.fastRejectSelector<RuleData::maximumIdentifierCount>(ruleData.descendantSelectorIdentifierHashes())) {
> +            InspectorInstrumentation::didMatchRule(cookie, false);
>              continue;
> +        }

This loop is super hot and the change will likely regress performance. You would proabably need a templated version of the function so you won't affect the normal case at all.
Comment 5 Pavel Feldman 2011-12-07 10:16:01 PST
Comment on attachment 118221 [details]
[PATCH] Fix Mac compilability

Before we introduce a new capability / perspective / functionality, it would be great to figure out what it measures and whether it is useful to the end users. I can see selector matching taking single digit milliseconds. I know from the timeline panel that layout takes hundres of milliseconds or even seconds.

I don't think we should expose this type of profiling, but should rather figure out how to help developers to figure out why layout is so slow. I am putting r- here unless we figure out whether we need this type of profile.
Comment 6 Timothy Hatcher 2011-12-07 13:57:37 PST
FWIW, Opera is adding this to Dragonfly.

http://my.opera.com/dragonfly/blog/style-profiler-preview
Comment 7 Luke Macpherson 2011-12-07 14:35:50 PST
@Pavel: In all the webkit profiling I've done, CSS selector matching is the hottest code in the browser for page load. This profiling information looks like it would help web authors improve their CSS rules in ways that would significantly improve page load times.

Before I started working on webkit I did a lot of work on Google Wave's CSS - at that time having access to a tool like this would have made a huge difference to us. Back then I knew the general principles of what the browser was doing and roughly which selectors to optimize, but a tool like this would make that process much simpler and accessible to many more web developers.
Comment 8 Pavel Feldman 2011-12-07 15:19:19 PST
(In reply to comment #6)
> FWIW, Opera is adding this to Dragonfly.
> 
> http://my.opera.com/dragonfly/blog/style-profiler-preview

That's why I asked Alexander to look into it originally.

(In reply to comment #7)
> @Pavel: In all the webkit profiling I've done, CSS selector matching is the hottest code in the browser for page load. This profiling information looks like it would help web authors improve their CSS rules in ways that would significantly improve page load times.
> 
> Before I started working on webkit I did a lot of work on Google Wave's CSS - at that time having access to a tool like this would have made a huge difference to us. Back then I knew the general principles of what the browser was doing and roughly which selectors to optimize, but a tool like this would make that process much simpler and accessible to many more web developers.

I am all for the useful tools, but as long as I see matching styles taking 10ms followed by a layout that takes 500ms I can't think of it as of useful tool. Either this profiler is broken or it exposes neglectable data. Could you suggest Alexander a site or use case that would end up with the numbers that are worth optimizing?
Comment 9 Antti Koivisto 2011-12-07 15:27:21 PST
(In reply to comment #6)
> FWIW, Opera is adding this to Dragonfly.
> 
> http://my.opera.com/dragonfly/blog/style-profiler-preview

Based on that blog post the feature is mostly useful because some basic selector types in their engine are not particularly fast and need workarounds.
Comment 10 Luke Macpherson 2011-12-07 16:16:41 PST
(In reply to comment #8)
> I am all for the useful tools, but as long as I see matching styles taking 10ms followed by a layout that takes 500ms I can't think of it as of useful tool. Either this profiler is broken or it exposes neglectable data. Could you suggest Alexander a site or use case that would end up with the numbers that are worth optimizing?

Where are you getting those numbers? It sounds like you're comparing matching a single rule to laying out the entire document. Even the simple page in the screenshot is showing over 100 ms, and I can see from the number of rules that it's not a particularly complex case.

I would suggest something like http://www.w3.org/TR/2011/WD-html5-20110405/Overview.html is a good large page with simple layout for testing.

Keep in mind that this kind of profiling is particularly useful even without even considering how many ms are spent. When optimizing it is very common to look at a selector and wonder how expensive it is - for example many developers just don't realize that doing ".foo .bar div" is particularly expensive, because they think about and read selectors left-to-right. They don't consider that this has the potential to match this every div in the document. Just showing them that these are the expensive selectors on the page is very useful. And these are costs that you can't really optimize out of the browser - it is trivial in CSS to generate selectors that are very expensive to handle in all browsers.
Comment 11 Luke Macpherson 2011-12-07 19:27:11 PST
(In reply to comment #10)
> (In reply to comment #8)
> > I am all for the useful tools, but as long as I see matching styles taking 10ms followed by a layout that takes 500ms I can't think of it as of useful tool. Either this profiler is broken or it exposes neglectable data. Could you suggest Alexander a site or use case that would end up with the numbers that are worth optimizing?
> 
> Where are you getting those numbers? It sounds like you're comparing matching a single rule to laying out the entire document. Even the simple page in the screenshot is showing over 100 ms, and I can see from the number of rules that it's not a particularly complex case.
> 
> I would suggest something like http://www.w3.org/TR/2011/WD-html5-20110405/Overview.html is a good large page with simple layout for testing.
> 
> Keep in mind that this kind of profiling is particularly useful even without even considering how many ms are spent. When optimizing it is very common to look at a selector and wonder how expensive it is - for example many developers just don't realize that doing ".foo .bar div" is particularly expensive, because they think about and read selectors left-to-right. They don't consider that this has the potential to match this every div in the document. Just showing them that these are the expensive selectors on the page is very useful. And these are costs that you can't really optimize out of the browser - it is trivial in CSS to generate selectors that are very expensive to handle in all browsers.

Apologies, the link should have been http://www.whatwg.org/specs/web-apps/current-work/
Comment 12 Antti Koivisto 2011-12-08 04:15:44 PST
(In reply to comment #10)
> I would suggest something like http://www.w3.org/TR/2011/WD-html5-20110405/Overview.html is a good large page with simple layout for testing.

The stylesheet of the HTML spec (whatwg version ) is fairly simple. On my MBP the current ToT WebKit spends ~550ms in selector matching (~5% of total CPU time) over a page load. This time is significant because the document has very large number of elements to style, not because any particularly slow features used on the stylesheet (only ~120ms is spent on the slow path for the most complex selectors). 

In others words, trying to optimize that page with tool like this would be waste of time.

>When optimizing it is very common to look at a selector and wonder how expensive it is - for example many developers just don't realize that doing ".foo .bar div" is particularly expensive, because they think about and read selectors left-to-right. 

This is old information. WebKit has rather efficient implementation of simple descendant selectors like that (ever since http://trac.webkit.org/changeset/77740).

Generally if we find that some selector types are causing real world performance problems we are going to optimize them.
Comment 13 Luke Macpherson 2011-12-08 14:23:58 PST
(In reply to comment #12)
> This is old information. WebKit has rather efficient implementation of simple descendant selectors like that (ever since http://trac.webkit.org/changeset/77740).
> 
> Generally if we find that some selector types are causing real world performance problems we are going to optimize them.

Are you claiming that it is no longer possible to create pathological selectors in CSS? Are you claiming that it is not possible for developers to improve the performance of their web pages by optimizing their CSS? Should the thousands of web devs out there that make a living doing this just give up?

Whether we have good or bad implementations of selector matching is a bad argument against providing information to developers. We can and should keep optimizing the browser, but we should also help developers make their content better by making browser behavior less of a black box.
Comment 14 komoroske 2011-12-08 14:57:15 PST
(In reply to comment #13)
> (In reply to comment #12)
> > This is old information. WebKit has rather efficient implementation of simple descendant selectors like that (ever since http://trac.webkit.org/changeset/77740).
> > 
> > Generally if we find that some selector types are causing real world performance problems we are going to optimize them.
> 
> Are you claiming that it is no longer possible to create pathological selectors in CSS? Are you claiming that it is not possible for developers to improve the performance of their web pages by optimizing their CSS? Should the thousands of web devs out there that make a living doing this just give up?
> 
> Whether we have good or bad implementations of selector matching is a bad argument against providing information to developers. We can and should keep optimizing the browser, but we should also help developers make their content better by making browser behavior less of a black box.

I wanted to support Luke's position with an argument that more data is better based on a specific example.

The Google Spreadsheets engineering team is very skilled and they've built one of the most advanced web apps in common usage.  At various times in its development they've done deep investigations into the performance of various implementation strategies. It's a significant investment because it's hard to do correctly.

The result is that they come away with very good understanding of the performance characteristics of the web platform _at the moment they conduct the investigation_. Over time, however, as the web platform evolves underneath them what once was knowledge slowly becomes "lore" and finally ends up as "superstitions."  Because it takes so much investment to do deep investigate performance characteristics, they rarely do it without some active need for more information.  We uncovered and dispelled a number of those superstitions when some WebKit folks sat down with them to brainstorm how to improve scrolling performance recently.

These superstitions, in my experience, are even more prevalent among the general population of web developers. Making the performance characteristics easier to understand would make web developers more likely to adapt to the changing conditions of the web platform instead of relying on old superstitions.
Comment 15 Ryosuke Niwa 2011-12-08 14:59:20 PST
(In reply to comment #14)
> I wanted to support Luke's position with an argument that more data is better based on a specific example.

There's always a trade-off between exposing useful information vs. exposing too much and letting developers over-fit.

> The result is that they come away with very good understanding of the performance characteristics of the web platform _at the moment they conduct the investigation_. Over time, however, as the web platform evolves underneath them what once was knowledge slowly becomes "lore" and finally ends up as "superstitions."  Because it takes so much investment to do deep investigate performance characteristics, they rarely do it without some active need for more information.  We uncovered and dispelled a number of those superstitions when some WebKit folks sat down with them to brainstorm how to improve scrolling performance recently.

This sounds like a really strong argument for us to add this feature.

> These superstitions, in my experience, are even more prevalent among the general population of web developers. Making the performance characteristics easier to understand would make web developers more likely to adapt to the changing conditions of the web platform instead of relying on old superstitions.

This sounds like a good goal in general. We want to avoid developers optimizing for old legacy behaviors.
Comment 16 Antti Koivisto 2011-12-08 15:36:30 PST
(In reply to comment #13)
> Are you claiming that it is no longer possible to create pathological selectors in CSS? Are you claiming that it is not possible for developers to improve the performance of their web pages by optimizing their CSS? Should the thousands of web devs out there that make a living doing this just give up?

No, I didn't say anything like that. I simply pointed out that the examples you cited would not have directly benefited from this kind of profiling with the current WebKit.

However....

(In reply to comment #14)
> These superstitions, in my experience, are even more prevalent among the general population of web developers. Making the performance characteristics easier to understand would make web developers more likely to adapt to the changing conditions of the web platform instead of relying on old superstitions.

This seems like a reasonable argument (considering that such a superstition was even cited in this thread with some conviction).

I'm still interested in hearing what exactly we should be measuring. Or maybe it doesn't really matter that much as something very rough would be sufficient for education purposes?
Comment 17 Alexander Pavlov (apavlov) 2011-12-09 05:31:41 PST
(In reply to comment #4)
> (From update of attachment 118221 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=118221&action=review
> 
> I don't quite understand what this patch is measuring. Could you explain what the measured times exactly mean?

For each selector the "Total" time is the time spent by WebCore in the CSSStyleSelector::matchRulesForList() loop for this particular selector (multiple runs for the same selector are aggregated). I did not find any other code fragments that would 

"Hits" is the number of runs of the said loop for the particular selector (i.e. a match was attempted against DOM elements).

"Matches" is the number of times the particular selector was really applied to DOM elements, their style declarations taking effect on the element's computed style.

> > Source/WebCore/css/CSSStyleSelector.cpp:712
> >      unsigned size = rules->size();
> >      for (unsigned i = 0; i < size; ++i) {
> >          const RuleData& ruleData = rules->at(i);
> > -        if (canUseFastReject &&
...
> >              continue;
> > +        }
> 
> This loop is super hot and the change will likely regress performance. You would proabably need a templated version of the function so you won't affect the normal case at all.

OK, let's be number-driven :)

I ran 3 experiments: for the ToT, Profiler OFF, and Profiler ON environments. Experiment: seven 5-iteration page cycler runs in Chromium Linux (Release build), steps described in the Quickstart section of http://www.chromium.org/developers/testing/page-cyclers. The greatest and smallest values were not considered (thus, 5 effective results in each experiment), and we suppose a normal distribution here.

Here are the figures (each experiment totaled roughly 14900 selector hits using the definition above):

ToT: 1165.2 1166.4 1161 1162 1163.8
Average = 1163.68ms
StDev = 2.22

Profiler OFF: 1164.4 1167 1172.2 1165.8 1165.4
Average = 1166.96ms
StDev = 3.07

Profiler ON: 1177.4 1171.4 1172.4 1173.8 1173.4
Average = 1173.68ms
StDev = 2.28

The ToT and Profiler OFF averages are only 3.28, which is slightly above one sigma, so it looks like the difference is statistically insignificant.

Feel free to check the results and speak up if I did something wrong (it's been a while since I last did something statistics-related...)
Comment 18 Alexander Pavlov (apavlov) 2011-12-09 05:32:32 PST
> The ToT and Profiler OFF averages are only 3.28, which is slightly above one sigma, so it looks like the difference is statistically insignificant.

...meaning "The ToT and Profiler OFF averages are only 3.28 APART"
Comment 19 Antti Koivisto 2011-12-09 05:45:45 PST
Please also try PerformanceTests/Parser/html5-full-render.html.

For full analysis, please use a sampling profiler (like Instruments) over page load of something sufficiently complex (the full html5) and look at the sample counts in the function.

In light of the discussion above I don't think you want to count the ancestor identifier filter rejections (fastRejectSelector) anyway as those are very fast and don't have much impact to the overall performance picture. You probably only care about selectors that pass that test.
Comment 20 Alexander Pavlov (apavlov) 2011-12-09 09:11:10 PST
Created attachment 118586 [details]
[PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching

Perhaps the m_hasCachedSelectorText flag should be moved up into CSSRule (and take 1 bit from m_sourceLine?)
Comment 21 Andreas Kling 2011-12-09 09:16:57 PST
(In reply to comment #20)
> Created an attachment (id=118586) [details]
> [PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching
> 
> Perhaps the m_hasCachedSelectorText flag should be moved up into CSSRule (and take 1 bit from m_sourceLine?)

Definitely!
Comment 22 Pavel Feldman 2011-12-09 09:24:10 PST
Could you split this change into the backend and front-end changes to ease the review? I'm sure Antti will be more comfortable reviewing the changes to the CSS and instrumentation there, while I'll take care of the general inspector plumbings and the front-end parts.
Comment 23 Ryosuke Niwa 2011-12-09 09:46:43 PST
(In reply to comment #17)
> ToT: 1165.2 1166.4 1161 1162 1163.8
> Average = 1163.68ms
> StDev = 2.22
> 
> Profiler OFF: 1164.4 1167 1172.2 1165.8 1165.4
> Average = 1166.96ms
> StDev = 3.07
> 
> Profiler ON: 1177.4 1171.4 1172.4 1173.8 1173.4
> Average = 1173.68ms
> StDev = 2.28

The sample size of 5 is way too small for this stat. to be meaningful. For a sample size of this magnitude, standard deviation is not so useful either.
Comment 24 Antti Koivisto 2011-12-09 09:57:44 PST
(In reply to comment #21)
> (In reply to comment #20)
> > Created an attachment (id=118586) [details] [details]
> > [PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching
> > 
> > Perhaps the m_hasCachedSelectorText flag should be moved up into CSSRule (and take 1 bit from m_sourceLine?)
> 
> Definitely!

It would be better to implement the cache under a separate bug.
Comment 25 Pavel Feldman 2011-12-11 09:43:48 PST
Comment on attachment 118586 [details]
[PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching

Wrong patch? I can still see instrumentations, caching, changes to the inspector backend and the front-end in the same patch. Should be 4 separate bugs / patches.
Comment 26 Alexander Pavlov (apavlov) 2011-12-26 03:44:23 PST
Created attachment 120545 [details]
Patch
Comment 27 Pavel Feldman 2011-12-26 05:10:53 PST
Comment on attachment 120545 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=120545&action=review

> Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:206
> +    performSearch: function(query, finishedCallback)

I think we should put generic search functionality into DataGrid.
Comment 28 Alexander Pavlov (apavlov) 2011-12-28 09:40:17 PST
Created attachment 120682 [details]
Patch
Comment 29 Alexander Pavlov (apavlov) 2011-12-29 03:57:17 PST
Created attachment 120730 [details]
Patch
Comment 30 Pavel Feldman 2011-12-29 08:26:07 PST
Comment on attachment 120730 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=120730&action=review

> Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:2
> + * Copyright (C) 2008 Apple Inc. All Rights Reserved.

Isn't it a new code?

> Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:49
> +

Extra blank lines.

> Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:137
> +    refreshVisibleData: function()

Function naming is misleading here: refresh actually rebuilds the content, while refreshVisibleData is calling refresh on the data members.

> Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:280
> +    setRecordingProfile: function(isProfiling)

What is this method used for?
Comment 31 Alexander Pavlov (apavlov) 2011-12-29 08:35:15 PST
(In reply to comment #30)
> (From update of attachment 120730 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=120730&action=review
> 
> > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:2
> > + * Copyright (C) 2008 Apple Inc. All Rights Reserved.
> 
> Isn't it a new code?

Removing this line.

> > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:49
> > +
> 
> Extra blank lines.

Fixed.

> > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:137
> > +    refreshVisibleData: function()
> 
> Function naming is misleading here: refresh actually rebuilds the content, while refreshVisibleData is calling refresh on the data members.

Renaming refresh -> rebuildGridItems, refreshVisibleData -> refreshData

> > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:280
> > +    setRecordingProfile: function(isProfiling)
> 
> What is this method used for?

It's a callback invoked from ProfilesPanel.
Comment 32 Alexander Pavlov (apavlov) 2011-12-29 09:03:08 PST
Committed r103803: <http://trac.webkit.org/changeset/103803>