Bug 74004

Summary:

Web Inspector: Implement CSS selector profiler

Product:

WebKit

Reporter:

Alexander Pavlov (apavlov) <apavlov>

Component:

Web Inspector (Deprecated)

Assignee:

Alexander Pavlov (apavlov) <apavlov>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

apavlov, bweinstein, darin, hyatt, joepeck, kangax, keishi, kling, koivisto, komoroske, loislo, macpherson, mathias, paulirish, peter, pfeldman, pmuellr, rik, rniwa, timothy, webkit.review.bot, yurys

Priority:

Version:

528+ (Nightly build)

Hardware:

All

OS:

All

Bug Depends on:

74269, 74391, 74603, 75228

Bug Blocks:

Attachments:

Description	Flags
Patch	none
[IMAGE] Screenshot of a CSS selector profile	none
[PATCH] Fix Mac compilability	none
[PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching	none
Patch	none
Patch	none
Patch	pfeldman: review+

Alexander Pavlov (apavlov)

Reported 2011-12-07 09:12:28 PST

Patch to follow

Attachments
Patch (76.57 KB, patch) 2011-12-07 09:30 PST, Alexander Pavlov (apavlov)	no flags	Details Formatted Diff Diff
[IMAGE] Screenshot of a CSS selector profile (91.37 KB, image/png) 2011-12-07 09:38 PST, Alexander Pavlov (apavlov)	no flags	Details
[PATCH] Fix Mac compilability (76.46 KB, patch) 2011-12-07 09:45 PST, Alexander Pavlov (apavlov)	no flags	Details Formatted Diff Diff
[PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching (84.83 KB, patch) 2011-12-09 09:11 PST, Alexander Pavlov (apavlov)	no flags	Details Formatted Diff Diff
Patch (47.12 KB, patch) 2011-12-26 03:44 PST, Alexander Pavlov (apavlov)	no flags	Details Formatted Diff Diff
Patch (28.26 KB, patch) 2011-12-28 09:40 PST, Alexander Pavlov (apavlov)	no flags	Details Formatted Diff Diff
Patch (27.67 KB, patch) 2011-12-29 03:57 PST, Alexander Pavlov (apavlov)	pfeldman: review+	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Alexander Pavlov (apavlov)

Comment 1 2011-12-07 09:30:31 PST

Created attachment 118218 [details] Patch

Alexander Pavlov (apavlov)

Comment 2 2011-12-07 09:38:13 PST

Created attachment 118220 [details] [IMAGE] Screenshot of a CSS selector profile - Percentages or absolute values can be displayed - The profile can be sorted and searched by all columns, like CPU profiles (see "< 10 matches" in the Search field) - The profile name reflects the aggregate time of all selector matching during profiling - The "Start/stop CSS profiling" button is far from perfect, I know :) Please speak up if you have any questions, comments, concerns!..

Alexander Pavlov (apavlov)

Comment 3 2011-12-07 09:45:40 PST

Created attachment 118221 [details] [PATCH] Fix Mac compilability

Antti Koivisto

Comment 4 2011-12-07 10:07:53 PST

Comment on attachment 118221 [details] [PATCH] Fix Mac compilability View in context: https://bugs.webkit.org/attachment.cgi?id=118221&action=review I don't quite understand what this patch is measuring. Could you explain what the measured times exactly mean? > Source/WebCore/css/CSSStyleSelector.cpp:712 > unsigned size = rules->size(); > for (unsigned i = 0; i < size; ++i) { > const RuleData& ruleData = rules->at(i); > - if (canUseFastReject && m_checker.fastRejectSelector<RuleData::maximumIdentifierCount>(ruleData.descendantSelectorIdentifierHashes())) > + CSSStyleRule* rule = ruleData.rule(); > + InspectorInstrumentationCookie cookie = InspectorInstrumentation::willMatchRule(document(), rule); > + if (canUseFastReject && m_checker.fastRejectSelector<RuleData::maximumIdentifierCount>(ruleData.descendantSelectorIdentifierHashes())) { > + InspectorInstrumentation::didMatchRule(cookie, false); > continue; > + } This loop is super hot and the change will likely regress performance. You would proabably need a templated version of the function so you won't affect the normal case at all.

Pavel Feldman

Comment 5 2011-12-07 10:16:01 PST

Comment on attachment 118221 [details] [PATCH] Fix Mac compilability Before we introduce a new capability / perspective / functionality, it would be great to figure out what it measures and whether it is useful to the end users. I can see selector matching taking single digit milliseconds. I know from the timeline panel that layout takes hundres of milliseconds or even seconds. I don't think we should expose this type of profiling, but should rather figure out how to help developers to figure out why layout is so slow. I am putting r- here unless we figure out whether we need this type of profile.

Timothy Hatcher

Comment 6 2011-12-07 13:57:37 PST

FWIW, Opera is adding this to Dragonfly. http://my.opera.com/dragonfly/blog/style-profiler-preview

Luke Macpherson

Comment 7 2011-12-07 14:35:50 PST

@Pavel: In all the webkit profiling I've done, CSS selector matching is the hottest code in the browser for page load. This profiling information looks like it would help web authors improve their CSS rules in ways that would significantly improve page load times. Before I started working on webkit I did a lot of work on Google Wave's CSS - at that time having access to a tool like this would have made a huge difference to us. Back then I knew the general principles of what the browser was doing and roughly which selectors to optimize, but a tool like this would make that process much simpler and accessible to many more web developers.

Pavel Feldman

Comment 8 2011-12-07 15:19:19 PST

(In reply to comment #6) > FWIW, Opera is adding this to Dragonfly. > > http://my.opera.com/dragonfly/blog/style-profiler-preview That's why I asked Alexander to look into it originally. (In reply to comment #7) > @Pavel: In all the webkit profiling I've done, CSS selector matching is the hottest code in the browser for page load. This profiling information looks like it would help web authors improve their CSS rules in ways that would significantly improve page load times. > > Before I started working on webkit I did a lot of work on Google Wave's CSS - at that time having access to a tool like this would have made a huge difference to us. Back then I knew the general principles of what the browser was doing and roughly which selectors to optimize, but a tool like this would make that process much simpler and accessible to many more web developers. I am all for the useful tools, but as long as I see matching styles taking 10ms followed by a layout that takes 500ms I can't think of it as of useful tool. Either this profiler is broken or it exposes neglectable data. Could you suggest Alexander a site or use case that would end up with the numbers that are worth optimizing?

Antti Koivisto

Comment 9 2011-12-07 15:27:21 PST

(In reply to comment #6) > FWIW, Opera is adding this to Dragonfly. > > http://my.opera.com/dragonfly/blog/style-profiler-preview Based on that blog post the feature is mostly useful because some basic selector types in their engine are not particularly fast and need workarounds.

Luke Macpherson

Comment 10 2011-12-07 16:16:41 PST

(In reply to comment #8) > I am all for the useful tools, but as long as I see matching styles taking 10ms followed by a layout that takes 500ms I can't think of it as of useful tool. Either this profiler is broken or it exposes neglectable data. Could you suggest Alexander a site or use case that would end up with the numbers that are worth optimizing? Where are you getting those numbers? It sounds like you're comparing matching a single rule to laying out the entire document. Even the simple page in the screenshot is showing over 100 ms, and I can see from the number of rules that it's not a particularly complex case. I would suggest something like http://www.w3.org/TR/2011/WD-html5-20110405/Overview.html is a good large page with simple layout for testing. Keep in mind that this kind of profiling is particularly useful even without even considering how many ms are spent. When optimizing it is very common to look at a selector and wonder how expensive it is - for example many developers just don't realize that doing ".foo .bar div" is particularly expensive, because they think about and read selectors left-to-right. They don't consider that this has the potential to match this every div in the document. Just showing them that these are the expensive selectors on the page is very useful. And these are costs that you can't really optimize out of the browser - it is trivial in CSS to generate selectors that are very expensive to handle in all browsers.

Luke Macpherson

Comment 11 2011-12-07 19:27:11 PST

(In reply to comment #10) > (In reply to comment #8) > > I am all for the useful tools, but as long as I see matching styles taking 10ms followed by a layout that takes 500ms I can't think of it as of useful tool. Either this profiler is broken or it exposes neglectable data. Could you suggest Alexander a site or use case that would end up with the numbers that are worth optimizing? > > Where are you getting those numbers? It sounds like you're comparing matching a single rule to laying out the entire document. Even the simple page in the screenshot is showing over 100 ms, and I can see from the number of rules that it's not a particularly complex case. > > I would suggest something like http://www.w3.org/TR/2011/WD-html5-20110405/Overview.html is a good large page with simple layout for testing. > > Keep in mind that this kind of profiling is particularly useful even without even considering how many ms are spent. When optimizing it is very common to look at a selector and wonder how expensive it is - for example many developers just don't realize that doing ".foo .bar div" is particularly expensive, because they think about and read selectors left-to-right. They don't consider that this has the potential to match this every div in the document. Just showing them that these are the expensive selectors on the page is very useful. And these are costs that you can't really optimize out of the browser - it is trivial in CSS to generate selectors that are very expensive to handle in all browsers. Apologies, the link should have been http://www.whatwg.org/specs/web-apps/current-work/

Antti Koivisto

Comment 12 2011-12-08 04:15:44 PST

(In reply to comment #10) > I would suggest something like http://www.w3.org/TR/2011/WD-html5-20110405/Overview.html is a good large page with simple layout for testing. The stylesheet of the HTML spec (whatwg version ) is fairly simple. On my MBP the current ToT WebKit spends ~550ms in selector matching (~5% of total CPU time) over a page load. This time is significant because the document has very large number of elements to style, not because any particularly slow features used on the stylesheet (only ~120ms is spent on the slow path for the most complex selectors). In others words, trying to optimize that page with tool like this would be waste of time. >When optimizing it is very common to look at a selector and wonder how expensive it is - for example many developers just don't realize that doing ".foo .bar div" is particularly expensive, because they think about and read selectors left-to-right. This is old information. WebKit has rather efficient implementation of simple descendant selectors like that (ever since http://trac.webkit.org/changeset/77740). Generally if we find that some selector types are causing real world performance problems we are going to optimize them.

Luke Macpherson

Comment 13 2011-12-08 14:23:58 PST

(In reply to comment #12) > This is old information. WebKit has rather efficient implementation of simple descendant selectors like that (ever since http://trac.webkit.org/changeset/77740). > > Generally if we find that some selector types are causing real world performance problems we are going to optimize them. Are you claiming that it is no longer possible to create pathological selectors in CSS? Are you claiming that it is not possible for developers to improve the performance of their web pages by optimizing their CSS? Should the thousands of web devs out there that make a living doing this just give up? Whether we have good or bad implementations of selector matching is a bad argument against providing information to developers. We can and should keep optimizing the browser, but we should also help developers make their content better by making browser behavior less of a black box.

komoroske

Comment 14 2011-12-08 14:57:15 PST

(In reply to comment #13) > (In reply to comment #12) > > This is old information. WebKit has rather efficient implementation of simple descendant selectors like that (ever since http://trac.webkit.org/changeset/77740). > > > > Generally if we find that some selector types are causing real world performance problems we are going to optimize them. > > Are you claiming that it is no longer possible to create pathological selectors in CSS? Are you claiming that it is not possible for developers to improve the performance of their web pages by optimizing their CSS? Should the thousands of web devs out there that make a living doing this just give up? > > Whether we have good or bad implementations of selector matching is a bad argument against providing information to developers. We can and should keep optimizing the browser, but we should also help developers make their content better by making browser behavior less of a black box. I wanted to support Luke's position with an argument that more data is better based on a specific example. The Google Spreadsheets engineering team is very skilled and they've built one of the most advanced web apps in common usage. At various times in its development they've done deep investigations into the performance of various implementation strategies. It's a significant investment because it's hard to do correctly. The result is that they come away with very good understanding of the performance characteristics of the web platform _at the moment they conduct the investigation_. Over time, however, as the web platform evolves underneath them what once was knowledge slowly becomes "lore" and finally ends up as "superstitions." Because it takes so much investment to do deep investigate performance characteristics, they rarely do it without some active need for more information. We uncovered and dispelled a number of those superstitions when some WebKit folks sat down with them to brainstorm how to improve scrolling performance recently. These superstitions, in my experience, are even more prevalent among the general population of web developers. Making the performance characteristics easier to understand would make web developers more likely to adapt to the changing conditions of the web platform instead of relying on old superstitions.

Ryosuke Niwa

Comment 15 2011-12-08 14:59:20 PST

(In reply to comment #14) > I wanted to support Luke's position with an argument that more data is better based on a specific example. There's always a trade-off between exposing useful information vs. exposing too much and letting developers over-fit. > The result is that they come away with very good understanding of the performance characteristics of the web platform _at the moment they conduct the investigation_. Over time, however, as the web platform evolves underneath them what once was knowledge slowly becomes "lore" and finally ends up as "superstitions." Because it takes so much investment to do deep investigate performance characteristics, they rarely do it without some active need for more information. We uncovered and dispelled a number of those superstitions when some WebKit folks sat down with them to brainstorm how to improve scrolling performance recently. This sounds like a really strong argument for us to add this feature. > These superstitions, in my experience, are even more prevalent among the general population of web developers. Making the performance characteristics easier to understand would make web developers more likely to adapt to the changing conditions of the web platform instead of relying on old superstitions. This sounds like a good goal in general. We want to avoid developers optimizing for old legacy behaviors.

Antti Koivisto

Comment 16 2011-12-08 15:36:30 PST

(In reply to comment #13) > Are you claiming that it is no longer possible to create pathological selectors in CSS? Are you claiming that it is not possible for developers to improve the performance of their web pages by optimizing their CSS? Should the thousands of web devs out there that make a living doing this just give up? No, I didn't say anything like that. I simply pointed out that the examples you cited would not have directly benefited from this kind of profiling with the current WebKit. However.... (In reply to comment #14) > These superstitions, in my experience, are even more prevalent among the general population of web developers. Making the performance characteristics easier to understand would make web developers more likely to adapt to the changing conditions of the web platform instead of relying on old superstitions. This seems like a reasonable argument (considering that such a superstition was even cited in this thread with some conviction). I'm still interested in hearing what exactly we should be measuring. Or maybe it doesn't really matter that much as something very rough would be sufficient for education purposes?

Alexander Pavlov (apavlov)

Comment 17 2011-12-09 05:31:41 PST

(In reply to comment #4) > (From update of attachment 118221 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=118221&action=review > > I don't quite understand what this patch is measuring. Could you explain what the measured times exactly mean? For each selector the "Total" time is the time spent by WebCore in the CSSStyleSelector::matchRulesForList() loop for this particular selector (multiple runs for the same selector are aggregated). I did not find any other code fragments that would "Hits" is the number of runs of the said loop for the particular selector (i.e. a match was attempted against DOM elements). "Matches" is the number of times the particular selector was really applied to DOM elements, their style declarations taking effect on the element's computed style. > > Source/WebCore/css/CSSStyleSelector.cpp:712 > > unsigned size = rules->size(); > > for (unsigned i = 0; i < size; ++i) { > > const RuleData& ruleData = rules->at(i); > > - if (canUseFastReject && ... > > continue; > > + } > > This loop is super hot and the change will likely regress performance. You would proabably need a templated version of the function so you won't affect the normal case at all. OK, let's be number-driven :) I ran 3 experiments: for the ToT, Profiler OFF, and Profiler ON environments. Experiment: seven 5-iteration page cycler runs in Chromium Linux (Release build), steps described in the Quickstart section of http://www.chromium.org/developers/testing/page-cyclers. The greatest and smallest values were not considered (thus, 5 effective results in each experiment), and we suppose a normal distribution here. Here are the figures (each experiment totaled roughly 14900 selector hits using the definition above): ToT: 1165.2 1166.4 1161 1162 1163.8 Average = 1163.68ms StDev = 2.22 Profiler OFF: 1164.4 1167 1172.2 1165.8 1165.4 Average = 1166.96ms StDev = 3.07 Profiler ON: 1177.4 1171.4 1172.4 1173.8 1173.4 Average = 1173.68ms StDev = 2.28 The ToT and Profiler OFF averages are only 3.28, which is slightly above one sigma, so it looks like the difference is statistically insignificant. Feel free to check the results and speak up if I did something wrong (it's been a while since I last did something statistics-related...)

Alexander Pavlov (apavlov)

Comment 18 2011-12-09 05:32:32 PST

> The ToT and Profiler OFF averages are only 3.28, which is slightly above one sigma, so it looks like the difference is statistically insignificant. ...meaning "The ToT and Profiler OFF averages are only 3.28 APART"

Antti Koivisto

Comment 19 2011-12-09 05:45:45 PST

Please also try PerformanceTests/Parser/html5-full-render.html. For full analysis, please use a sampling profiler (like Instruments) over page load of something sufficiently complex (the full html5) and look at the sample counts in the function. In light of the discussion above I don't think you want to count the ancestor identifier filter rejections (fastRejectSelector) anyway as those are very fast and don't have much impact to the overall performance picture. You probably only care about selectors that pass that test.

Alexander Pavlov (apavlov)

Comment 20 2011-12-09 09:11:10 PST

Created attachment 118586 [details] [PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching Perhaps the m_hasCachedSelectorText flag should be moved up into CSSRule (and take 1 bit from m_sourceLine?)

Andreas Kling

Comment 21 2011-12-09 09:16:57 PST

(In reply to comment #20) > Created an attachment (id=118586) [details] > [PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching > > Perhaps the m_hasCachedSelectorText flag should be moved up into CSSRule (and take 1 bit from m_sourceLine?) Definitely!

Pavel Feldman

Comment 22 2011-12-09 09:24:10 PST

Could you split this change into the backend and front-end changes to ease the review? I'm sure Antti will be more comfortable reviewing the changes to the CSS and instrumentation there, while I'll take care of the general inspector plumbings and the front-end parts.

Ryosuke Niwa

Comment 23 2011-12-09 09:46:43 PST

(In reply to comment #17) > ToT: 1165.2 1166.4 1161 1162 1163.8 > Average = 1163.68ms > StDev = 2.22 > > Profiler OFF: 1164.4 1167 1172.2 1165.8 1165.4 > Average = 1166.96ms > StDev = 3.07 > > Profiler ON: 1177.4 1171.4 1172.4 1173.8 1173.4 > Average = 1173.68ms > StDev = 2.28 The sample size of 5 is way too small for this stat. to be meaningful. For a sample size of this magnitude, standard deviation is not so useful either.

Antti Koivisto

Comment 24 2011-12-09 09:57:44 PST

(In reply to comment #21) > (In reply to comment #20) > > Created an attachment (id=118586) [details] [details] > > [PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching > > > > Perhaps the m_hasCachedSelectorText flag should be moved up into CSSRule (and take 1 bit from m_sourceLine?) > > Definitely! It would be better to implement the cache under a separate bug.

Pavel Feldman

Comment 25 2011-12-11 09:43:48 PST

Comment on attachment 118586 [details] [PATCH] Instrumented applyMatchedDeclaration(), added PoC selectorText caching Wrong patch? I can still see instrumentations, caching, changes to the inspector backend and the front-end in the same patch. Should be 4 separate bugs / patches.

Alexander Pavlov (apavlov)

Comment 26 2011-12-26 03:44:23 PST

Created attachment 120545 [details] Patch

Pavel Feldman

Comment 27 2011-12-26 05:10:53 PST

Comment on attachment 120545 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=120545&action=review > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:206 > + performSearch: function(query, finishedCallback) I think we should put generic search functionality into DataGrid.

Alexander Pavlov (apavlov)

Comment 28 2011-12-28 09:40:17 PST

Created attachment 120682 [details] Patch

Alexander Pavlov (apavlov)

Comment 29 2011-12-29 03:57:17 PST

Created attachment 120730 [details] Patch

Pavel Feldman

Comment 30 2011-12-29 08:26:07 PST

Comment on attachment 120730 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=120730&action=review > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:2 > + * Copyright (C) 2008 Apple Inc. All Rights Reserved. Isn't it a new code? > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:49 > + Extra blank lines. > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:137 > + refreshVisibleData: function() Function naming is misleading here: refresh actually rebuilds the content, while refreshVisibleData is calling refresh on the data members. > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:280 > + setRecordingProfile: function(isProfiling) What is this method used for?

Alexander Pavlov (apavlov)

Comment 31 2011-12-29 08:35:15 PST

(In reply to comment #30) > (From update of attachment 120730 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=120730&action=review > > > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:2 > > + * Copyright (C) 2008 Apple Inc. All Rights Reserved. > > Isn't it a new code? Removing this line. > > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:49 > > + > > Extra blank lines. Fixed. > > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:137 > > + refreshVisibleData: function() > > Function naming is misleading here: refresh actually rebuilds the content, while refreshVisibleData is calling refresh on the data members. Renaming refresh -> rebuildGridItems, refreshVisibleData -> refreshData > > Source/WebCore/inspector/front-end/CSSSelectorProfileView.js:280 > > + setRecordingProfile: function(isProfiling) > > What is this method used for? It's a callback invoked from ProfilesPanel.

Alexander Pavlov (apavlov)

Comment 32 2011-12-29 09:03:08 PST

Committed r103803: <http://trac.webkit.org/changeset/103803>

Note You need to log in before you can comment on or make changes to this bug.