Bug 74863

Summary: Web Inspector: Add CSSStyleSelector instrumentation calls towards implementing a CSS selector profiler
Product: WebKit Reporter: Alexander Pavlov (apavlov) <apavlov>
Component: Web Inspector (Deprecated)Assignee: Alexander Pavlov (apavlov) <apavlov>
Status: RESOLVED FIXED    
Severity: Normal CC: apavlov, bweinstein, dglazkov, joepeck, keishi, kling, koivisto, loislo, macpherson, pfeldman, pmuellr, rik, timothy, webkit.review.bot, yurys
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Patch
none
Patch koivisto: review+, webkit.review.bot: commit-queue-

Description Alexander Pavlov (apavlov) 2011-12-19 07:38:46 PST
Patch to follow.
Comment 1 Alexander Pavlov (apavlov) 2011-12-19 09:59:50 PST
Created attachment 119877 [details]
Patch
Comment 2 Antti Koivisto 2011-12-19 10:54:10 PST
Comment on attachment 119877 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=119877&action=review

r- on perf concerns.

> Source/WebCore/css/CSSStyleSelector.cpp:722
> +        CSSStyleRule* rule = ruleData.rule();
> +        InspectorInstrumentationCookie cookie = InspectorInstrumentation::willMatchRule(document(), rule);
> +        if (canUseFastReject && m_checker.fastRejectSelector<RuleData::maximumIdentifierCount>(ruleData.descendantSelectorIdentifierHashes())) {
> +            InspectorInstrumentation::didMatchRule(cookie, false);
>              continue;
> +        }

As said before, I don't think you should put anything on the fastRejectSelector path. You are not measuring anything interesting (the selectors rejected by this loop are not any more relevant than the selectors rejected by earlier hash lookup optimizations which you don't measure) and very little time is spent here anyway (bloom filter lookup is very fast).

Extra branches here are very likely to regress performance. If you look at the assembly, you will see that this code compiles to a very tight loop. You would need to prove that it doesn't on ARM where this kind of stuff can really matter.

Other additional checks in this patch are probably fine.
Comment 3 Alexander Pavlov (apavlov) 2011-12-20 04:21:31 PST
Created attachment 120006 [details]
Patch
Comment 4 Alexander Pavlov (apavlov) 2011-12-20 04:44:56 PST
(In reply to comment #2)
> (From update of attachment 119877 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=119877&action=review
> 
> r- on perf concerns.
> 
> > Source/WebCore/css/CSSStyleSelector.cpp:722
> > +        CSSStyleRule* rule = ruleData.rule();
> > +        InspectorInstrumentationCookie cookie = InspectorInstrumentation::willMatchRule(document(), rule);
> > +        if (canUseFastReject && m_checker.fastRejectSelector<RuleData::maximumIdentifierCount>(ruleData.descendantSelectorIdentifierHashes())) {
> > +            InspectorInstrumentation::didMatchRule(cookie, false);
> >              continue;
> > +        }
> 
> As said before, I don't think you should put anything on the fastRejectSelector path. You are not measuring anything interesting (the selectors rejected by this loop are not any more relevant than the selectors rejected by earlier hash lookup optimizations which you don't measure) and very little time is spent here anyway (bloom filter lookup is very fast).

My concern was profiling the invocation count, but since you say a lot of things have been filtered out by this point, it's probably okay to omit this check, too.

> Extra branches here are very likely to regress performance. If you look at the assembly, you will see that this code compiles to a very tight loop. You would need to prove that it doesn't on ARM where this kind of stuff can really matter.

NB: The following paragraph treats the case of a closed Web Inspector.

I've trimmed the total footprint down to 28 CPU instructions per loop (on my x86_64) in the latest patch. The original loop totals to around 210 CPU instruction at the top level, all instructions in calls excluded. That said, an rdtsc-based measurement has shown that typically the frontend check takes about 40 CPU cycles (occasionally - 100 to 180 iterations) on the first iteration and then gets cached by the CPU, so subsequent iterations typically evaluate to 0 - 10 cycles, depending on the CPU cache state.
Comment 5 Antti Koivisto 2011-12-20 11:18:31 PST
The UI should probably show the (approximate) time spent matching each selector and the match count but not the rejection count. Every selector that doesn't match is rejected (only the optimization stage we figure it out varies).
Comment 6 WebKit Review Bot 2011-12-20 16:22:43 PST
Comment on attachment 120006 [details]
Patch

Attachment 120006 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/10987563

New failing tests:
http/tests/inspector/extensions-headers.html
http/tests/inspector/extensions-useragent.html
http/tests/inspector/resource-har-headers.html
http/tests/inspector/console-xhr-logging.html
http/tests/inspector/console-cd.html
http/tests/inspector/console-cd-completions.html
http/tests/inspector/injected-script-for-origin.html
http/tests/inspector/extensions-network-redirect.html
http/tests/inspector/console-websocket-error.html
http/tests/inspector/compiler-source-mapping.html
http/tests/inspector/extensions-ignore-cache.html
http/tests/inspector/change-iframe-src.html
http/tests/inspector/inspect-iframe-from-different-domain.html
http/tests/inspector/compiler-source-mapping-debug.html
http/tests/inspector/resource-har-conversion.html
http/tests/inspector/console-resource-errors.html
http/tests/inspector/resource-parameters.html
Comment 7 Antti Koivisto 2011-12-20 22:47:00 PST
Comment on attachment 120006 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=120006&action=review

r=me, with some comments

> Source/WebCore/css/CSSStyleSelector.cpp:725
> +        InspectorInstrumentationCookie cookie;
> +#if ENABLE(INSPECTOR)
> +        if (InspectorInstrumentation::hasFrontends())
> +            cookie = InspectorInstrumentation::willMatchRule(document(), rule);
> +#endif

It would be cleaner to inline the front end and enable checks inside the InspectorInstrumentation call. Returning a null cookie probably doesn't have meaningful cost.

> Source/WebCore/css/CSSStyleSelector.cpp:2187
> +    InspectorInstrumentationCookie cookie;
> +#if ENABLE(INSPECTOR)
> +    if (InspectorInstrumentation::hasFrontends())
> +        cookie = InspectorInstrumentation::willProcessRule(document(), styleDeclaration->parentRule());
> +#endif

Same comment as above.
Comment 8 Alexander Pavlov (apavlov) 2011-12-21 06:10:50 PST
(In reply to comment #7)
> (From update of attachment 120006 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=120006&action=review
> 
> r=me, with some comments
> 
> > Source/WebCore/css/CSSStyleSelector.cpp:725
> > +        InspectorInstrumentationCookie cookie;
> > +#if ENABLE(INSPECTOR)
> > +        if (InspectorInstrumentation::hasFrontends())
> > +            cookie = InspectorInstrumentation::willMatchRule(document(), rule);
> > +#endif
> 
> It would be cleaner to inline the front end and enable checks inside the InspectorInstrumentation call. Returning a null cookie probably doesn't have meaningful cost.

So, it looks like your only concern in the previous patch was the will... call _before_ the fast rejection path, and returning an empty cookie is fine, right?
Comment 9 Alexander Pavlov (apavlov) 2011-12-23 02:07:19 PST
Committed r103615: <http://trac.webkit.org/changeset/103615>
Comment 10 Antti Koivisto 2011-12-23 09:20:05 PST
(In reply to comment #8)
> So, it looks like your only concern in the previous patch was the will... call _before_ the fast rejection path, and returning an empty cookie is fine, right?

Right, fastReject can throw out >90% of candidates so things are much hotter before it. Also, as I said I dont think measuring it is particularly interesting.
Comment 11 Alexander Pavlov (apavlov) 2011-12-23 09:43:32 PST
(In reply to comment #10)
> (In reply to comment #8)
> > So, it looks like your only concern in the previous patch was the will... call _before_ the fast rejection path, and returning an empty cookie is fine, right?
> 
> Right, fastReject can throw out >90% of candidates so things are much hotter before it. Also, as I said I dont think measuring it is particularly interesting.

Thanks for the clarification, Antti. The landed patch excludes fastReject from the measurement.