Summary: | [Refactoring] Introduce a traversal strategy in SelectorChecker | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Shinya Kawanaka <shinyak> | ||||||||||||||||||||||||||||
Component: | CSS | Assignee: | Shinya Kawanaka <shinyak> | ||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||||
Severity: | Normal | CC: | cmarcelo, dglazkov, hayato, kling, koivisto, macpherson, menard, rafael.lobo, webcomponents-bugzilla, webkit.review.bot | ||||||||||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||||
Bug Depends on: | |||||||||||||||||||||||||||||||
Bug Blocks: | 96990 | ||||||||||||||||||||||||||||||
Attachments: |
|
Description
Shinya Kawanaka
2012-09-20 23:43:49 PDT
Created attachment 165065 [details]
Patch
Comment on attachment 165065 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=165065&action=review I am still happy with the general approach, just need to keep improving. > Source/WebCore/ChangeLog:14 > + Since this code path is very hot, we were very careful not to regress performance. > + We used template and inline method, also we did not add a new argument to SelectorChecker::checkOneSelector, > + but replaced the SelectorCheckingContext to Strategy. When we added a new argument to checkOneSelector, > + we detected 3% ~ 5% performance regression. This seems like black magic. Why would the performance regress like this? Also, I am fine with Context containing Strategy, but not the other way around. Strategy is an abstract notion and shouldn't really have state. Context contains specifics of selector-checking. Strategy taking Context as argument also seems fine. > Source/WebCore/ChangeLog:42 > + (WebCore::SelectorChecker::DOMTraversalStrategy::isFirstChild): This kind of methods takes arguments > + instead of having a member. When we use member, we have detected 10% performance regression. I don't understand this comment. Maybe remove it? :) > Source/WebCore/css/SelectorChecker.h:268 > +inline bool SelectorChecker::DOMTraversalStrategy::isFirstOfType(Element* element, const QualifiedName& type) const Do all these methods have to live in the header? That seems sad. > PerformanceTests/CSS/pseudoclass-selectors.html:6 > + <span></span> I wonder if you would make the tree a bit more complex, you'd see more interesting data. I worry this is too small to give you any meaningful results. Factoring like this makes no sense to me. You are invoking the DOMTraversalStrategy functions with 'this' pointer but none of them actually use it. The added abstraction just makes the code harder to understand. If there are code sharing opportunities please just factor them into static inline functions. Generally, I'd like to hear what the goal here is. Refactoring without some measurable end goal (enabling performance improvements or new features for example) does not necessarily lead to good results. (In reply to comment #3) > Factoring like this makes no sense to me. You are invoking the DOMTraversalStrategy functions with 'this' pointer but none of them actually use it. That is a really good point. > The added abstraction just makes the code harder to understand. If there are code sharing opportunities please just factor them into static inline functions. > Generally, I'd like to hear what the goal here is. Refactoring without some measurable end goal (enabling performance improvements or new features for example) does not necessarily lead to good results. Here's the larger patch on bug 96990 (that I asked Shinya to break up) to show the context: https://bugs.webkit.org/attachment.cgi?id=164899&action=review The trick here is that in Shadow DOM, we need the selector to match a list of nodes that is not (necessarily) the list of children of an element. Some of the nodes are, but some may have been projected as children by an enclosing shadow DOM subtree. (In reply to comment #2) > (From update of attachment 165065 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=165065&action=review > > I am still happy with the general approach, just need to keep improving. > > > Source/WebCore/ChangeLog:14 > > + Since this code path is very hot, we were very careful not to regress performance. > > + We used template and inline method, also we did not add a new argument to SelectorChecker::checkOneSelector, > > + but replaced the SelectorCheckingContext to Strategy. When we added a new argument to checkOneSelector, > > + we detected 3% ~ 5% performance regression. > > This seems like black magic. Why would the performance regress like this? The 4th argument is passed with a register, and it will cause a few extra memory operations in the function, as far as I investigated the assembly code. > > PerformanceTests/CSS/pseudoclass-selectors.html:6 > > + <span></span> > > I wonder if you would make the tree a bit more complex, you'd see more interesting data. I worry this is too small to give you any meaningful results. Since we want to measure the overhead of template function call, it seems simpler is better. A simple test will magnify the overhead of this patch if any. > > > PerformanceTests/CSS/pseudoclass-selectors.html:6
> > > + <span></span>
> >
> > I wonder if you would make the tree a bit more complex, you'd see more interesting data. I worry this is too small to give you any meaningful results.
>
> Since we want to measure the overhead of template function call, it seems simpler is better. A simple test will magnify the overhead of this patch if any.
Oops, this is not correct so much. checkSelector will be called for all elements which we try to match. So I think complexity does not matter so much. Very simple one is not good, though.
Created attachment 165529 [details]
Patch
Comment on attachment 165529 [details] Patch Attachment 165529 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/14006377 Created attachment 165545 [details]
Patch
Comment on attachment 165545 [details]
Patch
This doesn't pass tests
Created attachment 165548 [details]
Patch
Created attachment 165550 [details]
WIP including other patches
Created attachment 165552 [details]
Patch
(In reply to comment #5) > > > + but replaced the SelectorCheckingContext to Strategy. When we added a new argument to checkOneSelector, > > > + we detected 3% ~ 5% performance regression. > > > > This seems like black magic. Why would the performance regress like this? > > The 4th argument is passed with a register, and it will cause a few extra memory operations in the function, as far as I investigated the assembly code. Whoa. Comment on attachment 165552 [details]
Patch
Hmm.. The template turducken is not enjoyable. It's clever, but at least to me, it's a decrease in clarity. Let me ponder this...
(In reply to comment #14) > (In reply to comment #5) > > > > > + but replaced the SelectorCheckingContext to Strategy. When we added a new argument to checkOneSelector, > > > > + we detected 3% ~ 5% performance regression. > > > > > > This seems like black magic. Why would the performance regress like this? > > > > The 4th argument is passed with a register, and it will cause a few extra memory operations in the function, as far as I investigated the assembly code. > > Whoa. BTW, I will reduce function signature to 2 arguments in bug 96445, which I'll finish as soon as I am done with <insert word that rhymes with nerf>. I wonder if we could lower the complexity by making SelectorCheckingContext a template and specializing it _before_ it's passes to checkSelector? (In reply to comment #17) > I wonder if we could lower the complexity by making SelectorCheckingContext a template and specializing it _before_ it's passes to checkSelector? I've thought the following ideas before. # I prefer to making SelectorCheckingContext and SelectorCheckingContextForShadow incompatible, since they are not have is-a relation. (type A) template<Strategy> struct SelectorCheckingContext { ... Strategy strategy; }; When we should not have Vector<Node*> for strategy, we have to pass SelectorCheckingContext to strategy anyway. Seems strange for me. (type B) template<Strategy> struct SelectorCheckingContextBase { typedef Strategy TraversalStrategy; }; and struct SelectCheckingContext : public SelectorCheckingContextBase<DOMTraversalStrategy> { }; struct SelectCheckingContextForShadow : public SelectorCheckingContextBase<ShadowDOMTraversalStrategy> { Vector<Node*> nodes; int nth; }; and calls CheckingContext::TraversalStrategy::isFirstChild(context, element) etc. This is more complicated than now, I think. (type C) struct SelectorCheckingContextBase { // fields are the same as exiting SelectorCheckingContext. }; and struct SelectCheckingContext : public SelectorCheckingContextBase { bool isFirstChild(Element*) const; ... }; struct SelectCheckingContextForShadow : public SelectorCheckingContextBase<ShadowDOMTraversalStrategy> { bool isFirstChild(Element*) const; ... Vector<Node*> nodes; int nth; }; SelectorCheckingContext and TraversalStrategy are mixed. I don't like this design. (In reply to comment #17) > I wonder if we could lower the complexity by making SelectorCheckingContext a template and specializing it _before_ it's passes to checkSelector? I'm not sure what this means, so can you give me some example? (In reply to comment #18) > (In reply to comment #17) > > I wonder if we could lower the complexity by making SelectorCheckingContext a template and specializing it _before_ it's passes to checkSelector? > > I've thought the following ideas before. > # I prefer to making SelectorCheckingContext and SelectorCheckingContextForShadow incompatible, since they are not have is-a relation. > > (type A) > > template<Strategy> > struct SelectorCheckingContext { > ... > Strategy strategy; > }; This is the one I meant. > When we should not have Vector<Node*> for strategy, we have to pass SelectorCheckingContext to strategy anyway. Seems strange for me. Can you explain this a bit? I am not sure I understand. > > When we should not have Vector<Node*> for strategy, we have to pass SelectorCheckingContext to strategy anyway. Seems strange for me.
>
> Can you explain this a bit? I am not sure I understand.
Since we don't want to have state in Strategy, we will have to have Vector<Node*> in SelectorCheckingContext (for Shadow). Strategy for ShadowDOM needs Vector<Node*>, we have to pass SelectorCheckingContext to Strategy as an argument. The method signature of Strategy will be like this.
class Strategy {
...
bool isFirstChild(const SelectorCheckingContext<Strategy>& context, Element*) const;
...
};
Though strategy is owned by Context, we have to pass Context to strategy. This seems a bit strange for me. It's OK in some case, but I hesitate to do like this.
# Here, we have to pass Element to isFirstChild not to regress performance. Though context has the same object, reading form a class field needs memory operation in the current compiler.
If it's OK to have Vector<Node*> in Strategy, this discussion does not hold, and I don't have any objection for this, since we don't have to pass SelectorCheckingContext to Strategy.
(In reply to comment #21) > > > When we should not have Vector<Node*> for strategy, we have to pass SelectorCheckingContext to strategy anyway. Seems strange for me. > > > > Can you explain this a bit? I am not sure I understand. > > Since we don't want to have state in Strategy, we will have to have Vector<Node*> in SelectorCheckingContext (for Shadow). Strategy for ShadowDOM needs Vector<Node*>, we have to pass SelectorCheckingContext to Strategy as an argument. The method signature of Strategy will be like this. > > class Strategy { > ... > bool isFirstChild(const SelectorCheckingContext<Strategy>& context, Element*) const; > ... > }; > > Though strategy is owned by Context, we have to pass Context to strategy. This seems a bit strange for me. It's OK in some case, but I hesitate to do like this. > > # Here, we have to pass Element to isFirstChild not to regress performance. Though context has the same object, reading form a class field needs memory operation in the current compiler. > > If it's OK to have Vector<Node*> in Strategy, this discussion does not hold, and I don't have any objection for this, since we don't have to pass SelectorCheckingContext to Strategy. I am ok with relaxing my view that Strategies should not hold state, if it will make code more readable. Let's do it :) Wanted to check and see how things are going here. I think I am waiting on a new patch? Created attachment 166868 [details]
wip
Created attachment 166869 [details]
Sync with ToT
I've updated the patch, syncing with ToT. I am afraid that I have not addressed every comments in the discussion. Please let me know if I miss something. (In reply to comment #23) > Wanted to check and see how things are going here. I think I am waiting on a new patch? Created attachment 166936 [details]
WIP: Strategy as a member of SelectorCheckingContext.
(In reply to comment #27) > Created an attachment (id=166936) [details] > WIP: Strategy as a member of SelectorCheckingContext. In this patch, I tried the approach A from comment 18. Performance stays within the noise range, but the templates are still quite ugly. Let me give this another try. Comment on attachment 166869 [details] Sync with ToT View in context: https://bugs.webkit.org/attachment.cgi?id=166869&action=review > Source/WebCore/ChangeLog:33 > + Test: PerformanceTests/CSS?pseudoclass-selectors.html Please use CamelCase for the test name. Created attachment 166950 [details]
WIP: Strategy as interface member of SelectorCheckingContext.
Comment on attachment 166950 [details] WIP: Strategy as interface member of SelectorCheckingContext. View in context: https://bugs.webkit.org/attachment.cgi?id=166950&action=review Personally, I like this one the best. Running the perf test, I couldn't discern any differences between either of the three approaches. WDYT? > Source/WebCore/css/SelectorChecker.h:122 > + bool checkOneSelector(const SelectorCheckingContext&) const; This can probably move back. The code became much cleaner and simpler. I like that. I am afraid that shinya has intentionally introduced an template approach to avoid the performance regression. I guess shinya noticed the performance regression, but if perf test cannot tell any differences between these approaches, I prefer the clean one. Let me investigate further. (In reply to comment #31) > (From update of attachment 166950 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=166950&action=review > > Personally, I like this one the best. Running the perf test, I couldn't discern any differences between either of the three approaches. WDYT? > > > Source/WebCore/css/SelectorChecker.h:122 > > + bool checkOneSelector(const SelectorCheckingContext&) const; > > This can probably move back. (In reply to comment #32) Antti, which one do you prefer? (In reply to comment #33) > (In reply to comment #32) > > Antti, which one do you prefer? This one. https://bugs.webkit.org/attachment.cgi?id=166950&action=review Let me rewrite the patch in https://bugs.webkit.org/show_bug.cgi?id=96990 based on attachment 166950 [details] and see how things work. Created attachment 167045 [details]
Fix a performance test and update the benchmark result.
I fixed the performance test and update the benchmark result in the ChangeLog. It seems there is no performance regression. (In reply to comment #35) > Created an attachment (id=167045) [details] > Fix a performance test and update the benchmark result. Comment on attachment 167045 [details] Fix a performance test and update the benchmark result. Attachment 167045 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/14168197 Created attachment 167073 [details]
Fix a performance test.
Comment on attachment 167073 [details] Fix a performance test. View in context: https://bugs.webkit.org/attachment.cgi?id=167073&action=review I'm bit unhappy about having to make this code generic and so harder to understand. The reasons given are about Shadow DOM and I can't evaluate if they are valid or not. The patch itself seems ok so I'll r+ it anyway. > Source/WebCore/ChangeLog:29 > + The performance of the added test before using this patch was: > + avg 3669.143396776992 runs/s > + median 3667.9394112696314 runs/s > + stdev 21.12345296601369 runs/s > + min 3594.5452776430134 runs/s > + max 3707.408251056703 runs/s > + > + When we used this patch, the performance was: > + avg 3851.376555590946 runs/s > + median 3858.343345260435 runs/s > + stdev 25.245683967249494 runs/s > + min 3782.510878241754 runs/s > + max 3878.4305938425096 runs/s Can you explain where the performance gains are coming from? (In reply to comment #39) > (From update of attachment 167073 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=167073&action=review > > I'm bit unhappy about having to make this code generic and so harder to understand. The reasons given are about Shadow DOM and I can't evaluate if they are valid or not. The patch itself seems ok so I'll r+ it anyway. I understand the concern. That's our biggest concern too. We are still seeking and trying various approaches which keep code clean. Let me land this now and try another approaches as in another bugs. > > > Source/WebCore/ChangeLog:29 > > + The performance of the added test before using this patch was: > > + avg 3669.143396776992 runs/s > > + median 3667.9394112696314 runs/s > > + stdev 21.12345296601369 runs/s > > + min 3594.5452776430134 runs/s > > + max 3707.408251056703 runs/s > > + > > + When we used this patch, the performance was: > > + avg 3851.376555590946 runs/s > > + median 3858.343345260435 runs/s > > + stdev 25.245683967249494 runs/s > > + min 3782.510878241754 runs/s > > + max 3878.4305938425096 runs/s > > Can you explain where the performance gains are coming from? I picked the benchmark results simply at ramdom (no intention) to be honest. As a result, unfortunately, the number became confusing since the benchmark results on my machine are unstable. Let me update the benchmark result which runs on stable machine. We already tried that. Then land it. Created attachment 167226 [details]
Patch for landing
Comment on attachment 167226 [details] Patch for landing Clearing flags on attachment: 167226 Committed r130459: <http://trac.webkit.org/changeset/130459> All reviewed patches have been landed. Closing bug. |