Speed up HitTestResult
Created attachment 86365 [details] Patch
Attachment 86365 [details] did not build on qt: Build output: http://queues.webkit.org/results/8212864
Comment on attachment 86365 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=86365&action=review Looks reasonable, aside from a style nitpick. > Source/WebCore/rendering/HitTestResult.cpp:583 > + for (NodeSet::const_iterator it = other.m_rectBasedTestResult->begin(), > + last = other.m_rectBasedTestResult->end(); The indent is weird here. Come to the dark side... Put it on one line :)
Attachment 86365 [details] did not build on win: Build output: http://queues.webkit.org/results/8214838
... and the bot gods look angry.
Created attachment 86372 [details] Patch
Thanks for the review! OK, I put the "for (...)" on one line. The bot anger was because some compilers are not optimizing "if (!x) { if (x) ...; y; }" to "if (!x) y;" before instantiating templates. That should be fixed by moving the offending code to the .cpp file where it has the definition of WebCore::Node available.
Comment on attachment 86372 [details] Patch let's give this puppy a whirl.
Comment on attachment 86372 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=86372&action=review > Source/WebCore/rendering/HitTestResult.cpp:108 > + m_rectBasedTestResult.set(other.m_rectBasedTestResult ? new NodeSet(*other.m_rectBasedTestResult) : 0); Please use adoptPtr rather than the obsolete set. > Source/WebCore/rendering/HitTestResult.cpp:134 > + m_rectBasedTestResult.set(other.m_rectBasedTestResult ? new NodeSet(*other.m_rectBasedTestResult) : 0); Please use adoptPtr rather than the obsolete set. > Source/WebCore/rendering/HitTestResult.cpp:562 > - m_rectBasedTestResult.add(node); > + getOrCreateNodeSet().add(node); I think the function could just be named nodeSet(). But also, before it was named based on its purpose and now you named it based on its data type. > Source/WebCore/rendering/HitTestResult.cpp:602 > + m_rectBasedTestResult.set(new NodeSet()); Please use adoptPtr instead of set. No need for the parentheses after NodeSet. > Source/WebCore/rendering/HitTestResult.h:123 > + const NodeSet& rectBasedTestResult() { return getOrCreateNodeSet(); } The old function was const, but your new function is not. But it should be. > Source/WebCore/rendering/HitTestResult.h:127 > + NodeSet& getOrCreateNodeSet(); This should be const. > Source/WebCore/rendering/HitTestResult.h:146 > + OwnPtr<NodeSet> m_rectBasedTestResult; This should be mutable.
Comment on attachment 86372 [details] Patch darin has good comments, as always.
Thanks for the review! > > Source/WebCore/rendering/HitTestResult.h:123 > > + const NodeSet& rectBasedTestResult() { return getOrCreateNodeSet(); } > > The old function was const, but your new function is not. But it should be. I considered that option but was concerned about thread safety. If a method is const then I would expect it to be safe to call without locking. If I take your suggestion should I also add an explanatory comment?
There is no expectation that const methods in WebKit are safe to call from any thread. We actually prefer designs where a given object only has its methods called from a single thread. Besides, nothing in DOM or layout code is thread safe.
Created attachment 86507 [details] Patch
+ cost of HitTestResult(IntPoint()) in EventHandler::mouseMoved() + from ~1700 cycles to ~300 cycles. What are the reasons to believe that it's worth making the code more complicated for this? The improvement is on microsecond scale. Historically, we had a lot of success making many 1% performance improvements, but it's not obvious if there is any scenario where this would provide 1%.
Thanks for the review! I agree this is not an earth-shattering improvement. I found it when informal testing and instrumentation suggested that this code path is high on the list of things that drive memory allocation and deallocation. And it's worse than the typical allocation, because it allocates several kilobytes and memsets the whole thing and then, typically, uses none of it. I estimate that the change can save a few ms of CPU per wall-clock second when the mouse is moving. It may be useful under other circumstances as well, I'm not sure. If you want more information or would like to suggest other areas where performance tuning would be more fruitful, please let me know.
> I estimate that the change can save a few ms of CPU per wall-clock second when the mouse is moving. That sounds more substantial than what I mentioned. Could you please share your calculation?
The savings per HitTestResult object is say 1400 cycles plus a bit more for the destructor. Call it .5 to 1 us. If you start Chrome and rapidly click links for 20 seconds and quit, several thousand HitTestResult objects are created and destroyed per second. That suggests a realistic peak savings of a few CPU ms per second when the mouse is moving. Of course, most of the time it's not. Even so, it's not that ugly a change for the benefit it provides.
This still looks like a good change to me :) Alexey, your thoughts?
Comment on attachment 86507 [details] Patch This looks good now.
Comment on attachment 86507 [details] Patch Clearing flags on attachment: 86507 Committed r82340: <http://trac.webkit.org/changeset/82340>
All reviewed patches have been landed. Closing bug.