Bug 73853

Summary: Inserting nodes is slow due to Node::notifyNodeListsAttributeChanged (20%+)
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: DOMAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, arv, catfish.man, darin, dglazkov, dslomov, esprehn, haraken, kbr, kling, koivisto, ojan, rafael.lobo, sam, tkent, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 73737, 73961, 73969, 74692, 75600, 89036, 89730, 89841, 100057    
Bug Blocks:    
Attachments:
Description Flags
perf test
none
profile result
none
diff I used to collect the stat.
none
updated stat collecting diff
none
Raw data for hit rate in DynamicNodeList::item (_ for misses, | for hits, and + for hits that will be missed in my proposal)
none
work in progress
none
Patch
none
kling's approach
none
Changed per Darin's comment
none
Fixed a bug
none
new patch
none
Removed more code
none
Fixed Mac build & per Ojan's comment
none
Updated ChangeLog & reverted xcodeproj change andersca: review+

Ryosuke Niwa
Reported 2011-12-05 12:27:11 PST
I've done more profiling on Shark, and it appears that we're spending 20%+ of the time inside Node::notifyNodeListsAttributeChanged. We should use some heuristics to speed up common cases.
Attachments
perf test (1.11 KB, text/html)
2011-12-05 12:28 PST, Ryosuke Niwa
no flags
profile result (deleted)
2011-12-05 12:34 PST, Ryosuke Niwa
no flags
diff I used to collect the stat. (8.05 KB, patch)
2011-12-06 12:16 PST, Ryosuke Niwa
no flags
updated stat collecting diff (10.83 KB, patch)
2011-12-06 15:14 PST, Ryosuke Niwa
no flags
Raw data for hit rate in DynamicNodeList::item (_ for misses, | for hits, and + for hits that will be missed in my proposal) (deleted)
2011-12-06 15:19 PST, Ryosuke Niwa
no flags
work in progress (22.54 KB, patch)
2011-12-17 11:55 PST, Ryosuke Niwa
no flags
Patch (30.16 KB, patch)
2011-12-20 12:13 PST, Ryosuke Niwa
no flags
kling's approach (18.99 KB, patch)
2011-12-21 18:19 PST, Ryosuke Niwa
no flags
Changed per Darin's comment (18.72 KB, patch)
2011-12-30 23:00 PST, Ryosuke Niwa
no flags
Fixed a bug (19.01 KB, patch)
2012-01-05 03:18 PST, Ryosuke Niwa
no flags
new patch (12.50 KB, patch)
2012-06-22 19:14 PDT, Ryosuke Niwa
no flags
Removed more code (15.96 KB, patch)
2012-06-22 23:46 PDT, Ryosuke Niwa
no flags
Fixed Mac build & per Ojan's comment (25.44 KB, patch)
2012-06-23 13:25 PDT, Ryosuke Niwa
no flags
Updated ChangeLog & reverted xcodeproj change (21.47 KB, patch)
2012-06-23 13:33 PDT, Ryosuke Niwa
andersca: review+
Ryosuke Niwa
Comment 1 2011-12-05 12:28:59 PST
Created attachment 117915 [details] perf test
Ryosuke Niwa
Comment 2 2011-12-05 12:34:44 PST
Created attachment 117919 [details] profile result 20.7% 20.7% WebCore WebCore::Node::notifyNodeListsAttributeChanged() 0.0% 20.6% WebCore WebCore::Node::dispatchSubtreeModifiedEvent() 0.0% 0.1% WebCore WebCore::ContainerNode::insertBefore(WTF::PassRefPtr<WebCore::Node>, WebCore::Node*, int&, bool) 0.0% 0.0% WebCore WebCore::ContainerNode::appendChild(WTF::PassRefPtr<WebCore::Node>, int&, bool) 3.3% 3.3% WebCore WebCore::QualifiedName::init(WTF::AtomicString const&, WTF::AtomicString const&, WTF::AtomicString const&) 3.3% 3.3% WebCore WebCore::JSNodeOwner::isReachableFromOpaqueRoots(JSC::Handle<JSC::Unknown>, void*, JSC::SlotVisitor&) 2.8% 2.8% WebCore WebCore::JSNodeOwner::finalize(JSC::Handle<JSC::Unknown>, void*) 2.8% 2.8% WebCore WebCore::createHTMLSpanElementWrapper(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>) 2.6% 2.6% JavaScriptCore WTF::AtomicString::addSlowCase(WTF::StringImpl*) 2.0% 2.0% WebCore WTF::HashMap<WTF::AtomicStringImpl*, WTF::PassRefPtr<WebCore::HTMLElement> (*)(WebCore::QualifiedName const&, WebCore::Document*, WebCore::HTMLFormElement*, bool), WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::PassRefPtr<WebCore::HTMLElement> (*)(WebCore::QualifiedName const&, WebCore::Document*, WebCore::HTMLFormElement*, bool)> >::get(WTF::AtomicStringImpl* const&) const 1.9% 1.9% WebCore WTF::HashMap<WTF::AtomicStringImpl*, WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>), WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)> >::get(WTF::AtomicStringImpl* const&) const 1.8% 1.8% WebCore WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*)
Sam Weinig
Comment 3 2011-12-05 16:20:29 PST
If this is the call to notifyNodeListsAttributeChanged() from Node::dispatchSubtreeModifiedEvent() (and I think it probably is) there is this comment: notifyNodeListsAttributeChanged(); // FIXME: Can do better some day. Really only care about the name attribute changing. Not sure if that is still true, but it is telling. The most obvious optimization that comes to mind is keeping a bit on the Document noting if there are any NodeLists that need to be invalidated (like we do with certain event listeners). The effectiveness of that optimization would be dependent on how many pages use NamedNodeLists and/or ClassNodeList. A better approach is probably to look at the callers to dispatchSubtreeModifiedEvent(), and see how many of them know that these caches don't need to be invalidated. The one in CharacterData, for instance, probably knows that those node lists can't have changed.
Ryosuke Niwa
Comment 4 2011-12-05 16:35:27 PST
Thanks for the comment, Sam! (In reply to comment #3) > notifyNodeListsAttributeChanged(); // FIXME: Can do better some day. Really only care about the name attribute changing. > > Not sure if that is still true, but it is telling. Yes. This is exactly the issue. We're walking up the tree (ancestors) per every node insertion and removal. > The most obvious optimization that comes to mind is keeping a bit on the Document noting if there are any NodeLists that need to be invalidated (like we do with certain event listeners). The effectiveness of that optimization would be dependent on how many pages use NamedNodeLists and/or ClassNodeList. We already do this (m_numNodeListCaches in TreeScope) but turned out be not so useful because almost every single web page uses querySelector/getElementById, etc... > A better approach is probably to look at the callers to dispatchSubtreeModifiedEvent(), and see how many of them know that these caches don't need to be invalidated. The one in CharacterData, for instance, probably knows that those node lists can't have changed. That's a good point. I hadn't thought about this case.
Alexey Proskuryakov
Comment 5 2011-12-05 22:58:38 PST
See also: bug 70810. This code really needs attention.
Ryosuke Niwa
Comment 6 2011-12-06 12:07:40 PST
I took some statistics on what kind of node lists are used in wild (again, gmail, facebook, twitter, wordpress.com, etc...): names: 3 0: 98.559% 4: 1.269% 2: 0.172% classes: 3 0: 75.054% 1: 8.839% 2: 8.581% 10: 2.602% 200: 1.484% 4: 1.054% 6: 0.946% 100: 0.430% 3: 0.323% 9: 0.280% labels: 3 0: 100.000% childs: 3 0: 44.581% 10: 10.129% 2: 7.699% 1: 6.839% 4: 5.570% 30: 5.312% 20: 2.946% 3: 2.387% 300: 1.849% 6: 1.656% tags: 3 4: 22.258% 0: 17.806% 10: 9.892% 6: 7.054% 20: 6.215% 40: 5.849% 30: 3.376% 7: 3.204% 3: 2.989% 5: 2.946%
Ryosuke Niwa
Comment 7 2011-12-06 12:08:09 PST
The sample size was 4650.
Ryosuke Niwa
Comment 8 2011-12-06 12:16:50 PST
Created attachment 118087 [details] diff I used to collect the stat.
Sam Weinig
Comment 9 2011-12-06 14:49:46 PST
Can you explain what those numbers mean. I am a bit confused by them.
Ryosuke Niwa
Comment 10 2011-12-06 14:56:05 PST
I have a hypothesis that we can just invalidate all caches whenever DOM mutation happens. In this world, each document will have a hashmap from node -> node list (probably excluding childnodes), and we'll remove all caches whenever a DOM mutation happens. I've collected 191220 samples while browsing websites like Gmail, Google+, Twitter, Facebook, etc... total: 191220 hits: 185772 (97.151%) misses: 3559 (1.861%) hits that will be missed under this approach: 1889 (0.988%)
Ryosuke Niwa
Comment 11 2011-12-06 14:57:58 PST
(In reply to comment #9) > Can you explain what those numbers mean. I am a bit confused by them. Oh sorry I guess I wasn't clear about this. These are numbers of instances that were alive at a time of DOM mutation (when dispatchChildInsertionEvents is called) and the percentage next to it shows the frequency of the number of alive objects for each type.
Ryosuke Niwa
Comment 12 2011-12-06 15:14:48 PST
Created attachment 118119 [details] updated stat collecting diff
Ryosuke Niwa
Comment 13 2011-12-06 15:19:34 PST
Created attachment 118121 [details] Raw data for hit rate in DynamicNodeList::item (_ for misses, | for hits, and + for hits that will be missed in my proposal)
Ojan Vafai
Comment 14 2011-12-06 15:31:45 PST
(In reply to comment #10) > I have a hypothesis that we can just invalidate all caches whenever DOM mutation happens. In this world, each document will have a hashmap from node -> node list (probably excluding childnodes), and we'll remove all caches whenever a DOM mutation happens. > > I've collected 191220 samples while browsing websites like Gmail, Google+, Twitter, Facebook, etc... > > total: 191220 > hits: 185772 (97.151%) > misses: 3559 (1.861%) > hits that will be missed under this approach: 1889 (0.988%) Does this exclude childNodes nodelists?
Ryosuke Niwa
Comment 15 2011-12-06 15:33:58 PST
(In reply to comment #14) > (In reply to comment #10) > > I have a hypothesis that we can just invalidate all caches whenever DOM mutation happens. In this world, each document will have a hashmap from node -> node list (probably excluding childnodes), and we'll remove all caches whenever a DOM mutation happens. > > > > I've collected 191220 samples while browsing websites like Gmail, Google+, Twitter, Facebook, etc... > > > > total: 191220 > > hits: 185772 (97.151%) > > misses: 3559 (1.861%) > > hits that will be missed under this approach: 1889 (0.988%) > > Does this exclude childNodes nodelists? Yes. ChildNodeList::item overrides DynamicNodeList's item.
Ryosuke Niwa
Comment 16 2011-12-17 11:55:54 PST
Created attachment 119733 [details] work in progress
Ryosuke Niwa
Comment 17 2011-12-19 12:23:49 PST
Good and bad news everyone: The good news is that my patch have successfully eliminated invalidateCaches from the profiler. The bad news is that new profile result indicates our hashmap/hashset may not be as efficient as it should be. In fact, we seem to spend 30-50% of time in hashmap/hashset at the moment. And the time we spend in them increase linearly with respect to the number of nodes inserted. New profile result: 4.5% 238 WebCore WebCore::createHTMLSpanElementWrapper(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>) 0.0% 237 WebCore WebCore::createJSHTMLWrapper(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>) 0.0% 237 WebCore WebCore::toJSNewlyCreated(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WebCore::Element*) 0.0% 237 WebCore WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*) 0.0% 237 Unknown Library 0x239a85c01538 [unknown] 0.0% 1 WebCore WebCore::toJSNewlyCreated(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WebCore::Element*) 4.0% 213 WebCore WebCore::JSNodeOwner::finalize(JSC::Handle<JSC::Unknown>, void*) 0.0% 213 JavaScriptCore JSC::HandleHeap::finalizeWeakHandles() 0.0% 4 WebProcess 0xbea [unreadable] 4.0% 212 WebCore WebCore::QualifiedName::init(WTF::AtomicString const&, WTF::AtomicString const&, WTF::AtomicString const&) 0.0% 207 WebCore WebCore::HTMLDocument::createElement(WTF::AtomicString const&, int&) 0.0% 207 WebCore WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*) 0.0% 207 Unknown Library 0x239a85c01538 [unknown] 0.0% 5 WebCore WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*) 3.1% 166 JavaScriptCore WTF::AtomicString::addSlowCase(WTF::StringImpl*) 0.0% 153 Unknown Library 0x1140469f8 [unknown] 0.0% 8 Unknown Library 0x239a85c01538 [unknown] 0.0% 3 WebCore WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*) 0.0% 1 WebCore WebCore::HTMLDocument::createElement(WTF::AtomicString const&, int&) 0.0% 1 Unknown Library 0x1140469d0 [unknown] 3.1% 164 WebCore WebCore::JSNodeOwner::isReachableFromOpaqueRoots(JSC::Handle<JSC::Unknown>, void*, JSC::SlotVisitor&) 0.0% 161 JavaScriptCore JSC::HandleHeap::visitWeakHandles(JSC::HeapRootVisitor&) 0.0% 3 WebProcess 0x1000000 [unreadable] 2.3% 124 WebCore WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*) 2.1% 114 JavaScriptCore WTF::TCMalloc_Central_FreeList::FetchFromSpansSafe() 2.0% 106 WebCore WebCore::TreeScope::clearNodeListCaches(WebCore::DynamicSubtreeNodeListType) 0.0% 83 WebCore WebCore::TreeScope::clearNodeListCachesForAllTypes() 0.0% 23 WebCore WebCore::Element::childrenChanged(bool, WebCore::Node*, WebCore::Node*, int) 1.9% 102 WebCore WTF::HashMap<WTF::AtomicStringImpl*, WTF::PassRefPtr<WebCore::HTMLElement> (*)(WebCore::QualifiedName const&, WebCore::Document*, WebCore::HTMLFormElement*, bool), WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::PassRefPtr<WebCore::HTMLElement> (*)(WebCore::QualifiedName const&, WebCore::Document*, WebCore::HTMLFormElement*, bool)> >::get(WTF::AtomicStringImpl* const&) const 1.9% 102 WebCore WebCore::ContainerNode::insertBefore(WTF::PassRefPtr<WebCore::Node>, WebCore::Node*, int&, bool) 0.0% 99 WebCore WebCore::Node::insertBefore(WTF::PassRefPtr<WebCore::Node>, WebCore::Node*, int&, bool) 0.0% 99 WebCore WebCore::JSNode::insertBefore(JSC::ExecState*) 0.0% 99 WebCore WebCore::jsNodePrototypeFunctionInsertBefore(JSC::ExecState*) 0.0% 3 WebCore WebCore::JSNode::insertBefore(JSC::ExecState*) 1.9% 99 JavaScriptCore WTF::StringImpl::lower() 1.9% 99 WebCore WTF::HashMap<JSC::ClassInfo const*, JSC::WriteBarrier<JSC::Structure>, WTF::PtrHash<JSC::ClassInfo const*>, WTF::HashTraits<JSC::ClassInfo const*>, WTF::HashTraits<JSC::WriteBarrier<JSC::Structure> > >::get(JSC::ClassInfo const* const&) const 1.7% 93 WebCore WebCore::ContainerNode::appendChild(WTF::PassRefPtr<WebCore::Node>, int&, bool) 1.7% 89 WebCore void WebCore::Private::addChildNodesToDeletionQueue<WebCore::Node, WebCore::ContainerNode>(WebCore::Node*&, WebCore::Node*&, WebCore::ContainerNode*) 1.7% 88 WebCore WebCore::HTMLDocument::createElement(WTF::AtomicString const&, int&) 1.6% 87 JavaScriptCore WTF::fastMalloc(unsigned long) 1.4% 75 WebCore WebCore::notifyChildInserted(WebCore::Node*)
Ryosuke Niwa
Comment 18 2011-12-20 10:51:31 PST
Here are some performance test results on r103144: PerformanceTests/Parser/xml-parser.html (2% improvement) Without patch: avg 1969.4 median 1953.5 stdev 25.63669245437094 min 1941 max 2002 With patch: avg 1935.4 median 1921 stdev 26.115895542753268 min 1903 max 1969 PerformanceTests/Parser/tiny-innerHTML.html (5% improvement) Without patch: avg 2284.15 median 2281.5 stdev 15.919406395968412 min 2257 max 2321 With patch: avg 2172.5 median 2170 stdev 13.966030216206752 min 2155 max 2215
Darin Adler
Comment 19 2011-12-20 11:09:01 PST
(In reply to comment #17) > The bad news is that new profile result indicates our hashmap/hashset may not be as efficient as it should be. In fact, we seem to spend 30-50% of time in hashmap/hashset at the moment. And the time we spend in them increase linearly with respect to the number of nodes inserted. I’m not sure what performance you are expecting from the HashMap. I believe that reading from it should be somewhere between O(1) and O(log n). Thus reading from it n times should be between O(n) and O(n log n). So if you are saying that time in the HashMap function is linear, that’s exactly what’s expected: O(n). We can achieve further speed by making a data structure that’s more specialized than a hash table for the operation in question. For example, for the HTML element factory, if we know the frequency of various types of tags, we can do a linear search in an array that is sorted by frequency instead of the hash map or as a first pass before falling back on it. I find all that time spent in StringImpl::lower() alarming. And I’m quite interested in the WTF::AtomicString::addSlowCase time as well. There’s a lot of juicy interesting things that could possibly be sped up here.
Ryosuke Niwa
Comment 20 2011-12-20 11:24:14 PST
Unfortunately, there appears to be 3% regression in PerformanceTests/Parser/html-parser.html Without patch: avg 21238.6 median 21242.5 stdev 42.886361468420226 min 21168 max 21322 With patch: avg 21714.15 median 21710.5 stdev 61.88317622746912 min 21614 max 21847 However, profile result doesn't tell us why this regression had happened. The regression doesn't appear to be a test flake given I've tried a couple of times and got a similar result. Here's profile result: 5.5% 5.5% WebCore WebCore::HTMLTokenizer::nextToken(WebCore::SegmentedString&, WebCore::HTMLToken&) 3.0% 3.0% WebCore WebCore::CSSStyleSelector::matchRulesForList(WTF::Vector<WebCore::RuleData, 0ul> const*, int&, int&, bool) 1.9% 1.9% JavaScriptCore WTF::fastFree(void*) 1.8% 1.8% JavaScriptCore WTF::fastMalloc(unsigned long) 1.4% 1.4% WebCore WebCore::CSSStyleSelector::applyProperty(int, WebCore::CSSValue*) 1.3% 1.3% WebCore void WebCore::Private::addChildNodesToDeletionQueue<WebCore::Node, WebCore::ContainerNode>(WebCore::Node*&, WebCore::Node*&, WebCore::ContainerNode*) 1.3% 1.3% WebCore WebCore::MarkupTokenizerBase<WebCore::HTMLToken, WebCore::HTMLTokenizerState>::InputStreamPreprocessor::advance(WebCore::SegmentedString&, int&) 1.2% 1.2% WebCore void WebCore::removeAllChildrenInContainer<WebCore::Node, WebCore::ContainerNode>(WebCore::ContainerNode*) 1.1% 1.1% JavaScriptCore WTF::AtomicString::add(unsigned short const*, unsigned int) 1.1% 1.1% JavaScriptCore WTF::AtomicString::addSlowCase(WTF::StringImpl*) 1.1% 1.1% WebCore WebCore::SelectorChecker::checkOneSelector(WebCore::CSSSelector*, WebCore::Element*, WebCore::PseudoId&, bool, WebCore::SelectorChecker::VisitedMatchType, WebCore::RenderStyle*, WebCore::RenderStyle*) const 0.9% 0.9% WebCore WebCore::visitedLinkHash(WebCore::KURL const&, WTF::AtomicString const&) 0.9% 0.9% JavaScriptCore WTF::TCMalloc_Central_FreeList::FetchFromSpansSafe() 0.9% 0.9% WebCore WebCore::HTMLTreeBuilder::processStartTagForInBody(WebCore::AtomicHTMLToken&) 0.9% 0.9% WebCore WebCore::CSSStyleSelector::adjustRenderStyle(WebCore::RenderStyle*, WebCore::RenderStyle*, WebCore::Element*) 0.7% 0.7% WebCore WTF::HashMap<WebCore::RenderBoxModelObject const*, WebCore::RenderBoxModelObject*, WTF::PtrHash<WebCore::RenderBoxModelObject const*>, WTF::HashTraits<WebCore::RenderBoxModelObject const*>, WTF::HashTraits<WebCore::RenderBoxModelObject*> >::get(WebCore::RenderBoxModelObject const* const&) const 0.7% 0.7% WebCore WebCore::HTMLConstructionSite::insertTextNode(WTF::String const&, WebCore::WhitespaceMode) 0.7% 0.7% libSystem.B.dylib __memcpy 0.7% 0.7% WebCore WebCore::CSSStyleSelector::styleForElement(WebCore::Element*, WebCore::RenderStyle*, bool, bool, WebCore::RenderRegion*) 0.7% 0.7% JavaScriptCore WTF::TCMalloc_Central_FreeList::RemoveRange(void**, void**, int*) 0.7% 0.7% WebCore WebCore::SelectorChecker::fastCheckSelector(WebCore::CSSSelector const*, WebCore::Element const*) const 0.7% 0.7% WebCore WebCore::SegmentedString::operator=(WebCore::SegmentedString const&) 0.6% 0.6% JavaScriptCore WTF::TCMalloc_Central_FreeList::ReleaseListToSpans(void*) 0.6% 0.6% WebCore WebCore::RenderObject::RenderObject(WebCore::Node*) 0.6% 0.6% WebCore void WebCore::CSSStyleSelector::applyDeclaration<true>(WebCore::CSSMutableStyleDeclaration*, bool, bool) 0.6% 0.6% WebCore WebCore::CSSStyleSelector::matchRules(WebCore::RuleSet*, int&, int&, bool) 0.6% 0.6% WebCore WTF::HashMap<WTF::AtomicStringImpl*, WTF::OwnPtr<WTF::Vector<WebCore::RuleData, 0ul> >, WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::OwnPtr<WTF::Vector<WebCore::RuleData, 0ul> > > >::get(WTF::AtomicStringImpl* const&) const 0.6% 0.6% WebCore WebCore::ContainerNode::detach() 0.6% 0.6% libSystem.B.dylib pthread_getspecific 0.6% 0.6% WebCore WebCore::QualifiedName::init(WTF::AtomicString const&, WTF::AtomicString const&, WTF::AtomicString const&) 0.6% 0.6% WebCore WebCore::AtomicHTMLToken::AtomicHTMLToken(WebCore::HTMLToken&) 0.5% 0.5% WebCore WebCore::RenderObjectChildList::appendChildNode(WebCore::RenderObject*, WebCore::RenderObject*, bool) 0.5% 0.5% WebCore WebCore::Text::~Text() 0.5% 0.5% WebCore WebCore::HTMLSourceTracker::start(WebCore::HTMLInputStream const&, WebCore::HTMLToken&) 0.5% 0.5% WebCore WebCore::Element::setAttributeMap(WTF::PassRefPtr<WebCore::NamedNodeMap>, WebCore::FragmentScriptingPermission) 0.5% 0.5% WebCore WebCore::mergeDoubleSlashes(WTF::Vector<unsigned short, 512ul>&, unsigned long) 0.5% 0.5% WebCore WebCore::RenderObject::setStyle(WTF::PassRefPtr<WebCore::RenderStyle>) 0.5% 0.5% WebCore WebCore::HTMLTreeBuilder::constructTreeFromAtomicToken(WebCore::AtomicHTMLToken&) 0.5% 0.5% WebCore WebCore::RenderText::RenderText(WebCore::Node*, WTF::PassRefPtr<WTF::StringImpl>) 0.5% 0.5% WebCore WebCore::Document::removeAllEventListeners() 0.5% 0.5% WebCore WebCore::HTMLTreeBuilder::constructTreeFromToken(WebCore::HTMLToken&)
Ryosuke Niwa
Comment 21 2011-12-20 11:41:06 PST
(In reply to comment #19) > (In reply to comment #17) > > The bad news is that new profile result indicates our hashmap/hashset may not be as efficient as it should be. In fact, we seem to spend 30-50% of time in hashmap/hashset at the moment. And the time we spend in them increase linearly with respect to the number of nodes inserted. > > I’m not sure what performance you are expecting from the HashMap. I believe that reading from it should be somewhere between O(1) and O(log n). Thus reading from it n times should be between O(n) and O(n log n). So if you are saying that time in the HashMap function is linear, that’s exactly what’s expected: O(n). No, what I'm saying is that the number of lookup time appears to be increasing with respect to the number of nodes in the hashmap/hashset. To give you more concrete idea, download https://bug-73737-attachments.webkit.org/attachment.cgi?id=117785 and open it on ToT WebKit. You get results like: Shallow tree: 88 ms Deep tree (20+): 93 ms Deeper tree (50+): 143 ms Now, reverse the order in which we run tests; e.g. replace lines 44-46 by: document.getElementById('shallow').innerHTML = repeatTest(0); document.getElementById('deep').innerHTML = repeatTest(20); document.getElementById('deeper').innerHTML = repeatTest(50); Then if you open it again in ToT WebKit, you get results like: Shallow tree: 104 ms Deep tree (20+): 84 ms Deeper tree (50+): 86 ms There might be some flakes but the overall, this is the trend I get if I repeat the experiment over and over.
Ryosuke Niwa
Comment 22 2011-12-20 12:13:58 PST
Ryosuke Niwa
Comment 23 2011-12-20 12:21:56 PST
Here's my patch. Given the regression in PerformanceTests/Parser/html-parser.html and the amount of code I'm adding, I'm not certain if this is a total win. It might be better to just have per-tree-scope cache counter but not manage node list themselves at the tree scope so that we can detect cases where we don't have any valid caches. Having said that, this patch eliminates O(n) performance in many cases and clearly improves the result of some performance tests so I'll leave it for other reviewers to decide whether this is a good patch or not.
Ryosuke Niwa
Comment 24 2011-12-20 17:49:10 PST
kling pointed out that I can lazily invalidate the node list cache by recording DOMVersion and comparing it when we access the cache next time. That sounds like a really good good idea. I'll try that out next week (I'll be stuck "gardening" WebKit for the remainder of the week).
Darin Adler
Comment 25 2011-12-21 10:15:03 PST
Comment on attachment 120057 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=120057&action=review Looks OK. I wasn’t able to think through everything about the data structure in detail. I would have liked to see a more thorough test, though. Given how much different code there is in here I’d like to see something that covers more of the code for different types, for example. > Source/WebCore/dom/DynamicNodeList.cpp:37 > + if (type != LabelNodeListType) > + rootNode()->registerDynamicSubtreeNodeList(type, this); This mysterious if condition needs a comment. > Source/WebCore/dom/DynamicNodeList.cpp:49 > + if (!hasValidCache()) > + node()->treeScope()->nodeListCachedValue(m_type); It’s a little strange to call a function saying “I cached a value” before caching the value. > Source/WebCore/dom/Node.cpp:487 > + for (NodeListsNodeData::ClassNodeListCache::const_iterator it = nodeLists->m_classNodeListCache.begin(); > + it != nodeLists->m_classNodeListCache.end(); ++it) { > + if (currentTreeScope) > + currentTreeScope->removeNodeListCache(ClassNodeListType, it->second); > + newTreeScope->addNodeListCache(ClassNodeListType, it->second); > + } So ugly. I can’t wait until we support only compilers with good C++11 support. If we had one of those then we could write: for (const auto& value: nodeLists->m_classNodeListCache.values()) { if (currentTreeScope) currentTreeScope->removeNodeListCache(ClassNodeListType, value); newTreeScope->addNodeListCache(ClassNodeListType, value); } I wonder if we can achieve something similar with a macro. > Source/WebCore/dom/Node.cpp:543 > + // FIXME: assert that nodes don't have node lists Sentence style please. Not sure that an idea for an assertion really qualifies as FIXME. I might just make a comment about the possible improvement without using FIXME. Recently I have been thinking we should strive to consistently use FIXME only when we think something is wrong, not when there is simply room for improvement. > Source/WebCore/dom/Node.cpp:1064 > +static DynamicSubtreeNodeListType attrToNodeListType(const QualifiedName& attrName) I think a better name is: nodeListTypeForAttribute These noun-type names work better if the beginning of the name is the noun. > Source/WebCore/dom/Node.cpp:1080 > + return InvalidNodeListType; > + // LabelsNodeList is invalidated via notifyLocalNodeListsLabelChanged in parseMappedAttribute Missing period in this sentence-style comment. Comment would be better before the return statement. > Source/WebCore/dom/Node.cpp:1102 > + // FIXME: We don't have to invalidate caches if the inserted or removed node > + // didn't have an attribute we care about. I think that for clarity we should say “any of the attributes” rather than “an attribute”. Also would be clearer if this stated that this is a performance idea. > Source/WebCore/dom/Node.cpp:3025 > -void NodeRareData::createNodeLists(Node* node) > +void NodeRareData::createNodeLists(Node*) Perhaps we should remove the unused argument entirely. > Source/WebCore/dom/Node.h:114 > +enum DynamicSubtreeNodeListType { > + ClassNodeListType, > + NameNodeListType, > + TagNodeListType, > + TagNodeListNSType, > +#if ENABLE(MICRODATA) > + MicroDataItemListType, > +#endif > + LabelNodeListType, > + InvalidNodeListType > +}; I’d suggest: 1) Adding a brief comment about why Invalid must be the last value. 2) Sorting the other values alphabetically as we do headers, in a separate paragraph. 3) Putting the #if value in its own separate paragraph instead of mixing it in with the unconditional ones. > Source/WebCore/dom/Node.h:115 > +const unsigned numDynamicSubtreeNodeListType = InvalidNodeListType; Why make the constant unsigned instead of of the enum type? I believe you could use the enum type consistently, even for the loop index, and avoid all the static_cast calls in the code below. The name should be plural with a trailing “s”: “number of types”, not “number of type”. > Source/WebCore/dom/TreeScope.h:63 > + void nodeListCachedValue(DynamicSubtreeNodeListType type) This sounds like a getter that returns a “cached value” for a “node list”. This kind of situation is what leads to names like nodeListDidCacheValue or nodeListWillCacheValue, either of which will be clearer than this. > Source/WebCore/dom/TreeScope.h:69 > + void nodeListInvalidatedCache(DynamicSubtreeNodeListType type) I think that nodeListDidInvalidateCache is a slightly better name. > Source/WebCore/html/LabelsNodeList.cpp:35 > +LabelsNodeList::LabelsNodeList(Node* forNode) The old name here, “for node”, is not good naming. We want to use a noun phrase. There might even be a good idea for the name in the HTML specification itself. But you need not change that name in this patch.
Ryosuke Niwa
Comment 26 2011-12-21 10:25:20 PST
Thanks for the review, Darin. But I want to try out kling's approach compare the performance first. I might have a time today; if not, definitely on next Wed.
Darin Adler
Comment 27 2011-12-21 11:09:44 PST
(In reply to comment #24) > kling pointed out that I can lazily invalidate the node list cache by recording DOMVersion and comparing it when we access the cache next time. That sounds like a really good good idea. Makes invalidation really cheap, but also really imprecise. I agree it’s worth investigating, but it’s not clear how to investigate the implications of this sufficiently.
Andreas Kling
Comment 28 2011-12-21 14:51:59 PST
(In reply to comment #27) > (In reply to comment #24) > > kling pointed out that I can lazily invalidate the node list cache by recording DOMVersion and comparing it when we access the cache next time. That sounds like a really good good idea. > > Makes invalidation really cheap, but also really imprecise. I agree it’s worth investigating, but it’s not clear how to investigate the implications of this sufficiently. One idea would be to keep additional version numbers that would always be >=DOMVersion, and would be >DOMVersion when e.g an attribute changes, allowing us to only invalidate those caches that care about attributes. incDOMTreeVersion() would then bring all version numbers to the same level.
Andreas Kling
Comment 29 2011-12-21 14:55:05 PST
(In reply to comment #28) > (In reply to comment #27) > > (In reply to comment #24) > > > kling pointed out that I can lazily invalidate the node list cache by recording DOMVersion and comparing it when we access the cache next time. That sounds like a really good good idea. > > > > Makes invalidation really cheap, but also really imprecise. I agree it’s worth investigating, but it’s not clear how to investigate the implications of this sufficiently. > > One idea would be to keep additional version numbers that would always be >=DOMVersion, and would be >DOMVersion when e.g an attribute changes, allowing us to only invalidate those caches that care about attributes. incDOMTreeVersion() would then bring all version numbers to the same level. Ehm. I was apparently typing a bit faster than I was thinking here. Changing an attribute would obviously bump DOMVersion as well. It would have to work the other way around; the more specific versions could lag behind DOMVersion.
Ryosuke Niwa
Comment 30 2011-12-21 15:20:23 PST
I implemented kling's approach and tested https://bug-73737-attachments.webkit.org/attachment.cgi?id=117785 on r103398 Here's result: Shallow tree: 104.2±2.9 -> 96.8±1.6 (7.6% reduction) Deep tree (20+): 105.4±1.9 -> 96.7±1.2 (9.0% reduction) Deeper tree (50+): 137.9±1.1 -> 134.9±1.1 (2.2% reduction)
Ryosuke Niwa
Comment 31 2011-12-21 16:55:46 PST
The new approach is a clear winner here: PerformanceTests/Parser/xml-parser.html (12.6% reduction) Without patch avg 2147.75 median 2140 stdev 22.144694624220943 min 2111 max 2187 With patch avg 1876.95 median 1868 stdev 28.452548216284598 min 1839 max 1934 PerformanceTests/Parser/tiny-innerHTML.html (9.1% reduction) Without patch avg 20932.75 median 20929 stdev 47.63493990759304 min 20832 max 21084 With patch avg 1919.3 median 1903.5 stdev 27.087081791880053 min 1888 max 1964 PerformanceTests/Parser/html-parser.html (1.7% reduction) Without patch avg 21291.15 median 21294.5 stdev 35.27360911503103 min 21227 max 21369 With patch avg 20932.75 median 20929 stdev 47.63493990759304 min 20832 max 21084
Ryosuke Niwa
Comment 32 2011-12-21 18:19:25 PST
Created attachment 120253 [details] kling's approach
Darin Adler
Comment 33 2011-12-21 21:17:02 PST
(In reply to comment #31) > The new approach is a clear winner here: We knew the new approach would be better when doing lots of modifications and not much access to the node list. But did you think about what the bad case would be for the new approach, and measure that? How much worse are the cases that get worse?
Ryosuke Niwa
Comment 34 2011-12-21 21:30:16 PST
(In reply to comment #33) > (In reply to comment #31) > > The new approach is a clear winner here: > > We knew the new approach would be better when doing lots of modifications and not much access to the node list. But did you think about what the bad case would be for the new approach, and measure that? How much worse are the cases that get worse? The worst case senario is that we'll end up invalidating cache every time we access it. However this was already the case before the patch; e.g. if you had written scripts like: var divs = element.getElementsByTagName('div'); for (var i = 0; i < divs.length; i++) divs[i].appendChild(document.createElement('br')); then we would end up invalidating the cache for divs along with node lists caches of element's ancestors. Luckily, this is unlikely to affect our performance on popular websites according to the statistics on comment #10: > I have a hypothesis that we can just invalidate all caches whenever DOM mutation happens. ... > I've collected 191220 samples while browsing websites like Gmail, Google+, Twitter, Facebook, etc... > > total: 191220 > hits: 185772 (97.151%) > misses: 3559 (1.861%) > hits that will be missed under this approach: 1889 (0.988%)
Ryosuke Niwa
Comment 35 2011-12-26 22:01:38 PST
Ping reviewers.
Darin Adler
Comment 36 2011-12-28 12:44:42 PST
Comment on attachment 120253 [details] kling's approach View in context: https://bugs.webkit.org/attachment.cgi?id=120253&action=review > Source/WebCore/dom/ChildNodeList.cpp:39 > + if (m_caches->m_isLengthCacheValid) > + return m_caches->m_length; It’s strange to call this “length cache” in one case, and “length” in another. I’d like to see these two data members use the same terminology. > Source/WebCore/dom/DynamicNodeList.h:48 > - unsigned cachedLength; > - Node* lastItem; > - unsigned lastItemOffset; > - bool isLengthCacheValid : 1; > - bool isItemCacheValid : 1; > + > + unsigned m_length; > + Node* m_lastItem; > + unsigned m_lastItemOffset; > + bool m_isLengthCacheValid : 1; > + bool m_isItemCacheValid : 1; I don’t like this change: Given that these are public data members, they should not have the m_ prefix added. If we want to use the m_ prefix, then we should make this a class with accessor functions instead. > Source/WebCore/dom/DynamicNodeList.h:120 > + DynamicNodeList::Caches::m_length; > + DynamicNodeList::Caches::m_lastItem; > + DynamicNodeList::Caches::m_lastItemOffset; > + DynamicNodeList::Caches::m_isLengthCacheValid; > + DynamicNodeList::Caches::m_isItemCacheValid; I have never seen this syntax before and I am surprised it works. I think the correct syntax is: using DynamicNodeList::Caches::m_length; But also, this syntax works: using Caches::m_length; I believe deriving from a base class makes the class name usable without a prefix. > Source/WebCore/dom/Node.cpp:1023 > -void Node::invalidateNodeListsCacheAfterAttributeChanged(const QualifiedName& attrName) > +void Node::invalidateNodeListsCacheAfterAttributeChanged(const QualifiedName&) We should remove the argument entirely, not just have it be unused. > Source/WebCore/dom/Node.cpp:2353 > #if ENABLE(MICRODATA) > invalidateMicrodataItemListCaches(); > #endif > + > } Shouldn’t add this blank line.
Ryosuke Niwa
Comment 37 2011-12-30 23:00:02 PST
Created attachment 120828 [details] Changed per Darin's comment
Ryosuke Niwa
Comment 38 2011-12-30 23:04:19 PST
(In reply to comment #36) > (From update of attachment 120253 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=120253&action=review > > > Source/WebCore/dom/ChildNodeList.cpp:39 > > + if (m_caches->m_isLengthCacheValid) > > + return m_caches->m_length; > > It’s strange to call this “length cache” in one case, and “length” in another. I’d like to see these two data members use the same terminology. Okay, I've renamed m_length to m_cachedLength and m_lastItem to m_cachedItem to be consistent. > > Source/WebCore/dom/DynamicNodeList.h:48 > > - unsigned cachedLength; > > - Node* lastItem; > > - unsigned lastItemOffset; > > - bool isLengthCacheValid : 1; > > - bool isItemCacheValid : 1; > > + > > + unsigned m_length; > > + Node* m_lastItem; > > + unsigned m_lastItemOffset; > > + bool m_isLengthCacheValid : 1; > > + bool m_isItemCacheValid : 1; > > I don’t like this change: Given that these are public data members, they should not have the m_ prefix added. If we want to use the m_ prefix, then we should make this a class with accessor functions instead. Yeah, I didn't like that either. I've made DynamicSubtreeNodeList::SubtreeCaches not inherit from DynamicNodeList::Caches. I think it's cleaner approach overall. SubtreeCaches no longer needs to be ref-counted and all variables and member functions can be appropriately without using and other name-scope hacks. > > Source/WebCore/dom/Node.cpp:2353 > > #if ENABLE(MICRODATA) > > invalidateMicrodataItemListCaches(); > > #endif > > + > > } > > Shouldn’t add this blank line. Oops, it seems like I missed this comment. Will fix once I get the next round of reviews.
WebKit Review Bot
Comment 39 2011-12-31 00:12:11 PST
Comment on attachment 120828 [details] Changed per Darin's comment Attachment 120828 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/10954364 New failing tests: http/tests/inspector/resource-tree/resource-tree-document-url.html
Ryosuke Niwa
Comment 40 2012-01-04 13:23:22 PST
Ping reviewers.
Antti Koivisto
Comment 41 2012-01-04 14:15:15 PST
Comment on attachment 120828 [details] Changed per Darin's comment View in context: https://bugs.webkit.org/attachment.cgi?id=120828&action=review > Source/WebCore/dom/DynamicNodeList.h:109 > + void setLengthCache(Node* node, unsigned length) > + { > + if (m_isItemCacheValid && !domVersionIsConsistent()) { > + m_cachedItem = node; > + m_isItemCacheValid = false; > + } > + m_cachedLength = length; > + m_isLengthCacheValid = true; > + } You should move the functions to cpp (with inline keyword) as that is the only place they are used. > Source/WebCore/dom/DynamicNodeList.h:132 > + unsigned m_cachedLength; > + bool m_isLengthCacheValid : 1; > + bool m_isItemCacheValid : 1; You could steal the space for these bits from some other field.
Ryosuke Niwa
Comment 42 2012-01-04 15:08:56 PST
Ryosuke Niwa
Comment 43 2012-01-04 21:01:00 PST
Reopen the bug since the patch was rolled out.
Dmitry Lomov
Comment 44 2012-01-04 21:05:12 PST
Patch rolled out: apparently regressed dom-perf after being rolled into chromium: http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116444
Ryosuke Niwa
Comment 45 2012-01-04 21:23:47 PST
(In reply to comment #44) > Patch rolled out: apparently regressed dom-perf after being rolled into chromium: > http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116444 For whatever reason, these tests are not available publicly so I can't disclose any information as to what's going on but the high level summary is that the test is running multiple tests at once and because some tests modify DOM, we'll end up invalidating caches more often than we used to. We'll probably need per-document counter instead.
Ryosuke Niwa
Comment 46 2012-01-05 03:18:10 PST
Created attachment 121252 [details] Fixed a bug
Ryosuke Niwa
Comment 47 2012-01-05 03:24:44 PST
The perf. regression was caused by the DOM version not being updated after caching values :( I've ran dom_perf locally and took 7 samples (larger value is better): Without patch: 734 758 785 791 780 784 779 avg: 773, stdev: 20 With patch: 765 761 750 765 741 770 762 avg: 759, stdev: 10 While there's a slight decline (1.8%) in the average value, it's statistically insignificant. Furthermore, the scores indicate that we've improved DOM modifications at the cost of slight regression in DOM traversal via dynamic node list, which is exactly what we expect from this patch. Based on improvements in WebKit's performance tests posted on comment #31, I think we should proceed with this patch.
Antti Koivisto
Comment 48 2012-01-05 10:59:27 PST
Comment on attachment 121252 [details] Fixed a bug sounds like a reasonable tradeoff, r=me
Ryosuke Niwa
Comment 49 2012-01-05 13:48:53 PST
Comment on attachment 121252 [details] Fixed a bug Clearing flags on attachment: 121252 Committed r104210: <http://trac.webkit.org/changeset/104210>
Ryosuke Niwa
Comment 50 2012-01-05 13:49:01 PST
All reviewed patches have been landed. Closing bug.
Dmitry Lomov
Comment 51 2012-01-05 17:21:28 PST
Ryosuke Niwa
Comment 52 2012-01-05 17:22:33 PST
(In reply to comment #51) > dom_perf no longer regressed - thanks for the fix! > http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116590 Actually, the patch hasn't been landed yet :(
Dmitry Lomov
Comment 53 2012-01-05 17:25:27 PST
(In reply to comment #52) > (In reply to comment #51) > > dom_perf no longer regressed - thanks for the fix! > > http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116590 > > Actually, the patch hasn't been landed yet :( Isn't r114210 the patch? What am I missing?
Dmitry Lomov
Comment 54 2012-01-05 17:26:01 PST
(In reply to comment #53) > (In reply to comment #52) > > (In reply to comment #51) > > > dom_perf no longer regressed - thanks for the fix! > > > http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116590 > > > > Actually, the patch hasn't been landed yet :( > > Isn't r114210 the patch? What am I missing? s/r114210/r104210/
Ryosuke Niwa
Comment 55 2012-01-05 17:33:27 PST
(In reply to comment #54) > (In reply to comment #53) > > (In reply to comment #52) > > > (In reply to comment #51) > > > > dom_perf no longer regressed - thanks for the fix! > > > > http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116590 > > > > > > Actually, the patch hasn't been landed yet :( > > > > Isn't r114210 the patch? What am I missing? > s/r114210/r104210/ Huh, didn't notice that. Hm... it seems like I've regressed this one: http://build.chromium.org/f/chromium/perf/xp-release-dual-core/dromaeo_domcore/report.html?history=50&rev=-1&graph=dom_query_getElementsByName__not_in_document_ I think I know how to fix it though.
Ojan Vafai
Comment 56 2012-01-05 18:54:45 PST
(In reply to comment #55) > Huh, didn't notice that. Hm... it seems like I've regressed this one: http://build.chromium.org/f/chromium/perf/xp-release-dual-core/dromaeo_domcore/report.html?history=50&rev=-1&graph=dom_query_getElementsByName__not_in_document_ > > I think I know how to fix it though. Maybe roll it out while you work out a fix?
Ryosuke Niwa
Comment 57 2012-01-05 19:31:35 PST
The regression was fixed in http://trac.webkit.org/changeset/104263.
Ryosuke Niwa
Comment 58 2012-01-10 23:03:58 PST
Reopen the bug since the patch was rolled out in http://trac.webkit.org/changeset/104674 and http://trac.webkit.org/changeset/104673.
Ryosuke Niwa
Comment 59 2012-01-10 23:15:42 PST
Build fix for the revert landed in http://trac.webkit.org/changeset/104675.
Ryosuke Niwa
Comment 60 2012-06-22 19:14:56 PDT
Created attachment 149156 [details] new patch
Ryosuke Niwa
Comment 61 2012-06-22 19:15:59 PDT
6% improvement on Dromaeo/dom-modify.html Before: RESULT Dromaeo: dom-modify= 3897.79625691 runs/s median= 0.0 runs/s, stdev= 52.9873315944 runs/s, min= 3801.28571429 runs/s, max= 4016.84667667 runs/s RESULT Dromaeo: dom-modify= 3878.28725467 runs/s median= 0.0 runs/s, stdev= 66.6932298708 runs/s, min= 3774.65020127 runs/s, max= 4050.16268955 runs/s After: RESULT Dromaeo: dom-modify= 4146.1717243 runs/s median= 0.0 runs/s, stdev= 54.9684861451 runs/s, min= 4013.73249528 runs/s, max= 4276.54651536 runs/s RESULT Dromaeo: dom-modify= 4109.48266573 runs/s median= 0.0 runs/s, stdev= 68.4080593631 runs/s, min= 4000.8610192 runs/s, max= 4289.26268699 runs/s 6.6% improvement on DOM/DOMDivWalk.html Before: RESULT DOM: DOMDivWalk= 58.4666666667 ms median= 59.0 ms, stdev= 1.36788563525 ms, min= 55.8333333333 ms, max= 60.6666666667 ms RESULT DOM: DOMDivWalk= 59.4083333333 ms median= 59.0833333333 ms, stdev= 1.38089644796 ms, min= 57.5 ms, max= 63.3333333333 ms After: RESULT DOM: DOMDivWalk= 54.9 ms median= 54.8333333333 ms, stdev= 0.925562891794 ms, min= 53.5 ms, max= 58.0 ms RESULT DOM: DOMDivWalk= 55.525 ms median= 55.4166666667 ms, stdev= 1.01553899208 ms, min= 53.8333333333 ms, max= 58.0 ms No statistically significant differences on xml-parser.html and html-parser.html Before: RESULT Parser: xml-parser= 6.97092743469 runs/s median= 6.96055684455 runs/s, stdev= 0.0682343218493 runs/s, min= 6.82593856655 runs/s, max= 7.11743772242 runs/s RESULT Parser: xml-parser= 6.89537037269 runs/s median= 6.90848347453 runs/s, stdev= 0.116432899383 runs/s, min= 6.62251655629 runs/s, max= 7.05052878966 runs/s After: RESULT Parser: xml-parser= 7.06476592238 runs/s median= 7.06713780919 runs/s, stdev= 0.0613748572558 runs/s, min= 6.86498855835 runs/s, max= 7.16845878136 runs/s RESULT Parser: xml-parser= 7.05728470567 runs/s median= 7.05052878966 runs/s, stdev= 0.0293092648714 runs/s, min= 7.00116686114 runs/s, max= 7.10900473934 runs/s Before: RESULT Parser: html-parser= 1952.45 ms median= 1956.0 ms, stdev= 16.8210433684 ms, min= 1931.0 ms, max= 1985.0 ms RESULT Parser: html-parser= 1961.3 ms median= 1966.0 ms, stdev= 11.6150764096 ms, min= 1938.0 ms, max= 1982.0 ms After: RESULT Parser: html-parser= 1961.1 ms median= 1959.5 ms, stdev= 11.7426572802 ms, min= 1937.0 ms, max= 1990.0 ms RESULT Parser: html-parser= 1972.85 ms median= 1970.0 ms, stdev= 13.5951278037 ms, min= 1944.0 ms, max= 2002.0 ms
Ryosuke Niwa
Comment 62 2012-06-22 22:19:56 PDT
Comment on attachment 149156 [details] new patch Just realized that I can remove more code.
Ryosuke Niwa
Comment 63 2012-06-22 23:46:07 PDT
Created attachment 149165 [details] Removed more code
Build Bot
Comment 64 2012-06-23 00:43:49 PDT
Comment on attachment 149165 [details] Removed more code Attachment 149165 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13030592
Ojan Vafai
Comment 65 2012-06-23 07:56:45 PDT
Comment on attachment 149165 [details] Removed more code View in context: https://bugs.webkit.org/attachment.cgi?id=149165&action=review Overall, this looks like a big improvement! This patch also uses an extra pointer per DynamicNodeList now because it adds more of them to m_listsInvalidatedAtDocument, right? I think that's fine, but the ChangeLog should state it. I don't have a good intuition for how many concurrent DynamicNodeLists a give page has. Have you instrumented gmail or facebook to get a sense of what the memory impact of this change would be? > Source/WebCore/dom/Document.cpp:3879 > + if (!attrName || (*it)->shouldInvalidateOnAttributeChange()) As the code is, it's weird for this to take a QualifiedName instead of an enum or something (e.g. ShouldClearAttributeCaches). But, I think in the future, we probably want to be even more fine-grained and only clear the attribute caches that are affected by this attribute name. Mind putting a FIXME to that effect? With the FIXME there it makes more sense to pass in a QualifiedName. > Source/WebCore/dom/Node.cpp:967 > + document()->clearNodeListCaches(&attrName); The comment on lines 950-955 probably needs to be updated.
Ryosuke Niwa
Comment 66 2012-06-23 12:23:28 PDT
Thanks for the review! (In reply to comment #65) > This patch also uses an extra pointer per DynamicNodeList now because it adds more of them to m_listsInvalidatedAtDocument, right? I think that's fine, but the ChangeLog should state it. Fair point. > I don't have a good intuition for how many concurrent DynamicNodeLists a give page has. Have you instrumented gmail or facebook to get a sense of what the memory impact of this change would be? We used to do this unintentionally until I fixed it in http://trac.webkit.org/changeset/110797 three months ago, so I don't think there will be any impact. > > Source/WebCore/dom/Document.cpp:3879 > > + if (!attrName || (*it)->shouldInvalidateOnAttributeChange()) > > As the code is, it's weird for this to take a QualifiedName instead of an enum or something (e.g. ShouldClearAttributeCaches). But, I think in the future, we probably want to be even more fine-grained and only clear the attribute caches that are affected by this attribute name. Mind putting a FIXME to that effect? With the FIXME there it makes more sense to pass in a QualifiedName. Right, that's the intention. I'll have a follow up patch that'll only invalidate node lists that match QualifiedName. I guess we can have a FIXME until then. > > Source/WebCore/dom/Node.cpp:967 > > + document()->clearNodeListCaches(&attrName); > > The comment on lines 950-955 probably needs to be updated. Will do.
Ryosuke Niwa
Comment 67 2012-06-23 13:25:58 PDT
Created attachment 149179 [details] Fixed Mac build & per Ojan's comment
Ryosuke Niwa
Comment 68 2012-06-23 13:33:30 PDT
Created attachment 149180 [details] Updated ChangeLog & reverted xcodeproj change
Ryosuke Niwa
Comment 69 2012-06-23 20:13:26 PDT
WebKit Review Bot
Comment 70 2012-06-24 17:35:56 PDT
Re-opened since this is blocked by 89841
Ryosuke Niwa
Comment 71 2012-06-24 17:40:47 PDT
I'm rolling out the patch because it caused 85-90% regressions on Dromaeo/jslib-modify-jquery.html http://webkit-perf.appspot.com/graph.html#tests=[[41019,2001,173262],[41019,2001,32196]]&sel=1340303912887.2336,1340533472011.3213,0,13125&displayrange=7&datatype=running Turns out jquery depends on our current optimization :( On one hand, the very perf test framework I added caught this regression so I should be happy but on the other hand, this would mean that we can never fix this bug so I'm sad.
Ryosuke Niwa
Comment 72 2012-06-24 17:49:33 PDT
Note 85-90% regression means that our code has gotten roughly 10 times slower (not as twice as slower).
Kentaro Hara
Comment 73 2012-06-24 19:49:59 PDT
(In reply to comment #72) > Note 85-90% regression means that our code has gotten roughly 10 times slower (not as twice as slower). We thought that this might be a bug of GC (bug 85162 and 88739), but it does not seem to be a bug of GC. As you pointed out, jQuery depends on our current optimization, and your patch ended up having jQuery.domManip() call jQuery.clone() every time (PerformanceTests/Dromaeo/resources/dromaeo/web/lib/jquery-1.6.4.js, line 5857). So the reason is not that GC is doing something wrong, but that with your patch jQuery calls jQuery.clone() a lot of times, which causes the memory bloat.
Ryosuke Niwa
Comment 74 2012-06-24 20:01:00 PDT
(In reply to comment #73) > (In reply to comment #72) > > Note 85-90% regression means that our code has gotten roughly 10 times slower (not as twice as slower). > > We thought that this might be a bug of GC (bug 85162 and 88739), but it does not seem to be a bug of GC. You're right but my point on g+ is that the fact calling clone() more ends up reducing the overall memory usage (the tests stopped crashing when this regression was introduced) means that the bugs 85162 and 88739 are indeed some problems in GC.
Kentaro Hara
Comment 75 2012-06-24 20:40:10 PDT
(In reply to comment #74) > You're right but my point on g+ is that the fact calling clone() more ends up reducing the overall memory usage (the tests stopped crashing when this regression was introduced) means that the bugs 85162 and 88739 are indeed some problems in GC. I am a bit confused. In my understanding: (1) Before r121106, jslib-modify-jquery.html was working fine. (2) After r121106, jslib-modify-jquery.html caused memory bloat. Some bots start crashing due to the memory bloat. (3) After reverting r121106 back, jslib-modify-jquery.html is working fine again. Where is a problem?
Ryosuke Niwa
Comment 76 2012-06-24 20:41:49 PDT
(In reply to comment #75) > (In reply to comment #74) > > You're right but my point on g+ is that the fact calling clone() more ends up reducing the overall memory usage (the tests stopped crashing when this regression was introduced) means that the bugs 85162 and 88739 are indeed some problems in GC. > > I am a bit confused. In my understanding: > > (1) Before r121106, jslib-modify-jquery.html was working fine. > (2) After r121106, jslib-modify-jquery.html caused memory bloat. Some bots start crashing due to the memory bloat. > (3) After reverting r121106 back, jslib-modify-jquery.html is working fine again. No. Before r121106, jslib-modify-jquery.html was crashing due to the memory bloat. After r121106 until it was landed, the test was running fine.
Ryosuke Niwa
Comment 77 2012-06-24 20:42:36 PDT
(In reply to comment #76) > No. Before r121106, jslib-modify-jquery.html was crashing due to the memory bloat. After r121106 until it was landed, the test was running fine. I meant to say that the test started running fine after r121106 until it was rolled out.
Kentaro Hara
Comment 78 2012-06-24 23:31:43 PDT
I confirmed that the memory bloat is still happening with r121106. (1) Before r121106, jslib-modify-jquery.html was crashing due to the memory bloat. (2) After r121106 and until r121106 is reverted, jslib-modify-jquery.html was working fine, although the performance regressed by 10x. Our question was why r121106 fixed the memory bloat. Actually, the memory bloat was happening even after r121106. However, since r121106 regressed performance by 10x, jslib-modify-jquery.html completed without reaching the memory bloat. In other words, r121106 was slowly causing the memory bloat. I confirmed that if I run jslib-modify-jquery.html for a long time, then r121106 also causes the memory bloat. So the current problem can be summarized into the following two: [1] r121106 regressed performance by 10x. [2] (Regardless of with or without r121106,) jslib-modify-jquery.html causes the memory bloat. Regarding [1], we already know the reason; i.e. jQuery is assuming our current optimization and not expecting rniwa's patch, unfortunately. Regarding [2], it seems that the memory bloat is just because jQuery clones too much nodes. I inserted window.internals.numberOfLiveNodes() in jslib-modify-jquery.html, and confirmed that the number of live nodes are monotonically increases both in Chromium/Linux and AppleWebKit/Mac. It seems that this is not a problem of GC; this is just a problem of jQuery cloning too much nodes.
Ryosuke Niwa
Comment 79 2012-06-25 10:17:10 PDT
Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point.
Elliott Sprehn
Comment 80 2012-11-07 16:05:09 PST
(In reply to comment #79) > Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point. The rare data map was removed in Bug 100057 which means Bug 89635 is resolved. I assume that means this is too?
Ryosuke Niwa
Comment 81 2012-11-07 16:23:39 PST
(In reply to comment #80) > (In reply to comment #79) > > Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point. > > The rare data map was removed in Bug 100057 which means Bug 89635 is resolved. I assume that means this is too? Probably. Try profiling it?
Elliott Sprehn
Comment 82 2012-11-07 18:10:28 PST
(In reply to comment #81) > (In reply to comment #80) > > (In reply to comment #79) > > > Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point. > > > > The rare data map was removed in Bug 100057 which means Bug 89635 is resolved. I assume that means this is too? > > Probably. Try profiling it? Ran the profile and the node list methods don't show up in the trace: Running Time Self Symbol Name 396.0ms 4.0% 396.0 WebCore::JSNodeOwner::isReachableFromOpaqueRoots(JSC::Handle<JSC::Unknown>, void*, JSC::SlotVisitor&) 377.0ms 3.8% 377.0 void WebCore::removeAllChildrenInContainer<WebCore::Node, WebCore::ContainerNode>(WebCore::ContainerNode*) 272.0ms 2.7% 272.0 JSC::WeakBlock::visit(JSC::HeapRootVisitor&) 269.0ms 2.7% 269.0 WTF::KeyValuePair<WTF::AtomicStringImpl*, WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)>* WTF::HashTable<WTF::AtomicStringImpl*, WTF::KeyValuePair<WTF::AtomicStringImpl*, WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)>, WTF::KeyValuePairKeyExtractor<WTF::KeyValuePair<WTF::AtomicStringImpl*, WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)> >, WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashMapValueTraits<WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)> >, WTF::HashTraits<WTF::AtomicStringImpl*> >::lookup<WTF::IdentityHashTranslator<WTF::PtrHash<WTF::AtomicStringImpl*> >, WTF::AtomicStringImpl*>(WTF::AtomicStringImpl* const&) 256.0ms 2.6% 256.0 WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*) 241.0ms 2.4% 241.0 _ZN7WebCoreL24updateTreeAfterInsertionEPNS_13ContainerNodeEPNS_4NodeEb 204.0ms 2.0% 204.0 WTF::AtomicString::addSlowCase(WTF::StringImpl*) 189.0ms 1.9% 189.0 WebCore::StyledElement::StyledElement(WebCore::QualifiedName const&, WebCore::Document*, WebCore::Node::ConstructionType) 186.0ms 1.9% 186.0 _ZN7WebCoreL28createHTMLSpanElementWrapperEPN3JSC9ExecStateEPNS_17JSDOMGlobalObjectEN3WTF10PassRefPtrINS_11HTMLElementEEE
Ryosuke Niwa
Comment 83 2012-11-07 20:02:26 PST
(In reply to comment #82) > (In reply to comment #81) > > (In reply to comment #80) > > > (In reply to comment #79) > > > > Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point. > > > > > > The rare data map was removed in Bug 100057 which means Bug 89635 is resolved. I assume that means this is too? > > > > Probably. Try profiling it? > > Ran the profile and the node list methods don't show up in the trace: Super!
Note You need to log in before you can comment on or make changes to this bug.