Bug 73853

Summary: Inserting nodes is slow due to Node::notifyNodeListsAttributeChanged (20%+)
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: DOMAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, arv, catfish.man, darin, dglazkov, dslomov, esprehn, haraken, kbr, kling, koivisto, ojan, rafael.lobo, sam, tkent, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 73737, 73961, 73969, 74692, 75600, 89036, 89730, 89841, 100057    
Bug Blocks:    
Attachments:
Description Flags
perf test
none
profile result
none
diff I used to collect the stat.
none
updated stat collecting diff
none
Raw data for hit rate in DynamicNodeList::item (_ for misses, | for hits, and + for hits that will be missed in my proposal)
none
work in progress
none
Patch
none
kling's approach
none
Changed per Darin's comment
none
Fixed a bug
none
new patch
none
Removed more code
none
Fixed Mac build & per Ojan's comment
none
Updated ChangeLog & reverted xcodeproj change andersca: review+

Description Ryosuke Niwa 2011-12-05 12:27:11 PST
I've done more profiling on Shark, and it appears that we're spending 20%+ of the time inside Node::notifyNodeListsAttributeChanged.

We should use some heuristics to speed up common cases.
Comment 1 Ryosuke Niwa 2011-12-05 12:28:59 PST
Created attachment 117915 [details]
perf test
Comment 2 Ryosuke Niwa 2011-12-05 12:34:44 PST
Created attachment 117919 [details]
profile result

20.7%   20.7%   WebCore    WebCore::Node::notifyNodeListsAttributeChanged()
0.0%    20.6%   WebCore      WebCore::Node::dispatchSubtreeModifiedEvent()
0.0%    0.1%    WebCore      WebCore::ContainerNode::insertBefore(WTF::PassRefPtr<WebCore::Node>, WebCore::Node*, int&, bool)
0.0%    0.0%    WebCore      WebCore::ContainerNode::appendChild(WTF::PassRefPtr<WebCore::Node>, int&, bool)
3.3%    3.3%    WebCore    WebCore::QualifiedName::init(WTF::AtomicString const&, WTF::AtomicString const&, WTF::AtomicString const&)
3.3%    3.3%    WebCore    WebCore::JSNodeOwner::isReachableFromOpaqueRoots(JSC::Handle<JSC::Unknown>, void*, JSC::SlotVisitor&)
2.8%    2.8%    WebCore    WebCore::JSNodeOwner::finalize(JSC::Handle<JSC::Unknown>, void*)
2.8%    2.8%    WebCore    WebCore::createHTMLSpanElementWrapper(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)
2.6%    2.6%    JavaScriptCore WTF::AtomicString::addSlowCase(WTF::StringImpl*)
2.0%    2.0%    WebCore    WTF::HashMap<WTF::AtomicStringImpl*, WTF::PassRefPtr<WebCore::HTMLElement> (*)(WebCore::QualifiedName const&, WebCore::Document*, WebCore::HTMLFormElement*, bool), WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::PassRefPtr<WebCore::HTMLElement> (*)(WebCore::QualifiedName const&, WebCore::Document*, WebCore::HTMLFormElement*, bool)> >::get(WTF::AtomicStringImpl* const&) const
1.9%    1.9%    WebCore    WTF::HashMap<WTF::AtomicStringImpl*, WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>), WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)> >::get(WTF::AtomicStringImpl* const&) const
1.8%    1.8%    WebCore    WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*)
Comment 3 Sam Weinig 2011-12-05 16:20:29 PST
If this is the call to notifyNodeListsAttributeChanged() from Node::dispatchSubtreeModifiedEvent() (and I think it probably is) there is this comment:

    notifyNodeListsAttributeChanged(); // FIXME: Can do better some day. Really only care about the name attribute changing.

Not sure if that is still true, but it is telling.

The most obvious optimization that comes to mind is keeping a bit on the Document noting if there are any NodeLists that need to be invalidated (like we do with certain event listeners).  The effectiveness of that optimization would be dependent on how many pages use NamedNodeLists and/or ClassNodeList.

A better approach is probably to look at the callers to dispatchSubtreeModifiedEvent(), and see how many of them know that these caches don't need to be invalidated.  The one in CharacterData, for instance, probably knows that those node lists can't have changed.
Comment 4 Ryosuke Niwa 2011-12-05 16:35:27 PST
Thanks for the comment, Sam!

(In reply to comment #3)
>     notifyNodeListsAttributeChanged(); // FIXME: Can do better some day. Really only care about the name attribute changing.
> 
> Not sure if that is still true, but it is telling.

Yes. This is exactly the issue. We're walking up the tree (ancestors) per every node insertion and removal.

> The most obvious optimization that comes to mind is keeping a bit on the Document noting if there are any NodeLists that need to be invalidated (like we do with certain event listeners).  The effectiveness of that optimization would be dependent on how many pages use NamedNodeLists and/or ClassNodeList.

We already do this (m_numNodeListCaches in TreeScope) but turned out be not so useful because almost every single web page uses querySelector/getElementById, etc...

> A better approach is probably to look at the callers to dispatchSubtreeModifiedEvent(), and see how many of them know that these caches don't need to be invalidated.  The one in CharacterData, for instance, probably knows that those node lists can't have changed.

That's a good point. I hadn't thought about this case.
Comment 5 Alexey Proskuryakov 2011-12-05 22:58:38 PST
See also: bug 70810. This code really needs attention.
Comment 6 Ryosuke Niwa 2011-12-06 12:07:40 PST
I took some statistics on what kind of node lists are used in wild (again, gmail, facebook, twitter, wordpress.com, etc...):

names: 3
0: 98.559%
4: 1.269%
2: 0.172%

classes: 3
0: 75.054%
1: 8.839%
2: 8.581%
10: 2.602%
200: 1.484%
4: 1.054%
6: 0.946%
100: 0.430%
3: 0.323%
9: 0.280%

labels: 3
0: 100.000%

childs: 3
0: 44.581%
10: 10.129%
2: 7.699%
1: 6.839%
4: 5.570%
30: 5.312%
20: 2.946%
3: 2.387%
300: 1.849%
6: 1.656%

tags: 3
4: 22.258%
0: 17.806%
10: 9.892%
6: 7.054%
20: 6.215%
40: 5.849%
30: 3.376%
7: 3.204%
3: 2.989%
5: 2.946%
Comment 7 Ryosuke Niwa 2011-12-06 12:08:09 PST
The sample size was 4650.
Comment 8 Ryosuke Niwa 2011-12-06 12:16:50 PST
Created attachment 118087 [details]
diff I used to collect the stat.
Comment 9 Sam Weinig 2011-12-06 14:49:46 PST
Can you explain what those numbers mean.  I am a bit confused by them.
Comment 10 Ryosuke Niwa 2011-12-06 14:56:05 PST
I have a hypothesis that we can just invalidate all caches whenever DOM mutation happens. In this world, each document will have a hashmap from node -> node list (probably excluding childnodes), and we'll remove all caches whenever a DOM mutation happens.

I've collected 191220 samples while browsing websites like Gmail, Google+, Twitter, Facebook, etc...

total:  191220
hits: 185772 (97.151%)
misses: 3559 (1.861%)
hits that will be missed under this approach: 1889 (0.988%)
Comment 11 Ryosuke Niwa 2011-12-06 14:57:58 PST
(In reply to comment #9)
> Can you explain what those numbers mean.  I am a bit confused by them.

Oh sorry I guess I wasn't clear about this. These are numbers of instances that were alive at a time of DOM mutation (when dispatchChildInsertionEvents is called) and the percentage next to it shows the frequency of the number of alive objects for each type.
Comment 12 Ryosuke Niwa 2011-12-06 15:14:48 PST
Created attachment 118119 [details]
updated stat collecting diff
Comment 13 Ryosuke Niwa 2011-12-06 15:19:34 PST
Created attachment 118121 [details]
Raw data for hit rate in DynamicNodeList::item (_ for misses, | for hits, and + for hits that will be missed in my proposal)
Comment 14 Ojan Vafai 2011-12-06 15:31:45 PST
(In reply to comment #10)
> I have a hypothesis that we can just invalidate all caches whenever DOM mutation happens. In this world, each document will have a hashmap from node -> node list (probably excluding childnodes), and we'll remove all caches whenever a DOM mutation happens.
> 
> I've collected 191220 samples while browsing websites like Gmail, Google+, Twitter, Facebook, etc...
> 
> total:  191220
> hits: 185772 (97.151%)
> misses: 3559 (1.861%)
> hits that will be missed under this approach: 1889 (0.988%)

Does this exclude childNodes nodelists?
Comment 15 Ryosuke Niwa 2011-12-06 15:33:58 PST
(In reply to comment #14)
> (In reply to comment #10)
> > I have a hypothesis that we can just invalidate all caches whenever DOM mutation happens. In this world, each document will have a hashmap from node -> node list (probably excluding childnodes), and we'll remove all caches whenever a DOM mutation happens.
> > 
> > I've collected 191220 samples while browsing websites like Gmail, Google+, Twitter, Facebook, etc...
> > 
> > total:  191220
> > hits: 185772 (97.151%)
> > misses: 3559 (1.861%)
> > hits that will be missed under this approach: 1889 (0.988%)
> 
> Does this exclude childNodes nodelists?

Yes. ChildNodeList::item overrides DynamicNodeList's item.
Comment 16 Ryosuke Niwa 2011-12-17 11:55:54 PST
Created attachment 119733 [details]
work in progress
Comment 17 Ryosuke Niwa 2011-12-19 12:23:49 PST
Good and bad news everyone:

The good news is that my patch have successfully eliminated invalidateCaches from the profiler. The bad news is that new profile result indicates our hashmap/hashset may not be as efficient as it should be. In fact, we seem to spend 30-50% of time in hashmap/hashset at the moment. And the time we spend in them increase linearly with respect to the number of nodes inserted.

New profile result:
	4.5%	238	WebCore	WebCore::createHTMLSpanElementWrapper(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)
	0.0%	237	WebCore	 WebCore::createJSHTMLWrapper(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)
	0.0%	237	WebCore	  WebCore::toJSNewlyCreated(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WebCore::Element*)
	0.0%	237	WebCore	   WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*)
	0.0%	237	Unknown Library	    0x239a85c01538 [unknown]
	0.0%	1	WebCore	 WebCore::toJSNewlyCreated(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WebCore::Element*)
	4.0%	213	WebCore	WebCore::JSNodeOwner::finalize(JSC::Handle<JSC::Unknown>, void*)
	0.0%	213	JavaScriptCore	 JSC::HandleHeap::finalizeWeakHandles()
	0.0%	4	WebProcess	  0xbea [unreadable]
	4.0%	212	WebCore	WebCore::QualifiedName::init(WTF::AtomicString const&, WTF::AtomicString const&, WTF::AtomicString const&)
	0.0%	207	WebCore	 WebCore::HTMLDocument::createElement(WTF::AtomicString const&, int&)
	0.0%	207	WebCore	  WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*)
	0.0%	207	Unknown Library	   0x239a85c01538 [unknown]
	0.0%	5	WebCore	 WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*)
	3.1%	166	JavaScriptCore	WTF::AtomicString::addSlowCase(WTF::StringImpl*)
	0.0%	153	Unknown Library	 0x1140469f8 [unknown]
	0.0%	8	Unknown Library	 0x239a85c01538 [unknown]
	0.0%	3	WebCore	 WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*)
	0.0%	1	WebCore	 WebCore::HTMLDocument::createElement(WTF::AtomicString const&, int&)
	0.0%	1	Unknown Library	 0x1140469d0 [unknown]
	3.1%	164	WebCore	WebCore::JSNodeOwner::isReachableFromOpaqueRoots(JSC::Handle<JSC::Unknown>, void*, JSC::SlotVisitor&)
	0.0%	161	JavaScriptCore	 JSC::HandleHeap::visitWeakHandles(JSC::HeapRootVisitor&)
	0.0%	3	WebProcess	 0x1000000 [unreadable]
	2.3%	124	WebCore	WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*)
	2.1%	114	JavaScriptCore	WTF::TCMalloc_Central_FreeList::FetchFromSpansSafe()
	2.0%	106	WebCore	WebCore::TreeScope::clearNodeListCaches(WebCore::DynamicSubtreeNodeListType)
	0.0%	83	WebCore	 WebCore::TreeScope::clearNodeListCachesForAllTypes()
	0.0%	23	WebCore	 WebCore::Element::childrenChanged(bool, WebCore::Node*, WebCore::Node*, int)
	1.9%	102	WebCore	WTF::HashMap<WTF::AtomicStringImpl*, WTF::PassRefPtr<WebCore::HTMLElement> (*)(WebCore::QualifiedName const&, WebCore::Document*, WebCore::HTMLFormElement*, bool), WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::PassRefPtr<WebCore::HTMLElement> (*)(WebCore::QualifiedName const&, WebCore::Document*, WebCore::HTMLFormElement*, bool)> >::get(WTF::AtomicStringImpl* const&) const
	1.9%	102	WebCore	WebCore::ContainerNode::insertBefore(WTF::PassRefPtr<WebCore::Node>, WebCore::Node*, int&, bool)
	0.0%	99	WebCore	 WebCore::Node::insertBefore(WTF::PassRefPtr<WebCore::Node>, WebCore::Node*, int&, bool)
	0.0%	99	WebCore	  WebCore::JSNode::insertBefore(JSC::ExecState*)
	0.0%	99	WebCore	   WebCore::jsNodePrototypeFunctionInsertBefore(JSC::ExecState*)
	0.0%	3	WebCore	 WebCore::JSNode::insertBefore(JSC::ExecState*)
	1.9%	99	JavaScriptCore	WTF::StringImpl::lower()
	1.9%	99	WebCore	WTF::HashMap<JSC::ClassInfo const*, JSC::WriteBarrier<JSC::Structure>, WTF::PtrHash<JSC::ClassInfo const*>, WTF::HashTraits<JSC::ClassInfo const*>, WTF::HashTraits<JSC::WriteBarrier<JSC::Structure> > >::get(JSC::ClassInfo const* const&) const
	1.7%	93	WebCore	WebCore::ContainerNode::appendChild(WTF::PassRefPtr<WebCore::Node>, int&, bool)
	1.7%	89	WebCore	void WebCore::Private::addChildNodesToDeletionQueue<WebCore::Node, WebCore::ContainerNode>(WebCore::Node*&, WebCore::Node*&, WebCore::ContainerNode*)
	1.7%	88	WebCore	WebCore::HTMLDocument::createElement(WTF::AtomicString const&, int&)
	1.6%	87	JavaScriptCore	WTF::fastMalloc(unsigned long)
	1.4%	75	WebCore	WebCore::notifyChildInserted(WebCore::Node*)
Comment 18 Ryosuke Niwa 2011-12-20 10:51:31 PST
Here are some performance test results on r103144:

PerformanceTests/Parser/xml-parser.html (2% improvement)
Without patch:
avg 1969.4
median 1953.5
stdev 25.63669245437094
min 1941
max 2002

With patch:
avg 1935.4
median 1921
stdev 26.115895542753268
min 1903
max 1969

PerformanceTests/Parser/tiny-innerHTML.html (5% improvement)
Without patch:
avg 2284.15
median 2281.5
stdev 15.919406395968412
min 2257
max 2321

With patch:
avg 2172.5
median 2170
stdev 13.966030216206752
min 2155
max 2215
Comment 19 Darin Adler 2011-12-20 11:09:01 PST
(In reply to comment #17)
> The bad news is that new profile result indicates our hashmap/hashset may not be as efficient as it should be. In fact, we seem to spend 30-50% of time in hashmap/hashset at the moment. And the time we spend in them increase linearly with respect to the number of nodes inserted.

I’m not sure what performance you are expecting from the HashMap. I believe that reading from it should be somewhere between O(1) and O(log n). Thus reading from it n times should be between O(n) and O(n log n). So if you are saying that time in the HashMap function is linear, that’s exactly what’s expected: O(n).

We can achieve further speed by making a data structure that’s more specialized than a hash table for the operation in question. For example, for the HTML element factory, if we know the frequency of various types of tags, we can do a linear search in an array that is sorted by frequency instead of the hash map or as a first pass before falling back on it.

I find all that time spent in StringImpl::lower() alarming. And I’m quite interested in the WTF::AtomicString::addSlowCase time as well. There’s a lot of juicy interesting things that could possibly be sped up here.
Comment 20 Ryosuke Niwa 2011-12-20 11:24:14 PST
Unfortunately, there appears to be 3% regression in PerformanceTests/Parser/html-parser.html

Without patch:
avg 21238.6
median 21242.5
stdev 42.886361468420226
min 21168
max 21322

With patch:
avg 21714.15
median 21710.5
stdev 61.88317622746912
min 21614
max 21847

However, profile result doesn't tell us why this regression had happened. The regression doesn't appear to be a test flake given I've tried a couple of times and got a similar result.

Here's profile result:

5.5%	5.5%	WebCore	WebCore::HTMLTokenizer::nextToken(WebCore::SegmentedString&, WebCore::HTMLToken&)
3.0%	3.0%	WebCore	WebCore::CSSStyleSelector::matchRulesForList(WTF::Vector<WebCore::RuleData, 0ul> const*, int&, int&, bool)
1.9%	1.9%	JavaScriptCore	WTF::fastFree(void*)
1.8%	1.8%	JavaScriptCore	WTF::fastMalloc(unsigned long)
1.4%	1.4%	WebCore	WebCore::CSSStyleSelector::applyProperty(int, WebCore::CSSValue*)
1.3%	1.3%	WebCore	void WebCore::Private::addChildNodesToDeletionQueue<WebCore::Node, WebCore::ContainerNode>(WebCore::Node*&, WebCore::Node*&, WebCore::ContainerNode*)
1.3%	1.3%	WebCore	WebCore::MarkupTokenizerBase<WebCore::HTMLToken, WebCore::HTMLTokenizerState>::InputStreamPreprocessor::advance(WebCore::SegmentedString&, int&)
1.2%	1.2%	WebCore	void WebCore::removeAllChildrenInContainer<WebCore::Node, WebCore::ContainerNode>(WebCore::ContainerNode*)
1.1%	1.1%	JavaScriptCore	WTF::AtomicString::add(unsigned short const*, unsigned int)
1.1%	1.1%	JavaScriptCore	WTF::AtomicString::addSlowCase(WTF::StringImpl*)
1.1%	1.1%	WebCore	WebCore::SelectorChecker::checkOneSelector(WebCore::CSSSelector*, WebCore::Element*, WebCore::PseudoId&, bool, WebCore::SelectorChecker::VisitedMatchType, WebCore::RenderStyle*, WebCore::RenderStyle*) const
0.9%	0.9%	WebCore	WebCore::visitedLinkHash(WebCore::KURL const&, WTF::AtomicString const&)
0.9%	0.9%	JavaScriptCore	WTF::TCMalloc_Central_FreeList::FetchFromSpansSafe()
0.9%	0.9%	WebCore	WebCore::HTMLTreeBuilder::processStartTagForInBody(WebCore::AtomicHTMLToken&)
0.9%	0.9%	WebCore	WebCore::CSSStyleSelector::adjustRenderStyle(WebCore::RenderStyle*, WebCore::RenderStyle*, WebCore::Element*)
0.7%	0.7%	WebCore	WTF::HashMap<WebCore::RenderBoxModelObject const*, WebCore::RenderBoxModelObject*, WTF::PtrHash<WebCore::RenderBoxModelObject const*>, WTF::HashTraits<WebCore::RenderBoxModelObject const*>, WTF::HashTraits<WebCore::RenderBoxModelObject*> >::get(WebCore::RenderBoxModelObject const* const&) const
0.7%	0.7%	WebCore	WebCore::HTMLConstructionSite::insertTextNode(WTF::String const&, WebCore::WhitespaceMode)
0.7%	0.7%	libSystem.B.dylib	__memcpy
0.7%	0.7%	WebCore	WebCore::CSSStyleSelector::styleForElement(WebCore::Element*, WebCore::RenderStyle*, bool, bool, WebCore::RenderRegion*)
0.7%	0.7%	JavaScriptCore	WTF::TCMalloc_Central_FreeList::RemoveRange(void**, void**, int*)
0.7%	0.7%	WebCore	WebCore::SelectorChecker::fastCheckSelector(WebCore::CSSSelector const*, WebCore::Element const*) const
0.7%	0.7%	WebCore	WebCore::SegmentedString::operator=(WebCore::SegmentedString const&)
0.6%	0.6%	JavaScriptCore	WTF::TCMalloc_Central_FreeList::ReleaseListToSpans(void*)
0.6%	0.6%	WebCore	WebCore::RenderObject::RenderObject(WebCore::Node*)
0.6%	0.6%	WebCore	void WebCore::CSSStyleSelector::applyDeclaration<true>(WebCore::CSSMutableStyleDeclaration*, bool, bool)
0.6%	0.6%	WebCore	WebCore::CSSStyleSelector::matchRules(WebCore::RuleSet*, int&, int&, bool)
0.6%	0.6%	WebCore	WTF::HashMap<WTF::AtomicStringImpl*, WTF::OwnPtr<WTF::Vector<WebCore::RuleData, 0ul> >, WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WTF::OwnPtr<WTF::Vector<WebCore::RuleData, 0ul> > > >::get(WTF::AtomicStringImpl* const&) const
0.6%	0.6%	WebCore	WebCore::ContainerNode::detach()
0.6%	0.6%	libSystem.B.dylib	pthread_getspecific
0.6%	0.6%	WebCore	WebCore::QualifiedName::init(WTF::AtomicString const&, WTF::AtomicString const&, WTF::AtomicString const&)
0.6%	0.6%	WebCore	WebCore::AtomicHTMLToken::AtomicHTMLToken(WebCore::HTMLToken&)
0.5%	0.5%	WebCore	WebCore::RenderObjectChildList::appendChildNode(WebCore::RenderObject*, WebCore::RenderObject*, bool)
0.5%	0.5%	WebCore	WebCore::Text::~Text()
0.5%	0.5%	WebCore	WebCore::HTMLSourceTracker::start(WebCore::HTMLInputStream const&, WebCore::HTMLToken&)
0.5%	0.5%	WebCore	WebCore::Element::setAttributeMap(WTF::PassRefPtr<WebCore::NamedNodeMap>, WebCore::FragmentScriptingPermission)
0.5%	0.5%	WebCore	WebCore::mergeDoubleSlashes(WTF::Vector<unsigned short, 512ul>&, unsigned long)
0.5%	0.5%	WebCore	WebCore::RenderObject::setStyle(WTF::PassRefPtr<WebCore::RenderStyle>)
0.5%	0.5%	WebCore	WebCore::HTMLTreeBuilder::constructTreeFromAtomicToken(WebCore::AtomicHTMLToken&)
0.5%	0.5%	WebCore	WebCore::RenderText::RenderText(WebCore::Node*, WTF::PassRefPtr<WTF::StringImpl>)
0.5%	0.5%	WebCore	WebCore::Document::removeAllEventListeners()
0.5%	0.5%	WebCore	WebCore::HTMLTreeBuilder::constructTreeFromToken(WebCore::HTMLToken&)
Comment 21 Ryosuke Niwa 2011-12-20 11:41:06 PST
(In reply to comment #19)
> (In reply to comment #17)
> > The bad news is that new profile result indicates our hashmap/hashset may not be as efficient as it should be. In fact, we seem to spend 30-50% of time in hashmap/hashset at the moment. And the time we spend in them increase linearly with respect to the number of nodes inserted.
> 
> I’m not sure what performance you are expecting from the HashMap. I believe that reading from it should be somewhere between O(1) and O(log n). Thus reading from it n times should be between O(n) and O(n log n). So if you are saying that time in the HashMap function is linear, that’s exactly what’s expected: O(n).

No, what I'm saying is that the number of lookup time appears to be increasing with respect to the number of nodes in the hashmap/hashset.

To give you more concrete idea, download https://bug-73737-attachments.webkit.org/attachment.cgi?id=117785 and open it on ToT WebKit. You get results like:

Shallow tree: 88 ms
Deep tree (20+): 93 ms
Deeper tree (50+): 143 ms

Now, reverse the order in which we run tests; e.g. replace lines 44-46 by:
document.getElementById('shallow').innerHTML = repeatTest(0);
document.getElementById('deep').innerHTML = repeatTest(20);
document.getElementById('deeper').innerHTML = repeatTest(50);

Then if you open it again in ToT WebKit, you get results like:

Shallow tree: 104 ms
Deep tree (20+): 84 ms
Deeper tree (50+): 86 ms

There might be some flakes but the overall, this is the trend I get if I repeat the experiment over and over.
Comment 22 Ryosuke Niwa 2011-12-20 12:13:58 PST
Created attachment 120057 [details]
Patch
Comment 23 Ryosuke Niwa 2011-12-20 12:21:56 PST
Here's my patch. Given the regression in PerformanceTests/Parser/html-parser.html and the amount of code I'm adding, I'm not certain if this is a total win. It might be better to just have per-tree-scope cache counter but not manage node list themselves at the tree scope so that we can detect cases where we don't have any valid caches.

Having said that, this patch eliminates O(n) performance in many cases and clearly improves the result of some performance tests so I'll leave it for other reviewers to decide whether this is a good patch or not.
Comment 24 Ryosuke Niwa 2011-12-20 17:49:10 PST
kling pointed out that I can lazily invalidate the node list cache by recording DOMVersion and comparing it when we access the cache next time. That sounds like a really good good idea. I'll try that out next week (I'll be stuck "gardening" WebKit for the remainder of the week).
Comment 25 Darin Adler 2011-12-21 10:15:03 PST
Comment on attachment 120057 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=120057&action=review

Looks OK. I wasn’t able to think through everything about the data structure in detail. I would have liked to see a more thorough test, though. Given how much different code there is in here I’d like to see something that covers more of the code for different types, for example.

> Source/WebCore/dom/DynamicNodeList.cpp:37
> +    if (type != LabelNodeListType)
> +        rootNode()->registerDynamicSubtreeNodeList(type, this);

This mysterious if condition needs a comment.

> Source/WebCore/dom/DynamicNodeList.cpp:49
> +    if (!hasValidCache())
> +        node()->treeScope()->nodeListCachedValue(m_type);

It’s a little strange to call a function saying “I cached a value” before caching the value.

> Source/WebCore/dom/Node.cpp:487
> +    for (NodeListsNodeData::ClassNodeListCache::const_iterator it = nodeLists->m_classNodeListCache.begin();
> +         it != nodeLists->m_classNodeListCache.end(); ++it) {
> +        if (currentTreeScope)
> +            currentTreeScope->removeNodeListCache(ClassNodeListType, it->second);
> +        newTreeScope->addNodeListCache(ClassNodeListType, it->second);
> +    }

So ugly. I can’t wait until we support only compilers with good C++11 support. If we had one of those then we could write:

    for (const auto& value: nodeLists->m_classNodeListCache.values()) {
        if (currentTreeScope)
            currentTreeScope->removeNodeListCache(ClassNodeListType, value);
        newTreeScope->addNodeListCache(ClassNodeListType, value);
    }

I wonder if we can achieve something similar with a macro.

> Source/WebCore/dom/Node.cpp:543
> +        // FIXME: assert that nodes don't have node lists

Sentence style please.

Not sure that an idea for an assertion really qualifies as FIXME. I might just make a comment about the possible improvement without using FIXME. Recently I have been thinking we should strive to consistently use FIXME only when we think something is wrong, not when there is simply room for improvement.

> Source/WebCore/dom/Node.cpp:1064
> +static DynamicSubtreeNodeListType attrToNodeListType(const QualifiedName& attrName)

I think a better name is:

    nodeListTypeForAttribute

These noun-type names work better if the beginning of the name is the noun.

> Source/WebCore/dom/Node.cpp:1080
> +    return InvalidNodeListType;
> +    // LabelsNodeList is invalidated via notifyLocalNodeListsLabelChanged in parseMappedAttribute

Missing period in this sentence-style comment. Comment would be better before the return statement.

> Source/WebCore/dom/Node.cpp:1102
> +    // FIXME: We don't have to invalidate caches if the inserted or removed node
> +    // didn't have an attribute we care about.

I think that for clarity we should say “any of the attributes” rather than “an attribute”. Also would be clearer if this stated that this is a performance idea.

> Source/WebCore/dom/Node.cpp:3025
> -void NodeRareData::createNodeLists(Node* node)
> +void NodeRareData::createNodeLists(Node*)

Perhaps we should remove the unused argument entirely.

> Source/WebCore/dom/Node.h:114
> +enum DynamicSubtreeNodeListType {
> +    ClassNodeListType,
> +    NameNodeListType,
> +    TagNodeListType,
> +    TagNodeListNSType,
> +#if ENABLE(MICRODATA)
> +    MicroDataItemListType,
> +#endif
> +    LabelNodeListType,
> +    InvalidNodeListType
> +};

I’d suggest:

1) Adding a brief comment about why Invalid must be the last value.
2) Sorting the other values alphabetically as we do headers, in a separate paragraph.
3) Putting the #if value in its own separate paragraph instead of mixing it in with the unconditional ones.

> Source/WebCore/dom/Node.h:115
> +const unsigned numDynamicSubtreeNodeListType = InvalidNodeListType;

Why make the constant unsigned instead of of the enum type? I believe you could use the enum type consistently, even for the loop index, and avoid all the static_cast calls in the code below.

The name should be plural with a trailing “s”: “number of types”, not “number of type”.

> Source/WebCore/dom/TreeScope.h:63
> +    void nodeListCachedValue(DynamicSubtreeNodeListType type)

This sounds like a getter that returns a “cached value” for a “node list”.

This kind of situation is what leads to names like nodeListDidCacheValue or nodeListWillCacheValue, either of which will be clearer than this.

> Source/WebCore/dom/TreeScope.h:69
> +    void nodeListInvalidatedCache(DynamicSubtreeNodeListType type)

I think that nodeListDidInvalidateCache is a slightly better name.

> Source/WebCore/html/LabelsNodeList.cpp:35
> +LabelsNodeList::LabelsNodeList(Node* forNode)

The old name here, “for node”, is not good naming. We want to use a noun phrase. There might even be a good idea for the name in the HTML specification itself.

But you need not change that name in this patch.
Comment 26 Ryosuke Niwa 2011-12-21 10:25:20 PST
Thanks for the review, Darin. But I want to try out kling's approach compare the performance first. I might have a time today; if not, definitely on next Wed.
Comment 27 Darin Adler 2011-12-21 11:09:44 PST
(In reply to comment #24)
> kling pointed out that I can lazily invalidate the node list cache by recording DOMVersion and comparing it when we access the cache next time. That sounds like a really good good idea.

Makes invalidation really cheap, but also really imprecise. I agree it’s worth investigating, but it’s not clear how to investigate the implications of this sufficiently.
Comment 28 Andreas Kling 2011-12-21 14:51:59 PST
(In reply to comment #27)
> (In reply to comment #24)
> > kling pointed out that I can lazily invalidate the node list cache by recording DOMVersion and comparing it when we access the cache next time. That sounds like a really good good idea.
> 
> Makes invalidation really cheap, but also really imprecise. I agree it’s worth investigating, but it’s not clear how to investigate the implications of this sufficiently.

One idea would be to keep additional version numbers that would always be >=DOMVersion, and would be >DOMVersion when e.g an attribute changes, allowing us to only invalidate those caches that care about attributes. incDOMTreeVersion() would then bring all version numbers to the same level.
Comment 29 Andreas Kling 2011-12-21 14:55:05 PST
(In reply to comment #28)
> (In reply to comment #27)
> > (In reply to comment #24)
> > > kling pointed out that I can lazily invalidate the node list cache by recording DOMVersion and comparing it when we access the cache next time. That sounds like a really good good idea.
> > 
> > Makes invalidation really cheap, but also really imprecise. I agree it’s worth investigating, but it’s not clear how to investigate the implications of this sufficiently.
> 
> One idea would be to keep additional version numbers that would always be >=DOMVersion, and would be >DOMVersion when e.g an attribute changes, allowing us to only invalidate those caches that care about attributes. incDOMTreeVersion() would then bring all version numbers to the same level.

Ehm. I was apparently typing a bit faster than I was thinking here. Changing an attribute would obviously bump DOMVersion as well. It would have to work the other way around; the more specific versions could lag behind DOMVersion.
Comment 30 Ryosuke Niwa 2011-12-21 15:20:23 PST
I implemented kling's approach and tested https://bug-73737-attachments.webkit.org/attachment.cgi?id=117785 on r103398

Here's result:
Shallow tree: 104.2±2.9 -> 96.8±1.6 (7.6% reduction)
Deep tree (20+): 105.4±1.9 -> 96.7±1.2 (9.0% reduction)
Deeper tree (50+): 137.9±1.1 -> 134.9±1.1 (2.2% reduction)
Comment 31 Ryosuke Niwa 2011-12-21 16:55:46 PST
The new approach is a clear winner here:

PerformanceTests/Parser/xml-parser.html (12.6% reduction)
Without patch
avg 2147.75
median 2140
stdev 22.144694624220943
min 2111
max 2187

With patch
avg 1876.95
median 1868
stdev 28.452548216284598
min 1839
max 1934

PerformanceTests/Parser/tiny-innerHTML.html (9.1% reduction)
Without patch
avg 20932.75
median 20929
stdev 47.63493990759304
min 20832
max 21084

With patch
avg 1919.3
median 1903.5
stdev 27.087081791880053
min 1888
max 1964

PerformanceTests/Parser/html-parser.html (1.7% reduction)
Without patch
avg 21291.15
median 21294.5
stdev 35.27360911503103
min 21227
max 21369

With patch
avg 20932.75
median 20929
stdev 47.63493990759304
min 20832
max 21084
Comment 32 Ryosuke Niwa 2011-12-21 18:19:25 PST
Created attachment 120253 [details]
kling's approach
Comment 33 Darin Adler 2011-12-21 21:17:02 PST
(In reply to comment #31)
> The new approach is a clear winner here:

We knew the new approach would be better when doing lots of modifications and not much access to the node list. But did you think about what the bad case would be for the new approach, and measure that? How much worse are the cases that get worse?
Comment 34 Ryosuke Niwa 2011-12-21 21:30:16 PST
(In reply to comment #33)
> (In reply to comment #31)
> > The new approach is a clear winner here:
> 
> We knew the new approach would be better when doing lots of modifications and not much access to the node list. But did you think about what the bad case would be for the new approach, and measure that? How much worse are the cases that get worse?

The worst case senario is that we'll end up invalidating cache every time we access it. However this was already the case before the patch; e.g. if you had written scripts like:

var divs = element.getElementsByTagName('div');
for (var i = 0; i < divs.length; i++)
    divs[i].appendChild(document.createElement('br'));

then we would end up invalidating the cache for divs along with node lists caches of element's ancestors.

Luckily, this is unlikely to affect our performance on popular websites according to the statistics on comment #10:

> I have a hypothesis that we can just invalidate all caches whenever DOM mutation happens.
...
> I've collected 191220 samples while browsing websites like Gmail, Google+, Twitter, Facebook, etc...
> 
> total:  191220
> hits: 185772 (97.151%)
> misses: 3559 (1.861%)
> hits that will be missed under this approach: 1889 (0.988%)
Comment 35 Ryosuke Niwa 2011-12-26 22:01:38 PST
Ping reviewers.
Comment 36 Darin Adler 2011-12-28 12:44:42 PST
Comment on attachment 120253 [details]
kling's approach

View in context: https://bugs.webkit.org/attachment.cgi?id=120253&action=review

> Source/WebCore/dom/ChildNodeList.cpp:39
> +    if (m_caches->m_isLengthCacheValid)
> +        return m_caches->m_length;

It’s strange to call this “length cache” in one case, and “length” in another. I’d like to see these two data members use the same terminology.

> Source/WebCore/dom/DynamicNodeList.h:48
> -        unsigned cachedLength;
> -        Node* lastItem;
> -        unsigned lastItemOffset;
> -        bool isLengthCacheValid : 1;
> -        bool isItemCacheValid : 1;
> +
> +        unsigned m_length;
> +        Node* m_lastItem;
> +        unsigned m_lastItemOffset;
> +        bool m_isLengthCacheValid : 1;
> +        bool m_isItemCacheValid : 1;

I don’t like this change: Given that these are public data members, they should not have the m_ prefix added. If we want to use the m_ prefix, then we should make this a class with accessor functions instead.

> Source/WebCore/dom/DynamicNodeList.h:120
> +        DynamicNodeList::Caches::m_length;
> +        DynamicNodeList::Caches::m_lastItem;
> +        DynamicNodeList::Caches::m_lastItemOffset;
> +        DynamicNodeList::Caches::m_isLengthCacheValid;
> +        DynamicNodeList::Caches::m_isItemCacheValid;

I have never seen this syntax before and I am surprised it works. I think the correct syntax is:

    using DynamicNodeList::Caches::m_length;

But also, this syntax works:

    using Caches::m_length;

I believe deriving from a base class makes the class name usable without a prefix.

> Source/WebCore/dom/Node.cpp:1023
> -void Node::invalidateNodeListsCacheAfterAttributeChanged(const QualifiedName& attrName)
> +void Node::invalidateNodeListsCacheAfterAttributeChanged(const QualifiedName&)

We should remove the argument entirely, not just have it be unused.

> Source/WebCore/dom/Node.cpp:2353
>  #if ENABLE(MICRODATA)
>      invalidateMicrodataItemListCaches();
>  #endif
> +    
>  }

Shouldn’t add this blank line.
Comment 37 Ryosuke Niwa 2011-12-30 23:00:02 PST
Created attachment 120828 [details]
Changed per Darin's comment
Comment 38 Ryosuke Niwa 2011-12-30 23:04:19 PST
(In reply to comment #36)
> (From update of attachment 120253 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=120253&action=review
> 
> > Source/WebCore/dom/ChildNodeList.cpp:39
> > +    if (m_caches->m_isLengthCacheValid)
> > +        return m_caches->m_length;
> 
> It’s strange to call this “length cache” in one case, and “length” in another. I’d like to see these two data members use the same terminology.

Okay, I've renamed m_length to m_cachedLength and m_lastItem to m_cachedItem to be consistent.

> > Source/WebCore/dom/DynamicNodeList.h:48
> > -        unsigned cachedLength;
> > -        Node* lastItem;
> > -        unsigned lastItemOffset;
> > -        bool isLengthCacheValid : 1;
> > -        bool isItemCacheValid : 1;
> > +
> > +        unsigned m_length;
> > +        Node* m_lastItem;
> > +        unsigned m_lastItemOffset;
> > +        bool m_isLengthCacheValid : 1;
> > +        bool m_isItemCacheValid : 1;
> 
> I don’t like this change: Given that these are public data members, they should not have the m_ prefix added. If we want to use the m_ prefix, then we should make this a class with accessor functions instead.

Yeah, I didn't like that either. I've made DynamicSubtreeNodeList::SubtreeCaches not inherit from DynamicNodeList::Caches. I think it's cleaner approach overall. SubtreeCaches no longer needs to be ref-counted and all variables and member functions can be appropriately without using and other name-scope hacks.

> > Source/WebCore/dom/Node.cpp:2353
> >  #if ENABLE(MICRODATA)
> >      invalidateMicrodataItemListCaches();
> >  #endif
> > +    
> >  }
> 
> Shouldn’t add this blank line.

Oops, it seems like I missed this comment. Will fix once I get the next round of reviews.
Comment 39 WebKit Review Bot 2011-12-31 00:12:11 PST
Comment on attachment 120828 [details]
Changed per Darin's comment

Attachment 120828 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/10954364

New failing tests:
http/tests/inspector/resource-tree/resource-tree-document-url.html
Comment 40 Ryosuke Niwa 2012-01-04 13:23:22 PST
Ping reviewers.
Comment 41 Antti Koivisto 2012-01-04 14:15:15 PST
Comment on attachment 120828 [details]
Changed per Darin's comment

View in context: https://bugs.webkit.org/attachment.cgi?id=120828&action=review

> Source/WebCore/dom/DynamicNodeList.h:109
> +        void setLengthCache(Node* node, unsigned length)
> +        {
> +            if (m_isItemCacheValid && !domVersionIsConsistent()) {
> +                m_cachedItem = node;
> +                m_isItemCacheValid = false;
> +            }
> +            m_cachedLength = length;
> +            m_isLengthCacheValid = true;
> +        }

You should move the functions to cpp (with inline keyword) as that is the only place they are used.

> Source/WebCore/dom/DynamicNodeList.h:132
> +        unsigned m_cachedLength;
> +        bool m_isLengthCacheValid : 1;
> +        bool m_isItemCacheValid : 1;

You could steal the space for these bits from some other field.
Comment 42 Ryosuke Niwa 2012-01-04 15:08:56 PST
Committed r104084: <http://trac.webkit.org/changeset/104084>
Comment 43 Ryosuke Niwa 2012-01-04 21:01:00 PST
Reopen the bug since the patch was rolled out.
Comment 44 Dmitry Lomov 2012-01-04 21:05:12 PST
Patch rolled out: apparently regressed dom-perf after being rolled into chromium:
http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116444
Comment 45 Ryosuke Niwa 2012-01-04 21:23:47 PST
(In reply to comment #44)
> Patch rolled out: apparently regressed dom-perf after being rolled into chromium:
> http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116444

For whatever reason, these tests are not available publicly so I can't disclose any information as to what's going on but the high level summary is that the test is running multiple tests at once and because some tests modify DOM, we'll end up invalidating caches more often than we used to.

We'll probably need per-document counter instead.
Comment 46 Ryosuke Niwa 2012-01-05 03:18:10 PST
Created attachment 121252 [details]
Fixed a bug
Comment 47 Ryosuke Niwa 2012-01-05 03:24:44 PST
The perf. regression was caused by the DOM version not being updated after caching values :(

I've ran dom_perf locally and took 7 samples (larger value is better):

Without patch:
734
758
785
791
780
784
779
avg: 773, stdev: 20

With patch:
765
761
750
765
741
770
762
avg: 759, stdev: 10

While there's a slight decline (1.8%) in the average value, it's statistically insignificant. Furthermore, the scores indicate that we've improved DOM modifications at the cost of slight regression in DOM traversal via dynamic node list, which is exactly what we expect from this patch. Based on improvements in WebKit's performance tests posted on comment #31, I think we should proceed with this patch.
Comment 48 Antti Koivisto 2012-01-05 10:59:27 PST
Comment on attachment 121252 [details]
Fixed a bug

sounds like a reasonable tradeoff, r=me
Comment 49 Ryosuke Niwa 2012-01-05 13:48:53 PST
Comment on attachment 121252 [details]
Fixed a bug

Clearing flags on attachment: 121252

Committed r104210: <http://trac.webkit.org/changeset/104210>
Comment 50 Ryosuke Niwa 2012-01-05 13:49:01 PST
All reviewed patches have been landed.  Closing bug.
Comment 51 Dmitry Lomov 2012-01-05 17:21:28 PST
dom_perf no longer regressed - thanks for the fix!
http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116590
Comment 52 Ryosuke Niwa 2012-01-05 17:22:33 PST
(In reply to comment #51)
> dom_perf no longer regressed - thanks for the fix!
> http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116590

Actually, the patch hasn't been landed yet :(
Comment 53 Dmitry Lomov 2012-01-05 17:25:27 PST
(In reply to comment #52)
> (In reply to comment #51)
> > dom_perf no longer regressed - thanks for the fix!
> > http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116590
> 
> Actually, the patch hasn't been landed yet :(

Isn't r114210 the patch? What am I missing?
Comment 54 Dmitry Lomov 2012-01-05 17:26:01 PST
(In reply to comment #53)
> (In reply to comment #52)
> > (In reply to comment #51)
> > > dom_perf no longer regressed - thanks for the fix!
> > > http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116590
> > 
> > Actually, the patch hasn't been landed yet :(
> 
> Isn't r114210 the patch? What am I missing?
s/r114210/r104210/
Comment 55 Ryosuke Niwa 2012-01-05 17:33:27 PST
(In reply to comment #54)
> (In reply to comment #53)
> > (In reply to comment #52)
> > > (In reply to comment #51)
> > > > dom_perf no longer regressed - thanks for the fix!
> > > > http://build.chromium.org/f/chromium/perf/linux-release/dom_perf/report.html?history=150&rev=116590
> > > 
> > > Actually, the patch hasn't been landed yet :(
> > 
> > Isn't r114210 the patch? What am I missing?
> s/r114210/r104210/

Huh, didn't notice that. Hm... it seems like I've regressed this one: http://build.chromium.org/f/chromium/perf/xp-release-dual-core/dromaeo_domcore/report.html?history=50&rev=-1&graph=dom_query_getElementsByName__not_in_document_

I think I know how to fix it though.
Comment 56 Ojan Vafai 2012-01-05 18:54:45 PST
(In reply to comment #55)
> Huh, didn't notice that. Hm... it seems like I've regressed this one: http://build.chromium.org/f/chromium/perf/xp-release-dual-core/dromaeo_domcore/report.html?history=50&rev=-1&graph=dom_query_getElementsByName__not_in_document_
> 
> I think I know how to fix it though.

Maybe roll it out while you work out a fix?
Comment 57 Ryosuke Niwa 2012-01-05 19:31:35 PST
The regression was fixed in http://trac.webkit.org/changeset/104263.
Comment 58 Ryosuke Niwa 2012-01-10 23:03:58 PST
Reopen the bug since the patch was rolled out in http://trac.webkit.org/changeset/104674 and http://trac.webkit.org/changeset/104673.
Comment 59 Ryosuke Niwa 2012-01-10 23:15:42 PST
Build fix for the revert landed in http://trac.webkit.org/changeset/104675.
Comment 60 Ryosuke Niwa 2012-06-22 19:14:56 PDT
Created attachment 149156 [details]
new patch
Comment 61 Ryosuke Niwa 2012-06-22 19:15:59 PDT
6% improvement on Dromaeo/dom-modify.html

Before:
RESULT Dromaeo: dom-modify= 3897.79625691 runs/s
median= 0.0 runs/s, stdev= 52.9873315944 runs/s, min= 3801.28571429 runs/s, max= 4016.84667667 runs/s

RESULT Dromaeo: dom-modify= 3878.28725467 runs/s
median= 0.0 runs/s, stdev= 66.6932298708 runs/s, min= 3774.65020127 runs/s, max= 4050.16268955 runs/s

After:
RESULT Dromaeo: dom-modify= 4146.1717243 runs/s
median= 0.0 runs/s, stdev= 54.9684861451 runs/s, min= 4013.73249528 runs/s, max= 4276.54651536 runs/s

RESULT Dromaeo: dom-modify= 4109.48266573 runs/s
median= 0.0 runs/s, stdev= 68.4080593631 runs/s, min= 4000.8610192 runs/s, max= 4289.26268699 runs/s


6.6% improvement on DOM/DOMDivWalk.html

Before:
RESULT DOM: DOMDivWalk= 58.4666666667 ms
median= 59.0 ms, stdev= 1.36788563525 ms, min= 55.8333333333 ms, max= 60.6666666667 ms

RESULT DOM: DOMDivWalk= 59.4083333333 ms
median= 59.0833333333 ms, stdev= 1.38089644796 ms, min= 57.5 ms, max= 63.3333333333 ms

After:
RESULT DOM: DOMDivWalk= 54.9 ms
median= 54.8333333333 ms, stdev= 0.925562891794 ms, min= 53.5 ms, max= 58.0 ms

RESULT DOM: DOMDivWalk= 55.525 ms
median= 55.4166666667 ms, stdev= 1.01553899208 ms, min= 53.8333333333 ms, max= 58.0 ms


No statistically significant differences on xml-parser.html and html-parser.html

Before:
RESULT Parser: xml-parser= 6.97092743469 runs/s
median= 6.96055684455 runs/s, stdev= 0.0682343218493 runs/s, min= 6.82593856655 runs/s, max= 7.11743772242 runs/s

RESULT Parser: xml-parser= 6.89537037269 runs/s
median= 6.90848347453 runs/s, stdev= 0.116432899383 runs/s, min= 6.62251655629 runs/s, max= 7.05052878966 runs/s

After:
RESULT Parser: xml-parser= 7.06476592238 runs/s
median= 7.06713780919 runs/s, stdev= 0.0613748572558 runs/s, min= 6.86498855835 runs/s, max= 7.16845878136 runs/s

RESULT Parser: xml-parser= 7.05728470567 runs/s
median= 7.05052878966 runs/s, stdev= 0.0293092648714 runs/s, min= 7.00116686114 runs/s, max= 7.10900473934 runs/s


Before:
RESULT Parser: html-parser= 1952.45 ms
median= 1956.0 ms, stdev= 16.8210433684 ms, min= 1931.0 ms, max= 1985.0 ms

RESULT Parser: html-parser= 1961.3 ms
median= 1966.0 ms, stdev= 11.6150764096 ms, min= 1938.0 ms, max= 1982.0 ms

After:
RESULT Parser: html-parser= 1961.1 ms
median= 1959.5 ms, stdev= 11.7426572802 ms, min= 1937.0 ms, max= 1990.0 ms

RESULT Parser: html-parser= 1972.85 ms
median= 1970.0 ms, stdev= 13.5951278037 ms, min= 1944.0 ms, max= 2002.0 ms
Comment 62 Ryosuke Niwa 2012-06-22 22:19:56 PDT
Comment on attachment 149156 [details]
new patch

Just realized that I can remove more code.
Comment 63 Ryosuke Niwa 2012-06-22 23:46:07 PDT
Created attachment 149165 [details]
Removed more code
Comment 64 Build Bot 2012-06-23 00:43:49 PDT
Comment on attachment 149165 [details]
Removed more code

Attachment 149165 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13030592
Comment 65 Ojan Vafai 2012-06-23 07:56:45 PDT
Comment on attachment 149165 [details]
Removed more code

View in context: https://bugs.webkit.org/attachment.cgi?id=149165&action=review

Overall, this looks like a big improvement!

This patch also uses an extra pointer per DynamicNodeList now because it adds more of them to m_listsInvalidatedAtDocument, right? I think that's fine, but the ChangeLog should state it. I don't have a good intuition for how many concurrent DynamicNodeLists a give page has. Have you instrumented gmail or facebook to get a sense of what the memory impact of this change would be?

> Source/WebCore/dom/Document.cpp:3879
> +        if (!attrName || (*it)->shouldInvalidateOnAttributeChange())

As the code is, it's weird for this to take a QualifiedName instead of an enum or something (e.g. ShouldClearAttributeCaches). But, I think in the future, we probably want to be even more fine-grained and only clear the attribute caches that are affected by this attribute name. Mind putting a FIXME to that effect? With the FIXME there it makes more sense to pass in a QualifiedName.

> Source/WebCore/dom/Node.cpp:967
> +    document()->clearNodeListCaches(&attrName);

The comment on lines 950-955 probably needs to be updated.
Comment 66 Ryosuke Niwa 2012-06-23 12:23:28 PDT
Thanks for the review!

(In reply to comment #65)
> This patch also uses an extra pointer per DynamicNodeList now because it adds more of them to m_listsInvalidatedAtDocument, right? I think that's fine, but the ChangeLog should state it.

Fair point.

> I don't have a good intuition for how many concurrent DynamicNodeLists a give page has. Have you instrumented gmail or facebook to get a sense of what the memory impact of this change would be?

We used to do this unintentionally until I fixed it in http://trac.webkit.org/changeset/110797 three months ago, so I don't think there will be any impact.
 
> > Source/WebCore/dom/Document.cpp:3879
> > +        if (!attrName || (*it)->shouldInvalidateOnAttributeChange())
> 
> As the code is, it's weird for this to take a QualifiedName instead of an enum or something (e.g. ShouldClearAttributeCaches). But, I think in the future, we probably want to be even more fine-grained and only clear the attribute caches that are affected by this attribute name. Mind putting a FIXME to that effect? With the FIXME there it makes more sense to pass in a QualifiedName.

Right, that's the intention. I'll have a follow up patch that'll only invalidate node lists that match QualifiedName. I guess we can have a FIXME until then.

> > Source/WebCore/dom/Node.cpp:967
> > +    document()->clearNodeListCaches(&attrName);
> 
> The comment on lines 950-955 probably needs to be updated.

Will do.
Comment 67 Ryosuke Niwa 2012-06-23 13:25:58 PDT
Created attachment 149179 [details]
Fixed Mac build & per Ojan's comment
Comment 68 Ryosuke Niwa 2012-06-23 13:33:30 PDT
Created attachment 149180 [details]
Updated ChangeLog & reverted xcodeproj change
Comment 69 Ryosuke Niwa 2012-06-23 20:13:26 PDT
Committed r121106: <http://trac.webkit.org/changeset/121106>
Comment 70 WebKit Review Bot 2012-06-24 17:35:56 PDT
Re-opened since this is blocked by 89841
Comment 71 Ryosuke Niwa 2012-06-24 17:40:47 PDT
I'm rolling out the patch because it caused 85-90% regressions on Dromaeo/jslib-modify-jquery.html

http://webkit-perf.appspot.com/graph.html#tests=[[41019,2001,173262],[41019,2001,32196]]&sel=1340303912887.2336,1340533472011.3213,0,13125&displayrange=7&datatype=running

Turns out jquery depends on our current optimization :(

On one hand, the very perf test framework I added caught this regression so I should be happy but on the other hand, this would mean that we can never fix this bug so I'm sad.
Comment 72 Ryosuke Niwa 2012-06-24 17:49:33 PDT
Note 85-90% regression means that our code has gotten roughly 10 times slower (not as twice as slower).
Comment 73 Kentaro Hara 2012-06-24 19:49:59 PDT
(In reply to comment #72)
> Note 85-90% regression means that our code has gotten roughly 10 times slower (not as twice as slower).

We thought that this might be a bug of GC (bug 85162 and 88739), but it does not seem to be a bug of GC.

As you pointed out, jQuery depends on our current optimization, and your patch ended up having jQuery.domManip() call jQuery.clone() every time (PerformanceTests/Dromaeo/resources/dromaeo/web/lib/jquery-1.6.4.js, line 5857).

So the reason is not that GC is doing something wrong, but that with your patch jQuery calls jQuery.clone() a lot of times, which causes the memory bloat.
Comment 74 Ryosuke Niwa 2012-06-24 20:01:00 PDT
(In reply to comment #73)
> (In reply to comment #72)
> > Note 85-90% regression means that our code has gotten roughly 10 times slower (not as twice as slower).
> 
> We thought that this might be a bug of GC (bug 85162 and 88739), but it does not seem to be a bug of GC.

You're right but my point on g+ is that the fact calling clone() more ends up reducing the overall memory usage (the tests stopped crashing when this regression was introduced) means that the bugs 85162 and 88739 are indeed some problems in GC.
Comment 75 Kentaro Hara 2012-06-24 20:40:10 PDT
(In reply to comment #74)
> You're right but my point on g+ is that the fact calling clone() more ends up reducing the overall memory usage (the tests stopped crashing when this regression was introduced) means that the bugs 85162 and 88739 are indeed some problems in GC.

I am a bit confused. In my understanding:

(1) Before r121106, jslib-modify-jquery.html was working fine.
(2) After r121106, jslib-modify-jquery.html caused memory bloat. Some bots start crashing due to the memory bloat.
(3) After reverting r121106 back, jslib-modify-jquery.html is working fine again.

Where is a problem?
Comment 76 Ryosuke Niwa 2012-06-24 20:41:49 PDT
(In reply to comment #75)
> (In reply to comment #74)
> > You're right but my point on g+ is that the fact calling clone() more ends up reducing the overall memory usage (the tests stopped crashing when this regression was introduced) means that the bugs 85162 and 88739 are indeed some problems in GC.
> 
> I am a bit confused. In my understanding:
> 
> (1) Before r121106, jslib-modify-jquery.html was working fine.
> (2) After r121106, jslib-modify-jquery.html caused memory bloat. Some bots start crashing due to the memory bloat.
> (3) After reverting r121106 back, jslib-modify-jquery.html is working fine again.

No. Before r121106, jslib-modify-jquery.html was crashing due to the memory bloat. After r121106 until it was landed, the test was running fine.
Comment 77 Ryosuke Niwa 2012-06-24 20:42:36 PDT
(In reply to comment #76)
> No. Before r121106, jslib-modify-jquery.html was crashing due to the memory bloat. After r121106 until it was landed, the test was running fine.

I meant to say that the test started running fine after r121106 until it was rolled out.
Comment 78 Kentaro Hara 2012-06-24 23:31:43 PDT
I confirmed that the memory bloat is still happening with r121106.

(1) Before r121106, jslib-modify-jquery.html was crashing due to the memory bloat.
(2) After r121106 and until r121106 is reverted, jslib-modify-jquery.html was working fine, although the performance regressed by 10x. Our question was why r121106 fixed the memory bloat.

Actually, the memory bloat was happening even after r121106. However, since r121106 regressed performance by 10x, jslib-modify-jquery.html completed without reaching the memory bloat. In other words, r121106 was slowly causing the memory bloat. I confirmed that if I run jslib-modify-jquery.html for a long time, then r121106 also causes the memory bloat.

So the current problem can be summarized into the following two:

[1] r121106 regressed performance by 10x.
[2] (Regardless of with or without r121106,) jslib-modify-jquery.html causes the memory bloat.

Regarding [1], we already know the reason; i.e. jQuery is assuming our current optimization and not expecting rniwa's patch, unfortunately.

Regarding [2], it seems that the memory bloat is just because jQuery clones too much nodes. I inserted window.internals.numberOfLiveNodes() in jslib-modify-jquery.html, and confirmed that the number of live nodes are monotonically increases both in Chromium/Linux and AppleWebKit/Mac. It seems that this is not a problem of GC; this is just a problem of jQuery cloning too much nodes.
Comment 79 Ryosuke Niwa 2012-06-25 10:17:10 PDT
Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point.
Comment 80 Elliott Sprehn 2012-11-07 16:05:09 PST
(In reply to comment #79)
> Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point.

The rare data map was removed in Bug 100057 which means Bug 89635 is resolved. I assume that means this is too?
Comment 81 Ryosuke Niwa 2012-11-07 16:23:39 PST
(In reply to comment #80)
> (In reply to comment #79)
> > Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point.
> 
> The rare data map was removed in Bug 100057 which means Bug 89635 is resolved. I assume that means this is too?

Probably. Try profiling it?
Comment 82 Elliott Sprehn 2012-11-07 18:10:28 PST
(In reply to comment #81)
> (In reply to comment #80)
> > (In reply to comment #79)
> > > Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point.
> > 
> > The rare data map was removed in Bug 100057 which means Bug 89635 is resolved. I assume that means this is too?
> 
> Probably. Try profiling it?

Ran the profile and the node list methods don't show up in the trace:

Running Time	Self		Symbol Name
396.0ms    4.0%	396.0	 	WebCore::JSNodeOwner::isReachableFromOpaqueRoots(JSC::Handle<JSC::Unknown>, void*, JSC::SlotVisitor&)
377.0ms    3.8%	377.0	 	void WebCore::removeAllChildrenInContainer<WebCore::Node, WebCore::ContainerNode>(WebCore::ContainerNode*)
272.0ms    2.7%	272.0	 	JSC::WeakBlock::visit(JSC::HeapRootVisitor&)
269.0ms    2.7%	269.0	 	WTF::KeyValuePair<WTF::AtomicStringImpl*, WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)>* WTF::HashTable<WTF::AtomicStringImpl*, WTF::KeyValuePair<WTF::AtomicStringImpl*, WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)>, WTF::KeyValuePairKeyExtractor<WTF::KeyValuePair<WTF::AtomicStringImpl*, WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)> >, WTF::PtrHash<WTF::AtomicStringImpl*>, WTF::HashMapValueTraits<WTF::HashTraits<WTF::AtomicStringImpl*>, WTF::HashTraits<WebCore::JSDOMWrapper* (*)(JSC::ExecState*, WebCore::JSDOMGlobalObject*, WTF::PassRefPtr<WebCore::HTMLElement>)> >, WTF::HashTraits<WTF::AtomicStringImpl*> >::lookup<WTF::IdentityHashTranslator<WTF::PtrHash<WTF::AtomicStringImpl*> >, WTF::AtomicStringImpl*>(WTF::AtomicStringImpl* const&)
256.0ms    2.6%	256.0	 	WebCore::jsDocumentPrototypeFunctionCreateElement(JSC::ExecState*)
241.0ms    2.4%	241.0	 	_ZN7WebCoreL24updateTreeAfterInsertionEPNS_13ContainerNodeEPNS_4NodeEb
204.0ms    2.0%	204.0	 	WTF::AtomicString::addSlowCase(WTF::StringImpl*)
189.0ms    1.9%	189.0	 	WebCore::StyledElement::StyledElement(WebCore::QualifiedName const&, WebCore::Document*, WebCore::Node::ConstructionType)
186.0ms    1.9%	186.0	 	_ZN7WebCoreL28createHTMLSpanElementWrapperEPN3JSC9ExecStateEPNS_17JSDOMGlobalObjectEN3WTF10PassRefPtrINS_11HTMLElementEEE
Comment 83 Ryosuke Niwa 2012-11-07 20:02:26 PST
(In reply to comment #82)
> (In reply to comment #81)
> > (In reply to comment #80)
> > > (In reply to comment #79)
> > > > Unfortunately, the only thing we can do to address this bug is to fix the bug 89635 to speed-up the access to NodeListsNodeData at this point.
> > > 
> > > The rare data map was removed in Bug 100057 which means Bug 89635 is resolved. I assume that means this is too?
> > 
> > Probably. Try profiling it?
> 
> Ran the profile and the node list methods don't show up in the trace:

Super!