REOPENED 103868
Change ChildNodeInsertionNotifier::m_postInsertionsNotificationTargets from a Vector to an OwnPtr
https://bugs.webkit.org/show_bug.cgi?id=103868
Summary Change ChildNodeInsertionNotifier::m_postInsertionsNotificationTargets from a...
Kentaro Hara
Reported 2012-12-03 01:49:42 PST
ChildNodeInsertionNotifier::m_postInsertionsNotificationTargets is used for HTMLFrameElement and HTMLBodyElement only, which would be a cold path. We can change it from a Vector to an OwnPtr<Vector> so that ChildNodeInsertionNotifier can be allocated faster. This will slightly optimize Dromaeo/dom-modify.
Attachments
Patch (4.13 KB, patch)
2012-12-03 01:54 PST, Kentaro Hara
no flags
patch for landing (4.10 KB, patch)
2012-12-03 17:12 PST, Kentaro Hara
no flags
Kentaro Hara
Comment 1 2012-12-03 01:54:04 PST
Andreas Kling
Comment 2 2012-12-03 06:25:10 PST
Are we confident that this is a real improvement? It looks to me like the Vector is never appended to in the common case, which means it would never allocate a buffer. While this patch shrinks ChildNodeInsertionNotifier, it shouldn't cause any fewer allocations. In fact, it adds allocations to the iframe and body cases.
Kentaro Hara
Comment 3 2012-12-03 07:36:04 PST
> Are we confident that this is a real improvement? Let me check it again in micro benchmarks tomorrow. This is one of the optimizations I tried in bug 88834 to offset performance regression introduced by another patch (the patch didn't land).
Darin Adler
Comment 4 2012-12-03 10:38:00 PST
Comment on attachment 177209 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=177209&action=review > Source/WebCore/dom/ContainerNodeAlgorithms.h:50 > + OwnPtr<Vector< RefPtr<Node> > > m_postInsertionNotificationTargets; The space in Vector< RefPtr is not the correct style. It should be Vector<RefPtr. > Source/WebCore/dom/ContainerNodeAlgorithms.h:197 > + if (UNLIKELY(Node::InsertionShouldCallDidNotifySubtreeInsertions == node->insertedInto(m_insertionPoint))) { Are you sure this improved performance? Normally we only use LIKELY and UNLIKELY when we have concrete evidence it’s helpful. > Source/WebCore/dom/ContainerNodeAlgorithms.h:199 > + m_postInsertionNotificationTargets = adoptPtr(new Vector<RefPtr<Node> >()); The () here is not needed. > Source/WebCore/dom/ContainerNodeAlgorithms.h:213 > + m_postInsertionNotificationTargets = adoptPtr(new Vector<RefPtr<Node> >()); The () here is not needed.
Kentaro Hara
Comment 5 2012-12-03 17:12:03 PST
Created attachment 177378 [details] patch for landing
Kentaro Hara
Comment 6 2012-12-03 17:13:31 PST
(In reply to comment #4) > (From update of attachment 177209 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=177209&action=review > > > Source/WebCore/dom/ContainerNodeAlgorithms.h:50 > > + OwnPtr<Vector< RefPtr<Node> > > m_postInsertionNotificationTargets; > > The space in Vector< RefPtr is not the correct style. It should be Vector<RefPtr. Fixed. > > Source/WebCore/dom/ContainerNodeAlgorithms.h:197 > > + if (UNLIKELY(Node::InsertionShouldCallDidNotifySubtreeInsertions == node->insertedInto(m_insertionPoint))) { > > Are you sure this improved performance? Normally we only use LIKELY and UNLIKELY when we have concrete evidence it’s helpful. You're right. I couldn't observe any effect. Removed. > > Source/WebCore/dom/ContainerNodeAlgorithms.h:199 > > + m_postInsertionNotificationTargets = adoptPtr(new Vector<RefPtr<Node> >()); > > The () here is not needed. > > > Source/WebCore/dom/ContainerNodeAlgorithms.h:213 > > + m_postInsertionNotificationTargets = adoptPtr(new Vector<RefPtr<Node> >()); > > The () here is not needed. Fixed.
Kentaro Hara
Comment 7 2012-12-03 17:15:02 PST
(In reply to comment #2) > Are we confident that this is a real improvement? I confirmed, yes. When we insert a tree whose depth is 2 or larger, insertion time is improved.
Andreas Kling
Comment 8 2012-12-03 17:17:34 PST
This still looks wrong to me. ChildNodeInsertionNotifier is only created on the stack, so all this patch does is save 8 bytes of stack space in the common case, and adds a heap allocation for body and iframe elements. Where is the improvement coming from?
Kentaro Hara
Comment 9 2012-12-03 17:20:05 PST
Comment on attachment 177378 [details] patch for landing ok, then let me investigate the details later. (Maybe instruction cache? A Vector allocation code is removed from a hot call path, which is larger than a OwnPtr allocation code?)
Kentaro Hara
Comment 10 2012-12-03 23:04:38 PST
(In reply to comment #8) > This still looks wrong to me. ChildNodeInsertionNotifier is only created on the stack, so all this patch does is save 8 bytes of stack space in the common case, and adds a heap allocation for body and iframe elements. Where is the improvement coming from? for (NodeVector::const_iterator it = targets.begin(); it != targets.end(); ++it) { Node* child = it->get(); insertBeforeCommon(next.get(), child); childrenChanged(true, nextChildPreviousSibling.get(), nextChild, 1); asm("_ppppp:"); ChildNodeInsertionNotifier tmp(this); asm("_qqqqq:"); tmp.notify(child); } Here are assemblies for between _ppppp and _qqqqq. Before this patch: 00000000005f9d6b <_pppppp>: 5f9d6b: 48 89 6c 24 10 mov %rbp,0x10(%rsp) 5f9d70: 48 8d 44 24 18 lea 0x18(%rsp),%rax 5f9d75: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax) 5f9d7c: 00 5f9d7d: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax) 5f9d84: 00 5f9d85: 48 c7 00 00 00 00 00 movq $0x0,(%rax) 00000000005f9d8c <_qqqqqq>: After this patch: 00000000005f9c5d <_pppppp>: 5f9c5d: 48 89 1c 24 mov %rbx,(%rsp) 5f9c61: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp) 5f9c68: 00 00 00000000005f9c6a <_qqqqqq>: In this way, instructions per ChildNodeInsertionNotifier's constructor are reduced. In particular, note that ChildNodeInsertionNotifier's constructor is used inside for loops.
WebKit Review Bot
Comment 11 2012-12-04 00:05:21 PST
Comment on attachment 177378 [details] patch for landing Clearing flags on attachment: 177378 Committed r136481: <http://trac.webkit.org/changeset/136481>
WebKit Review Bot
Comment 12 2012-12-04 00:05:26 PST
All reviewed patches have been landed. Closing bug.
Andreas Kling
Comment 13 2012-12-04 18:12:59 PST
(In reply to comment #10) > (In reply to comment #8) > > This still looks wrong to me. ChildNodeInsertionNotifier is only created on the stack, so all this patch does is save 8 bytes of stack space in the common case, and adds a heap allocation for body and iframe elements. Where is the improvement coming from? > > for (NodeVector::const_iterator it = targets.begin(); it != targets.end(); ++it) { > Node* child = it->get(); > insertBeforeCommon(next.get(), child); > childrenChanged(true, nextChildPreviousSibling.get(), nextChild, 1); > asm("_ppppp:"); > ChildNodeInsertionNotifier tmp(this); > asm("_qqqqq:"); > tmp.notify(child); > } > > > Here are assemblies for between _ppppp and _qqqqq. > > > Before this patch: > > 00000000005f9d6b <_pppppp>: > 5f9d6b: 48 89 6c 24 10 mov %rbp,0x10(%rsp) > 5f9d70: 48 8d 44 24 18 lea 0x18(%rsp),%rax > 5f9d75: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax) > 5f9d7c: 00 > 5f9d7d: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax) > 5f9d84: 00 > 5f9d85: 48 c7 00 00 00 00 00 movq $0x0,(%rax) > > 00000000005f9d8c <_qqqqqq>: > > After this patch: > > 00000000005f9c5d <_pppppp>: > 5f9c5d: 48 89 1c 24 mov %rbx,(%rsp) > 5f9c61: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp) > 5f9c68: 00 00 > > 00000000005f9c6a <_qqqqqq>: > > > > In this way, instructions per ChildNodeInsertionNotifier's constructor are reduced. In particular, note that ChildNodeInsertionNotifier's constructor is used inside for loops. If making the ChildNodeInsertionNotifier constructor 3 instructions shorter gives you a ~1.4% speed boost on Dromaeo/dom-modify, you should probably move construction out of the loop which would be trivial since you're always passing 'this' to the constructor. You could also move the reffing/unreffing of the Document out of ChildNodeInsertionNotifier::notify() since that Document* should be the same for every call. Personally, I don't trust these numbers but if they are indeed true then there is much work here to be done.
Kentaro Hara
Comment 14 2012-12-05 16:32:29 PST
Reverted r136481 for reason: it might have regressed dom_perf/CloneNodes (See bug 104177) Committed r136777: <http://trac.webkit.org/changeset/136777>
Note You need to log in before you can comment on or make changes to this bug.