WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
101144
SimplifyMarkupCommand takes a disproportionally long time to run when there are many nodes to remove
https://bugs.webkit.org/show_bug.cgi?id=101144
Summary
SimplifyMarkupCommand takes a disproportionally long time to run when there a...
Ryosuke Niwa
Reported
2012-11-02 22:04:41 PDT
SimplifyMarkupCommand can take Θ(n^2) time to run because mass removal doesn't really do the mass removal. For each node removal we do, we end up re-attaching lots of render objects. We can do better by avoiding attach()es and doing some out-of-order removal to minimize the number of removals we do. <
rdar://problem/12179712
>
Attachments
Work in progress patch
(7.38 KB, patch)
2012-11-02 22:05 PDT
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Fixes the bug
(6.59 KB, patch)
2012-11-05 10:42 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Lazy attach + avoid calling isContentEditable
(23.76 KB, patch)
2012-11-05 17:03 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Keep assertions in appendNode and insertNodeBefore
(23.72 KB, patch)
2012-11-05 18:33 PST
,
Ryosuke Niwa
no flags
Details
Formatted Diff
Diff
Fixed password-echo-passnode.html
(24.87 KB, patch)
2012-11-07 13:57 PST
,
Ryosuke Niwa
enrica
: review+
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Ryosuke Niwa
Comment 1
2012-11-02 22:05:52 PDT
Created
attachment 172207
[details]
Work in progress patch
Ryosuke Niwa
Comment 2
2012-11-05 10:42:56 PST
Created
attachment 172359
[details]
Fixes the bug
WebKit Review Bot
Comment 3
2012-11-05 10:51:43 PST
Attachment 172359
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/ChangeLog:26: Line contains tab character. [whitespace/tab] [5] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Enrica Casucci
Comment 4
2012-11-05 10:58:32 PST
Comment on
attachment 172359
[details]
Fixes the bug View in context:
https://bugs.webkit.org/attachment.cgi?id=172359&action=review
I'm not thrilled at the idea of removing the root editable element in principle, but I don't see how else you could do it otherwise.
> Source/WebCore/editing/RemoveNodeCommand.cpp:45 > + if (!parent || (!parent->isContentEditable(Node::UserSelectAllIsAlwaysNonEditable) && parent->attached()))
Why do you need to check if it is attached?
> Source/WebCore/editing/SimplifyMarkupCommand.cpp:120 > + || nodesToRemove[pastLastNodeToRemove]->firstChild() != nodesToRemove[pastLastNodeToRemove]->lastChild())
I don't think the second condition could ever be true. If I recall correctly I never place in the list of nodes to remove nodes with more than one child.
Ryosuke Niwa
Comment 5
2012-11-05 11:02:40 PST
Comment on
attachment 172359
[details]
Fixes the bug View in context:
https://bugs.webkit.org/attachment.cgi?id=172359&action=review
>> Source/WebCore/editing/SimplifyMarkupCommand.cpp:120 >> + || nodesToRemove[pastLastNodeToRemove]->firstChild() != nodesToRemove[pastLastNodeToRemove]->lastChild()) > > I don't think the second condition could ever be true. If I recall correctly I never place in the list of nodes to remove nodes with more than one child.
That’s a good point. I’ll turn that into an assertion for documentation purposes.
Ryosuke Niwa
Comment 6
2012-11-05 11:03:45 PST
Comment on
attachment 172359
[details]
Fixes the bug View in context:
https://bugs.webkit.org/attachment.cgi?id=172359&action=review
>> Source/WebCore/editing/RemoveNodeCommand.cpp:45 >> + if (!parent || (!parent->isContentEditable(Node::UserSelectAllIsAlwaysNonEditable) && parent->attached())) > > Why do you need to check if it is attached?
Unfortunately isContentEditable returns false for a node without attached ancestor :( Gotta hate -webkit-user-modify.
Ojan Vafai
Comment 7
2012-11-05 11:11:47 PST
Comment on
attachment 172359
[details]
Fixes the bug So, we are knowingly creating a bug here? For example, if you have a blur event handler on one of the nodes removed and in that handler you grab the computed style of the contentEditable element, we'll return the wrong thing? That doesn't seem great.
Ryosuke Niwa
Comment 8
2012-11-05 11:14:00 PST
(In reply to
comment #7
)
> (From update of
attachment 172359
[details]
) > So, we are knowingly creating a bug here? For example, if you have a blur event handler on one of the nodes removed and in that handler you grab the computed style of the contentEditable element, we'll return the wrong thing? > > That doesn't seem great.
It's not great but I can't think of an alternative solution here. The problem is that what's slow is traversing nodes and attach()ing render object so any solution that involves traversing nodes will be too slow.
Ojan Vafai
Comment 9
2012-11-05 11:23:29 PST
You could imagine having a RAII object that would delay attaching any renderers until the RAII object goes away or we try to read out the computedStyle or other layout dependent data off the node.
Ryosuke Niwa
Comment 10
2012-11-05 11:27:02 PST
(In reply to
comment #9
)
> You could imagine having a RAII object that would delay attaching any renderers until the RAII object goes away or we try to read out the computedStyle or other layout dependent data off the node.
Possibly. The problem is that this RAII object will then hold onto ~8*n million entries where n is the average number of children those 8 million nodes have.
Ryosuke Niwa
Comment 11
2012-11-05 11:29:32 PST
Oh well, I guess there will be only 8 million entries since we can just hold onto the nodes themselves.
Ojan Vafai
Comment 12
2012-11-05 11:40:04 PST
Comment on
attachment 172359
[details]
Fixes the bug There has to be a less hacky solution to this that performs well. The RAII object doesn't need to hold on to the nodes. It just needs to keep them from attaching. In its destructor we could walk the root editable elements tree and attach anything that's not attached. That should perform comparably to what you have here.
Ryosuke Niwa
Comment 13
2012-11-05 11:45:16 PST
(In reply to
comment #12
)
> (From update of
attachment 172359
[details]
) > There has to be a less hacky solution to this that performs well. The RAII object doesn't need to hold on to the nodes. It just needs to keep them from attaching. In its destructor we could walk the root editable elements tree and attach anything that's not attached. That should perform comparably to what you have here.
Another problem is isContentEditable. Even if we delayed attach, we’ll still force style recall in isContentEditable.
Ryosuke Niwa
Comment 14
2012-11-05 11:50:51 PST
recalc*
Ryosuke Niwa
Comment 15
2012-11-05 17:03:46 PST
Created
attachment 172446
[details]
Lazy attach + avoid calling isContentEditable
Ryosuke Niwa
Comment 16
2012-11-05 17:06:20 PST
Note we can’t have a RAII object to avoid calling isContentEditable because accessibility code, author scripts, and others could synchronously call isContentEditable or rendererIsEditable while we’re removing the nodes.
Ojan Vafai
Comment 17
2012-11-05 17:40:40 PST
Comment on
attachment 172446
[details]
Lazy attach + avoid calling isContentEditable View in context:
https://bugs.webkit.org/attachment.cgi?id=172446&action=review
Can you include a manual test-case? That way if anyone needs to work on this code in the future, they will be able to make sure they don't regress it if they need to change this significantly.
> Source/WebCore/ChangeLog:19 > + this specific case is not the worth the increase in the bot cycle time. I'll note that the email > + attached in the original radar bug (<
rdar://problem/12179712
>) took 100 seconds to open now only takes > + 7 seconds to open on my MacPro.
Do you still get the same numbers with this new patch?
> Source/WebCore/editing/ApplyStyleCommand.cpp:945 > + node->document()->updateStyleIfNeeded();
Is this needed now because we lazy attach? If so, would be good to put a comment to that effect.
> Source/WebCore/editing/SimplifyMarkupCommand.cpp:44 > + Vector<RefPtr<Node> > nodesToRemove;
This looks like a behavior change to me. As in, it looks like you are fixing a use-after-free bug here. Mind making a test-case for this?
> Source/WebCore/editing/SimplifyMarkupCommand.cpp:90 > + int numPrunedAncestors = pruneSubsequentAncestorsToRemove(nodesToRemove, i);
It seems to me that we could do something much simpler here. In the loop above, have two vectors: nodesToRemove and nodeWithChildrenToPreserve (needs a better name). Only add the highest ancestor to remove to nodesToRemove and the lowest ancestor to remove to nodeWithChildrenToPreserve. Then, create a version of removeNodePreservingChildren that takes a first argument of the highest ancestor to remove and a second argument of the lowest ancestor to remove. Then removeNodePreservingChildren just reparents the children of the lowest ancestor and then removes the highest ancestor. In addition to being simpler, I expect that would perform much better as well. Also, then I don't think we'd need to do any of this lazyAttach or AssumeContentIsAlwaysEditable stuff.
Ryosuke Niwa
Comment 18
2012-11-05 18:19:01 PST
Comment on
attachment 172446
[details]
Lazy attach + avoid calling isContentEditable View in context:
https://bugs.webkit.org/attachment.cgi?id=172446&action=review
>> Source/WebCore/ChangeLog:19 >> + 7 seconds to open on my MacPro. > > Do you still get the same numbers with this new patch?
Yes.
>> Source/WebCore/editing/ApplyStyleCommand.cpp:945 >> + node->document()->updateStyleIfNeeded(); > > Is this needed now because we lazy attach? If so, would be good to put a comment to that effect.
Yes. The reason is self-evident from the fact the line line below touches node->renderer().
>> Source/WebCore/editing/SimplifyMarkupCommand.cpp:44 >> + Vector<RefPtr<Node> > nodesToRemove; > > This looks like a behavior change to me. As in, it looks like you are fixing a use-after-free bug here. Mind making a test-case for this?
No. The reason I have to make this RefPtr is the optimization I added below. We used to always moves from one place in the document to another place. This patch implements a mass removal, which necessities all nodes to be retained.
>> Source/WebCore/editing/SimplifyMarkupCommand.cpp:90 >> + int numPrunedAncestors = pruneSubsequentAncestorsToRemove(nodesToRemove, i); > > It seems to me that we could do something much simpler here. In the loop above, have two vectors: > nodesToRemove and nodeWithChildrenToPreserve (needs a better name). > > Only add the highest ancestor to remove to nodesToRemove and the lowest ancestor to remove to nodeWithChildrenToPreserve. Then, create a version of removeNodePreservingChildren that takes a first argument of the highest ancestor to remove and a second argument of the lowest ancestor to remove. > > Then removeNodePreservingChildren just reparents the children of the lowest ancestor and then removes the highest ancestor. > > In addition to being simpler, I expect that would perform much better as well. Also, then I don't think we'd need to do any of this lazyAttach or AssumeContentIsAlwaysEditable stuff.
I had already tried that but even checking whether a node is the highest ancestor to remove or not will significantly increases the runtime because that’s an O(n^2) operation where n is 8 million. And even if we could implement a clever algorithm to do that in O(n), we still need AssumeContentIsAlwaysEditable because that’s what gives us 50% of my optimization.
Ryosuke Niwa
Comment 19
2012-11-05 18:20:46 PST
(In reply to
comment #17
)
> (From update of
attachment 172446
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=172446&action=review
> > Can you include a manual test-case? That way if anyone needs to work on this code in the future, they will be able to make sure they don't regress it if they need to change this significantly.
I don’t think that’s useful. Literally nobody runs manual tests.
Ryosuke Niwa
Comment 20
2012-11-05 18:21:07 PST
Comment on
attachment 172446
[details]
Lazy attach + avoid calling isContentEditable With that, I’m going to re-iterate my r?.
Ryosuke Niwa
Comment 21
2012-11-05 18:33:07 PST
Created
attachment 172462
[details]
Keep assertions in appendNode and insertNodeBefore
Ryosuke Niwa
Comment 22
2012-11-05 19:00:15 PST
(In reply to
comment #18
)
> (From update of
attachment 172446
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=172446&action=review
>
> No. The reason I have to make this RefPtr is the optimization I added below. We used to always moves from one place in the document to another place. > This patch implements a mass removal, which necessities all nodes to be retained.
I’ll elaborate on this a little more since the reasoning isn’t so obvious. On the contrary to what I said in the
comment #16
, it’s extremely hard (if not impossible) to make WebKit synchronously callback to author scripts while we're removing nodes in SimplifyMarkupCommand (we still have to worry about AX code, WebKit API, etc… so we still can’t use RAII to override isContentEditable) because all nodes and their descendants we’re removing there are contents pasted by the user. The only way we could have had a security vulnerability was if there was a way for ReplaceSelectionCommand to fire a synchronous event after nodes are inserted and before SimplifyMarkupCommand is called (lines 958 through 1023 in ReplaceSelectionCommand.cpp). I can’t think of a way to do that. As far as I understand the ode, we don’t remove any node that had not been inserted by ReplaceSelectionCommand itself. Now, if ReplaceSelectionCommand DID fire a synchronous event, we can insert some element like input element wrapped in a redundant element like div without any attributes, set the focus there, and wait for SimplifyMarkupCommand to blur it.
Build Bot
Comment 23
2012-11-06 02:05:35 PST
Comment on
attachment 172462
[details]
Keep assertions in appendNode and insertNodeBefore
Attachment 172462
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/14665480
New failing tests: editing/input/password-echo-passnode.html
WebKit Review Bot
Comment 24
2012-11-06 05:34:11 PST
Comment on
attachment 172462
[details]
Keep assertions in appendNode and insertNodeBefore
Attachment 172462
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/14731747
New failing tests: editing/execCommand/switch-list-type-with-orphaned-li.html editing/input/password-echo-passnode.html
Ryosuke Niwa
Comment 25
2012-11-07 13:57:59 PST
Created
attachment 172866
[details]
Fixed password-echo-passnode.html
Enrica Casucci
Comment 26
2012-11-07 16:18:57 PST
Comment on
attachment 172866
[details]
Fixed password-echo-passnode.html It would have been nice not having to repeat all the logic around ShouldAssumeContentIsAlwaysEditable in every command affected. Is there any way to move it to the parent class? I know some of the commands are simple and other are composite. If there is no way to move it to a common place it is fine too, not holding the review for this.
Ryosuke Niwa
Comment 27
2012-11-07 16:26:59 PST
Thanks for r+. (In reply to
comment #26
)
> (From update of
attachment 172866
[details]
) > It would have been nice not having to repeat all the logic around ShouldAssumeContentIsAlwaysEditable in every command affected. Is there any way to move it to the parent class? I know some of the commands are simple and other are composite. If there is no way to move it to a common place it is fine too, not holding the review for this.
The problem is that AppendNodeCommand, etc... inherits from SimpleEditCommand and thus needs to call its constructor. Even if we made this flag set by a member function of SimpleEditCommand, then we'll still need to modify those commands and then wrapper functions in CompositeEditCommand. In the end, that didn't seem like a win.
Ryosuke Niwa
Comment 28
2012-11-07 16:36:16 PST
Committed
r133820
: <
http://trac.webkit.org/changeset/133820
>
Ojan Vafai
Comment 29
2012-11-07 17:05:39 PST
(In reply to
comment #19
)
> (In reply to
comment #17
) > > (From update of
attachment 172446
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=172446&action=review
> > > > Can you include a manual test-case? That way if anyone needs to work on this code in the future, they will be able to make sure they don't regress it if they need to change this significantly. > > I don’t think that’s useful. Literally nobody runs manual tests.
I strongly disagree with this. When someone looks at the history of this code, they will be able to find the patch that committed the code and find the performance test. At the very least, can you upload a performance test to this bug? As it is, there's no way for someone hacking on this code to duplicate your results.
Ryosuke Niwa
Comment 30
2012-11-07 17:13:41 PST
(In reply to
comment #29
)
> I strongly disagree with this. When someone looks at the history of this code, they will be able to find the patch that committed the code and find the performance test. At the very least, can you upload a performance test to this bug? As it is, there's no way for someone hacking on this code to duplicate your results.
Unfortunately, the original reproduction of this bug is a confidential e-mail received on Mail and it's about 1MB in size. Note that this bug only reproduces on Mail.
Ryosuke Niwa
Comment 31
2012-11-07 17:18:12 PST
To clarify this point, ReplaceSelectionCommand DOES use SimplifyMarkupCommand on the pasted content but it does so after calling removeRedundantStylesAndKeepStyleSpanInline on the pasted content, which is orders of magnitude slower than SimplifyMarkupCommand itself. If we wanted to make the paste faster, then we have to reimplement removeRedundantStylesAndKeepStyleSpanInline using techniques used in SimplifyMarkupCommand. I do agree that adding a performance test for pasting a large content where removeRedundantStylesAndKeepStyleSpanInline and SimplifyMarkupCommand do a lot of work is valuable.
Ojan Vafai
Comment 32
2012-11-07 17:18:47 PST
Comment on
attachment 172446
[details]
Lazy attach + avoid calling isContentEditable View in context:
https://bugs.webkit.org/attachment.cgi?id=172446&action=review
Sorry I didn't get to respond before this got committed. I still don't understand why we need such a complex solution.
>>> Source/WebCore/editing/ApplyStyleCommand.cpp:945 >>> + node->document()->updateStyleIfNeeded(); >> >> Is this needed now because we lazy attach? If so, would be good to put a comment to that effect. > > Yes. The reason is self-evident from the fact the line line below touches node->renderer().
I disagree. We didn't need it before this patch right? So, the code below doesn't clearly require it before this patch. We don't call updateStyleIfNeeded throughout the code base every time we touch a renderer and the places we have thrown in these calls have often turned out to be unintended performance bottlenecks. We touch node->renderer(), but we updateStyleIfNeeded on the entire document.
>>> Source/WebCore/editing/SimplifyMarkupCommand.cpp:44 >>> + Vector<RefPtr<Node> > nodesToRemove; >> >> This looks like a behavior change to me. As in, it looks like you are fixing a use-after-free bug here. Mind making a test-case for this? > > No. The reason I have to make this RefPtr is the optimization I added below. We used to always moves from one place in the document to another place. > This patch implements a mass removal, which necessities all nodes to be retained.
I don't understand why we need the RefPtr if we can't fire events or execute JS code.
>>> Source/WebCore/editing/SimplifyMarkupCommand.cpp:90 >>> + int numPrunedAncestors = pruneSubsequentAncestorsToRemove(nodesToRemove, i); >> >> It seems to me that we could do something much simpler here. In the loop above, have two vectors: >> nodesToRemove and nodeWithChildrenToPreserve (needs a better name). >> >> Only add the highest ancestor to remove to nodesToRemove and the lowest ancestor to remove to nodeWithChildrenToPreserve. Then, create a version of removeNodePreservingChildren that takes a first argument of the highest ancestor to remove and a second argument of the lowest ancestor to remove. >> >> Then removeNodePreservingChildren just reparents the children of the lowest ancestor and then removes the highest ancestor. >> >> In addition to being simpler, I expect that would perform much better as well. Also, then I don't think we'd need to do any of this lazyAttach or AssumeContentIsAlwaysEditable stuff. > > I had already tried that but even checking whether a node is the highest ancestor to remove or not will significantly increases the runtime because that’s an O(n^2) operation where n is 8 million. > And even if we could implement a clever algorithm to do that in O(n), we still need AssumeContentIsAlwaysEditable because that’s what gives us 50% of my optimization.
I don't understand. We are already doing the work to calculate both these nodes as best I can tell. In the code above, startingNode==nodeWithChildrenToPreserve and topNodeWithStartingStyle.firstChild == highestAncestorToRemove. What am I missing? Even if we can't get rid of AssumeContentIsAlwaysEditable (which I'm still not convinced of), getting rid of the updateStyleIfNeeded would be a big improvement.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug