Bug 42941

Summary: Redo fails after text node is split by SplitTextNodeCommand
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: HTML EditingAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, darin, enrica, eric, ojan, tkent, tony, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 27156    
Attachments:
Description Flags
demonstrates the bug
none
Added doReapply to SplitTextNodeCommand
darin: review-
Added doReapply to SplitTextNodeCommand (asserts parent node is editable)
none
Added doReapply to SplitTextNodeCommand (asserts parent node is editable)
darin: review-
Added a test and fix per Darin's comment. darin: review+

Description Ryosuke Niwa 2010-07-24 18:42:53 PDT
Created attachment 62516 [details]
demonstrates the bug

Because SplitTextNodeCommand's doReapply is not implemented and creates new text node when reapplying, re-applying inline style to the first half of the split text node fails.

Reproduction steps:
1. Type in "hello"
2. Split into "he" and "llo" by inserting a new paragraph
3. Apply any inline style "he"
4. Undo twice
5. Redo twice

Expected result:
"he" is bolded

Actual result:
"he" is not bolded.
Comment 1 Ryosuke Niwa 2010-07-24 19:27:07 PDT
Created attachment 62518 [details]
Added doReapply to SplitTextNodeCommand

Trivial fix.
Comment 2 Ryosuke Niwa 2010-07-25 14:05:32 PDT
Created attachment 62531 [details]
Added doReapply to SplitTextNodeCommand (asserts parent node is editable)

Instead of bailing out of doApply when the parent node is not editable, assert that the parent node is editable because callee shouldn't be calling SplitTextNode on a non-editable text node.
Comment 3 Eric Seidel (no email) 2010-07-25 14:30:28 PDT
Attachment 62531 [details] did not build on mac:
Build output: http://queues.webkit.org/results/3583474
Comment 4 Ryosuke Niwa 2010-07-25 15:00:25 PDT
Created attachment 62534 [details]
Added doReapply to SplitTextNodeCommand (asserts parent node is editable)

(In reply to comment #3)
> Attachment 62531 [details] did not build on mac:
> Build output: http://queues.webkit.org/results/3583474

Oh right, ASSERT will be gone in release build and parent isn't accessed elsewhere. Fixed.
Comment 5 Darin Adler 2010-07-25 19:36:04 PDT
Comment on attachment 62534 [details]
Added doReapply to SplitTextNodeCommand (asserts parent node is editable)

I don’t understand why this patch is needed.

By default, doReapply calls doApply. The existing logic in SplitTextNodeCommand::doApply looks like it would work fine if you called it again after doUnapply was called. I trust that there is a failure because you provide a test case. Could you be more specific about exactly what fails?

I suspect there is a much simpler way to fix this.

> -    Node* parent = m_text2->parentNode();
> -    if (!parent || !parent->isContentEditable())
> -        return;
> +    ASSERT(m_text2->parentNode() && m_text2->parentNode()->isContentEditable());

I don’t think it’s a good idea to change this into an assertion. What’s the guarantee this function won’t be called incorrectly? In general, the approach we’ve been using in the editing commands is to have the low level commands themselves do reality checks. This makes it a a little harder to cause crashes by using the functions wrong and also makes it safer to use a doApply function later as a doReapply function, where there has been time for the organization of the document to possibly have been changed a bit.

I’m not sure why you decided to reverse this design direction for this particular command.

> +    executeApply();

I think the function name "execute apply" is vague; two verbs in a row is also hard to parse mentally. The function names for commands, such as doApply and doReapply, are already quirky and a little hard to understand. Adding a function named executeApply adds a bit to the confusion.

I won’t suggest another name, because I think there is no need to add functions here at all.
Comment 6 Ryosuke Niwa 2010-07-25 20:04:47 PDT
(In reply to comment #5)
> By default, doReapply calls doApply. The existing logic in SplitTextNodeCommand::doApply looks like it would work fine if you called it again after doUnapply was called. I trust that there is a failure because you provide a test case. Could you be more specific about exactly what fails?

Consider a case where a text node <text> is split into <text#1><text#2> (note <text#1> is the new node and <text#2> is the old <text>). Now, suppose we insert a span before <text#1> to wrap the text. At this point, SplitTextCommand and InsertNodeBeforeCommand are added to in our undo stack.

However, when we undo these steps and redo them, SplitTextCommand creates <text#3><text#2> and InsertNodeBeforeCommand fails to insert the span because its m_refChild is still pointing to <text#1>.  My test case simulates this process using ApplyStyleCommand (it splits the text node and wrap it with a span).

> > -    Node* parent = m_text2->parentNode();
> > -    if (!parent || !parent->isContentEditable())
> > -        return;
> > +    ASSERT(m_text2->parentNode() && m_text2->parentNode()->isContentEditable());
> 
> I don’t think it’s a good idea to change this into an assertion. What’s the guarantee this function won’t be called incorrectly? In general, the approach we’ve been using in the editing commands is to have the low level commands themselves do reality checks. This makes it a a little harder to cause crashes by using the functions wrong and also makes it safer to use a doApply function later as a doReapply function, where there has been time for the organization of the document to possibly have been changed a bit.

The reason I made this change was because there is no way for the caller to know whether or not the text node was really split other than remembering neighboring nodes and comparing it after the call.  I don't think simpler commands should be smart and defensive.  That makes the code in composite edit commands to miss condition checks before executing simple commands and assume that the command always succeeds or end up having awkward check & branching for two different cases.

> I’m not sure why you decided to reverse this design direction for this particular command.

i.e. I don't personally like this design decision.  But that's totally off from what the bug is addressing so I could just revert this change.

> > +    executeApply();
> 
> I think the function name "execute apply" is vague; two verbs in a row is also hard to parse mentally. The function names for commands, such as doApply and doReapply, are already quirky and a little hard to understand. Adding a function named executeApply adds a bit to the confusion.

I followed the pattern for SplitElementCommand.
Comment 7 Ryosuke Niwa 2010-07-25 20:31:52 PDT
Comment on attachment 62518 [details]
Added doReapply to SplitTextNodeCommand

In fact, I'll do that now.  Reverted the ASSERT change.
Comment 8 Darin Adler 2010-07-25 20:35:09 PDT
(In reply to comment #6)
> I don't think simpler commands should be smart and defensive.

In the past we had serious problems with crashes when we try to undo a command after changes to the document. I changed all the simple command undo and redo code to be defensive; after that change those crashes are almost completely a thing of the past. We should not go back to the old way.

I agree that the simple command steps should not be "smart". They should know only enough to do a mechanical operation, and not have some higher level view of how this operation fits into a larger comment.

Generally speaking I am not fond of functions that can be "called wrong"; ones that have to do a lot of assertions on their arguments. Overall they are harder to use than functions that do something well defined in all cases, make few assumptions, and hence have to make few assertions.

In the initial command execution, in doApply, we could use assertions instead of reality checks. If we wanted to turn the command overall into a “don’t call this wrong” type of function. Something I’d generally prefer not to do.

But in undo and redo, in doUnappy and doReapply, we’ll need sufficient checks to make sure the code does not lead to a crash or problem if a careless or malicious JavaScript programmer changes the document between the initial command execution, undo, and redo.

> That makes the code in composite edit commands to miss condition checks before executing simple commands and assume that the command always succeeds or end up having awkward check & branching for two different cases.

I don’t think these complexities you mention in composite edit commands are affected much at all by the simple edit command policy. I could imagine us adopting a new policy that we do assertions on the initial command execution and limit the runtime checks to the undo/redo code. But I don’t think this is the place to start in that new direction and I’m not sure it’s the right direction.

> But that's totally off from what the bug is addressing so I could just revert this change.

This is the kind of thing we should talk over before we do it, so I’m glad I had a chance to express my thoughts on this.

> > > +    executeApply();
> > 
> > I think the function name "execute apply" is vague; two verbs in a row is also hard to parse mentally. The function names for commands, such as doApply and doReapply, are already quirky and a little hard to understand. Adding a function named executeApply adds a bit to the confusion.
> 
> I followed the pattern for SplitElementCommand.

Yes, I don’t like the name there either.

OK, here are some other comments on the patch now that I know the basic approach of keeping the node around is needed:

- With the patch, the redo code path will crash if m_text2 no longer has a parent node. This can arise when JavaScript programmers change the document between the undo and redo operations. So the code we removed from doApply does need to be kept around for doReapply.

- With the patch, the redo code path will now operate on the nodes even if they are now part of non-editable content. That’s a policy change that is probably not an improvement. The code we removed from doApply probably does need to be kept around for doReapply

- Since we’re going to keep the text node around, we may need to do some reality checks on it and not do the redo command in certain cases. For example, if the text node has a parent we may want to do nothing. This can arise when JavaScript programmers change the document between the undo and redo operations. One of the benefits of the old approach was that changes to that text node could not interfere with future undo/redo.

Otherwise it seems like this patch will be OK.
Comment 9 Darin Adler 2010-07-25 20:35:47 PDT
Comment on attachment 62518 [details]
Added doReapply to SplitTextNodeCommand

review- because this removes the needed null check of m_text2->parentNode() as well as the possibly-needed content-editable check from the redo operation.
Comment 10 Darin Adler 2010-07-25 20:36:10 PDT
Comment on attachment 62518 [details]
Added doReapply to SplitTextNodeCommand

WebCore/editing/SplitTextNodeCommand.cpp:99
 +      int ec = 0;
We normally use the type ExceptionCode rather than just "int".
Comment 11 Ryosuke Niwa 2010-07-25 23:07:54 PDT
Created attachment 62546 [details]
Added a test and fix per Darin's comment.

(In reply to comment #8)
> In the past we had serious problems with crashes when we try to undo a command after changes to the document. I changed all the simple command undo and redo code to be defensive; after that change those crashes are almost completely a thing of the past. We should not go back to the old way.
> 
> I agree that the simple command steps should not be "smart". They should know only enough to do a mechanical operation, and not have some higher level view of how this operation fits into a larger comment.
> 
> Generally speaking I am not fond of functions that can be "called wrong"; ones that have to do a lot of assertions on their arguments. Overall they are harder to use than functions that do something well defined in all cases, make few assumptions, and hence have to make few assertions.

I see. Thanks for the detailed explanation.

> But in undo and redo, in doUnappy and doReapply, we’ll need sufficient checks to make sure the code does not lead to a crash or problem if a careless or malicious JavaScript programmer changes the document between the initial command execution, undo, and redo.

Agreed.

> > I followed the pattern for SplitElementCommand.
> 
> Yes, I don’t like the name there either.

Renamed but let me know if you have a better one.


> - With the patch, the redo code path will crash if m_text2 no longer has a parent node. This can arise when JavaScript programmers change the document between the undo and redo operations. So the code we removed from doApply does need to be kept around for doReapply.

Fixed with a test. But AppendChildNode is adding back "hello" and I can't spot where it's coming from. It should probably be addressed in another bug though.

> - With the patch, the redo code path will now operate on the nodes even if they are now part of non-editable content. That’s a policy change that is probably not an improvement. The code we removed from doApply probably does need to be kept around for doReapply

Fixed.

> - Since we’re going to keep the text node around, we may need to do some reality checks on it and not do the redo command in certain cases. For example, if the text node has a parent we may want to do nothing. This can arise when JavaScript programmers change the document between the undo and redo operations. One of the benefits of the old approach was that changes to that text node could not interfere with future undo/redo.

Are you talking about the case where m_text1 gets a parent?

(In reply to comment #10)
> (From update of attachment 62518 [details])
> WebCore/editing/SplitTextNodeCommand.cpp:99
>  +      int ec = 0;
> We normally use the type ExceptionCode rather than just "int".

Oops, fixed.
Comment 12 Darin Adler 2010-07-25 23:17:24 PDT
Comment on attachment 62546 [details]
Added a test and fix per Darin's comment.

>      ExceptionCode ec = 0;
>      m_text2->insertData(0, prefixText, ec);
> +    ASSERT(!ec);

Asserting here is OK because insertData can only return an exception when the offset is larger than the text size, and we are passing an offset of 0.

> +    m_text2->parentNode()->insertBefore(m_text1.get(), m_text2.get(), ec);
> +    ASSERT(!ec);

But asserting here seems possibly wrong. Are you sure there are no exceptions that insertBefore might raise in unusual circumstances?
Comment 13 Darin Adler 2010-07-25 23:17:48 PDT
(In reply to comment #11)
> > - Since we’re going to keep the text node around, we may need to do some reality checks on it and not do the redo command in certain cases. For example, if the text node has a parent we may want to do nothing. This can arise when JavaScript programmers change the document between the undo and redo operations. One of the benefits of the old approach was that changes to that text node could not interfere with future undo/redo.
> 
> Are you talking about the case where m_text1 gets a parent?

That was one case I thought of. And it should not be a problem because insertBefore takes care of that. But perhaps there are other ways the text node could be in an unusual state.
Comment 14 Ryosuke Niwa 2010-07-25 23:30:20 PDT
(In reply to comment #12)
> > +    m_text2->parentNode()->insertBefore(m_text1.get(), m_text2.get(), ec);
> > +    ASSERT(!ec);
> 
> But asserting here seems possibly wrong. Are you sure there are no exceptions that insertBefore might raise in unusual circumstances?

The only case in which exception occurs is if checkAddChild fails on m_text1 (which seem to happen only if m_text1 was moved out of the document), or removeChild on m_text1's parent caused an exception.

I don't think either case is realistic in most cases but I could change it to bail out just to be safe and also to match with the defensive approach taken elsewhere. Should I change it to bail out before I commit or would you prefer to review the patch again before I commit?
Comment 15 Darin Adler 2010-07-25 23:36:40 PDT
(In reply to comment #14)
> The only case in which exception occurs is if checkAddChild fails on m_text1 (which seem to happen only if m_text1 was moved out of the document), or removeChild on m_text1's parent caused an exception.
> 
> I don't think either case is realistic in most cases but I could change it to bail out just to be safe and also to match with the defensive approach taken elsewhere. Should I change it to bail out before I commit or would you prefer to review the patch again before I commit?

I don’t know exactly what you mean by realistic. I suppose the question is twofold:

    1) Can we prove that an exception never will occur?

    2) If an exception does occur, is it important that we do not go on with the removal from the text node?

The point of having code like this is to avoid unpleasant crashes that could possibly be security vulnerabilities, and also to avoid possible data loss if half of an editing operation happens and the other half does not.

In this case, I think we’re really close to proving (1). I can’t come up with a case where an exception could occur. But usually the burden of proof goes in the other direction.

But more importantly, I think the answer to (2) is no. So it’s probably OK to keep running even if we do have an assertion.

It goes against good programming practice to assert something that probably won’t happen. We should assert only things we know won’t happen. Assertions are not for wishful thinking.

So on balance, I’d suggest removing the assertion, but I don’t really care if we add back code to exit when we get an exception. And I stand by my review+ so you are also welcome to land the patch as-is.
Comment 16 Ryosuke Niwa 2010-07-26 00:33:49 PDT
(In reply to comment #15)
> > I don't think either case is realistic in most cases but I could change it to bail out just to be safe and also to match with the defensive approach taken elsewhere. Should I change it to bail out before I commit or would you prefer to review the patch again before I commit?
> 
> I don’t know exactly what you mean by realistic.

I meant that I couldn't think of a way that the condition occurs but I say it with 100% assurance.

>     1) Can we prove that an exception never will occur?

No.

>     2) If an exception does occur, is it important that we do not go on with the removal from the text node?
...
> But more importantly, I think the answer to (2) is no. So it’s probably OK to keep running even if we do have an assertion.

I'm not sure I fully understand you but if we somehow fail to insert the node and remove the text, then we end up losing text.  So I think the answer to (2) is yes.

> It goes against good programming practice to assert something that probably won’t happen. We should assert only things we know won’t happen. Assertions are not for wishful thinking.

I agree.

> So on balance, I’d suggest removing the assertion, but I don’t really care if we add back code to exit when we get an exception. And I stand by my review+ so you are also welcome to land the patch as-is.

I'll add back the bail out since that was the original code to begin with.

Thank you again for the detailed reviews.
Comment 17 WebKit Review Bot 2010-07-26 01:40:13 PDT
http://trac.webkit.org/changeset/64034 might have broken Qt Linux Release
Comment 18 Ryosuke Niwa 2010-07-26 02:03:43 PDT
Landed as http://trac.webkit.org/changeset/64034.

I don't think the test failure on Qt platform is caused by this patch but will check back tomorrow morning.