Bug 56874 - Repeated copy and paste-in-place operation results in increasingly verbose HTML
Summary: Repeated copy and paste-in-place operation results in increasingly verbose HTML
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: All OS X 10.5
: P2 Normal
Assignee: Enrica Casucci
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2011-03-22 15:41 PDT by Enrica Casucci
Modified: 2011-03-25 09:21 PDT (History)
4 users (show)

See Also:


Attachments
Patch (24.71 KB, patch)
2011-03-22 16:44 PDT, Enrica Casucci
no flags Details | Formatted Diff | Diff
Patch2 (24.71 KB, patch)
2011-03-22 16:58 PDT, Enrica Casucci
darin: review+
Details | Formatted Diff | Diff
Patch3 (24.05 KB, patch)
2011-03-23 11:32 PDT, Enrica Casucci
darin: review-
Details | Formatted Diff | Diff
Patch4 (24.07 KB, patch)
2011-03-23 20:21 PDT, Enrica Casucci
darin: review-
Details | Formatted Diff | Diff
Patch5 (24.19 KB, patch)
2011-03-24 11:32 PDT, Enrica Casucci
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Enrica Casucci 2011-03-22 15:41:23 PDT
* SUMMARY
A repeated copy and paste-in-place operation on right-aligned bold "hello" results in increasingly verbose HTML

* STEPS TO REPRODUCE
Setup:
1. Open TextEdit
2. Type "hello"
3. CMD+A to select all
4. Right-align the text, bold it, and increase the font to be pretty large.
5. CMD+C to copy all selected text

5. Open http://www.mozilla.org/editor/midasdemo/
6. Paste.  (then check View HTML Source)
7. CMD+A then CMD+C to copy
8. Paste again in place (then check View HTML Source)

* RESULTS
Expect the pasted HTML to be the same
Instead, it is more verbose
Comment 1 Enrica Casucci 2011-03-22 16:44:36 PDT
Created attachment 86540 [details]
Patch
Comment 2 WebKit Review Bot 2011-03-22 16:46:41 PDT
Attachment 86540 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/edit..." exit_code: 1

Source/WebCore/editing/ReplaceSelectionCommand.cpp:782:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebCore/editing/ReplaceSelectionCommand.cpp:783:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebCore/editing/ReplaceSelectionCommand.cpp:784:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebCore/editing/ReplaceSelectionCommand.cpp:785:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebCore/editing/ReplaceSelectionCommand.cpp:786:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebCore/editing/ReplaceSelectionCommand.cpp:787:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebCore/editing/ReplaceSelectionCommand.cpp:788:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebCore/editing/ReplaceSelectionCommand.cpp:789:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebCore/editing/ReplaceSelectionCommand.cpp:790:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Total errors found: 9 in 10 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Enrica Casucci 2011-03-22 16:58:15 PDT
Created attachment 86545 [details]
Patch2

Fixed style.
Comment 4 Darin Adler 2011-03-22 18:47:46 PDT
Comment on attachment 86545 [details]
Patch2

View in context: https://bugs.webkit.org/attachment.cgi?id=86545&action=review

I’m going to say review+, but I think there is also room for improvement.

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:794
> +    if (node->hasTagName(spanTag) && node->renderer() && node->renderer()->isInline()) {

I don’t understand why for all those other tags we are assuming they are always inline, but for span we are checking if a renderer exists and is actually an inline. Pages can style those other types of tags to be non-inline too; there is nothing special about span in this respect.

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:796
> +        const HTMLElement* elem = static_cast<const HTMLElement*>(node);
> +        return elem->hasAttribute(styleAttr);

I don’t think you need a local variable for this. If you did want to keep it, I’d suggest naming it element, not elem.

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:947
> +    // If we are not trying to match the destination style and we are not
> +    // inside a list element we prefer a position that is outside inline
> +    // elements that provide style. This way we can produce a less verbose
> +    // markup.
> +    if (!m_matchStyle && !enclosingList(insertionPos.deprecatedNode())) {

Why is being in a list relevant? Comments need to focus on the question “why” so the comment should explain.

Why are you using the deprecatedNode function? Isn’t there a more modern way to write this, avoiding the functions with “deprecated” in their names?

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:956
> +            if (parentNode->childNodes()->length() > 1)

There are more efficient ways to check if something has more than one child. This will walk the entire list of child nodes to count them all just to check if it is > 1. You could just check parent->firstChild()->nextSibling() to see if it is non-zero, with the appropriate null checks.

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:957
> +                splitElement(static_cast<Element*>(parentNode), parentNode->childNodes()->item(insertionPos.deprecatedEditingOffset()));

Is there a guarantee that the parent is not the document? I just want to make sure the cast is safe.
Comment 5 Justin Garcia 2011-03-22 22:42:03 PDT
(In reply to comment #0)
> 5. CMD+C to copy all selected text
> 
> 5. Open http://www.mozilla.org/editor/midasdemo/
> 6. Paste.  (then check View HTML Source)
> 7. CMD+A then CMD+C to copy
> 8. Paste again in place (then check View HTML Source)

When you copy the styled content and paste it over itself, why doesn't WebKit remove the styled inline elements during the delete phase? Why are they still there at Paste time? That seems like the real bug.

In TextEdit, if I type:

hello
world

Select "hello", bold it, delete it, move the selection elsewhere, move the selection back to the first line and start typing, the inserted text *is not* bold.

It *is* bold with WebKit.

If we don't leave those styled elements around in the document in the delete phase, then at Paste time, I think ReplaceSelectionCommand won't have a problem keeping the DOM clean.
Comment 6 Darin Adler 2011-03-23 00:18:01 PDT
(In reply to comment #5)
> If we don't leave those styled elements around in the document in the delete phase, then at Paste time, I think ReplaceSelectionCommand won't have a problem keeping the DOM clean.

Enrica considered this.

It helps in the delete everything case, but as soon as you leave even a single character behind it falls apart. A fix that prevents the increasing nesting even if you leave a little behind is even better.
Comment 7 Darin Adler 2011-03-23 00:19:55 PDT
Note also that if we delete the <b> element then typed text won’t be bold. So doing the “delete more” optimization means we’d have to delete differently when replacing than when just deleting alone.
Comment 8 Ryosuke Niwa 2011-03-23 00:53:39 PDT
Comment on attachment 86545 [details]
Patch2

View in context: https://bugs.webkit.org/attachment.cgi?id=86545&action=review

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:792
> +    if (node->hasTagName(bTag)
> +        || node->hasTagName(fontTag)
> +        || node->hasTagName(iTag)
> +        || node->hasTagName(sTag)
> +        || node->hasTagName(strikeTag)
> +        || node->hasTagName(strongTag)
> +        || node->hasTagName(strikeTag)
> +        || node->hasTagName(subTag)
> +        || node->hasTagName(supTag)
> +        || node->hasTagName(uTag))
> +        return true;

This duplicates the tables in conflictsWithImplicitStyleOfElement and htmlAttributeEquivalents.  Ideally, we share code with those functions.

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:946
> +    // If we are not trying to match the destination style and we are not
> +    // inside a list element we prefer a position that is outside inline
> +    // elements that provide style. This way we can produce a less verbose
> +    // markup.

I'd really prefer to have a properly named function than having a comment here since this function is already one of the most complicated doApply functions.

>> Source/WebCore/editing/ReplaceSelectionCommand.cpp:947
>> +    if (!m_matchStyle && !enclosingList(insertionPos.deprecatedNode())) {
> 
> Why is being in a list relevant? Comments need to focus on the question “why” so the comment should explain.
> 
> Why are you using the deprecatedNode function? Isn’t there a more modern way to write this, avoiding the functions with “deprecated” in their names?

To add to Darin's comment, you should be calling containerNode() here since you shouldn't consider position before/after a list element as inside the list.

Also, given the condition, can't we just do this in the else clause around the line 949 where we call insertNodeAtAndUpdateNodesInserted?

> LayoutTests/editing/pasteboard/paste-text-with-style.html:1
> +<html> 

Missing DOCTYPE.

> LayoutTests/editing/pasteboard/paste-text-with-style.html:19
> +<script src=../editing.js language="JavaScript" type="text/JavaScript" ></script>

I don't think we need to put language or type attributes.

> LayoutTests/editing/pasteboard/paste-text-with-style.html:33
> +function editingTest() {
> +    var markupBefore = document.getElementById("test").innerHTML;
> +    log("Markup before: " + markupBefore);
> +    selectAllCommand();
> +    copyCommand();
> +    pasteCommand();
> +    var markupAfter = document.getElementById("test").innerHTML;
> +    log("Markup after: " + markupAfter);
> +}

It seems to me that what you want to do is to use dump-as-markup.js.
Comment 9 Enrica Casucci 2011-03-23 09:32:27 PDT
Thank you all for the review and feedback. I'll post another patch shortly to address your comments.
Comment 10 Justin Garcia 2011-03-23 09:58:40 PDT
> Note also that if we delete the <b> element then typed text won’t be bold.

No, I think it would be bold even without the <b> because DeleteSelectionCommand::saveTypingStyleState() would preserve the typing style.

> So doing the “delete more” optimization means we’d have to delete differently when replacing than when just deleting alone.

I'm saying that delete should delete more in both the replace case and the delete case. The pure delete case does not match TextEdit right now as I mentioned. But maybe that is intentional.

> It helps in the delete everything case, but as soon as you leave even a single character behind it falls apart. A fix that prevents the increasing nesting even if you leave a little behind is even better.

True.
Comment 11 Ryosuke Niwa 2011-03-23 10:15:59 PDT
(In reply to comment #10)
> > So doing the “delete more” optimization means we’d have to delete differently when replacing than when just deleting alone.
> 
> I'm saying that delete should delete more in both the replace case and the delete case. The pure delete case does not match TextEdit right now as I mentioned. But maybe that is intentional.

Can we check what IE/FF does?  Also we should check what Microsoft Word and WordPad do on Windows.  It might be one of those things where Mac/Windows behaviors differ.
Comment 12 Enrica Casucci 2011-03-23 11:32:49 PDT
Created attachment 86650 [details]
Patch3

I've changed the code a bit and I like this implementation better. I believe I've addressed most of the comments.
We could explore the direction of sanitizing the markup upon delete in the future.
There are many more improvements to make in the generated markup when copying and pasting.
Comment 13 Darin Adler 2011-03-23 12:08:16 PDT
Comment on attachment 86650 [details]
Patch3

View in context: https://bugs.webkit.org/attachment.cgi?id=86650&action=review

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:786
> +    if (node->hasTagName(spanTag))
> +        return static_cast<const HTMLElement*>(node)->hasAttribute(styleAttr);
> +    return true;

The concept here is good but I think this logic is backwards. Tags like <b> add style even though they have no style attribute. But a custom tag like <oog> would behave the same as span. So it’s not span that’s the special case, it’s all the other elements that contribute style without a style attribute. We need a list of tags that add style even when they have no style attribute.

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:945
> +            if (parentNode->firstChild()->nextSibling())
> +                splitElement(static_cast<Element*>(parentNode), parentNode->childNodes()->item(insertionPos.offsetInContainerNode()));

I don’t think this is right in the case that the anchor node is a text node.
Comment 14 Darin Adler 2011-03-23 12:59:51 PDT
Comment on attachment 86650 [details]
Patch3

View in context: https://bugs.webkit.org/attachment.cgi?id=86650&action=review

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:941
> +            if (insertionPos.anchorNode()->isTextNode() && insertionPos.offsetInContainerNode())
> +                splitTextNodeContainingElement(static_cast<Text*>(insertionPos.anchorNode()), insertionPos.offsetInContainerNode());

I think this needs one more check, a check that anchorType() is PositionIsOffsetInAnchor.

>> Source/WebCore/editing/ReplaceSelectionCommand.cpp:945

> 
> I don’t think this is right in the case that the anchor node is a text node.

Rather than parentNode->childNodes()->item(insertionPos.offsetInContainerNode()) I think we should pass in insertionPos.computeNodeAfterPosition() here.
Comment 15 Enrica Casucci 2011-03-23 20:21:48 PDT
Created attachment 86738 [details]
Patch4
Comment 16 Darin Adler 2011-03-24 10:21:01 PDT
Comment on attachment 86738 [details]
Patch4

View in context: https://bugs.webkit.org/attachment.cgi?id=86738&action=review

review- because of the bad HTMLElement cast

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:788
> +    const AtomicString classAttributeValue = static_cast<const HTMLElement*>(node)->getAttribute(classAttr);

It’s not safe to cast node to HTMLElement* without first checking if it is an HTMLElement.

The type here should be const AtomicString&, not const AtomicString, to avoid an extra bit of reference count churn.

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:797
> +    const NamedNodeMap* attributesMap = static_cast<const HTMLElement*>(node)->attributeMap();

This should just be attributeMap, not attributesMap.

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:957
> +            // If we are in the middle of a text node, we need to split it before we can
> +            // move the insertion position.
> +            if (insertionPos.anchorNode()->isTextNode() && insertionPos.anchorType() == Position::PositionIsOffsetInAnchor && insertionPos.offsetInContainerNode())
> +                splitTextNodeContainingElement(static_cast<Text*>(insertionPos.anchorNode()), insertionPos.offsetInContainerNode());

Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text.
Comment 17 Enrica Casucci 2011-03-24 10:25:24 PDT
(In reply to comment #16)
> (From update of attachment 86738 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=86738&action=review
> 
> review- because of the bad HTMLElement cast
> 
> > Source/WebCore/editing/ReplaceSelectionCommand.cpp:788
> > +    const AtomicString classAttributeValue = static_cast<const HTMLElement*>(node)->getAttribute(classAttr);
> 
> It’s not safe to cast node to HTMLElement* without first checking if it is an HTMLElement.
>
I don't think this was ever possible since we are only calling this function on elements that are containers, but I'll add it anyway.
 
> The type here should be const AtomicString&, not const AtomicString, to avoid an extra bit of reference count churn.
>
ok. 
> > Source/WebCore/editing/ReplaceSelectionCommand.cpp:797
> > +    const NamedNodeMap* attributesMap = static_cast<const HTMLElement*>(node)->attributeMap();
> 
> This should just be attributeMap, not attributesMap.
> 
> > Source/WebCore/editing/ReplaceSelectionCommand.cpp:957
> > +            // If we are in the middle of a text node, we need to split it before we can
> > +            // move the insertion position.
> > +            if (insertionPos.anchorNode()->isTextNode() && insertionPos.anchorType() == Position::PositionIsOffsetInAnchor && insertionPos.offsetInContainerNode())
> > +                splitTextNodeContainingElement(static_cast<Text*>(insertionPos.anchorNode()), insertionPos.offsetInContainerNode());
> 
> Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text.
I did test that and it works correctly.
Comment 18 Darin Adler 2011-03-24 10:30:43 PDT
(In reply to comment #17)
> > It’s not safe to cast node to HTMLElement* without first checking if it is an HTMLElement.
> >
> I don't think this was ever possible since we are only calling this function on elements that are containers, but I'll add it anyway.

Besides HTMLElement, a container can be a non-HTML element, which is an Element* and not an HTMLElement* or something like a Document* or DocumentFragment*.

I think in practice the Element case could really happen, although probably not the others.
Comment 19 Darin Adler 2011-03-24 10:31:37 PDT
> > Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text.
> I did test that and it works correctly.

Why does it work correctly? Because splitTextNodeContainingElement correctly does nothing? Should we remove the check for 0 then? Maybe splitTextNodeContainingElement correctly does nothing in that case too.
Comment 20 Enrica Casucci 2011-03-24 11:05:50 PDT
(In reply to comment #19)
> > > Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text.
> > I did test that and it works correctly.
> 
> Why does it work correctly? Because splitTextNodeContainingElement correctly does nothing? Should we remove the check for 0 then? Maybe splitTextNodeContainingElement correctly does nothing in that case too.

No, the 0 case needs to be checked. I'll investigate more to make sure we don't need an additional check
Comment 21 Enrica Casucci 2011-03-24 11:32:47 PDT
Created attachment 86803 [details]
Patch5

One more iteration.
Comment 22 Enrica Casucci 2011-03-24 12:48:46 PDT
http://trac.webkit.org/changeset/81887
Comment 23 Ryosuke Niwa 2011-03-24 19:52:00 PDT
Comment on attachment 86803 [details]
Patch5

View in context: https://bugs.webkit.org/attachment.cgi?id=86803&action=review

> Source/WebCore/editing/ReplaceSelectionCommand.cpp:956
> +    if (!m_matchStyle && !enclosingList(insertionPos.anchorNode()) && isStyleSpan(fragment.firstChild())) {
> +        Node* parentNode = insertionPos.anchorNode()->parentNode();

I don't think this is right.  How does it make sense to use the anchor node here?  Why should we care about the list element before or after the insertion pos?
Comment 24 Enrica Casucci 2011-03-25 09:04:00 PDT
(In reply to comment #23)
> (From update of attachment 86803 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=86803&action=review
> 
> > Source/WebCore/editing/ReplaceSelectionCommand.cpp:956
> > +    if (!m_matchStyle && !enclosingList(insertionPos.anchorNode()) && isStyleSpan(fragment.firstChild())) {
> > +        Node* parentNode = insertionPos.anchorNode()->parentNode();
> 
> I don't think this is right.  How does it make sense to use the anchor node here?  Why should we care about the list element before or after the insertion pos?

I'm not sure I understand what you mean here. I want to avoid list items, since the correct type of splitting is done in insertAsListItem below.
Comment 25 Ryosuke Niwa 2011-03-25 09:21:16 PDT
Comment on attachment 86803 [details]
Patch5

View in context: https://bugs.webkit.org/attachment.cgi?id=86803&action=review

>>> Source/WebCore/editing/ReplaceSelectionCommand.cpp:956
>>> +        Node* parentNode = insertionPos.anchorNode()->parentNode();
>> 
>> I don't think this is right.  How does it make sense to use the anchor node here?  Why should we care about the list element before or after the insertion pos?
> 
> I'm not sure I understand what you mean here. I want to avoid list items, since the correct type of splitting is done in insertAsListItem below.

Right, but you only care when insertPos is IN the list, right? (i.e. insertionPos is enclosed by some list)  Not when insertionPos is before or after a list (i.e. insertionPos is adjacent to some list).