RESOLVED FIXED 56874
Repeated copy and paste-in-place operation results in increasingly verbose HTML
https://bugs.webkit.org/show_bug.cgi?id=56874
Summary Repeated copy and paste-in-place operation results in increasingly verbose HTML
Enrica Casucci
Reported 2011-03-22 15:41:23 PDT
* SUMMARY A repeated copy and paste-in-place operation on right-aligned bold "hello" results in increasingly verbose HTML * STEPS TO REPRODUCE Setup: 1. Open TextEdit 2. Type "hello" 3. CMD+A to select all 4. Right-align the text, bold it, and increase the font to be pretty large. 5. CMD+C to copy all selected text 5. Open http://www.mozilla.org/editor/midasdemo/ 6. Paste. (then check View HTML Source) 7. CMD+A then CMD+C to copy 8. Paste again in place (then check View HTML Source) * RESULTS Expect the pasted HTML to be the same Instead, it is more verbose
Attachments
Patch (24.71 KB, patch)
2011-03-22 16:44 PDT, Enrica Casucci
no flags
Patch2 (24.71 KB, patch)
2011-03-22 16:58 PDT, Enrica Casucci
darin: review+
Patch3 (24.05 KB, patch)
2011-03-23 11:32 PDT, Enrica Casucci
darin: review-
Patch4 (24.07 KB, patch)
2011-03-23 20:21 PDT, Enrica Casucci
darin: review-
Patch5 (24.19 KB, patch)
2011-03-24 11:32 PDT, Enrica Casucci
darin: review+
Enrica Casucci
Comment 1 2011-03-22 16:44:36 PDT
WebKit Review Bot
Comment 2 2011-03-22 16:46:41 PDT
Attachment 86540 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/edit..." exit_code: 1 Source/WebCore/editing/ReplaceSelectionCommand.cpp:782: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:783: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:784: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:785: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:786: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:787: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:788: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:789: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:790: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Total errors found: 9 in 10 files If any of these errors are false positives, please file a bug against check-webkit-style.
Enrica Casucci
Comment 3 2011-03-22 16:58:15 PDT
Created attachment 86545 [details] Patch2 Fixed style.
Darin Adler
Comment 4 2011-03-22 18:47:46 PDT
Comment on attachment 86545 [details] Patch2 View in context: https://bugs.webkit.org/attachment.cgi?id=86545&action=review I’m going to say review+, but I think there is also room for improvement. > Source/WebCore/editing/ReplaceSelectionCommand.cpp:794 > + if (node->hasTagName(spanTag) && node->renderer() && node->renderer()->isInline()) { I don’t understand why for all those other tags we are assuming they are always inline, but for span we are checking if a renderer exists and is actually an inline. Pages can style those other types of tags to be non-inline too; there is nothing special about span in this respect. > Source/WebCore/editing/ReplaceSelectionCommand.cpp:796 > + const HTMLElement* elem = static_cast<const HTMLElement*>(node); > + return elem->hasAttribute(styleAttr); I don’t think you need a local variable for this. If you did want to keep it, I’d suggest naming it element, not elem. > Source/WebCore/editing/ReplaceSelectionCommand.cpp:947 > + // If we are not trying to match the destination style and we are not > + // inside a list element we prefer a position that is outside inline > + // elements that provide style. This way we can produce a less verbose > + // markup. > + if (!m_matchStyle && !enclosingList(insertionPos.deprecatedNode())) { Why is being in a list relevant? Comments need to focus on the question “why” so the comment should explain. Why are you using the deprecatedNode function? Isn’t there a more modern way to write this, avoiding the functions with “deprecated” in their names? > Source/WebCore/editing/ReplaceSelectionCommand.cpp:956 > + if (parentNode->childNodes()->length() > 1) There are more efficient ways to check if something has more than one child. This will walk the entire list of child nodes to count them all just to check if it is > 1. You could just check parent->firstChild()->nextSibling() to see if it is non-zero, with the appropriate null checks. > Source/WebCore/editing/ReplaceSelectionCommand.cpp:957 > + splitElement(static_cast<Element*>(parentNode), parentNode->childNodes()->item(insertionPos.deprecatedEditingOffset())); Is there a guarantee that the parent is not the document? I just want to make sure the cast is safe.
Justin Garcia
Comment 5 2011-03-22 22:42:03 PDT
(In reply to comment #0) > 5. CMD+C to copy all selected text > > 5. Open http://www.mozilla.org/editor/midasdemo/ > 6. Paste. (then check View HTML Source) > 7. CMD+A then CMD+C to copy > 8. Paste again in place (then check View HTML Source) When you copy the styled content and paste it over itself, why doesn't WebKit remove the styled inline elements during the delete phase? Why are they still there at Paste time? That seems like the real bug. In TextEdit, if I type: hello world Select "hello", bold it, delete it, move the selection elsewhere, move the selection back to the first line and start typing, the inserted text *is not* bold. It *is* bold with WebKit. If we don't leave those styled elements around in the document in the delete phase, then at Paste time, I think ReplaceSelectionCommand won't have a problem keeping the DOM clean.
Darin Adler
Comment 6 2011-03-23 00:18:01 PDT
(In reply to comment #5) > If we don't leave those styled elements around in the document in the delete phase, then at Paste time, I think ReplaceSelectionCommand won't have a problem keeping the DOM clean. Enrica considered this. It helps in the delete everything case, but as soon as you leave even a single character behind it falls apart. A fix that prevents the increasing nesting even if you leave a little behind is even better.
Darin Adler
Comment 7 2011-03-23 00:19:55 PDT
Note also that if we delete the <b> element then typed text won’t be bold. So doing the “delete more” optimization means we’d have to delete differently when replacing than when just deleting alone.
Ryosuke Niwa
Comment 8 2011-03-23 00:53:39 PDT
Comment on attachment 86545 [details] Patch2 View in context: https://bugs.webkit.org/attachment.cgi?id=86545&action=review > Source/WebCore/editing/ReplaceSelectionCommand.cpp:792 > + if (node->hasTagName(bTag) > + || node->hasTagName(fontTag) > + || node->hasTagName(iTag) > + || node->hasTagName(sTag) > + || node->hasTagName(strikeTag) > + || node->hasTagName(strongTag) > + || node->hasTagName(strikeTag) > + || node->hasTagName(subTag) > + || node->hasTagName(supTag) > + || node->hasTagName(uTag)) > + return true; This duplicates the tables in conflictsWithImplicitStyleOfElement and htmlAttributeEquivalents. Ideally, we share code with those functions. > Source/WebCore/editing/ReplaceSelectionCommand.cpp:946 > + // If we are not trying to match the destination style and we are not > + // inside a list element we prefer a position that is outside inline > + // elements that provide style. This way we can produce a less verbose > + // markup. I'd really prefer to have a properly named function than having a comment here since this function is already one of the most complicated doApply functions. >> Source/WebCore/editing/ReplaceSelectionCommand.cpp:947 >> + if (!m_matchStyle && !enclosingList(insertionPos.deprecatedNode())) { > > Why is being in a list relevant? Comments need to focus on the question “why” so the comment should explain. > > Why are you using the deprecatedNode function? Isn’t there a more modern way to write this, avoiding the functions with “deprecated” in their names? To add to Darin's comment, you should be calling containerNode() here since you shouldn't consider position before/after a list element as inside the list. Also, given the condition, can't we just do this in the else clause around the line 949 where we call insertNodeAtAndUpdateNodesInserted? > LayoutTests/editing/pasteboard/paste-text-with-style.html:1 > +<html> Missing DOCTYPE. > LayoutTests/editing/pasteboard/paste-text-with-style.html:19 > +<script src=../editing.js language="JavaScript" type="text/JavaScript" ></script> I don't think we need to put language or type attributes. > LayoutTests/editing/pasteboard/paste-text-with-style.html:33 > +function editingTest() { > + var markupBefore = document.getElementById("test").innerHTML; > + log("Markup before: " + markupBefore); > + selectAllCommand(); > + copyCommand(); > + pasteCommand(); > + var markupAfter = document.getElementById("test").innerHTML; > + log("Markup after: " + markupAfter); > +} It seems to me that what you want to do is to use dump-as-markup.js.
Enrica Casucci
Comment 9 2011-03-23 09:32:27 PDT
Thank you all for the review and feedback. I'll post another patch shortly to address your comments.
Justin Garcia
Comment 10 2011-03-23 09:58:40 PDT
> Note also that if we delete the <b> element then typed text won’t be bold. No, I think it would be bold even without the <b> because DeleteSelectionCommand::saveTypingStyleState() would preserve the typing style. > So doing the “delete more” optimization means we’d have to delete differently when replacing than when just deleting alone. I'm saying that delete should delete more in both the replace case and the delete case. The pure delete case does not match TextEdit right now as I mentioned. But maybe that is intentional. > It helps in the delete everything case, but as soon as you leave even a single character behind it falls apart. A fix that prevents the increasing nesting even if you leave a little behind is even better. True.
Ryosuke Niwa
Comment 11 2011-03-23 10:15:59 PDT
(In reply to comment #10) > > So doing the “delete more” optimization means we’d have to delete differently when replacing than when just deleting alone. > > I'm saying that delete should delete more in both the replace case and the delete case. The pure delete case does not match TextEdit right now as I mentioned. But maybe that is intentional. Can we check what IE/FF does? Also we should check what Microsoft Word and WordPad do on Windows. It might be one of those things where Mac/Windows behaviors differ.
Enrica Casucci
Comment 12 2011-03-23 11:32:49 PDT
Created attachment 86650 [details] Patch3 I've changed the code a bit and I like this implementation better. I believe I've addressed most of the comments. We could explore the direction of sanitizing the markup upon delete in the future. There are many more improvements to make in the generated markup when copying and pasting.
Darin Adler
Comment 13 2011-03-23 12:08:16 PDT
Comment on attachment 86650 [details] Patch3 View in context: https://bugs.webkit.org/attachment.cgi?id=86650&action=review > Source/WebCore/editing/ReplaceSelectionCommand.cpp:786 > + if (node->hasTagName(spanTag)) > + return static_cast<const HTMLElement*>(node)->hasAttribute(styleAttr); > + return true; The concept here is good but I think this logic is backwards. Tags like <b> add style even though they have no style attribute. But a custom tag like <oog> would behave the same as span. So it’s not span that’s the special case, it’s all the other elements that contribute style without a style attribute. We need a list of tags that add style even when they have no style attribute. > Source/WebCore/editing/ReplaceSelectionCommand.cpp:945 > + if (parentNode->firstChild()->nextSibling()) > + splitElement(static_cast<Element*>(parentNode), parentNode->childNodes()->item(insertionPos.offsetInContainerNode())); I don’t think this is right in the case that the anchor node is a text node.
Darin Adler
Comment 14 2011-03-23 12:59:51 PDT
Comment on attachment 86650 [details] Patch3 View in context: https://bugs.webkit.org/attachment.cgi?id=86650&action=review > Source/WebCore/editing/ReplaceSelectionCommand.cpp:941 > + if (insertionPos.anchorNode()->isTextNode() && insertionPos.offsetInContainerNode()) > + splitTextNodeContainingElement(static_cast<Text*>(insertionPos.anchorNode()), insertionPos.offsetInContainerNode()); I think this needs one more check, a check that anchorType() is PositionIsOffsetInAnchor. >> Source/WebCore/editing/ReplaceSelectionCommand.cpp:945 > > I don’t think this is right in the case that the anchor node is a text node. Rather than parentNode->childNodes()->item(insertionPos.offsetInContainerNode()) I think we should pass in insertionPos.computeNodeAfterPosition() here.
Enrica Casucci
Comment 15 2011-03-23 20:21:48 PDT
Darin Adler
Comment 16 2011-03-24 10:21:01 PDT
Comment on attachment 86738 [details] Patch4 View in context: https://bugs.webkit.org/attachment.cgi?id=86738&action=review review- because of the bad HTMLElement cast > Source/WebCore/editing/ReplaceSelectionCommand.cpp:788 > + const AtomicString classAttributeValue = static_cast<const HTMLElement*>(node)->getAttribute(classAttr); It’s not safe to cast node to HTMLElement* without first checking if it is an HTMLElement. The type here should be const AtomicString&, not const AtomicString, to avoid an extra bit of reference count churn. > Source/WebCore/editing/ReplaceSelectionCommand.cpp:797 > + const NamedNodeMap* attributesMap = static_cast<const HTMLElement*>(node)->attributeMap(); This should just be attributeMap, not attributesMap. > Source/WebCore/editing/ReplaceSelectionCommand.cpp:957 > + // If we are in the middle of a text node, we need to split it before we can > + // move the insertion position. > + if (insertionPos.anchorNode()->isTextNode() && insertionPos.anchorType() == Position::PositionIsOffsetInAnchor && insertionPos.offsetInContainerNode()) > + splitTextNodeContainingElement(static_cast<Text*>(insertionPos.anchorNode()), insertionPos.offsetInContainerNode()); Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text.
Enrica Casucci
Comment 17 2011-03-24 10:25:24 PDT
(In reply to comment #16) > (From update of attachment 86738 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=86738&action=review > > review- because of the bad HTMLElement cast > > > Source/WebCore/editing/ReplaceSelectionCommand.cpp:788 > > + const AtomicString classAttributeValue = static_cast<const HTMLElement*>(node)->getAttribute(classAttr); > > It’s not safe to cast node to HTMLElement* without first checking if it is an HTMLElement. > I don't think this was ever possible since we are only calling this function on elements that are containers, but I'll add it anyway. > The type here should be const AtomicString&, not const AtomicString, to avoid an extra bit of reference count churn. > ok. > > Source/WebCore/editing/ReplaceSelectionCommand.cpp:797 > > + const NamedNodeMap* attributesMap = static_cast<const HTMLElement*>(node)->attributeMap(); > > This should just be attributeMap, not attributesMap. > > > Source/WebCore/editing/ReplaceSelectionCommand.cpp:957 > > + // If we are in the middle of a text node, we need to split it before we can > > + // move the insertion position. > > + if (insertionPos.anchorNode()->isTextNode() && insertionPos.anchorType() == Position::PositionIsOffsetInAnchor && insertionPos.offsetInContainerNode()) > > + splitTextNodeContainingElement(static_cast<Text*>(insertionPos.anchorNode()), insertionPos.offsetInContainerNode()); > > Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text. I did test that and it works correctly.
Darin Adler
Comment 18 2011-03-24 10:30:43 PDT
(In reply to comment #17) > > It’s not safe to cast node to HTMLElement* without first checking if it is an HTMLElement. > > > I don't think this was ever possible since we are only calling this function on elements that are containers, but I'll add it anyway. Besides HTMLElement, a container can be a non-HTML element, which is an Element* and not an HTMLElement* or something like a Document* or DocumentFragment*. I think in practice the Element case could really happen, although probably not the others.
Darin Adler
Comment 19 2011-03-24 10:31:37 PDT
> > Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text. > I did test that and it works correctly. Why does it work correctly? Because splitTextNodeContainingElement correctly does nothing? Should we remove the check for 0 then? Maybe splitTextNodeContainingElement correctly does nothing in that case too.
Enrica Casucci
Comment 20 2011-03-24 11:05:50 PDT
(In reply to comment #19) > > > Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text. > > I did test that and it works correctly. > > Why does it work correctly? Because splitTextNodeContainingElement correctly does nothing? Should we remove the check for 0 then? Maybe splitTextNodeContainingElement correctly does nothing in that case too. No, the 0 case needs to be checked. I'll investigate more to make sure we don't need an additional check
Enrica Casucci
Comment 21 2011-03-24 11:32:47 PDT
Created attachment 86803 [details] Patch5 One more iteration.
Enrica Casucci
Comment 22 2011-03-24 12:48:46 PDT
Ryosuke Niwa
Comment 23 2011-03-24 19:52:00 PDT
Comment on attachment 86803 [details] Patch5 View in context: https://bugs.webkit.org/attachment.cgi?id=86803&action=review > Source/WebCore/editing/ReplaceSelectionCommand.cpp:956 > + if (!m_matchStyle && !enclosingList(insertionPos.anchorNode()) && isStyleSpan(fragment.firstChild())) { > + Node* parentNode = insertionPos.anchorNode()->parentNode(); I don't think this is right. How does it make sense to use the anchor node here? Why should we care about the list element before or after the insertion pos?
Enrica Casucci
Comment 24 2011-03-25 09:04:00 PDT
(In reply to comment #23) > (From update of attachment 86803 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=86803&action=review > > > Source/WebCore/editing/ReplaceSelectionCommand.cpp:956 > > + if (!m_matchStyle && !enclosingList(insertionPos.anchorNode()) && isStyleSpan(fragment.firstChild())) { > > + Node* parentNode = insertionPos.anchorNode()->parentNode(); > > I don't think this is right. How does it make sense to use the anchor node here? Why should we care about the list element before or after the insertion pos? I'm not sure I understand what you mean here. I want to avoid list items, since the correct type of splitting is done in insertAsListItem below.
Ryosuke Niwa
Comment 25 2011-03-25 09:21:16 PDT
Comment on attachment 86803 [details] Patch5 View in context: https://bugs.webkit.org/attachment.cgi?id=86803&action=review >>> Source/WebCore/editing/ReplaceSelectionCommand.cpp:956 >>> + Node* parentNode = insertionPos.anchorNode()->parentNode(); >> >> I don't think this is right. How does it make sense to use the anchor node here? Why should we care about the list element before or after the insertion pos? > > I'm not sure I understand what you mean here. I want to avoid list items, since the correct type of splitting is done in insertAsListItem below. Right, but you only care when insertPos is IN the list, right? (i.e. insertionPos is enclosed by some list) Not when insertionPos is before or after a list (i.e. insertionPos is adjacent to some list).
Note You need to log in before you can comment on or make changes to this bug.