WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
56874
Repeated copy and paste-in-place operation results in increasingly verbose HTML
https://bugs.webkit.org/show_bug.cgi?id=56874
Summary
Repeated copy and paste-in-place operation results in increasingly verbose HTML
Enrica Casucci
Reported
2011-03-22 15:41:23 PDT
* SUMMARY A repeated copy and paste-in-place operation on right-aligned bold "hello" results in increasingly verbose HTML * STEPS TO REPRODUCE Setup: 1. Open TextEdit 2. Type "hello" 3. CMD+A to select all 4. Right-align the text, bold it, and increase the font to be pretty large. 5. CMD+C to copy all selected text 5. Open
http://www.mozilla.org/editor/midasdemo/
6. Paste. (then check View HTML Source) 7. CMD+A then CMD+C to copy 8. Paste again in place (then check View HTML Source) * RESULTS Expect the pasted HTML to be the same Instead, it is more verbose
Attachments
Patch
(24.71 KB, patch)
2011-03-22 16:44 PDT
,
Enrica Casucci
no flags
Details
Formatted Diff
Diff
Patch2
(24.71 KB, patch)
2011-03-22 16:58 PDT
,
Enrica Casucci
darin
: review+
Details
Formatted Diff
Diff
Patch3
(24.05 KB, patch)
2011-03-23 11:32 PDT
,
Enrica Casucci
darin
: review-
Details
Formatted Diff
Diff
Patch4
(24.07 KB, patch)
2011-03-23 20:21 PDT
,
Enrica Casucci
darin
: review-
Details
Formatted Diff
Diff
Patch5
(24.19 KB, patch)
2011-03-24 11:32 PDT
,
Enrica Casucci
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Enrica Casucci
Comment 1
2011-03-22 16:44:36 PDT
Created
attachment 86540
[details]
Patch
WebKit Review Bot
Comment 2
2011-03-22 16:46:41 PDT
Attachment 86540
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/edit..." exit_code: 1 Source/WebCore/editing/ReplaceSelectionCommand.cpp:782: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:783: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:784: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:785: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:786: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:787: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:788: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:789: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebCore/editing/ReplaceSelectionCommand.cpp:790: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Total errors found: 9 in 10 files If any of these errors are false positives, please file a bug against check-webkit-style.
Enrica Casucci
Comment 3
2011-03-22 16:58:15 PDT
Created
attachment 86545
[details]
Patch2 Fixed style.
Darin Adler
Comment 4
2011-03-22 18:47:46 PDT
Comment on
attachment 86545
[details]
Patch2 View in context:
https://bugs.webkit.org/attachment.cgi?id=86545&action=review
I’m going to say review+, but I think there is also room for improvement.
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:794 > + if (node->hasTagName(spanTag) && node->renderer() && node->renderer()->isInline()) {
I don’t understand why for all those other tags we are assuming they are always inline, but for span we are checking if a renderer exists and is actually an inline. Pages can style those other types of tags to be non-inline too; there is nothing special about span in this respect.
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:796 > + const HTMLElement* elem = static_cast<const HTMLElement*>(node); > + return elem->hasAttribute(styleAttr);
I don’t think you need a local variable for this. If you did want to keep it, I’d suggest naming it element, not elem.
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:947 > + // If we are not trying to match the destination style and we are not > + // inside a list element we prefer a position that is outside inline > + // elements that provide style. This way we can produce a less verbose > + // markup. > + if (!m_matchStyle && !enclosingList(insertionPos.deprecatedNode())) {
Why is being in a list relevant? Comments need to focus on the question “why” so the comment should explain. Why are you using the deprecatedNode function? Isn’t there a more modern way to write this, avoiding the functions with “deprecated” in their names?
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:956 > + if (parentNode->childNodes()->length() > 1)
There are more efficient ways to check if something has more than one child. This will walk the entire list of child nodes to count them all just to check if it is > 1. You could just check parent->firstChild()->nextSibling() to see if it is non-zero, with the appropriate null checks.
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:957 > + splitElement(static_cast<Element*>(parentNode), parentNode->childNodes()->item(insertionPos.deprecatedEditingOffset()));
Is there a guarantee that the parent is not the document? I just want to make sure the cast is safe.
Justin Garcia
Comment 5
2011-03-22 22:42:03 PDT
(In reply to
comment #0
)
> 5. CMD+C to copy all selected text > > 5. Open
http://www.mozilla.org/editor/midasdemo/
> 6. Paste. (then check View HTML Source) > 7. CMD+A then CMD+C to copy > 8. Paste again in place (then check View HTML Source)
When you copy the styled content and paste it over itself, why doesn't WebKit remove the styled inline elements during the delete phase? Why are they still there at Paste time? That seems like the real bug. In TextEdit, if I type: hello world Select "hello", bold it, delete it, move the selection elsewhere, move the selection back to the first line and start typing, the inserted text *is not* bold. It *is* bold with WebKit. If we don't leave those styled elements around in the document in the delete phase, then at Paste time, I think ReplaceSelectionCommand won't have a problem keeping the DOM clean.
Darin Adler
Comment 6
2011-03-23 00:18:01 PDT
(In reply to
comment #5
)
> If we don't leave those styled elements around in the document in the delete phase, then at Paste time, I think ReplaceSelectionCommand won't have a problem keeping the DOM clean.
Enrica considered this. It helps in the delete everything case, but as soon as you leave even a single character behind it falls apart. A fix that prevents the increasing nesting even if you leave a little behind is even better.
Darin Adler
Comment 7
2011-03-23 00:19:55 PDT
Note also that if we delete the <b> element then typed text won’t be bold. So doing the “delete more” optimization means we’d have to delete differently when replacing than when just deleting alone.
Ryosuke Niwa
Comment 8
2011-03-23 00:53:39 PDT
Comment on
attachment 86545
[details]
Patch2 View in context:
https://bugs.webkit.org/attachment.cgi?id=86545&action=review
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:792 > + if (node->hasTagName(bTag) > + || node->hasTagName(fontTag) > + || node->hasTagName(iTag) > + || node->hasTagName(sTag) > + || node->hasTagName(strikeTag) > + || node->hasTagName(strongTag) > + || node->hasTagName(strikeTag) > + || node->hasTagName(subTag) > + || node->hasTagName(supTag) > + || node->hasTagName(uTag)) > + return true;
This duplicates the tables in conflictsWithImplicitStyleOfElement and htmlAttributeEquivalents. Ideally, we share code with those functions.
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:946 > + // If we are not trying to match the destination style and we are not > + // inside a list element we prefer a position that is outside inline > + // elements that provide style. This way we can produce a less verbose > + // markup.
I'd really prefer to have a properly named function than having a comment here since this function is already one of the most complicated doApply functions.
>> Source/WebCore/editing/ReplaceSelectionCommand.cpp:947 >> + if (!m_matchStyle && !enclosingList(insertionPos.deprecatedNode())) { > > Why is being in a list relevant? Comments need to focus on the question “why” so the comment should explain. > > Why are you using the deprecatedNode function? Isn’t there a more modern way to write this, avoiding the functions with “deprecated” in their names?
To add to Darin's comment, you should be calling containerNode() here since you shouldn't consider position before/after a list element as inside the list. Also, given the condition, can't we just do this in the else clause around the line 949 where we call insertNodeAtAndUpdateNodesInserted?
> LayoutTests/editing/pasteboard/paste-text-with-style.html:1 > +<html>
Missing DOCTYPE.
> LayoutTests/editing/pasteboard/paste-text-with-style.html:19 > +<script src=../editing.js language="JavaScript" type="text/JavaScript" ></script>
I don't think we need to put language or type attributes.
> LayoutTests/editing/pasteboard/paste-text-with-style.html:33 > +function editingTest() { > + var markupBefore = document.getElementById("test").innerHTML; > + log("Markup before: " + markupBefore); > + selectAllCommand(); > + copyCommand(); > + pasteCommand(); > + var markupAfter = document.getElementById("test").innerHTML; > + log("Markup after: " + markupAfter); > +}
It seems to me that what you want to do is to use dump-as-markup.js.
Enrica Casucci
Comment 9
2011-03-23 09:32:27 PDT
Thank you all for the review and feedback. I'll post another patch shortly to address your comments.
Justin Garcia
Comment 10
2011-03-23 09:58:40 PDT
> Note also that if we delete the <b> element then typed text won’t be bold.
No, I think it would be bold even without the <b> because DeleteSelectionCommand::saveTypingStyleState() would preserve the typing style.
> So doing the “delete more” optimization means we’d have to delete differently when replacing than when just deleting alone.
I'm saying that delete should delete more in both the replace case and the delete case. The pure delete case does not match TextEdit right now as I mentioned. But maybe that is intentional.
> It helps in the delete everything case, but as soon as you leave even a single character behind it falls apart. A fix that prevents the increasing nesting even if you leave a little behind is even better.
True.
Ryosuke Niwa
Comment 11
2011-03-23 10:15:59 PDT
(In reply to
comment #10
)
> > So doing the “delete more” optimization means we’d have to delete differently when replacing than when just deleting alone. > > I'm saying that delete should delete more in both the replace case and the delete case. The pure delete case does not match TextEdit right now as I mentioned. But maybe that is intentional.
Can we check what IE/FF does? Also we should check what Microsoft Word and WordPad do on Windows. It might be one of those things where Mac/Windows behaviors differ.
Enrica Casucci
Comment 12
2011-03-23 11:32:49 PDT
Created
attachment 86650
[details]
Patch3 I've changed the code a bit and I like this implementation better. I believe I've addressed most of the comments. We could explore the direction of sanitizing the markup upon delete in the future. There are many more improvements to make in the generated markup when copying and pasting.
Darin Adler
Comment 13
2011-03-23 12:08:16 PDT
Comment on
attachment 86650
[details]
Patch3 View in context:
https://bugs.webkit.org/attachment.cgi?id=86650&action=review
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:786 > + if (node->hasTagName(spanTag)) > + return static_cast<const HTMLElement*>(node)->hasAttribute(styleAttr); > + return true;
The concept here is good but I think this logic is backwards. Tags like <b> add style even though they have no style attribute. But a custom tag like <oog> would behave the same as span. So it’s not span that’s the special case, it’s all the other elements that contribute style without a style attribute. We need a list of tags that add style even when they have no style attribute.
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:945 > + if (parentNode->firstChild()->nextSibling()) > + splitElement(static_cast<Element*>(parentNode), parentNode->childNodes()->item(insertionPos.offsetInContainerNode()));
I don’t think this is right in the case that the anchor node is a text node.
Darin Adler
Comment 14
2011-03-23 12:59:51 PDT
Comment on
attachment 86650
[details]
Patch3 View in context:
https://bugs.webkit.org/attachment.cgi?id=86650&action=review
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:941 > + if (insertionPos.anchorNode()->isTextNode() && insertionPos.offsetInContainerNode()) > + splitTextNodeContainingElement(static_cast<Text*>(insertionPos.anchorNode()), insertionPos.offsetInContainerNode());
I think this needs one more check, a check that anchorType() is PositionIsOffsetInAnchor.
>> Source/WebCore/editing/ReplaceSelectionCommand.cpp:945
> > I don’t think this is right in the case that the anchor node is a text node.
Rather than parentNode->childNodes()->item(insertionPos.offsetInContainerNode()) I think we should pass in insertionPos.computeNodeAfterPosition() here.
Enrica Casucci
Comment 15
2011-03-23 20:21:48 PDT
Created
attachment 86738
[details]
Patch4
Darin Adler
Comment 16
2011-03-24 10:21:01 PDT
Comment on
attachment 86738
[details]
Patch4 View in context:
https://bugs.webkit.org/attachment.cgi?id=86738&action=review
review- because of the bad HTMLElement cast
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:788 > + const AtomicString classAttributeValue = static_cast<const HTMLElement*>(node)->getAttribute(classAttr);
It’s not safe to cast node to HTMLElement* without first checking if it is an HTMLElement. The type here should be const AtomicString&, not const AtomicString, to avoid an extra bit of reference count churn.
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:797 > + const NamedNodeMap* attributesMap = static_cast<const HTMLElement*>(node)->attributeMap();
This should just be attributeMap, not attributesMap.
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:957 > + // If we are in the middle of a text node, we need to split it before we can > + // move the insertion position. > + if (insertionPos.anchorNode()->isTextNode() && insertionPos.anchorType() == Position::PositionIsOffsetInAnchor && insertionPos.offsetInContainerNode()) > + splitTextNodeContainingElement(static_cast<Text*>(insertionPos.anchorNode()), insertionPos.offsetInContainerNode());
Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text.
Enrica Casucci
Comment 17
2011-03-24 10:25:24 PDT
(In reply to
comment #16
)
> (From update of
attachment 86738
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=86738&action=review
> > review- because of the bad HTMLElement cast > > > Source/WebCore/editing/ReplaceSelectionCommand.cpp:788 > > + const AtomicString classAttributeValue = static_cast<const HTMLElement*>(node)->getAttribute(classAttr); > > It’s not safe to cast node to HTMLElement* without first checking if it is an HTMLElement.
> I don't think this was ever possible since we are only calling this function on elements that are containers, but I'll add it anyway.
> The type here should be const AtomicString&, not const AtomicString, to avoid an extra bit of reference count churn.
> ok.
> > Source/WebCore/editing/ReplaceSelectionCommand.cpp:797 > > + const NamedNodeMap* attributesMap = static_cast<const HTMLElement*>(node)->attributeMap(); > > This should just be attributeMap, not attributesMap. > > > Source/WebCore/editing/ReplaceSelectionCommand.cpp:957 > > + // If we are in the middle of a text node, we need to split it before we can > > + // move the insertion position. > > + if (insertionPos.anchorNode()->isTextNode() && insertionPos.anchorType() == Position::PositionIsOffsetInAnchor && insertionPos.offsetInContainerNode()) > > + splitTextNodeContainingElement(static_cast<Text*>(insertionPos.anchorNode()), insertionPos.offsetInContainerNode()); > > Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text.
I did test that and it works correctly.
Darin Adler
Comment 18
2011-03-24 10:30:43 PDT
(In reply to
comment #17
)
> > It’s not safe to cast node to HTMLElement* without first checking if it is an HTMLElement. > > > I don't think this was ever possible since we are only calling this function on elements that are containers, but I'll add it anyway.
Besides HTMLElement, a container can be a non-HTML element, which is an Element* and not an HTMLElement* or something like a Document* or DocumentFragment*. I think in practice the Element case could really happen, although probably not the others.
Darin Adler
Comment 19
2011-03-24 10:31:37 PDT
> > Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text. > I did test that and it works correctly.
Why does it work correctly? Because splitTextNodeContainingElement correctly does nothing? Should we remove the check for 0 then? Maybe splitTextNodeContainingElement correctly does nothing in that case too.
Enrica Casucci
Comment 20
2011-03-24 11:05:50 PDT
(In reply to
comment #19
)
> > > Does this handle the case where the offset is at the end of a text node? I think besides checking the offset for 0 we might need to check if it’s the same as the length of the text. > > I did test that and it works correctly. > > Why does it work correctly? Because splitTextNodeContainingElement correctly does nothing? Should we remove the check for 0 then? Maybe splitTextNodeContainingElement correctly does nothing in that case too.
No, the 0 case needs to be checked. I'll investigate more to make sure we don't need an additional check
Enrica Casucci
Comment 21
2011-03-24 11:32:47 PDT
Created
attachment 86803
[details]
Patch5 One more iteration.
Enrica Casucci
Comment 22
2011-03-24 12:48:46 PDT
http://trac.webkit.org/changeset/81887
Ryosuke Niwa
Comment 23
2011-03-24 19:52:00 PDT
Comment on
attachment 86803
[details]
Patch5 View in context:
https://bugs.webkit.org/attachment.cgi?id=86803&action=review
> Source/WebCore/editing/ReplaceSelectionCommand.cpp:956 > + if (!m_matchStyle && !enclosingList(insertionPos.anchorNode()) && isStyleSpan(fragment.firstChild())) { > + Node* parentNode = insertionPos.anchorNode()->parentNode();
I don't think this is right. How does it make sense to use the anchor node here? Why should we care about the list element before or after the insertion pos?
Enrica Casucci
Comment 24
2011-03-25 09:04:00 PDT
(In reply to
comment #23
)
> (From update of
attachment 86803
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=86803&action=review
> > > Source/WebCore/editing/ReplaceSelectionCommand.cpp:956 > > + if (!m_matchStyle && !enclosingList(insertionPos.anchorNode()) && isStyleSpan(fragment.firstChild())) { > > + Node* parentNode = insertionPos.anchorNode()->parentNode(); > > I don't think this is right. How does it make sense to use the anchor node here? Why should we care about the list element before or after the insertion pos?
I'm not sure I understand what you mean here. I want to avoid list items, since the correct type of splitting is done in insertAsListItem below.
Ryosuke Niwa
Comment 25
2011-03-25 09:21:16 PDT
Comment on
attachment 86803
[details]
Patch5 View in context:
https://bugs.webkit.org/attachment.cgi?id=86803&action=review
>>> Source/WebCore/editing/ReplaceSelectionCommand.cpp:956 >>> + Node* parentNode = insertionPos.anchorNode()->parentNode(); >> >> I don't think this is right. How does it make sense to use the anchor node here? Why should we care about the list element before or after the insertion pos? > > I'm not sure I understand what you mean here. I want to avoid list items, since the correct type of splitting is done in insertAsListItem below.
Right, but you only care when insertPos is IN the list, right? (i.e. insertionPos is enclosed by some list) Not when insertionPos is before or after a list (i.e. insertionPos is adjacent to some list).
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug