Bug 26937 - Copying and pasting into a contenteditable area can create <div>s surrounded by <span>s
Summary: Copying and pasting into a contenteditable area can create <div>s surrounded ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 26633 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-07-02 15:24 PDT by Annie Sullivan
Modified: 2010-02-21 20:23 PST (History)
10 users (show)

See Also:


Attachments
demonstrates the bug by indent/outdent (960 bytes, text/html)
2009-07-31 01:48 PDT, Ryosuke Niwa
no flags Details
Patch (2.35 KB, patch)
2010-02-03 00:46 PST, Tony Chang
no flags Details | Formatted Diff | Diff
Patch (14.96 KB, patch)
2010-02-03 23:55 PST, Tony Chang
eric: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Annie Sullivan 2009-07-02 15:24:47 PDT
STEPS TO REPRODUCE:
1. Create a web page and enter the following HTML:
<style>div{font-family:Courier; font-size:small;}</style>
<div>one</div>
<div>two</div>
2. Copy the text "one two" from the page and paste it into a contenteditable area on another page (e.g. midas demo)
3. HTML that is pasted into the contenteditable area:
<span class="Apple-style-span" style="font-family: Courier; font-size: small; "><div style="font-family: Courier; font-size: small; ">one</div><div style="font-family: Courier; font-size: small; ">two</div></span>

This creates redundant styles, which can be difficult for other browser rich text editors to remove, and also block-level tags (divs) which are children of inline tags (spans), which is invalid HTML. 

The styles should be applied to the copied text in the same way they would be applied if the contentedtable area contaied the HTML "<div>one</div><div>two</div>" in the contenteditable area, it was selected, and the font and text size were changed with execCommand:
<div><font class="Apple-style-span" face="Courier"><span class="Apple-style-span" style="font-size: small;">one</span></font></div> <div><font class="Apple-style-span" face="Courier"><span class="Apple-style-span" style="font-size: small;">two</span></font></div>
Comment 1 Julie Parent 2009-07-02 15:26:58 PDT
+ojan, who was looking recently into copy/paste in rich text issues.
Comment 2 Ryosuke Niwa 2009-07-31 01:48:44 PDT
Created attachment 33862 [details]
demonstrates the bug by indent/outdent

I think this is a bug in ReplaceSelectionCommand.  In particular, I suspect it's in createMarkup.  For this reason, this bug can be demonstrated by indent/outdent as well.
Comment 3 Justin Garcia 2009-07-31 12:22:21 PDT
In one document I have:

<style>div{font-family:Courier; font-size:small;}</style>
<div>one</div>
<div>two</div>

In another:

<div id="div" contentEditable="true"></div>
<input type="button" value="click" onClick="alert(document.getElementById('div').innerHTML)">

I copy from the first and paste intot he second, then inspect the markup, and I don't have the redundant span that you describe.  What am I missing?
Comment 4 Ryosuke Niwa 2009-07-31 12:30:42 PDT
Copy & paste bug appears when
1. There is style applied outside of the region copied (must be block)
2. The region being copied contains at least two block elements with at least one having a different style than outside region.

On TOT, copying & paste the following HTML reproduces the bug. 
<div style="text-decoration: none;">
<div style="font-weight: bold;">one</div>
<div>two</div>
</div>

These bugs are caused by the lines 1010-1039 of markup.cpp.  In particular, we shouldn't be adding span if the fragment we copied contained any block element.
Comment 5 Ryosuke Niwa 2009-07-31 18:20:11 PDT
It turned out that fixing this bug is rather challenging because so much of existing code depends on the fact that we pass Apple-style-span.  In particular, when we add div, we end up producing a lot of extra div's, and removing them later is hard because removing arbitrary div would cause paragraph to disappear.  What we really need to do is to re-design how we handle style during copy & paste ground up.  Unfortunately, I don't think I can fix this bug anytime soon.

Here is my note for future reference.

1. The bug is caused by the last two calls to addStyleMarkup in createMarkup.

2. I added new boolean type to getStartMarkup that keeps track of whether we encountered block flow element yet or not.  If we did, then we should use div instead of span as the wrapping node.  But it turned out that this is not a sufficient condition beacuse if we have a <span>foo</span><div>bar</div>, and copy and paste this, we should not be wrapping it with a div.

3. To remove redundant styles and divs, I edited handleStyleSpans and handleStyleSpansBeforeInsertion.  But removing extra div caused another problem that we don't know whether we can remove a given div safely or not.

4. For the re-deign of copy&paste logic, we should push down all relevant comptuedStyles to each root node we're copying.  e.g. when we're copying <span>foo</span><div>bar</div>, we should apply computedStyles to each span and div node, instead of adding extra node to preserve style.  This would prevent us from producing extra spans and there will be no need to remove span's/div's later.

With this approach, only case in which we might need to produce a span is when we have a plain text and copying it without wrapping HTML element.  But this is highly unlikely because we can safely assume that any text node is a child of some HTML element that applies the style to the text.  If there was no HTML element to split, then we should treat it as plain text anyways.
Comment 6 Tony Chang 2010-01-28 18:56:16 PST
I'm unable to repro this bug.  In the original test case, when I paste the HTML, I get:

<div style="font-family: Courier; font-size: small; ">one</div><div style="font-family: Courier; font-size: small; ">two</div>

I tested in Chrome Mac (4.0.302.2) and Safari (4.0.4 (5531.21.10).

In the attachment by Ryosuke, the underline is preserved and the "after" html is:
a <blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"></blockquote><div id="start" style="text-decoration: underline; display: inline !important; ">hello</div><br><div id="end">world</div> b

The empty blockquote is a little unfortunate, but it doesn't seem to cause problems in this test.

Is this bug still valid?
Comment 7 Ryosuke Niwa 2010-01-28 19:57:51 PST
Tony, new reproduction steps,

1. go to http://www.mozilla.org/editor/midasdemo/
2. paste
<div style="text-decoration: none;">
<div style="font-weight: bold;">one</div>
<div>two</div>
</div>
3. go to rich text editing and select all
4. cut & paste

actual (r53842):
<div style="text-decoration: none;"> <div style="font-weight: bold;"><span class="Apple-style-span" style="font-weight: normal; "><div style="font-weight: bold; ">one</div><div>two</div></span></div>

expected:
no div inside a span.

Look at my comments 4-5. This bug addresses one serious issue in createMarkup, and now styles are handled.
Comment 8 Tony Chang 2010-01-28 21:04:00 PST
Ah, that repros for me.  Thanks for the updated test case.
Comment 9 Tony Chang 2010-02-01 21:23:42 PST
(In reply to comment #5)
> 4. For the re-deign of copy&paste logic, we should push down all relevant
> comptuedStyles to each root node we're copying.  e.g. when we're copying
> <span>foo</span><div>bar</div>, we should apply computedStyles to each span and
> div node, instead of adding extra node to preserve style.  This would prevent
> us from producing extra spans and there will be no need to remove span's/div's
> later.

I looked into this, but it's a bit tricky because the point where generate the computedStyle is after we've generated the markup as a string.  I guess we could convert into a document fragment so we could add these styles, but that seems slow.

We could also handle this on the paste side.  In ReplaceSelectionCommand::handleStyleSpan, we could take the style span attributes and move them to the child nodes like you suggest doing during the copy.  This would result in less styles copied and ideally in the example you'd get:
<div style="text-decoration: none;"> <div style="font-weight: bold;"><div >one</div><div style="font-weight: normal>two</div></div>

I think the main downside to this approach is that you still have the invalid markup in your clipboard.

Ryosuke, what do you think?
Comment 10 Ryosuke Niwa 2010-02-01 23:46:53 PST
(In reply to comment #9)
> I looked into this, but it's a bit tricky because the point where generate the
> computedStyle is after we've generated the markup as a string.  I guess we
> could convert into a document fragment so we could add these styles, but that
> seems slow.

Right, I thought of several different approaches but all of them turned out to be bad because it affects so many other places and causes a lot of problems for us. And we definitely don't want to convert back to a document fragment for the reason you mentioned.

> We could also handle this on the paste side.  In
> ReplaceSelectionCommand::handleStyleSpan, we could take the style span
> attributes and move them to the child nodes like you suggest doing during the
> copy.  This would result in less styles copied and ideally in the example you'd
> get:
> <div style="text-decoration: none;"> <div style="font-weight: bold;"><div
> >one</div><div style="font-weight: normal>two</div></div>

This was one of the suggestions I got as well but...

> I think the main downside to this approach is that you still have the invalid
> markup in your clipboard.

This is a quite serious problem as users may use other Web browsers and word processors while using WebKit. We probably want more input from other reviewers/committers because if we fix this problem by introducing new code on the paste side, it might have a profound impact on how we implement other editing code later.
Comment 11 Enrica Casucci 2010-02-02 09:24:56 PST
(In reply to comment #10)
> (In reply to comment #9)
> > I looked into this, but it's a bit tricky because the point where generate the
> > computedStyle is after we've generated the markup as a string.  I guess we
> > could convert into a document fragment so we could add these styles, but that
> > seems slow.
> 
> Right, I thought of several different approaches but all of them turned out to
> be bad because it affects so many other places and causes a lot of problems for
> us. And we definitely don't want to convert back to a document fragment for the
> reason you mentioned.
> 
> > We could also handle this on the paste side.  In
> > ReplaceSelectionCommand::handleStyleSpan, we could take the style span
> > attributes and move them to the child nodes like you suggest doing during the
> > copy.  This would result in less styles copied and ideally in the example you'd
> > get:
> > <div style="text-decoration: none;"> <div style="font-weight: bold;"><div
> > >one</div><div style="font-weight: normal>two</div></div>
> 
> This was one of the suggestions I got as well but...
> 
> > I think the main downside to this approach is that you still have the invalid
> > markup in your clipboard.
> 
> This is a quite serious problem as users may use other Web browsers and word
> processors while using WebKit. We probably want more input from other
> reviewers/committers because if we fix this problem by introducing new code on
> the paste side, it might have a profound impact on how we implement other
> editing code later.

I think that trying to address the problem on the paste side is correct.
If you try the same repro steps, but instead of pasting back to the same editable area you paste into a different one (or you press delete after cut to wipe out any remaining markup), you'll have a much simpler markup.
All the copy code can do, is capture is the computed style on the selection and add as much style as needed to fully describe it. It is up to the paste code, that knows about the style of the destination to perform any optimization, also because we could be pasting markup that has been placed on the pasteboard by something other than WebKit.
Comment 12 Tony Chang 2010-02-03 00:46:37 PST
Created attachment 48000 [details]
Patch
Comment 13 Tony Chang 2010-02-03 00:50:07 PST
(In reply to comment #12)
> Created an attachment (id=48000) [details]
> Patch

This is an example of handling the extra span on paste.  After this, I think all the code paths in handleStyleSpans remove the style span.  With this patch, when I paste the example in comment #7 I get:

<div style="text-decoration: none;"> <div style="font-weight: bold;"><div style="font-weight: bold; ">one</div><div style="font-weight: normal; ">two</div></div> </div>

The redundant font-weight:bold is a bit unfortunate, but it's better than before.

I still need a layouttest, but I have to run now.  Enrica, is this like what you're suggesting?
Comment 14 Enrica Casucci 2010-02-03 10:17:58 PST
Comment on attachment 48000 [details]
Patch

> diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog
> index 67f5b56..c65fe8b 100644
> --- a/WebCore/ChangeLog
> +++ b/WebCore/ChangeLog
> @@ -1,3 +1,19 @@
> +2010-02-03  Tony Chang  <tony@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Copying and pasting into a contenteditable area can create <div>s surrounded by <span>s
> +
> +        This happens because of a span added when we copy that is used to
> +        preserve styles.  To avoid this, when we paste, make sure to apply
> +        the styles to the span's children and then remove the style span.
> +        https://bugs.webkit.org/show_bug.cgi?id=26937
> +
> +        No new tests. (OOPS!)
> +
> +        * editing/ReplaceSelectionCommand.cpp:
> +        (WebCore::ReplaceSelectionCommand::handleStyleSpans):
> +
>  2010-02-02  Steve Falkenburg  <sfalken@apple.com>
>  
>          Reviewed by Darin Adler.
> diff --git a/WebCore/editing/ReplaceSelectionCommand.cpp b/WebCore/editing/ReplaceSelectionCommand.cpp
> index 85a4471..49189e5 100644
> --- a/WebCore/editing/ReplaceSelectionCommand.cpp
> +++ b/WebCore/editing/ReplaceSelectionCommand.cpp
> @@ -638,10 +638,20 @@ void ReplaceSelectionCommand::handleStyleSpans()
>      }
>      
>      // There are non-redundant styles on sourceDocumentStyleSpan, but there is no
> -    // copiedRangeStyleSpan.  Clear the redundant styles from sourceDocumentStyleSpan
> -    // and return.
> +    // copiedRangeStyleSpan.  Remove the span, because it could be surrounding block elements,
> +    // and apply the styles to its children.
>      if (sourceDocumentStyle->length() > 0 && !copiedRangeStyleSpan) {
> -        setNodeAttribute(static_cast<Element*>(sourceDocumentStyleSpan), styleAttr, sourceDocumentStyle->cssText());
> +        for (Node* childNode = sourceDocumentStyleSpan->firstChild(); childNode; childNode = childNode->nextSibling()) {
> +            if (!childNode->isHTMLElement())
> +                continue;
> +            RefPtr<CSSMutableStyleDeclaration> newStyles = sourceDocumentStyle->copy();
> +            HTMLElement* childElement = static_cast<HTMLElement*>(childNode);
> +            RefPtr<CSSMutableStyleDeclaration> existingStyles = childElement->getInlineStyleDecl()->copy();
> +            existingStyles->merge(newStyles.get(), false);
> +            setNodeAttribute(childElement, styleAttr, existingStyles->cssText());
> +        }
> +
> +        removeNodePreservingChildren(sourceDocumentStyleSpan);
>          return;
>      }
>      

Yes, that's the idea. I like it.
Comment 15 Ryosuke Niwa 2010-02-03 17:46:40 PST
(In reply to comment #12)
> Created an attachment (id=48000) [details]
> Patch

It looks nice but is it guaranteed that sourceDocumentStyleSpan always has a child?  What if the fragment was <span class="Apple-style-span" style="...">&nbsp;</span>?  Another concern is that the fact that malformed HTML is copied into clipboard will be forgotten if this bug was fixed at the pasting time. I think we should file another bug that address this issue if there isn't one already.
Comment 16 Tony Chang 2010-02-03 23:55:31 PST
Created attachment 48104 [details]
Patch
Comment 17 Tony Chang 2010-02-04 00:03:13 PST
(In reply to comment #15)
> It looks nice but is it guaranteed that sourceDocumentStyleSpan always has a
> child?  What if the fragment was <span class="Apple-style-span"
> style="...">&nbsp;</span>?

You're right; I hadn't run the layout tests yet.  This version handles that by copying the span tag to each child that's not a block (like text nodes) so in those cases, it's not much different than before (i.e., the generated HTML will be the same as before).

> Another concern is that the fact that malformed
> HTML is copied into clipboard will be forgotten if this bug was fixed at the
> pasting time. I think we should file another bug that address this issue if
> there isn't one already.

Ok, but 34564 filed.

New patch that passes layout tests up.
Comment 18 Eric Seidel (no email) 2010-02-17 14:37:16 PST
Comment on attachment 48104 [details]
Patch

I'm surprised the "wrap in span" logic doesn't already exist somewhere.

Otherwise this looks OK to me.
Comment 19 Ryosuke Niwa 2010-02-17 17:37:53 PST
It might be sensible to use ApplyStyleCommand here. Because if we're applying style, we might want to consider converting them to u, i, etc... tags as well.

@Tony, have you looked into this point? i.e. when user is editing html in styleWithCSS=false, doesn't your code provide new style attribute rather than decorating them with tags?
Comment 20 Tony Chang 2010-02-17 22:35:30 PST
(In reply to comment #19)
> It might be sensible to use ApplyStyleCommand here. Because if we're applying
> style, we might want to consider converting them to u, i, etc... tags as well.
> 
> @Tony, have you looked into this point? i.e. when user is editing html in
> styleWithCSS=false, doesn't your code provide new style attribute rather than
> decorating them with tags?

I hadn't looked into it, so I tried it locally.  There are a couple problems with using apply style:

1) In some cases, ApplyStyleCommand copies nodes.  This causes some nodes pointed to by ReplaceSelectionCommand to become stale.  This is probably fixable, but non-trivial.

2) It would mean rebaselining a a few dozen tests.  Since we always put a span on the pasteboard, these tests expect the span when pasted, regardless of whether we're in stylewithcss mode or not.

I think it's possible, but it's not really regressing anything here.  Previously we had a span around divs with styleWithCSS=false.  Maybe file a new bug about this?  I'm not sure it's worth the effort to fix (i.e., I'm not sure what actual sites are impacted by not honoring styleWithCSS when pasting).
Comment 21 Tony Chang 2010-02-17 23:12:28 PST
Committed r54933: <http://trac.webkit.org/changeset/54933>
Comment 22 Tony Chang 2010-02-21 20:23:06 PST
*** Bug 26633 has been marked as a duplicate of this bug. ***