Bug 63233

Summary: Copying (createMarkup) wrapping text results in space between wrapped lines stripped
Product: WebKit Reporter: Malcolm MacLeod <mmacleod>
Component: DOMAssignee: Chang Shu <cshu>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, commit-queue, cshu, darin, enrica, eric, hyatt, leviw, rniwa
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 129609    
Attachments:
Description Flags
Basic html sample with which bug can be reproduced
none
fix patch
none
fix patch 2 none

Description Malcolm MacLeod 2011-06-23 01:44:17 PDT
I've run into a bug where WebKit based browsers are throwing away whitespace during a copy operation (For CF_HTML based copying but not for text copying).
The issue occurs when wrapping between text and certain tags e.g. "The <b>fat</b> cat <i>sat</i> on the mat" will be copied as "The <b>fat</b>cat <i>sat</i> on the mat" if word wrapping occurs just before cat and "The <b>fat</b> cat<i>sat</i> on the mat" if word wrapping occurs just before sat, etc.

Problem verified on:
Chrome - latest stable as well as an older version
Safari for windows
wxWebKit

I've tracked this issue down to line 545 in TextIterator.cpp:
unsigned runEnd = min(textBoxEnd, end);

textBoxEnd is one less then the actual end of the text, so this causes the last character (the wrapped space) not to be considered by createMarkup.
As an experiment I have changed the line to:
unsigned runEnd = max(textBoxEnd, end);
which resolves the problem.
This is of course probably not the actual way to fix the problem, unfortunately I do not have the knowledge of this code required to produce a proper fix/patch but it should at least give a good starting point of where to look.
Comment 1 Ryosuke Niwa 2011-06-23 13:59:03 PDT
I can't reproduce this bug.  Could you give us exact reproduction steps?
Comment 2 Malcolm MacLeod 2011-06-23 23:28:16 PDT
Created attachment 98462 [details]
Basic html sample with which bug can be reproduced
Comment 3 Malcolm MacLeod 2011-06-23 23:31:29 PDT
1) Open minimal sample "wrapbug.html"
2) resize browser window so that line wrapping occurs between one of the bolded words and a normal word
3) highlight all the text 
4) copy
5) paste the text into any application that supports the CF_HTML clipboard format instead of the plain text format (bug does not occur in plain text as that uses different code to generate) - for most obvious results ensure that the target application has a different window size so that wrapping is not the same.
6) observe that in the pasted text the bolded word and the normal word are now next to each other with no spacing
Comment 4 Malcolm MacLeod 2011-06-23 23:34:58 PDT
8) View clipboard contents directly using a raw clipboard viewer - e.g."ClipView.exe"
9) Observe that there is no spacing in the raw CF_HTML on the clipboard either, so it is definitely a bug in generating the CF_HTML and not a bug in e.g. the receiving application reading the CF_HTML wrong.
Comment 5 Darin Adler 2011-06-24 10:40:42 PDT
The plain text generation does use TextIterator, so it’s strange that a TextIterator bug would affect CF_HTML but not plain text.
Comment 6 Ryosuke Niwa 2011-06-24 10:44:28 PDT
I think the problem comes from the fact createMarkup uses renderer to serialize text nodes and we don't render trailing whitespace (collapse them).  We'd either have to special-case it in StylizedMarkupAccumulator or create inline boxes for unrendered trailing whitespace.
Comment 7 Ryosuke Niwa 2011-11-25 16:46:55 PST
This bug is related to https://bugs.webkit.org/show_bug.cgi?id=54598.
Comment 8 Chang Shu 2014-02-05 10:59:54 PST
The test case below failed on latest webkit.
http://jsfiddle.net/Wu2Gy/1/
The problem is StyledMarkupAccumulator uses renderedText and the space after the text has been stripped when the following tag was wrapped.
Comment 9 Chang Shu 2014-02-05 11:06:52 PST
The following test case failed on latest webkit.
http://jsfiddle.net/Wu2Gy/1/

The problem is StyledMarkupAccumulator uses renderedText and the space at the end of the text has been stripped when the tag after the text was wrapped.
Comment 10 Chang Shu 2014-02-05 11:11:40 PST
Created attachment 223249 [details]
fix patch
Comment 11 Ryosuke Niwa 2014-02-11 13:28:16 PST
Comment on attachment 223249 [details]
fix patch

View in context: https://bugs.webkit.org/attachment.cgi?id=223249&action=review

> LayoutTests/editing/pasteboard/copy-text-with-wrapped-tag-expected.txt:1
> +Success

This output doesn't tell us anything about the nature of the test.

> LayoutTests/editing/pasteboard/copy-text-with-wrapped-tag.html:25
> +expected = 'Copy this text&nbsp;<a href="http://www.google.com">AVeryLongWordThatWillWrap</a>';
> +actual = paste.innerHTML;
> +
> +if (window.testRunner && actual == expected)
> +    document.body.innerText = "Success";
> +else
> +    document.body.innerText = "Failure.  Found: " + actual + ", but expected: " + expected;
> +

Please use dump-as-markup, etc... to output the serialized markup
so that expected result is self explanatory.

> Source/WebCore/ChangeLog:8
> +        The problem is StyledMarkupAccumulator uses renderedText and the space at
> +        the end of the text has been stripped when the tag after the text was wrapped.
> +

Please add inline function description as to how you're fixing this bug.

> Source/WebCore/ChangeLog:9
> +        Reviewed by NOBODY (OOPS!).

This line should appear after the bug URL but before the long description.

> Source/WebCore/editing/TextIterator.cpp:651
> +                    if (!nextTextBox && m_isNotLastNode && (str.length() == runEnd + 1)) // Restore space if followed by a non-text box
> +                        subrunEnd++;

It's not clear how incrementing subrunEnd++ ends up restoring space.
Also, it's better to define a local boolean like followedByNonTextBox instead of adding a comment like this.

> Source/WebCore/editing/markup.cpp:269
> +        behavior = TextIteratorIsNotLastNode;

I don't think this behavior is descriptive.  TextIterator isn't a node.  It iterates over nodes.
Looks like we need a behavior where text iterator behaves as if there are more nodes??
Comment 12 Ryosuke Niwa 2014-02-11 13:28:16 PST
Comment on attachment 223249 [details]
fix patch

View in context: https://bugs.webkit.org/attachment.cgi?id=223249&action=review

> LayoutTests/editing/pasteboard/copy-text-with-wrapped-tag-expected.txt:1
> +Success

This output doesn't tell us anything about the nature of the test.

> LayoutTests/editing/pasteboard/copy-text-with-wrapped-tag.html:25
> +expected = 'Copy this text&nbsp;<a href="http://www.google.com">AVeryLongWordThatWillWrap</a>';
> +actual = paste.innerHTML;
> +
> +if (window.testRunner && actual == expected)
> +    document.body.innerText = "Success";
> +else
> +    document.body.innerText = "Failure.  Found: " + actual + ", but expected: " + expected;
> +

Please use dump-as-markup, etc... to output the serialized markup
so that expected result is self explanatory.

> Source/WebCore/ChangeLog:8
> +        The problem is StyledMarkupAccumulator uses renderedText and the space at
> +        the end of the text has been stripped when the tag after the text was wrapped.
> +

Please add inline function description as to how you're fixing this bug.

> Source/WebCore/ChangeLog:9
> +        Reviewed by NOBODY (OOPS!).

This line should appear after the bug URL but before the long description.

> Source/WebCore/editing/TextIterator.cpp:651
> +                    if (!nextTextBox && m_isNotLastNode && (str.length() == runEnd + 1)) // Restore space if followed by a non-text box
> +                        subrunEnd++;

It's not clear how incrementing subrunEnd++ ends up restoring space.
Also, it's better to define a local boolean like followedByNonTextBox instead of adding a comment like this.

> Source/WebCore/editing/markup.cpp:269
> +        behavior = TextIteratorIsNotLastNode;

I don't think this behavior is descriptive.  TextIterator isn't a node.  It iterates over nodes.
Looks like we need a behavior where text iterator behaves as if there are more nodes??
Comment 13 Chang Shu 2014-02-12 07:02:17 PST
Created attachment 223968 [details]
fix patch 2

new patch to address the review comments.
Comment 14 WebKit Commit Bot 2014-02-13 12:40:33 PST
Comment on attachment 223968 [details]
fix patch 2

Clearing flags on attachment: 223968

Committed r164047: <http://trac.webkit.org/changeset/164047>
Comment 15 WebKit Commit Bot 2014-02-13 12:40:36 PST
All reviewed patches have been landed.  Closing bug.