Bug 52236

Summary:

Strip NUL character when copying text on Windows

Product:

WebKit

Reporter:

Erik Arvidsson <arv>

Component:

HTML Editing

Assignee:

Tony Chang <tony>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

abarth, ap, commit-queue, enrica, eric, mjs, rniwa, tony

Priority:

Version:

528+ (Nightly build)

Hardware:

OS:

Windows 7

Attachments:

Description	Flags
Test case	none
Shows that NUL is stripped through the parser	none
Patch	none
Patch for landing	none
Patch	none
Patch for landing	none

Erik Arvidsson

Reported 2011-01-11 12:21:16 PST

Windows does not allow NUL characters on the clipboard since they are used to terminate the clipboard data. We should strip NUL characters from the text before sending the text to the windows clipboard.

Attachments
Test case (97 bytes, text/html) 2011-01-11 12:32 PST, Erik Arvidsson	no flags	Details
Shows that NUL is stripped through the parser (682 bytes, text/html) 2011-01-11 14:37 PST, Erik Arvidsson	no flags	Details
Patch (6.47 KB, patch) 2011-01-13 14:30 PST, Tony Chang	no flags	Details Formatted Diff Diff
Patch for landing (7.70 KB, patch) 2011-01-14 14:35 PST, Tony Chang	no flags	Details Formatted Diff Diff
Patch (7.54 KB, patch) 2011-01-14 16:11 PST, Tony Chang	no flags	Details Formatted Diff Diff
Patch for landing (7.12 KB, patch) 2011-01-14 17:26 PST, Tony Chang	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Erik Arvidsson

Comment 1 2011-01-11 12:32:43 PST

Created attachment 78579 [details] Test case

Tony Chang

Comment 2 2011-01-11 14:00:18 PST

Some testing: On Chromium Linux, Chromium Win, Safari Win, we copy up to the NUL character (i.e., the copied text is truncated). On Safari Mac and Chromium Mac, we ignore the NUL (when you paste, you get text without the NUL character). On IE9, Firefox Win and Firefox Linux, the DOM has the unicode replacement character. On Firefox Mac, the NUL is converted into a unicode replacement character (U+FFFD) in the pasted text. Actually, it may already be replaced in the DOM but perhaps there's no glyph for it.

Ryosuke Niwa

Comment 3 2011-01-11 14:02:29 PST

(In reply to comment #2) > On IE9, Firefox Win and Firefox Linux, the DOM has the unicode replacement character. We should most certainly replace with the unicode replace character.

Tony Chang

Comment 4 2011-01-11 14:18:38 PST

(In reply to comment #3) > (In reply to comment #2) > > On IE9, Firefox Win and Firefox Linux, the DOM has the unicode replacement character. > > We should most certainly replace with the unicode replace character. Yup, that's what the HTML5 spec says: http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#plaintext-state

Tony Chang

Comment 5 2011-01-11 14:25:18 PST

Actually, arv says that if he changes the test case to use innerHTML, the NUL character is stripped (he'll upload a test case). Seems like we need to have setting textContent go through the HTML parser and the HTML parser needs to insert the replacement character.

Adam Barth

Comment 6 2011-01-11 14:35:18 PST

> Seems like we need to have setting textContent go through the HTML parser and the HTML parser needs to insert the replacement character. That would be surprising. textContent is a DOM-level concept. It should ignore anything HTML specific. That said, stranger things have happened.

Erik Arvidsson

Comment 7 2011-01-11 14:37:35 PST

Created attachment 78601 [details] Shows that NUL is stripped through the parser

Alexey Proskuryakov

Comment 8 2011-01-11 14:43:02 PST

HTML5 goes to great lengths trying to avoid having null characters in Unicode strings, but I don't think that it generally makes sense. Replacing null with U+FFFD during parsing is particularly weird, because one can always use DOM manipulation to insert a null (as it's done in attached test case), so there is no guarantee of "no null in DOM" anyway. When copying, we should just make sure that the user gets expected results. As WebKit doesn't display nulls in content, they should be removed in copied text, not replaced with U+FFFD. > Seems like we need to have setting textContent go through the HTML parser and the HTML parser needs to insert the replacement character. I don't think that it's the right direction. There are many ways to insert a null via DOM manipulation, and even HTML5 doesn't prohibit those.

Erik Arvidsson

Comment 9 2011-01-11 15:03:23 PST

(In reply to comment #8) > When copying, we should just make sure that the user gets expected results. As WebKit doesn't display nulls in content, they should be removed in copied text, not replaced with U+FFFD. Agreed. That was how this started before it got side tracked...

Tony Chang

Comment 10 2011-01-11 15:05:53 PST

(In reply to comment #9) > (In reply to comment #8) > > When copying, we should just make sure that the user gets expected results. As WebKit doesn't display nulls in content, they should be removed in copied text, not replaced with U+FFFD. > > Agreed. That was how this started before it got side tracked... OK, patch coming for Chromium. I don't have a windows box handy, so Safari Win will come later.

Ryosuke Niwa

Comment 11 2011-01-11 15:09:47 PST

(In reply to comment #10) > OK, patch coming for Chromium. I don't have a windows box handy, so Safari Win will come later. Why don't we do this when we're serializing DOM instead of doing it in each platform? It's a simple modification to MarkupAccumulator.

Tony Chang

Comment 12 2011-01-13 14:30:34 PST

Created attachment 78856 [details] Patch

Tony Chang

Comment 13 2011-01-13 14:32:14 PST

(In reply to comment #12) > Created an attachment (id=78856) [details] > Patch I talked to Ryosuke offline and the code for putting plain text on the clipboard doesn't go through MarkupAccumulator, it goes thorough Editor::selectedText(). Adding the code there fixes Chromium Linux, but I need to do some more testing on a Windows machine to see if we need to make a similar fix in MarkupAccumulator (for the rich text case).

Ryosuke Niwa

Comment 14 2011-01-13 15:07:55 PST

(In reply to comment #13) > (In reply to comment #12) > > Created an attachment (id=78856) [details] [details] > > Patch > > I talked to Ryosuke offline and the code for putting plain text on the clipboard doesn't go through MarkupAccumulator, it goes thorough Editor::selectedText(). Adding the code there fixes Chromium Linux, but I need to do some more testing on a Windows machine to see if we need to make a similar fix in MarkupAccumulator (for the rich text case). Per discussion in person, Tony doesn't have access to a Windows machine, so I'll build & test his patch on my Windows machine.

Ryosuke Niwa

Comment 15 2011-01-13 19:27:51 PST

(In reply to comment #14) > Per discussion in person, Tony doesn't have access to a Windows machine, so I'll build & test his patch on my Windows machine. I confirmed that the bug does not produce on Windows with Tony's patch.

Ryosuke Niwa

Comment 16 2011-01-13 19:39:03 PST

Comment on attachment 78856 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=78856&action=review r=me provided the answers to the following two questions are both yes. > LayoutTests/editing/pasteboard/copy-null-characters.html:27 > + var source = document.getElementById("source"); > + var textWithNull = "Copy\0 paste me"; > + source.textContent = textWithNull; > + sel.setPosition(source, 0); > + document.execCommand("SelectAll"); > + document.execCommand("Copy"); > + // Remove the source text so we don't end up with a null character in the > + // expected output file. > + source.parentNode.removeChild(source); > + > + var destinationRichText = document.getElementById("destination-rich-text"); > + sel.setPosition(destinationRichText, 0); > + document.execCommand("Paste"); > + > + var destinationPlainText = document.getElementById("destination-plain-text"); > + destinationPlainText.focus(); > + document.execCommand("Paste"); Could you also add a test case where the copied contents are richly formatted? > Source/WebCore/platform/mac/PasteboardMac.mm:194 > - String text = selectedRange->text(); > + String text = frame->editor()->selectedText(); If I'm understanding this right, you'd expect expect Range::text() to include null char because it's a DOM API. Am I right?

Tony Chang

Comment 17 2011-01-14 14:35:09 PST

Created attachment 79001 [details] Patch for landing

Tony Chang

Comment 18 2011-01-14 14:36:36 PST

(In reply to comment #16) > Could you also add a test case where the copied contents are richly formatted? Done. > > Source/WebCore/platform/mac/PasteboardMac.mm:194 > > - String text = selectedRange->text(); > > + String text = frame->editor()->selectedText(); > > If I'm understanding this right, you'd expect expect Range::text() to include null char because it's a DOM API. Am I right? Yes, that's right. This has the side benefit that now all platforms use the same code path to get the plain text that is placed on the clipboard/pasteboard.

Alexey Proskuryakov

Comment 19 2011-01-14 14:56:20 PST

Comment on attachment 79001 [details] Patch for landing +static String removeNULLCharacters(const String& str) +{ + String copy = str; + return copy.removeCharacters(&isNULLCharacter); +} Wow, that's a pretty slow way to remove nulls. Are we sure that we want to land it?

Alexey Proskuryakov

Comment 20 2011-01-14 15:02:56 PST

User-initiated copying may not be the hottest code, but one function call per character is still a lot.

WebKit Commit Bot

Comment 21 2011-01-14 15:06:09 PST

The commit-queue encountered the following flaky tests while processing attachment 79001 [details]: http/tests/appcache/update-cache.html bug 52483 (author: ap@webkit.org) The commit-queue is continuing to process your patch.

Tony Chang

Comment 22 2011-01-14 15:11:26 PST

(In reply to comment #20) > User-initiated copying may not be the hottest code, but one function call per character is still a lot. I suspect the compiler would inline the function, but if you prefer, I can update the patch to use StringImpl::replace(UChar, StringImpl*).

Tony Chang

Comment 23 2011-01-14 15:13:54 PST

(In reply to comment #22) > (In reply to comment #20) > > User-initiated copying may not be the hottest code, but one function call per character is still a lot. > > I suspect the compiler would inline the function, but if you prefer, I can update the patch to use StringImpl::replace(UChar, StringImpl*). I could also use StringImpl::createStrippingNullCharactersSlowCase.

Alexey Proskuryakov

Comment 24 2011-01-14 15:35:20 PST

> I suspect the compiler would inline the function, but if you prefer, I can update the patch to use StringImpl::replace(UChar, StringImpl*). The compiler cannot inline the function, because String::removeCharacters() is not inline itself. The function's definition is not visible in StringImpl.cpp (which is in JavaScriptCore, so even cross-unit inlining wouldn't work).

Maciej Stachowiak

Comment 25 2011-01-14 15:48:30 PST

Better way to do it (untested): static String removeNULLCharacters(const String& str) { return str.replace(0, ""); }

Ryosuke Niwa

Comment 26 2011-01-14 15:51:47 PST

(In reply to comment #19) > (From update of attachment 79001 [details]) > +static String removeNULLCharacters(const String& str) > +{ > + String copy = str; > + return copy.removeCharacters(&isNULLCharacter); > +} > > Wow, that's a pretty slow way to remove nulls. Are we sure that we want to land it? Oops, I missed that. Let's fix this before landing.

Tony Chang

Comment 27 2011-01-14 16:11:14 PST

Created attachment 79021 [details] Patch

Alexey Proskuryakov

Comment 28 2011-01-14 16:34:31 PST

Comment on attachment 79021 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79021&action=review > Source/WebCore/editing/Editor.cpp:133 > +static void removeNULLCharacters(String& str) > +{ > + str.replace(0, ""); > +} I didn't comment about the function name, because I assumed that you'd just inline str.replace(0, "") at call site. But NULL isn't great, s it usually refers to C null pointer macro, which is not a single byte, and which we don't use anyway.

Tony Chang

Comment 29 2011-01-14 17:26:06 PST

Created attachment 79039 [details] Patch for landing

WebKit Commit Bot

Comment 30 2011-01-14 18:06:23 PST

Comment on attachment 79039 [details] Patch for landing Clearing flags on attachment: 79039 Committed r75861: <http://trac.webkit.org/changeset/75861>

WebKit Commit Bot

Comment 31 2011-01-14 18:06:31 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.