RESOLVED FIXED 52236
Strip NUL character when copying text on Windows
https://bugs.webkit.org/show_bug.cgi?id=52236
Summary Strip NUL character when copying text on Windows
Erik Arvidsson
Reported 2011-01-11 12:21:16 PST
Windows does not allow NUL characters on the clipboard since they are used to terminate the clipboard data. We should strip NUL characters from the text before sending the text to the windows clipboard.
Attachments
Test case (97 bytes, text/html)
2011-01-11 12:32 PST, Erik Arvidsson
no flags
Shows that NUL is stripped through the parser (682 bytes, text/html)
2011-01-11 14:37 PST, Erik Arvidsson
no flags
Patch (6.47 KB, patch)
2011-01-13 14:30 PST, Tony Chang
no flags
Patch for landing (7.70 KB, patch)
2011-01-14 14:35 PST, Tony Chang
no flags
Patch (7.54 KB, patch)
2011-01-14 16:11 PST, Tony Chang
no flags
Patch for landing (7.12 KB, patch)
2011-01-14 17:26 PST, Tony Chang
no flags
Erik Arvidsson
Comment 1 2011-01-11 12:32:43 PST
Created attachment 78579 [details] Test case
Tony Chang
Comment 2 2011-01-11 14:00:18 PST
Some testing: On Chromium Linux, Chromium Win, Safari Win, we copy up to the NUL character (i.e., the copied text is truncated). On Safari Mac and Chromium Mac, we ignore the NUL (when you paste, you get text without the NUL character). On IE9, Firefox Win and Firefox Linux, the DOM has the unicode replacement character. On Firefox Mac, the NUL is converted into a unicode replacement character (U+FFFD) in the pasted text. Actually, it may already be replaced in the DOM but perhaps there's no glyph for it.
Ryosuke Niwa
Comment 3 2011-01-11 14:02:29 PST
(In reply to comment #2) > On IE9, Firefox Win and Firefox Linux, the DOM has the unicode replacement character. We should most certainly replace with the unicode replace character.
Tony Chang
Comment 4 2011-01-11 14:18:38 PST
(In reply to comment #3) > (In reply to comment #2) > > On IE9, Firefox Win and Firefox Linux, the DOM has the unicode replacement character. > > We should most certainly replace with the unicode replace character. Yup, that's what the HTML5 spec says: http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#plaintext-state
Tony Chang
Comment 5 2011-01-11 14:25:18 PST
Actually, arv says that if he changes the test case to use innerHTML, the NUL character is stripped (he'll upload a test case). Seems like we need to have setting textContent go through the HTML parser and the HTML parser needs to insert the replacement character.
Adam Barth
Comment 6 2011-01-11 14:35:18 PST
> Seems like we need to have setting textContent go through the HTML parser and the HTML parser needs to insert the replacement character. That would be surprising. textContent is a DOM-level concept. It should ignore anything HTML specific. That said, stranger things have happened.
Erik Arvidsson
Comment 7 2011-01-11 14:37:35 PST
Created attachment 78601 [details] Shows that NUL is stripped through the parser
Alexey Proskuryakov
Comment 8 2011-01-11 14:43:02 PST
HTML5 goes to great lengths trying to avoid having null characters in Unicode strings, but I don't think that it generally makes sense. Replacing null with U+FFFD during parsing is particularly weird, because one can always use DOM manipulation to insert a null (as it's done in attached test case), so there is no guarantee of "no null in DOM" anyway. When copying, we should just make sure that the user gets expected results. As WebKit doesn't display nulls in content, they should be removed in copied text, not replaced with U+FFFD. > Seems like we need to have setting textContent go through the HTML parser and the HTML parser needs to insert the replacement character. I don't think that it's the right direction. There are many ways to insert a null via DOM manipulation, and even HTML5 doesn't prohibit those.
Erik Arvidsson
Comment 9 2011-01-11 15:03:23 PST
(In reply to comment #8) > When copying, we should just make sure that the user gets expected results. As WebKit doesn't display nulls in content, they should be removed in copied text, not replaced with U+FFFD. Agreed. That was how this started before it got side tracked...
Tony Chang
Comment 10 2011-01-11 15:05:53 PST
(In reply to comment #9) > (In reply to comment #8) > > When copying, we should just make sure that the user gets expected results. As WebKit doesn't display nulls in content, they should be removed in copied text, not replaced with U+FFFD. > > Agreed. That was how this started before it got side tracked... OK, patch coming for Chromium. I don't have a windows box handy, so Safari Win will come later.
Ryosuke Niwa
Comment 11 2011-01-11 15:09:47 PST
(In reply to comment #10) > OK, patch coming for Chromium. I don't have a windows box handy, so Safari Win will come later. Why don't we do this when we're serializing DOM instead of doing it in each platform? It's a simple modification to MarkupAccumulator.
Tony Chang
Comment 12 2011-01-13 14:30:34 PST
Tony Chang
Comment 13 2011-01-13 14:32:14 PST
(In reply to comment #12) > Created an attachment (id=78856) [details] > Patch I talked to Ryosuke offline and the code for putting plain text on the clipboard doesn't go through MarkupAccumulator, it goes thorough Editor::selectedText(). Adding the code there fixes Chromium Linux, but I need to do some more testing on a Windows machine to see if we need to make a similar fix in MarkupAccumulator (for the rich text case).
Ryosuke Niwa
Comment 14 2011-01-13 15:07:55 PST
(In reply to comment #13) > (In reply to comment #12) > > Created an attachment (id=78856) [details] [details] > > Patch > > I talked to Ryosuke offline and the code for putting plain text on the clipboard doesn't go through MarkupAccumulator, it goes thorough Editor::selectedText(). Adding the code there fixes Chromium Linux, but I need to do some more testing on a Windows machine to see if we need to make a similar fix in MarkupAccumulator (for the rich text case). Per discussion in person, Tony doesn't have access to a Windows machine, so I'll build & test his patch on my Windows machine.
Ryosuke Niwa
Comment 15 2011-01-13 19:27:51 PST
(In reply to comment #14) > Per discussion in person, Tony doesn't have access to a Windows machine, so I'll build & test his patch on my Windows machine. I confirmed that the bug does not produce on Windows with Tony's patch.
Ryosuke Niwa
Comment 16 2011-01-13 19:39:03 PST
Comment on attachment 78856 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=78856&action=review r=me provided the answers to the following two questions are both yes. > LayoutTests/editing/pasteboard/copy-null-characters.html:27 > + var source = document.getElementById("source"); > + var textWithNull = "Copy\0 paste me"; > + source.textContent = textWithNull; > + sel.setPosition(source, 0); > + document.execCommand("SelectAll"); > + document.execCommand("Copy"); > + // Remove the source text so we don't end up with a null character in the > + // expected output file. > + source.parentNode.removeChild(source); > + > + var destinationRichText = document.getElementById("destination-rich-text"); > + sel.setPosition(destinationRichText, 0); > + document.execCommand("Paste"); > + > + var destinationPlainText = document.getElementById("destination-plain-text"); > + destinationPlainText.focus(); > + document.execCommand("Paste"); Could you also add a test case where the copied contents are richly formatted? > Source/WebCore/platform/mac/PasteboardMac.mm:194 > - String text = selectedRange->text(); > + String text = frame->editor()->selectedText(); If I'm understanding this right, you'd expect expect Range::text() to include null char because it's a DOM API. Am I right?
Tony Chang
Comment 17 2011-01-14 14:35:09 PST
Created attachment 79001 [details] Patch for landing
Tony Chang
Comment 18 2011-01-14 14:36:36 PST
(In reply to comment #16) > Could you also add a test case where the copied contents are richly formatted? Done. > > Source/WebCore/platform/mac/PasteboardMac.mm:194 > > - String text = selectedRange->text(); > > + String text = frame->editor()->selectedText(); > > If I'm understanding this right, you'd expect expect Range::text() to include null char because it's a DOM API. Am I right? Yes, that's right. This has the side benefit that now all platforms use the same code path to get the plain text that is placed on the clipboard/pasteboard.
Alexey Proskuryakov
Comment 19 2011-01-14 14:56:20 PST
Comment on attachment 79001 [details] Patch for landing +static String removeNULLCharacters(const String& str) +{ + String copy = str; + return copy.removeCharacters(&isNULLCharacter); +} Wow, that's a pretty slow way to remove nulls. Are we sure that we want to land it?
Alexey Proskuryakov
Comment 20 2011-01-14 15:02:56 PST
User-initiated copying may not be the hottest code, but one function call per character is still a lot.
WebKit Commit Bot
Comment 21 2011-01-14 15:06:09 PST
The commit-queue encountered the following flaky tests while processing attachment 79001 [details]: http/tests/appcache/update-cache.html bug 52483 (author: ap@webkit.org) The commit-queue is continuing to process your patch.
Tony Chang
Comment 22 2011-01-14 15:11:26 PST
(In reply to comment #20) > User-initiated copying may not be the hottest code, but one function call per character is still a lot. I suspect the compiler would inline the function, but if you prefer, I can update the patch to use StringImpl::replace(UChar, StringImpl*).
Tony Chang
Comment 23 2011-01-14 15:13:54 PST
(In reply to comment #22) > (In reply to comment #20) > > User-initiated copying may not be the hottest code, but one function call per character is still a lot. > > I suspect the compiler would inline the function, but if you prefer, I can update the patch to use StringImpl::replace(UChar, StringImpl*). I could also use StringImpl::createStrippingNullCharactersSlowCase.
Alexey Proskuryakov
Comment 24 2011-01-14 15:35:20 PST
> I suspect the compiler would inline the function, but if you prefer, I can update the patch to use StringImpl::replace(UChar, StringImpl*). The compiler cannot inline the function, because String::removeCharacters() is not inline itself. The function's definition is not visible in StringImpl.cpp (which is in JavaScriptCore, so even cross-unit inlining wouldn't work).
Maciej Stachowiak
Comment 25 2011-01-14 15:48:30 PST
Better way to do it (untested): static String removeNULLCharacters(const String& str) { return str.replace(0, ""); }
Ryosuke Niwa
Comment 26 2011-01-14 15:51:47 PST
(In reply to comment #19) > (From update of attachment 79001 [details]) > +static String removeNULLCharacters(const String& str) > +{ > + String copy = str; > + return copy.removeCharacters(&isNULLCharacter); > +} > > Wow, that's a pretty slow way to remove nulls. Are we sure that we want to land it? Oops, I missed that. Let's fix this before landing.
Tony Chang
Comment 27 2011-01-14 16:11:14 PST
Alexey Proskuryakov
Comment 28 2011-01-14 16:34:31 PST
Comment on attachment 79021 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79021&action=review > Source/WebCore/editing/Editor.cpp:133 > +static void removeNULLCharacters(String& str) > +{ > + str.replace(0, ""); > +} I didn't comment about the function name, because I assumed that you'd just inline str.replace(0, "") at call site. But NULL isn't great, s it usually refers to C null pointer macro, which is not a single byte, and which we don't use anyway.
Tony Chang
Comment 29 2011-01-14 17:26:06 PST
Created attachment 79039 [details] Patch for landing
WebKit Commit Bot
Comment 30 2011-01-14 18:06:23 PST
Comment on attachment 79039 [details] Patch for landing Clearing flags on attachment: 79039 Committed r75861: <http://trac.webkit.org/changeset/75861>
WebKit Commit Bot
Comment 31 2011-01-14 18:06:31 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.