Windows does not allow NUL characters on the clipboard since they are used to terminate the clipboard data. We should strip NUL characters from the text before sending the text to the windows clipboard.
Created attachment 78579 [details] Test case
Some testing: On Chromium Linux, Chromium Win, Safari Win, we copy up to the NUL character (i.e., the copied text is truncated). On Safari Mac and Chromium Mac, we ignore the NUL (when you paste, you get text without the NUL character). On IE9, Firefox Win and Firefox Linux, the DOM has the unicode replacement character. On Firefox Mac, the NUL is converted into a unicode replacement character (U+FFFD) in the pasted text. Actually, it may already be replaced in the DOM but perhaps there's no glyph for it.
(In reply to comment #2) > On IE9, Firefox Win and Firefox Linux, the DOM has the unicode replacement character. We should most certainly replace with the unicode replace character.
(In reply to comment #3) > (In reply to comment #2) > > On IE9, Firefox Win and Firefox Linux, the DOM has the unicode replacement character. > > We should most certainly replace with the unicode replace character. Yup, that's what the HTML5 spec says: http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#plaintext-state
Actually, arv says that if he changes the test case to use innerHTML, the NUL character is stripped (he'll upload a test case). Seems like we need to have setting textContent go through the HTML parser and the HTML parser needs to insert the replacement character.
> Seems like we need to have setting textContent go through the HTML parser and the HTML parser needs to insert the replacement character. That would be surprising. textContent is a DOM-level concept. It should ignore anything HTML specific. That said, stranger things have happened.
Created attachment 78601 [details] Shows that NUL is stripped through the parser
HTML5 goes to great lengths trying to avoid having null characters in Unicode strings, but I don't think that it generally makes sense. Replacing null with U+FFFD during parsing is particularly weird, because one can always use DOM manipulation to insert a null (as it's done in attached test case), so there is no guarantee of "no null in DOM" anyway. When copying, we should just make sure that the user gets expected results. As WebKit doesn't display nulls in content, they should be removed in copied text, not replaced with U+FFFD. > Seems like we need to have setting textContent go through the HTML parser and the HTML parser needs to insert the replacement character. I don't think that it's the right direction. There are many ways to insert a null via DOM manipulation, and even HTML5 doesn't prohibit those.
(In reply to comment #8) > When copying, we should just make sure that the user gets expected results. As WebKit doesn't display nulls in content, they should be removed in copied text, not replaced with U+FFFD. Agreed. That was how this started before it got side tracked...
(In reply to comment #9) > (In reply to comment #8) > > When copying, we should just make sure that the user gets expected results. As WebKit doesn't display nulls in content, they should be removed in copied text, not replaced with U+FFFD. > > Agreed. That was how this started before it got side tracked... OK, patch coming for Chromium. I don't have a windows box handy, so Safari Win will come later.
(In reply to comment #10) > OK, patch coming for Chromium. I don't have a windows box handy, so Safari Win will come later. Why don't we do this when we're serializing DOM instead of doing it in each platform? It's a simple modification to MarkupAccumulator.
Created attachment 78856 [details] Patch
(In reply to comment #12) > Created an attachment (id=78856) [details] > Patch I talked to Ryosuke offline and the code for putting plain text on the clipboard doesn't go through MarkupAccumulator, it goes thorough Editor::selectedText(). Adding the code there fixes Chromium Linux, but I need to do some more testing on a Windows machine to see if we need to make a similar fix in MarkupAccumulator (for the rich text case).
(In reply to comment #13) > (In reply to comment #12) > > Created an attachment (id=78856) [details] [details] > > Patch > > I talked to Ryosuke offline and the code for putting plain text on the clipboard doesn't go through MarkupAccumulator, it goes thorough Editor::selectedText(). Adding the code there fixes Chromium Linux, but I need to do some more testing on a Windows machine to see if we need to make a similar fix in MarkupAccumulator (for the rich text case). Per discussion in person, Tony doesn't have access to a Windows machine, so I'll build & test his patch on my Windows machine.
(In reply to comment #14) > Per discussion in person, Tony doesn't have access to a Windows machine, so I'll build & test his patch on my Windows machine. I confirmed that the bug does not produce on Windows with Tony's patch.
Comment on attachment 78856 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=78856&action=review r=me provided the answers to the following two questions are both yes. > LayoutTests/editing/pasteboard/copy-null-characters.html:27 > + var source = document.getElementById("source"); > + var textWithNull = "Copy\0 paste me"; > + source.textContent = textWithNull; > + sel.setPosition(source, 0); > + document.execCommand("SelectAll"); > + document.execCommand("Copy"); > + // Remove the source text so we don't end up with a null character in the > + // expected output file. > + source.parentNode.removeChild(source); > + > + var destinationRichText = document.getElementById("destination-rich-text"); > + sel.setPosition(destinationRichText, 0); > + document.execCommand("Paste"); > + > + var destinationPlainText = document.getElementById("destination-plain-text"); > + destinationPlainText.focus(); > + document.execCommand("Paste"); Could you also add a test case where the copied contents are richly formatted? > Source/WebCore/platform/mac/PasteboardMac.mm:194 > - String text = selectedRange->text(); > + String text = frame->editor()->selectedText(); If I'm understanding this right, you'd expect expect Range::text() to include null char because it's a DOM API. Am I right?
Created attachment 79001 [details] Patch for landing
(In reply to comment #16) > Could you also add a test case where the copied contents are richly formatted? Done. > > Source/WebCore/platform/mac/PasteboardMac.mm:194 > > - String text = selectedRange->text(); > > + String text = frame->editor()->selectedText(); > > If I'm understanding this right, you'd expect expect Range::text() to include null char because it's a DOM API. Am I right? Yes, that's right. This has the side benefit that now all platforms use the same code path to get the plain text that is placed on the clipboard/pasteboard.
Comment on attachment 79001 [details] Patch for landing +static String removeNULLCharacters(const String& str) +{ + String copy = str; + return copy.removeCharacters(&isNULLCharacter); +} Wow, that's a pretty slow way to remove nulls. Are we sure that we want to land it?
User-initiated copying may not be the hottest code, but one function call per character is still a lot.
The commit-queue encountered the following flaky tests while processing attachment 79001 [details]: http/tests/appcache/update-cache.html bug 52483 (author: ap@webkit.org) The commit-queue is continuing to process your patch.
(In reply to comment #20) > User-initiated copying may not be the hottest code, but one function call per character is still a lot. I suspect the compiler would inline the function, but if you prefer, I can update the patch to use StringImpl::replace(UChar, StringImpl*).
(In reply to comment #22) > (In reply to comment #20) > > User-initiated copying may not be the hottest code, but one function call per character is still a lot. > > I suspect the compiler would inline the function, but if you prefer, I can update the patch to use StringImpl::replace(UChar, StringImpl*). I could also use StringImpl::createStrippingNullCharactersSlowCase.
> I suspect the compiler would inline the function, but if you prefer, I can update the patch to use StringImpl::replace(UChar, StringImpl*). The compiler cannot inline the function, because String::removeCharacters() is not inline itself. The function's definition is not visible in StringImpl.cpp (which is in JavaScriptCore, so even cross-unit inlining wouldn't work).
Better way to do it (untested): static String removeNULLCharacters(const String& str) { return str.replace(0, ""); }
(In reply to comment #19) > (From update of attachment 79001 [details]) > +static String removeNULLCharacters(const String& str) > +{ > + String copy = str; > + return copy.removeCharacters(&isNULLCharacter); > +} > > Wow, that's a pretty slow way to remove nulls. Are we sure that we want to land it? Oops, I missed that. Let's fix this before landing.
Created attachment 79021 [details] Patch
Comment on attachment 79021 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79021&action=review > Source/WebCore/editing/Editor.cpp:133 > +static void removeNULLCharacters(String& str) > +{ > + str.replace(0, ""); > +} I didn't comment about the function name, because I assumed that you'd just inline str.replace(0, "") at call site. But NULL isn't great, s it usually refers to C null pointer macro, which is not a single byte, and which we don't use anyway.
Created attachment 79039 [details] Patch for landing
Comment on attachment 79039 [details] Patch for landing Clearing flags on attachment: 79039 Committed r75861: <http://trac.webkit.org/changeset/75861>
All reviewed patches have been landed. Closing bug.