WebkitGtk HTML Editing not accustomed to UTF-16 clipboard. When paste rich text to editablearea, (like gmail Rich formatting editor, or any application use WebkitGtk as html editor) WebkitGtk assumed that SelectionData from gtk clipboard is always UTF-8 encoding. But ,in fact, some applications like FireFox put SelectionData for clipboard use UTF-16 encoding, so rich text markup garbled. To reproduce: 0. Launch FireFox to browse http://www.apple.com 1. Copy that text or picture from FireFox page 2. Launch a WebkitGtk browser (Midori or GtkLauncher) and open http://www.mozilla.org/editor/midasdemo/ , it can see a rich formatting editor 3. Paste in the rich formatting editor Results: pasted contents of editor is garbled markup
Created attachment 31999 [details] Fix problem with UTF-16 clipboard pasted
Created attachment 32013 [details] update patch to r45340
Comment on attachment 32013 [details] update patch to r45340 Can there be other encodings on the gtk pasteboard? Should we be using TextDecoder instead here? This is also really lame that this isn't standardized in some way...
(In reply to comment #3) > (From update of attachment 32013 [details] [review]) > Can there be other encodings on the gtk pasteboard? No, only UTF-8 and "Unicode" (UTF-16LE, UTF-16BE) now. General gtk apps use UTF-8, and others use Unicode. > Should we be using > TextDecoder instead here? This is also really lame that this isn't > standardized in some way... Thanks for your comments. They were£¬It was enough for UTF-8 and Unicode.
Comment on attachment 32013 [details] update patch to r45340 Won't this still end up with the UTF16 BOM char pasted into your web page? WebKit is smart enough to ignore BOMs in the middle of pages, but it seems silly to paste them in.
(In reply to comment #5) > (From update of attachment 32013 [details] [review]) > Won't this still end up with the UTF16 BOM char pasted into your web page? > WebKit is smart enough to ignore BOMs in the middle of pages, but it seems > silly to paste them in. > Hi, Eric, The problem of garbled is not on the BOMs, it is that String::fromUTF8() the code in WebCore/platform/text/String.cpp: > String String::fromUTF8(const char* string, size_t size) > > String String::fromUTF8(const char* string, size_t size) > { > if (!string) > return String(); > return UTF8Encoding().decode(string, size); > } the string must be legal UTF-8 encoding chars, and the size must be right length. When gtk.Clipbord put a UTF-16 data, the string is not legal UTF-8 encoding chars but UChar, and the length muse be Halved. So, the old code will wrong for encoding (multiple decoding) and size, the wrong of size (double the right length) will make markups confused. So, I maked this changes: > - String html = String::fromUTF8(reinterpret_cast<gchar*>(data->data), data->length * data->format / 8); > + String html; > + if (reinterpret_cast<UChar*>(data->data)[0] == 0xFEFF > + || reinterpret_cast<UChar*>(data->data)[0] == 0xFFFE) > + html = String(reinterpret_cast<UChar*>(data->data), data->length * data->format / 16); > + else > + html = String::fromUTF8(reinterpret_cast<gchar*>(data->data), data->length * data->format / 8); Let it use String::String() instead of String::fromUTF8 when gtk.Clipbord put a UTF-16 data (BOMs used to discern)
(In reply to comment #6) > Let it use String::String() instead of String::fromUTF8 > when gtk.Clipbord put a UTF-16 data (BOMs used to discern) > and Halved the length size ("data->length * data->format / 16" instead of "data->length * data->format / 8")
Created attachment 32072 [details] Fix the length size of GtkSelectionData from Pidgin and FireFox UTF-16 (with BOMs) GtkSelectionData must a halfed length, but some apps wrong for format size, in which case, I use "data->length / 2" to fix it.
Comment on attachment 32072 [details] Fix the length size of GtkSelectionData from Pidgin and FireFox UTF-16 (with BOMs) GtkSelectionData must a halfed length, but some apps wrong for format size, in which case, I use "data->length / 2" to fix it.
Comment on attachment 32072 [details] Fix the length size of GtkSelectionData from Pidgin and FireFox It seems like this would do a better job: TextEncoding clipboardEncoding; if (detectTextEncoding(data->data, data->len, "utf8", &clipboardEncoding)) html = clipboardEncoding.decode(data->data, data->len); This also needs a manual test in WebCore/manual-tests/gtk
(In reply to comment #10) > (From update of attachment 32072 [details] [review]) > It seems like this would do a better job: > Thanks, I appreciate your comments. I'm trying to add a manual-tests.
(In reply to comment #10) > (From update of attachment 32072 [details] [review]) > It seems like this would do a better job: > Sorry, I tried this code, but it didn't work. > String html; > TextEncoding clipboardEncoding; > if (detectTextEncoding(reinterpret_cast<char*>(data->data), data->length, "utf8", &clipboardEncoding)) > html = clipboardEncoding.decode(reinterpret_cast<char*>(data->data), data->length); I'm trying to find way
(In reply to comment #12) > Sorry, > I tried this code, but it didn't work. > It can paste text from FireFox, other WebkitGtk Page, Pidgin MessageView, but formats have been lost。 It seems detectTextEncoding always return False, so it fallback to PlainText.
(In reply to comment #12) > Sorry, > I tried this code, but it didn't work. > > > String html; > > TextEncoding clipboardEncoding; > > if (detectTextEncoding(reinterpret_cast<char*>(data->data), data->length, "utf8", &clipboardEncoding)) > > html = clipboardEncoding.decode(reinterpret_cast<char*>(data->data), data->length); Does "UTF-8" work? I admit, I haven't used detectTextEncoding much.
Oh, it's also possible that detectTextEncoding isn't wired up for the gtk port. TextEncodingDetectorICU.cpp is the file I would expect gtk to be using.
(In reply to comment #14) > Does "UTF-8" work? I admit, I haven't used detectTextEncoding much. > no :(
(In reply to comment #15) > Oh, it's also possible that detectTextEncoding isn't wired up for the gtk port. > TextEncodingDetectorICU.cpp is the file I would expect gtk to be using. > configure of WebkitGtk has this Optional: > --with-unicode-backend=[icu/glib] > Select Unicode backend (WARNING: the glib-based > backend is slow, and incomplete) default=icu I selected the default icu, and I had add > #include "TextEncodingDetector.h" in WebCore/platform/gtk/PasteboardGtk.cpp
I would encourage you to set a breakpoint in detectTextEncoding, because I believe your use-case should work fine if the gtk build has encoding detection wired up correctly.
(In reply to comment #18) > I would encourage you to set a breakpoint in detectTextEncoding, because I > believe your use-case should work fine if the gtk build has encoding detection > wired up correctly. Hi, I use TextResourceDecoder instead of simple detectTextEncoding. this's the code: > String html; > html = TextResourceDecoder::create("text/html", "UTF-8", true)->decode(reinterpret_cast<char*>(data->data), data->length); it works fine.
Created attachment 32238 [details] use TextResourceDecoder instead of simple detectTextEncoding change to this code: > String html; > RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create("text/html", "UTF-8", true); > html = decoder->decode(reinterpret_cast<char*>(data->data), data->length); it works. GtkSelectionData (html) from GTK clipboard can be some forms: 1. From other WebkitGtk page "text/html", 8bits, utf-8, no BOMs 2. From FireFox page "text/html", 8bits (wrong for firefox), utf-16, BOMs 3. From Pidgin messageview "text/html", 16bits, BOMs 4. From other Gtk App who use "text/html" when copy-paste "text/html", 8bits, utf-8, no BOMs ... it use TextResourceDecoder to auto detect encoding.
Created attachment 32239 [details] change mimeType to decrease the cost Use "text/plain" mimeType instead of "text/html" in TextResourceDecoder, it will decrease the checking cost.
(In reply to comment #21) > Created an attachment (id=32239) [details] > change mimeType to decrease the cost > > Use "text/plain" mimeType instead of "text/html" in TextResourceDecoder, > it will decrease the checking cost. Hi Jiahua. Are you able to add manual-test for this? That seems to be a missing piece in your patch. Thanks!
Created attachment 32264 [details] add a manual test in WebCore/manual-tests/gtk (In reply to comment #22) > Hi Jiahua. Are you able to add manual-test for this? That seems to be a missing > piece in your patch. > > Thanks! Hi Jan, WebCore/manual-tests/gtk/paste-richtext-from-firefox.html has been added.
Comment on attachment 32264 [details] add a manual test in WebCore/manual-tests/gtk > - String html = String::fromUTF8(reinterpret_cast<gchar*>(data->data), data->length * data->format / 8); > + String html; > + RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create("text/plain", "UTF-8", true); > + html = decoder->decode(reinterpret_cast<char*>(data->data), data->length); > + html += decoder->flush(); > gtk_selection_data_free(data); Declaration and the first assignment can be in the same line here. Looks fine and the patch works great! r=me.
(In reply to comment #24) > (From update of attachment 32264 [details]) > > - String html = String::fromUTF8(reinterpret_cast<gchar*>(data->data), data->length * data->format / 8); > > + String html; > > + RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create("text/plain", "UTF-8", true); > > + html = decoder->decode(reinterpret_cast<char*>(data->data), data->length); > > + html += decoder->flush(); > > gtk_selection_data_free(data); > > Declaration and the first assignment can be in the same line here. > > Looks fine and the patch works great! r=me. Thank you, Jan, I appreciate your comments. I observ'd also, when I select some html content, and Copy them use Ctrl-C, the Pasteboard/GtkClipboard can't handle the "text/html" target. I found that the "text/html" target only in X clipboard (GDK_SELECTION_CLIPBOARD), but not Desktop clipboard (GDK_SELECTION_PRIMARY). So we can't use Ctrl-C to Copy rich text from WebkitGtk. I use this Change: > Index: WebCore/platform/gtk/PasteboardGtk.cpp > =================================================================== > --- WebCore/platform/gtk/PasteboardGtk.cpp (revision 45546) > +++ WebCore/platform/gtk/PasteboardGtk.cpp (working copy) > @@ -37,8 +38,8 @@ namespace WebCore { > /* FIXME: we must get rid of this and use the enum in webkitwebview.h someway */ > typedef enum > { > - WEBKIT_WEB_VIEW_TARGET_INFO_HTML = - 1, > - WEBKIT_WEB_VIEW_TARGET_INFO_TEXT = - 2 > + WEBKIT_WEB_VIEW_TARGET_INFO_HTML, > + WEBKIT_WEB_VIEW_TARGET_INFO_TEXT > } WebKitWebViewTargetInfo; > it works. it can use Ctrl-C to Copy rich text from WebkitGtk page now. but I'm not sure if it will be a new patch. Jan, should fix it when landing, or should I create a new patch with ChangeLog, or open a new Bug? Regards
> > Looks fine and the patch works great! r=me. Landed in r45592. > > Index: WebCore/platform/gtk/PasteboardGtk.cpp > > =================================================================== > > --- WebCore/platform/gtk/PasteboardGtk.cpp (revision 45546) > > +++ WebCore/platform/gtk/PasteboardGtk.cpp (working copy) > > @@ -37,8 +38,8 @@ namespace WebCore { > > /* FIXME: we must get rid of this and use the enum in webkitwebview.h someway */ > > typedef enum > > { > > - WEBKIT_WEB_VIEW_TARGET_INFO_HTML = - 1, > > - WEBKIT_WEB_VIEW_TARGET_INFO_TEXT = - 2 > > + WEBKIT_WEB_VIEW_TARGET_INFO_HTML, > > + WEBKIT_WEB_VIEW_TARGET_INFO_TEXT > > } WebKitWebViewTargetInfo; > > > > it works. > it can use Ctrl-C to Copy rich text from WebkitGtk page now. > > > but I'm not sure if it will be a new patch. Please file a new bug with the problem that you just encountered (including your patch too). Please CC me on the bug as well. Thanks
(In reply to comment #26) > Please file a new bug with the problem that you just encountered (including > your patch too). Please CC me on the bug as well. > > Thanks Well, I see, wait for a moment。 Thanks. Should I mark this bug as CLOSED.
(In reply to comment #26) Hi, Jan, I had file a new bug on https://bugs.webkit.org/show_bug.cgi?id=27028 and CC you on the bug. but when I submitted for bug 27028, it say: > Excluding: > <your email>