WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
CLOSED FIXED
26791
[Gtk] Paste of rich text from firefox results garbled markup
https://bugs.webkit.org/show_bug.cgi?id=26791
Summary
[Gtk] Paste of rich text from firefox results garbled markup
Jiahua Huang
Reported
2009-06-29 00:57:44 PDT
WebkitGtk HTML Editing not accustomed to UTF-16 clipboard. When paste rich text to editablearea, (like gmail Rich formatting editor, or any application use WebkitGtk as html editor) WebkitGtk assumed that SelectionData from gtk clipboard is always UTF-8 encoding. But ,in fact, some applications like FireFox put SelectionData for clipboard use UTF-16 encoding, so rich text markup garbled. To reproduce: 0. Launch FireFox to browse
http://www.apple.com
1. Copy that text or picture from FireFox page 2. Launch a WebkitGtk browser (Midori or GtkLauncher) and open
http://www.mozilla.org/editor/midasdemo/
, it can see a rich formatting editor 3. Paste in the rich formatting editor Results: pasted contents of editor is garbled markup
Attachments
Fix problem with UTF-16 clipboard pasted
(1.51 KB, patch)
2009-06-29 01:04 PDT
,
Jiahua Huang
no flags
Details
Formatted Diff
Diff
update patch to r45340
(1.58 KB, patch)
2009-06-29 10:35 PDT
,
Jiahua Huang
no flags
Details
Formatted Diff
Diff
Fix the length size of GtkSelectionData from Pidgin and FireFox
(1.56 KB, patch)
2009-06-30 09:42 PDT
,
Jiahua Huang
eric
: review-
Details
Formatted Diff
Diff
use TextResourceDecoder instead of simple detectTextEncoding
(1.61 KB, patch)
2009-07-03 08:33 PDT
,
Jiahua Huang
no flags
Details
Formatted Diff
Diff
change mimeType to decrease the cost
(1.65 KB, patch)
2009-07-03 09:59 PDT
,
Jiahua Huang
no flags
Details
Formatted Diff
Diff
add a manual test in WebCore/manual-tests/gtk
(3.31 KB, patch)
2009-07-04 16:20 PDT
,
Jiahua Huang
jmalonzo
: review+
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Jiahua Huang
Comment 1
2009-06-29 01:04:06 PDT
Created
attachment 31999
[details]
Fix problem with UTF-16 clipboard pasted
Jiahua Huang
Comment 2
2009-06-29 10:35:00 PDT
Created
attachment 32013
[details]
update patch to
r45340
Eric Seidel (no email)
Comment 3
2009-06-30 00:40:38 PDT
Comment on
attachment 32013
[details]
update patch to
r45340
Can there be other encodings on the gtk pasteboard? Should we be using TextDecoder instead here? This is also really lame that this isn't standardized in some way...
Jiahua Huang
Comment 4
2009-06-30 01:14:28 PDT
(In reply to
comment #3
)
> (From update of
attachment 32013
[details]
[review]) > Can there be other encodings on the gtk pasteboard?
No, only UTF-8 and "Unicode" (UTF-16LE, UTF-16BE) now. General gtk apps use UTF-8, and others use Unicode.
> Should we be using > TextDecoder instead here? This is also really lame that this isn't > standardized in some way...
Thanks for your comments. They were£¬It was enough for UTF-8 and Unicode.
Eric Seidel (no email)
Comment 5
2009-06-30 02:53:10 PDT
Comment on
attachment 32013
[details]
update patch to
r45340
Won't this still end up with the UTF16 BOM char pasted into your web page? WebKit is smart enough to ignore BOMs in the middle of pages, but it seems silly to paste them in.
Jiahua Huang
Comment 6
2009-06-30 03:34:52 PDT
(In reply to
comment #5
)
> (From update of
attachment 32013
[details]
[review]) > Won't this still end up with the UTF16 BOM char pasted into your web page? > WebKit is smart enough to ignore BOMs in the middle of pages, but it seems > silly to paste them in. >
Hi, Eric, The problem of garbled is not on the BOMs, it is that String::fromUTF8() the code in WebCore/platform/text/String.cpp:
> String String::fromUTF8(const char* string, size_t size) > > String String::fromUTF8(const char* string, size_t size) > { > if (!string) > return String(); > return UTF8Encoding().decode(string, size); > }
the string must be legal UTF-8 encoding chars, and the size must be right length. When gtk.Clipbord put a UTF-16 data, the string is not legal UTF-8 encoding chars but UChar, and the length muse be Halved. So, the old code will wrong for encoding (multiple decoding) and size, the wrong of size (double the right length) will make markups confused. So, I maked this changes:
> - String html = String::fromUTF8(reinterpret_cast<gchar*>(data->data), data->length * data->format / 8); > + String html; > + if (reinterpret_cast<UChar*>(data->data)[0] == 0xFEFF > + || reinterpret_cast<UChar*>(data->data)[0] == 0xFFFE) > + html = String(reinterpret_cast<UChar*>(data->data), data->length * data->format / 16); > + else > + html = String::fromUTF8(reinterpret_cast<gchar*>(data->data), data->length * data->format / 8);
Let it use String::String() instead of String::fromUTF8 when gtk.Clipbord put a UTF-16 data (BOMs used to discern)
Jiahua Huang
Comment 7
2009-06-30 03:38:06 PDT
(In reply to
comment #6
)
> Let it use String::String() instead of String::fromUTF8 > when gtk.Clipbord put a UTF-16 data (BOMs used to discern) >
and Halved the length size ("data->length * data->format / 16" instead of "data->length * data->format / 8")
Jiahua Huang
Comment 8
2009-06-30 09:42:13 PDT
Created
attachment 32072
[details]
Fix the length size of GtkSelectionData from Pidgin and FireFox UTF-16 (with BOMs) GtkSelectionData must a halfed length, but some apps wrong for format size, in which case, I use "data->length / 2" to fix it.
Jiahua Huang
Comment 9
2009-06-30 10:24:47 PDT
Comment on
attachment 32072
[details]
Fix the length size of GtkSelectionData from Pidgin and FireFox UTF-16 (with BOMs) GtkSelectionData must a halfed length, but some apps wrong for format size, in which case, I use "data->length / 2" to fix it.
Eric Seidel (no email)
Comment 10
2009-06-30 10:59:59 PDT
Comment on
attachment 32072
[details]
Fix the length size of GtkSelectionData from Pidgin and FireFox It seems like this would do a better job: TextEncoding clipboardEncoding; if (detectTextEncoding(data->data, data->len, "utf8", &clipboardEncoding)) html = clipboardEncoding.decode(data->data, data->len); This also needs a manual test in WebCore/manual-tests/gtk
Jiahua Huang
Comment 11
2009-06-30 11:19:39 PDT
(In reply to
comment #10
)
> (From update of
attachment 32072
[details]
[review]) > It seems like this would do a better job: >
Thanks, I appreciate your comments. I'm trying to add a manual-tests.
Jiahua Huang
Comment 12
2009-06-30 11:30:32 PDT
(In reply to
comment #10
)
> (From update of
attachment 32072
[details]
[review]) > It seems like this would do a better job: >
Sorry, I tried this code, but it didn't work.
> String html; > TextEncoding clipboardEncoding; > if (detectTextEncoding(reinterpret_cast<char*>(data->data), data->length, "utf8", &clipboardEncoding)) > html = clipboardEncoding.decode(reinterpret_cast<char*>(data->data), data->length);
I'm trying to find way
Jiahua Huang
Comment 13
2009-06-30 11:34:06 PDT
(In reply to
comment #12
)
> Sorry, > I tried this code, but it didn't work. >
It can paste text from FireFox, other WebkitGtk Page, Pidgin MessageView, but formats have been lost。 It seems detectTextEncoding always return False, so it fallback to PlainText.
Eric Seidel (no email)
Comment 14
2009-06-30 11:34:58 PDT
(In reply to
comment #12
)
> Sorry, > I tried this code, but it didn't work. > > > String html; > > TextEncoding clipboardEncoding; > > if (detectTextEncoding(reinterpret_cast<char*>(data->data), data->length, "utf8", &clipboardEncoding)) > > html = clipboardEncoding.decode(reinterpret_cast<char*>(data->data), data->length);
Does "UTF-8" work? I admit, I haven't used detectTextEncoding much.
Eric Seidel (no email)
Comment 15
2009-06-30 11:36:08 PDT
Oh, it's also possible that detectTextEncoding isn't wired up for the gtk port. TextEncodingDetectorICU.cpp is the file I would expect gtk to be using.
Jiahua Huang
Comment 16
2009-06-30 11:42:01 PDT
(In reply to
comment #14
)
> Does "UTF-8" work? I admit, I haven't used detectTextEncoding much. >
no :(
Jiahua Huang
Comment 17
2009-06-30 11:46:39 PDT
(In reply to
comment #15
)
> Oh, it's also possible that detectTextEncoding isn't wired up for the gtk port. > TextEncodingDetectorICU.cpp is the file I would expect gtk to be using. >
configure of WebkitGtk has this Optional:
> --with-unicode-backend=[icu/glib] > Select Unicode backend (WARNING: the glib-based > backend is slow, and incomplete) default=icu
I selected the default icu, and I had add
> #include "TextEncodingDetector.h"
in WebCore/platform/gtk/PasteboardGtk.cpp
Eric Seidel (no email)
Comment 18
2009-06-30 12:14:13 PDT
I would encourage you to set a breakpoint in detectTextEncoding, because I believe your use-case should work fine if the gtk build has encoding detection wired up correctly.
Jiahua Huang
Comment 19
2009-07-03 08:21:46 PDT
(In reply to
comment #18
)
> I would encourage you to set a breakpoint in detectTextEncoding, because I > believe your use-case should work fine if the gtk build has encoding detection > wired up correctly.
Hi, I use TextResourceDecoder instead of simple detectTextEncoding. this's the code:
> String html; > html = TextResourceDecoder::create("text/html", "UTF-8", true)->decode(reinterpret_cast<char*>(data->data), data->length);
it works fine.
Jiahua Huang
Comment 20
2009-07-03 08:33:33 PDT
Created
attachment 32238
[details]
use TextResourceDecoder instead of simple detectTextEncoding change to this code:
> String html; > RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create("text/html", "UTF-8", true); > html = decoder->decode(reinterpret_cast<char*>(data->data), data->length);
it works. GtkSelectionData (html) from GTK clipboard can be some forms: 1. From other WebkitGtk page "text/html", 8bits, utf-8, no BOMs 2. From FireFox page "text/html", 8bits (wrong for firefox), utf-16, BOMs 3. From Pidgin messageview "text/html", 16bits, BOMs 4. From other Gtk App who use "text/html" when copy-paste "text/html", 8bits, utf-8, no BOMs ... it use TextResourceDecoder to auto detect encoding.
Jiahua Huang
Comment 21
2009-07-03 09:59:46 PDT
Created
attachment 32239
[details]
change mimeType to decrease the cost Use "text/plain" mimeType instead of "text/html" in TextResourceDecoder, it will decrease the checking cost.
Jan Alonzo
Comment 22
2009-07-04 14:55:43 PDT
(In reply to
comment #21
)
> Created an attachment (id=32239) [details] > change mimeType to decrease the cost > > Use "text/plain" mimeType instead of "text/html" in TextResourceDecoder, > it will decrease the checking cost.
Hi Jiahua. Are you able to add manual-test for this? That seems to be a missing piece in your patch. Thanks!
Jiahua Huang
Comment 23
2009-07-04 16:20:41 PDT
Created
attachment 32264
[details]
add a manual test in WebCore/manual-tests/gtk (In reply to
comment #22
)
> Hi Jiahua. Are you able to add manual-test for this? That seems to be a missing > piece in your patch. > > Thanks!
Hi Jan, WebCore/manual-tests/gtk/paste-richtext-from-firefox.html has been added.
Jan Alonzo
Comment 24
2009-07-07 05:17:10 PDT
Comment on
attachment 32264
[details]
add a manual test in WebCore/manual-tests/gtk
> - String html = String::fromUTF8(reinterpret_cast<gchar*>(data->data), data->length * data->format / 8); > + String html; > + RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create("text/plain", "UTF-8", true); > + html = decoder->decode(reinterpret_cast<char*>(data->data), data->length); > + html += decoder->flush(); > gtk_selection_data_free(data);
Declaration and the first assignment can be in the same line here. Looks fine and the patch works great! r=me.
Jiahua Huang
Comment 25
2009-07-07 05:44:29 PDT
(In reply to
comment #24
)
> (From update of
attachment 32264
[details]
) > > - String html = String::fromUTF8(reinterpret_cast<gchar*>(data->data), data->length * data->format / 8); > > + String html; > > + RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create("text/plain", "UTF-8", true); > > + html = decoder->decode(reinterpret_cast<char*>(data->data), data->length); > > + html += decoder->flush(); > > gtk_selection_data_free(data); > > Declaration and the first assignment can be in the same line here. > > Looks fine and the patch works great! r=me.
Thank you, Jan, I appreciate your comments. I observ'd also, when I select some html content, and Copy them use Ctrl-C, the Pasteboard/GtkClipboard can't handle the "text/html" target. I found that the "text/html" target only in X clipboard (GDK_SELECTION_CLIPBOARD), but not Desktop clipboard (GDK_SELECTION_PRIMARY). So we can't use Ctrl-C to Copy rich text from WebkitGtk. I use this Change:
> Index: WebCore/platform/gtk/PasteboardGtk.cpp > =================================================================== > --- WebCore/platform/gtk/PasteboardGtk.cpp (revision 45546) > +++ WebCore/platform/gtk/PasteboardGtk.cpp (working copy) > @@ -37,8 +38,8 @@ namespace WebCore { > /* FIXME: we must get rid of this and use the enum in webkitwebview.h someway */ > typedef enum > { > - WEBKIT_WEB_VIEW_TARGET_INFO_HTML = - 1, > - WEBKIT_WEB_VIEW_TARGET_INFO_TEXT = - 2 > + WEBKIT_WEB_VIEW_TARGET_INFO_HTML, > + WEBKIT_WEB_VIEW_TARGET_INFO_TEXT > } WebKitWebViewTargetInfo; >
it works. it can use Ctrl-C to Copy rich text from WebkitGtk page now. but I'm not sure if it will be a new patch. Jan, should fix it when landing, or should I create a new patch with ChangeLog, or open a new Bug? Regards
Jan Alonzo
Comment 26
2009-07-07 05:59:05 PDT
> > Looks fine and the patch works great! r=me.
Landed in
r45592
.
> > Index: WebCore/platform/gtk/PasteboardGtk.cpp > > =================================================================== > > --- WebCore/platform/gtk/PasteboardGtk.cpp (revision 45546) > > +++ WebCore/platform/gtk/PasteboardGtk.cpp (working copy) > > @@ -37,8 +38,8 @@ namespace WebCore { > > /* FIXME: we must get rid of this and use the enum in webkitwebview.h someway */ > > typedef enum > > { > > - WEBKIT_WEB_VIEW_TARGET_INFO_HTML = - 1, > > - WEBKIT_WEB_VIEW_TARGET_INFO_TEXT = - 2 > > + WEBKIT_WEB_VIEW_TARGET_INFO_HTML, > > + WEBKIT_WEB_VIEW_TARGET_INFO_TEXT > > } WebKitWebViewTargetInfo; > > > > it works. > it can use Ctrl-C to Copy rich text from WebkitGtk page now. > > > but I'm not sure if it will be a new patch.
Please file a new bug with the problem that you just encountered (including your patch too). Please CC me on the bug as well. Thanks
Jiahua Huang
Comment 27
2009-07-07 06:45:21 PDT
(In reply to
comment #26
)
> Please file a new bug with the problem that you just encountered (including > your patch too). Please CC me on the bug as well. > > Thanks
Well, I see, wait for a moment。 Thanks. Should I mark this bug as CLOSED.
Jiahua Huang
Comment 28
2009-07-07 07:24:48 PDT
(In reply to
comment #26
) Hi, Jan, I had file a new bug on
https://bugs.webkit.org/show_bug.cgi?id=27028
and CC you on the bug. but when I submitted for
bug 27028
, it say:
> Excluding: > <your email>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug