RESOLVED FIXED Bug 27262
Chromium: HTML exported isn't marked as being UTF-8
https://bugs.webkit.org/show_bug.cgi?id=27262
Summary Chromium: HTML exported isn't marked as being UTF-8
Avi Drissman
Reported 2009-07-14 08:13:17 PDT
When exporting HTML for the clipboard or drag/drop, the charset isn't indicated. The Windows clipboard format is explicitly documented as being UTF-8, and all Linux apps assume UTF-8. On the Mac, though, unless otherwise indicated, ISO/IEC 8859-1 is assumed, which is wrong.
Attachments
Patch to mark clipboard HTML as UTF-8 (2.19 KB, patch)
2009-07-14 08:14 PDT, Avi Drissman
fishd: review+
Links to the bug now; no other changes (2.25 KB, patch)
2009-07-14 09:41 PDT, Avi Drissman
fishd: review-
New version; addresses jshin's comments (2.25 KB, patch)
2009-07-14 11:33 PDT, Avi Drissman
fishd: review+
Avi Drissman
Comment 1 2009-07-14 08:14:52 PDT
Created attachment 32713 [details] Patch to mark clipboard HTML as UTF-8 This is corresponding to http://codereview.chromium.org/149414
Darin Fisher (:fishd, Google)
Comment 2 2009-07-14 09:38:26 PDT
Comment on attachment 32713 [details] Patch to mark clipboard HTML as UTF-8 > Index: WebCore/ChangeLog ... > +2009-07-14 Avi Drissman <avi@chromium.org> > + > + Reviewed by NOBODY (OOPS!). > + > + Explicitly mark the HTML generated for the Mac as being UTF-8 encoded. > + The Windows clipboard format is explicitly documented as being UTF-8, > + and all Linux apps assume UTF-8. On the Mac, though, unless otherwise > + indicated, ISO/IEC 8859-1 is assumed, which is wrong. nit: Your ChangeLog should include a link to this bug. Otherwise, R=me
Avi Drissman
Comment 3 2009-07-14 09:41:28 PDT
Created attachment 32718 [details] Links to the bug now; no other changes
Jungshik Shin
Comment 4 2009-07-14 10:23:50 PDT
nit: a bit of change in the comment and the bug description is necessary. Judging from the way it's broken without your patch, what's assumed is not ISO-8859-1 nor MacRoman but windows-1252 (it's a bit odd to see that on Mac OS X :-)). For instance, U+2018 (Left Single Quotation Mark) whose UTF-8 representation is "0xE2, 0x80, 0x98" is converted to "U+00E2, U+20AC, U+02DC". If it's interpreted as ISO-8859-1, it would be converted to "U+00E2, U+0080, U+0098".
Darin Fisher (:fishd, Google)
Comment 5 2009-07-14 10:48:44 PDT
Comment on attachment 32718 [details] Links to the bug now; no other changes r- for revised changelog per feedback from jshin. i'll commit the next patch. -darin
Avi Drissman
Comment 6 2009-07-14 11:33:52 PDT
Created attachment 32724 [details] New version; addresses jshin's comments
Darin Fisher (:fishd, Google)
Comment 7 2009-07-14 15:51:14 PDT
Landed as: http://trac.webkit.org/changeset/45878 (The patch didn't apply cleanly... hand-editing in the ChangeLog portion of the diff?)
Avi Drissman
Comment 8 2009-07-14 15:54:42 PDT
(In reply to comment #7) > (The patch didn't apply cleanly... hand-editing in the ChangeLog portion of the > diff?) Yes, that's the precise reason. Bad me; I'll not do that next time.
Note You need to log in before you can comment on or make changes to this bug.