Bug 27262 - Chromium: HTML exported isn't marked as being UTF-8
Summary: Chromium: HTML exported isn't marked as being UTF-8
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-14 08:13 PDT by Avi Drissman
Modified: 2009-07-14 15:54 PDT (History)
2 users (show)

See Also:


Attachments
Patch to mark clipboard HTML as UTF-8 (2.19 KB, patch)
2009-07-14 08:14 PDT, Avi Drissman
fishd: review+
Details | Formatted Diff | Diff
Links to the bug now; no other changes (2.25 KB, patch)
2009-07-14 09:41 PDT, Avi Drissman
fishd: review-
Details | Formatted Diff | Diff
New version; addresses jshin's comments (2.25 KB, patch)
2009-07-14 11:33 PDT, Avi Drissman
fishd: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Avi Drissman 2009-07-14 08:13:17 PDT
When exporting HTML for the clipboard or drag/drop, the charset isn't indicated. The Windows clipboard format is explicitly documented as being UTF-8, and all Linux apps assume UTF-8. On the Mac, though, unless otherwise indicated, ISO/IEC 8859-1 is assumed, which is wrong.
Comment 1 Avi Drissman 2009-07-14 08:14:52 PDT
Created attachment 32713 [details]
Patch to mark clipboard HTML as UTF-8

This is corresponding to http://codereview.chromium.org/149414
Comment 2 Darin Fisher (:fishd, Google) 2009-07-14 09:38:26 PDT
Comment on attachment 32713 [details]
Patch to mark clipboard HTML as UTF-8

> Index: WebCore/ChangeLog
...
> +2009-07-14  Avi Drissman  <avi@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Explicitly mark the HTML generated for the Mac as being UTF-8 encoded.
> +        The Windows clipboard format is explicitly documented as being UTF-8,
> +        and all Linux apps assume UTF-8. On the Mac, though, unless otherwise
> +        indicated, ISO/IEC 8859-1 is assumed, which is wrong.

nit: Your ChangeLog should include a link to this bug.

Otherwise, R=me
Comment 3 Avi Drissman 2009-07-14 09:41:28 PDT
Created attachment 32718 [details]
Links to the bug now; no other changes
Comment 4 Jungshik Shin 2009-07-14 10:23:50 PDT
nit: a bit of change in the comment and the bug description is necessary.

Judging from the way it's broken without your patch, what's assumed is not ISO-8859-1 nor MacRoman but windows-1252 (it's a bit odd to see that on Mac OS X :-)). 

For instance, U+2018 (Left Single Quotation Mark) whose UTF-8 representation is "0xE2, 0x80, 0x98" is converted to "U+00E2, U+20AC, U+02DC". If it's interpreted as ISO-8859-1, it would be converted to "U+00E2, U+0080, U+0098".
Comment 5 Darin Fisher (:fishd, Google) 2009-07-14 10:48:44 PDT
Comment on attachment 32718 [details]
Links to the bug now; no other changes

r- for revised changelog per feedback from jshin.  i'll commit the next patch.
-darin
Comment 6 Avi Drissman 2009-07-14 11:33:52 PDT
Created attachment 32724 [details]
New version; addresses jshin's comments
Comment 7 Darin Fisher (:fishd, Google) 2009-07-14 15:51:14 PDT
Landed as: http://trac.webkit.org/changeset/45878

(The patch didn't apply cleanly... hand-editing in the ChangeLog portion of the diff?)
Comment 8 Avi Drissman 2009-07-14 15:54:42 PDT
(In reply to comment #7)
> (The patch didn't apply cleanly... hand-editing in the ChangeLog portion of the
> diff?)

Yes, that's the precise reason. Bad me; I'll not do that next time.