Bug 75394 - “Formatted Diff” view mangles non-ASCII characters
Summary: “Formatted Diff” view mangles non-ASCII characters
Status: REOPENED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL: https://bugs.webkit.org/attachment.cg...
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-30 12:21 PST by mitz
Modified: 2016-07-10 22:43 PDT (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mitz 2011-12-30 12:21:35 PST
To reproduce, compare <https://bugs.webkit.org/attachment.cgi?id=120812&action=prettypatch> to <https://bugs.webkit.org/attachment.cgi?id=120812>. Notice that every apostrophe in the latter appears as â in the former.

The plain diff is encoded as and served as UTF-8, with the apostrophe encoded as E2 80 99. In the formatted diff, each on of those bytes appears to have been interpreted as Latin-1 and then encoded as UTF-8, resulting in the sequence C3 A2 C2 80 C2 99.
Comment 1 Kent Tamura 2012-03-21 23:22:06 PDT
Bugzilla.pm:
> sub init_page {
>     (binmode STDOUT, ':utf8') if Bugzilla->params->{'utf8'};

attachment.cgi sub prettyPatch:
>    open2(\*OUT, \*IN, "/usr/bin/ruby", "-I", "PrettyPatch", "PrettyPatch/prettify.rb", "--html-exceptions");
>    $ENV{'PATH'} = $orig_path;
>    print IN $attachment->data;
>    close(IN);
>    while (<OUT>) {
>        print;
>    }
>    close(OUT);

I guess OUT works as binary, and "print" converts a binary line to UTF-8 because of binmode.
Probably utf8::decode($_); before print fixes this?
Comment 2 Martin Robinson 2013-07-09 10:07:56 PDT

*** This bug has been marked as a duplicate of bug 45760 ***
Comment 3 mitz 2013-09-12 00:10:28 PDT
This is not fixed.