Bug 100384 - Webkit 1.10.1 shows "question mark" (?) instead utf8 char
Summary: Webkit 1.10.1 shows "question mark" (?) instead utf8 char
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-25 08:45 PDT by Dâniel Fraga
Modified: 2012-12-11 08:48 PST (History)
1 user (show)

See Also:


Attachments
Wrong chars (utf8 problem?) (65.56 KB, image/png)
2012-10-25 08:45 PDT, Dâniel Fraga
no flags Details
Sample youtube e-mail (10.58 KB, text/plain)
2012-10-30 17:45 PDT, Dâniel Fraga
no flags Details
Midori output (69.22 KB, image/png)
2012-10-30 18:57 PDT, Dâniel Fraga
no flags Details
Sample youtube notification (8.28 KB, text/plain)
2012-11-03 15:55 PDT, Dâniel Fraga
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dâniel Fraga 2012-10-25 08:45:29 PDT
Created attachment 170661 [details]
Wrong chars (utf8 problem?)

After youtube changed the e-mail format form the comments received, I can't seeany chars with accents (for example: é, á, í etc).

I use Fancy plugin for Claws-mail which uses webkit to show html e-mail. The Fancy plugin author said it was a webkit bug.

This only happens with e-mails from youtube comments. Here it's the e-mail content:

    <html lang="pt">

  <head>
    <title>
      Comentário postado sobre "Ração MAIS CARA para cães e gatos"

    </title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>

  <body>
    <table width="620" cellspacing="0" cellpadding="0" border="0" align="center"><tr><td bgcolor="#F0F0F0">
      <table width="578" cellspacing="0" cellpadding="0" border="0" align="center">
        <tr>
          <td height="16"></td>
        </tr>
        <tr>
          <td>
            <img src="http://s.ytimg.com/yt/img/email/digest/email_header.png">
          </td>
        </tr>
        <tr>
          <td height="16"></td>
        </tr>

        <tr>
          <td align="left" bgcolor="#FFFFFF">
            <div style="border-style:solid; border-width:1px; border-color:#CCCCCC;">
              <table width="578" cellspacing="0" cellpadding="0" border="0" align="center">
                <tr>
                  <td height="22" colspan="3"></td>
                </tr>

                <tr>
                  <td width="40"></td>
                  <td width="498">
                    <div style="
  font-family:arial,Arial,sans-serif;
">  
                                <table cellspacing="0" cellpadding="0" border="0">
    <tr>
        <td bgcolor="#FFFFFF" align="left" width="50">
          <img src="https://lh4.googleusercontent.com/-0raL6fsqLd8/AAAAAAAAAAI/AAAAAAAAAAA/-kee1yVLSUM/s28-c-k/photo.jpg" height="50" width="50">
        </td>
        <td width="16"></td>

      <td>
        <div style="
  font-family:arial,Arial,sans-serif; font-size:18px; color:#333333; line-height:24px;
" height:"59" dir="ltr">
          

<a href="http://www.youtube.com/user/iaralice?feature=em-comment_received" style="text-decoration:none; color:#1C62B9;">Iara Alice Raymundo</a> fez um comentário sobre <a href="http://www.youtube.com/watch?v=nSikPg7XXgk&lc=d7sgBrAnVdmfpXDPp640qKEngjBPWwpuXfEPY5omtUs&lch=email&feature=em-comment_received" style="text-decoration:none; color:#1C62B9;" dir="ltr">Ração MAIS CARA para cães e gatos</a>

        </div>
      </td>
    </tr>
  </table>


      <table cellspacing="0" cellpadding="0" border="0">
    <tr>
      <td width="498">
        <div style="font-family:arial,Arial,sans-serif; font-size:13px; color:#333333; line-height:16px;" dir="ltr">
          
  <div style="
  font-family:arial,Arial,sans-serif; font-size:11px; color:#999999; line-height:14px;
">  
Para responder a este comentário, <a href="http://www.youtube.com/watch?v=nSikPg7XXgk&lcor=1&lc=d7sgBrAnVdmfpXDPp640qKEngjBPWwpuXfEPY5omtUs&lch=email&feature=em
Comment 1 Dâniel Fraga 2012-10-25 08:45:59 PDT
This also happens with 1.8.3 and 1.8.1 versions of webkit gtk.
Comment 2 Martin Robinson 2012-10-30 11:37:27 PDT
I saved the email you included in your comment and cannot reproduce this issue with GtkLauncher or Epiphany. What version of WebKitGTK+ are you using?
Comment 3 Martin Robinson 2012-10-30 11:37:45 PDT
(In reply to comment #2)
> I saved the email you included in your comment and cannot reproduce this issue with GtkLauncher or Epiphany. What version of WebKitGTK+ are you using?

Ah, sorry. I see that you already included that information.
Comment 4 Martin Robinson 2012-10-30 11:44:08 PDT
(In reply to comment #3)
> (In reply to comment #2)
> > I saved the email you included in your comment and cannot reproduce this issue with GtkLauncher or Epiphany. What version of WebKitGTK+ are you using?
> 
> Ah, sorry. I see that you already included that information.

Fancy seems to be trying to override the default encoding of the email:

g_object_set(viewer->settings, "default-encoding", charset, NULL);

from fancy_viewer.c line 141. It doesn't seem like that should cause an issue, but I wonder in your case what encoding is being used here. If I knew, I could test locally.

Right before that line is a line like this:

debug_print("using %s charset\n", charset);

Maybe you can try to get that output.
Comment 5 Dâniel Fraga 2012-10-30 12:50:49 PDT
> Fancy seems to be trying to override the default encoding of the email:
> 
> g_object_set(viewer->settings, "default-encoding", charset, NULL);
> 
> from fancy_viewer.c line 141. It doesn't seem like that should cause an issue, but I wonder in your case what encoding is being used here. If I knew, I could test locally.
> 
> Right before that line is a line like this:
> 
> debug_print("using %s charset\n", charset);
> 
> Maybe you can try to get that output.

Hi Martin! Thanks for the reply. Fancy returns the following:

fancy_viewer.c:141:using windows-1252 charset

so maybe this is the problem right? So it's Fancy's fault?
Comment 6 Martin Robinson 2012-10-30 16:59:33 PDT
I can't seem to reproduce the problem just by setting the default-encoding alone. Are you sure that the data that Fancy is interpreting has the content-type meta tag? Is Claws or Fancy stripping it or not inserting it in some cases? Can you try opening this file in Epiphany or Midori on your computer?
Comment 7 Dâniel Fraga 2012-10-30 17:45:12 PDT
(In reply to comment #6)
> I can't seem to reproduce the problem just by setting the default-encoding alone. Are you sure that the data that Fancy is interpreting has the content-type meta tag? Is Claws or Fancy stripping it or not inserting it in some cases? Can you try opening this file in Epiphany or Midori on your computer?

Hi Martin, I'm waiting Fancy developer answer. I don't use Epiphany nor Midori, nor gnome. To install Epiphany I need gnome? It would be overkill.

But I'm attaching the complete e-mail from youtube. If you can't reproduce there, then it's surely a Fancy bug.

The attachment is youtube-message-utf8.txt
Comment 8 Dâniel Fraga 2012-10-30 17:45:43 PDT
Created attachment 171559 [details]
Sample youtube e-mail
Comment 9 Martin Robinson 2012-10-30 18:12:27 PDT
(In reply to comment #8)
> Created an attachment (id=171559) [details]
> Sample youtube e-mail

Epiphany has some Gnome dependencies, but Midori has very few.
Comment 10 Dâniel Fraga 2012-10-30 18:57:54 PDT
Created attachment 171565 [details]
Midori output 

Ok, I installed Midori and here's the result: same problem.

So it's a problem with webkit right?
Comment 11 Martin Robinson 2012-10-31 15:09:38 PDT
(In reply to comment #10)
> Created an attachment (id=171565) [details]
> Midori output 
> 
> Ok, I installed Midori and here's the result: same problem.
> 
> So it's a problem with webkit right?

Perhaps it's something to do with your environment settings?
Comment 12 Dâniel Fraga 2012-10-31 15:19:03 PDT
(In reply to comment #11)

> Perhaps it's something to do with your environment settings?

Hmm maybe, but what settings? I have, for example:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

***

Is there any other environment variable I should check?
Comment 13 Dâniel Fraga 2012-11-03 00:55:50 PDT
(In reply to comment #11)
> Perhaps it's something to do with your environment settings?

Martin, are you the maintainer of webkit gtk?

Is there any other place I can ask about this?

This seems very difficult to solve :(
Comment 14 Dâniel Fraga 2012-11-03 15:55:14 PDT
Created attachment 172232 [details]
Sample youtube notification

Hi, I think I found the problem (please see the attachment):

    <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8=
">

The youtube notification sends the e-mail with a broken line, so webkit gtk 
gets confused about this tag. If I change it to:

    <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8=">

it works. So is it a bug?
Comment 15 Martin Robinson 2012-11-05 08:51:17 PST
(In reply to comment #14)
> Created an attachment (id=172232) [details]
> Sample youtube notification
> 
> Hi, I think I found the problem (please see the attachment):
> 
>     <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8=
> ">
> 
> The youtube notification sends the e-mail with a broken line, so webkit gtk 
> gets confused about this tag. If I change it to:
> 
>     <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8=">
> 
> it works. So is it a bug?

Great find. I'm not sure if with the newline this should be interpreted correctly. You'd probably need to check the HTML5 spec to see what should happen in this case.  Most likely though is that claws is adding the newline though and corrupting the HTML.
Comment 16 Martin Robinson 2012-12-11 02:26:08 PST
I find it very suspicious that the line length is around 80 characters. I'm going to close this bug since it looks like a but in claws.
Comment 17 Dâniel Fraga 2012-12-11 08:08:48 PST
(In reply to comment #16)
> I find it very suspicious that the line length is around 80 characters. I'm going to close this bug since it looks like a but in claws.

But it happens in Midori too :(

Claws developers wrote that it is a bug in Youtube... I'll try again to contact youtube (so difficult).

Thank you!
Comment 18 Martin Robinson 2012-12-11 08:37:07 PST
Did you save the output of the email and then load it in Midori? Are you sure that claws isn't line-wrapping the content before you load it in Midori?
Comment 19 Dâniel Fraga 2012-12-11 08:48:31 PST
(In reply to comment #18)
> Did you save the output of the email and then load it in Midori? Are you sure that claws isn't line-wrapping the content before you load it in Midori?

Hi Martin. Yes, I'm sure. You can confirm it here:

http://www.thewildbeast.co.uk/claws-mail/bugzilla/show_bug.cgi?id=2768

The developer wrote that it is a youtube bug... maybe? Here it's what he wrote:

**********************************

If I change
the head line to:

<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-1=
">

It renders perfectly, as the HTML is in fact ISO, not UTF-8.

Sorry, but I'm afraid this is a Youtube generator bug.

*********************************

So I'm afraid it is a youtube bug. Anyway, thanks, let's hope youtube can fix it.