<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>50843</bug_id>
          
          <creation_ts>2010-12-10 13:57:47 -0800</creation_ts>
          <short_desc>gtk-ews having trouble with non-ascii characters</short_desc>
          <delta_ts>2011-06-27 10:16:45 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Tools / Tests</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          <blocked>63452</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="Adam Barth">abarth</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>eric</cc>
    
    <cc>leandro</cc>
    
    <cc>mrobinson</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>320392</commentid>
    <comment_count>0</comment_count>
    <who name="Adam Barth">abarth</who>
    <bug_when>2010-12-10 13:57:47 -0800</bug_when>
    <thetext>../../JavaScriptCore/wtf/TCPageMap.h: In function ‘size_t WTF::fastMallocSize(const void*)’:
Traceback (most recent call last):
  File &quot;/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/bot/queueengine.py&quot;, line 108, in run
    if not self._delegate.process_work_item(work_item):
  File &quot;/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/queues.py&quot;, line 362, in process_work_item
    if not self.review_patch(patch):
  File &quot;/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/earlywarningsystem.py&quot;, line 92, in review_patch
    if not self._can_build():
  File &quot;/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/earlywarningsystem.py&quot;, line 53, in _can_build
    &quot;--no-update&quot;])
  File &quot;/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/queues.py&quot;, line 96, in run_webkit_patch
    return self._tool.executive.run_and_throw_if_fail(webkit_patch_args)
  File &quot;/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/executive.py&quot;, line 141, in run_and_throw_if_fail
    exit_code = self._run_command_with_teed_output(args, child_stdout)
  File &quot;/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/executive.py&quot;, line 126, in _run_command_with_teed_output
    teed_output.write(output_line)
  File &quot;/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/deprecated_logging.py&quot;, line 55, in write
    file.write(bytes)
  File &quot;/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/deprecated_logging.py&quot;, line 55, in write
    file.write(bytes)
  File &quot;/usr/lib/python2.6/codecs.py&quot;, line 691, in write
    return self.writer.write(data)
  File &quot;/usr/lib/python2.6/codecs.py&quot;, line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: &apos;ascii&apos; codec can&apos;t decode byte 0xe2 in position 50: ordinal not in range(128)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>320400</commentid>
    <comment_count>1</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2010-12-10 14:05:28 -0800</bug_when>
    <thetext>I&apos;m not sure how to solve this.  I remember explicitly moving tee() to operate on bytes instead of unicode strings long ago.  This seems to suggest that the logging module is using a codecs.open&apos;d log file and trying to decode the byte stream we&apos;re sending to it.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>320402</commentid>
    <comment_count>2</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2010-12-10 14:07:39 -0800</bug_when>
    <thetext>Maybe it doesn&apos;t make sense to write bytes to std out?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>322018</commentid>
    <comment_count>3</comment_count>
    <who name="Leandro Pereira">leandro</who>
    <bug_when>2010-12-14 09:49:44 -0800</bug_when>
    <thetext>(In reply to comment #0)
&gt; UnicodeDecodeError: &apos;ascii&apos; codec can&apos;t decode byte 0xe2 in position 50: 
&gt; ordinal not in range(128)

BTW, having the same problem with EFL-EWS.

(In reply to comment #2)
&gt; Maybe it doesn&apos;t make sense to write bytes to std out?

Maybe only saving to a log file and printing the file name would help? Less noise on EWS output, and debuggable whenever needed.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>322085</commentid>
    <comment_count>4</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2010-12-14 11:53:45 -0800</bug_when>
    <thetext>We haven&apos;t yet been able to produce a minimal reduction.

However we used:
setenv LANG en_US.US-ASCII
to work around the issue on the gtk-ews for the moment.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>322086</commentid>
    <comment_count>5</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2010-12-14 11:54:07 -0800</bug_when>
    <thetext>gcc seems to like to print fancy quotes in recent versions</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>322114</commentid>
    <comment_count>6</comment_count>
    <who name="Leandro Pereira">leandro</who>
    <bug_when>2010-12-14 12:24:07 -0800</bug_when>
    <thetext>(In reply to comment #4)
&gt; However we used:
&gt; setenv LANG en_US.US-ASCII
&gt; to work around the issue on the gtk-ews for the moment.

Using the same workaround on EFL-EWS. Seems it&apos;s working.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>424104</commentid>
    <comment_count>7</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2011-06-20 17:12:34 -0700</bug_when>
    <thetext>I&apos;m struggling to reproduce this with a minimal example.  I&apos;m not sure how we&apos;re hitting this.

I could see we might hit a decoding error with run_and_throw_if_fail(cmd, silent=True), because /dev/null is opened w/o any encoding.

But I don&apos;t see how we hit this case.  What stream are we opening with encoding of ascii?  The logging stream?  Maybe the python on that system doesn&apos;t correctly default to utf8?

What&apos;s the lang value before we override it to US-ASCII?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>424217</commentid>
    <comment_count>8</comment_count>
    <who name="Martin Robinson">mrobinson</who>
    <bug_when>2011-06-20 20:18:31 -0700</bug_when>
    <thetext>(In reply to comment #7)
&gt; But I don&apos;t see how we hit this case.  What stream are we opening with encoding of ascii?  The logging stream?  Maybe the python on that system doesn&apos;t correctly default to utf8?

Looks like you can figure out the default encoding in Python by running this in the REPL:

import sys
sys.getdefaultencoding()</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>424271</commentid>
    <comment_count>9</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2011-06-20 22:47:52 -0700</bug_when>
    <thetext>On both my mac and on linux, sys.getdefaultencoding() returns &apos;ascii&apos;.
It does seem like that must be the encoding we&apos;re hitting. The question is what file is opened with default encoding?  I assume it must be stderr/stdout.  But why?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>428045</commentid>
    <comment_count>10</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2011-06-27 10:16:45 -0700</bug_when>
    <thetext>If I were able to reproduce this this would be easy to fix.  But I failed to make a reduced python script on our EC2 bots.  I could probably just edit a webkit file and wait for a whole build to fail, but I was too lazy to try that.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>