Bug 50843

Summary: gtk-ews having trouble with non-ascii characters
Product: WebKit Reporter: Adam Barth <abarth>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: eric, leandro, mrobinson
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 63452    

Description Adam Barth 2010-12-10 13:57:47 PST
../../JavaScriptCore/wtf/TCPageMap.h: In function ‘size_t WTF::fastMallocSize(const void*)’:
Traceback (most recent call last):
  File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/bot/queueengine.py", line 108, in run
    if not self._delegate.process_work_item(work_item):
  File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/queues.py", line 362, in process_work_item
    if not self.review_patch(patch):
  File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/earlywarningsystem.py", line 92, in review_patch
    if not self._can_build():
  File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/earlywarningsystem.py", line 53, in _can_build
    "--no-update"])
  File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/queues.py", line 96, in run_webkit_patch
    return self._tool.executive.run_and_throw_if_fail(webkit_patch_args)
  File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/executive.py", line 141, in run_and_throw_if_fail
    exit_code = self._run_command_with_teed_output(args, child_stdout)
  File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/executive.py", line 126, in _run_command_with_teed_output
    teed_output.write(output_line)
  File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/deprecated_logging.py", line 55, in write
    file.write(bytes)
  File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/deprecated_logging.py", line 55, in write
    file.write(bytes)
  File "/usr/lib/python2.6/codecs.py", line 691, in write
    return self.writer.write(data)
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)
Comment 1 Eric Seidel (no email) 2010-12-10 14:05:28 PST
I'm not sure how to solve this.  I remember explicitly moving tee() to operate on bytes instead of unicode strings long ago.  This seems to suggest that the logging module is using a codecs.open'd log file and trying to decode the byte stream we're sending to it.
Comment 2 Eric Seidel (no email) 2010-12-10 14:07:39 PST
Maybe it doesn't make sense to write bytes to std out?
Comment 3 Leandro Pereira 2010-12-14 09:49:44 PST
(In reply to comment #0)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: 
> ordinal not in range(128)

BTW, having the same problem with EFL-EWS.

(In reply to comment #2)
> Maybe it doesn't make sense to write bytes to std out?

Maybe only saving to a log file and printing the file name would help? Less noise on EWS output, and debuggable whenever needed.
Comment 4 Eric Seidel (no email) 2010-12-14 11:53:45 PST
We haven't yet been able to produce a minimal reduction.

However we used:
setenv LANG en_US.US-ASCII
to work around the issue on the gtk-ews for the moment.
Comment 5 Eric Seidel (no email) 2010-12-14 11:54:07 PST
gcc seems to like to print fancy quotes in recent versions
Comment 6 Leandro Pereira 2010-12-14 12:24:07 PST
(In reply to comment #4)
> However we used:
> setenv LANG en_US.US-ASCII
> to work around the issue on the gtk-ews for the moment.

Using the same workaround on EFL-EWS. Seems it's working.
Comment 7 Eric Seidel (no email) 2011-06-20 17:12:34 PDT
I'm struggling to reproduce this with a minimal example.  I'm not sure how we're hitting this.

I could see we might hit a decoding error with run_and_throw_if_fail(cmd, silent=True), because /dev/null is opened w/o any encoding.

But I don't see how we hit this case.  What stream are we opening with encoding of ascii?  The logging stream?  Maybe the python on that system doesn't correctly default to utf8?

What's the lang value before we override it to US-ASCII?
Comment 8 Martin Robinson 2011-06-20 20:18:31 PDT
(In reply to comment #7)
> But I don't see how we hit this case.  What stream are we opening with encoding of ascii?  The logging stream?  Maybe the python on that system doesn't correctly default to utf8?

Looks like you can figure out the default encoding in Python by running this in the REPL:

import sys
sys.getdefaultencoding()
Comment 9 Eric Seidel (no email) 2011-06-20 22:47:52 PDT
On both my mac and on linux, sys.getdefaultencoding() returns 'ascii'.
It does seem like that must be the encoding we're hitting. The question is what file is opened with default encoding?  I assume it must be stderr/stdout.  But why?
Comment 10 Eric Seidel (no email) 2011-06-27 10:16:45 PDT
If I were able to reproduce this this would be easy to fix.  But I failed to make a reduced python script on our EC2 bots.  I could probably just edit a webkit file and wait for a whole build to fail, but I was too lazy to try that.