Bug 50843

Summary:	gtk-ews having trouble with non-ascii characters
Product:	WebKit	Reporter:	Adam Barth <abarth>
Component:	Tools / Tests	Assignee:	Nobody <webkit-unassigned>
Status:	NEW
Severity:	Normal	CC:	eric, leandro, mrobinson
Priority:	P2
Version:	528+ (Nightly build)
Hardware:	All
OS:	All
Bug Depends on:
Bug Blocks:	63452

Adam Barth

Reported 2010-12-10 13:57:47 PST

../../JavaScriptCore/wtf/TCPageMap.h: In function ‘size_t WTF::fastMallocSize(const void*)’: Traceback (most recent call last): File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/bot/queueengine.py", line 108, in run if not self._delegate.process_work_item(work_item): File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/queues.py", line 362, in process_work_item if not self.review_patch(patch): File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/earlywarningsystem.py", line 92, in review_patch if not self._can_build(): File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/earlywarningsystem.py", line 53, in _can_build "--no-update"]) File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/tool/commands/queues.py", line 96, in run_webkit_patch return self._tool.executive.run_and_throw_if_fail(webkit_patch_args) File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/executive.py", line 141, in run_and_throw_if_fail exit_code = self._run_command_with_teed_output(args, child_stdout) File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/executive.py", line 126, in _run_command_with_teed_output teed_output.write(output_line) File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/deprecated_logging.py", line 55, in write file.write(bytes) File "/mnt/git/webkit-gtk-ews/WebKitTools/Scripts/webkitpy/common/system/deprecated_logging.py", line 55, in write file.write(bytes) File "/usr/lib/python2.6/codecs.py", line 691, in write return self.writer.write(data) File "/usr/lib/python2.6/codecs.py", line 351, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)

Attachments
Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2010-12-10 14:05:28 PST

I'm not sure how to solve this. I remember explicitly moving tee() to operate on bytes instead of unicode strings long ago. This seems to suggest that the logging module is using a codecs.open'd log file and trying to decode the byte stream we're sending to it.

Eric Seidel (no email)

Comment 2 2010-12-10 14:07:39 PST

Maybe it doesn't make sense to write bytes to std out?

Leandro Pereira

Comment 3 2010-12-14 09:49:44 PST

(In reply to comment #0) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: > ordinal not in range(128) BTW, having the same problem with EFL-EWS. (In reply to comment #2) > Maybe it doesn't make sense to write bytes to std out? Maybe only saving to a log file and printing the file name would help? Less noise on EWS output, and debuggable whenever needed.

Eric Seidel (no email)

Comment 4 2010-12-14 11:53:45 PST

We haven't yet been able to produce a minimal reduction. However we used: setenv LANG en_US.US-ASCII to work around the issue on the gtk-ews for the moment.

Eric Seidel (no email)

Comment 5 2010-12-14 11:54:07 PST

gcc seems to like to print fancy quotes in recent versions

Leandro Pereira

Comment 6 2010-12-14 12:24:07 PST

(In reply to comment #4) > However we used: > setenv LANG en_US.US-ASCII > to work around the issue on the gtk-ews for the moment. Using the same workaround on EFL-EWS. Seems it's working.

Eric Seidel (no email)

Comment 7 2011-06-20 17:12:34 PDT

I'm struggling to reproduce this with a minimal example. I'm not sure how we're hitting this. I could see we might hit a decoding error with run_and_throw_if_fail(cmd, silent=True), because /dev/null is opened w/o any encoding. But I don't see how we hit this case. What stream are we opening with encoding of ascii? The logging stream? Maybe the python on that system doesn't correctly default to utf8? What's the lang value before we override it to US-ASCII?

Martin Robinson

Comment 8 2011-06-20 20:18:31 PDT

(In reply to comment #7) > But I don't see how we hit this case. What stream are we opening with encoding of ascii? The logging stream? Maybe the python on that system doesn't correctly default to utf8? Looks like you can figure out the default encoding in Python by running this in the REPL: import sys sys.getdefaultencoding()

Eric Seidel (no email)

Comment 9 2011-06-20 22:47:52 PDT

On both my mac and on linux, sys.getdefaultencoding() returns 'ascii'. It does seem like that must be the encoding we're hitting. The question is what file is opened with default encoding? I assume it must be stderr/stdout. But why?

Eric Seidel (no email)

Comment 10 2011-06-27 10:16:45 PDT

If I were able to reproduce this this would be easy to fix. But I failed to make a reduced python script on our EC2 bots. I could probably just edit a webkit file and wait for a whole build to fail, but I was too lazy to try that.

Note You need to log in before you can comment on or make changes to this bug.