Summary: | gtk-ews having trouble with non-ascii characters | ||
---|---|---|---|
Product: | WebKit | Reporter: | Adam Barth <abarth> |
Component: | Tools / Tests | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW --- | ||
Severity: | Normal | CC: | eric, leandro, mrobinson |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | All | ||
OS: | All | ||
Bug Depends on: | |||
Bug Blocks: | 63452 |
Description
Adam Barth
2010-12-10 13:57:47 PST
I'm not sure how to solve this. I remember explicitly moving tee() to operate on bytes instead of unicode strings long ago. This seems to suggest that the logging module is using a codecs.open'd log file and trying to decode the byte stream we're sending to it. Maybe it doesn't make sense to write bytes to std out? (In reply to comment #0) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: > ordinal not in range(128) BTW, having the same problem with EFL-EWS. (In reply to comment #2) > Maybe it doesn't make sense to write bytes to std out? Maybe only saving to a log file and printing the file name would help? Less noise on EWS output, and debuggable whenever needed. We haven't yet been able to produce a minimal reduction. However we used: setenv LANG en_US.US-ASCII to work around the issue on the gtk-ews for the moment. gcc seems to like to print fancy quotes in recent versions (In reply to comment #4) > However we used: > setenv LANG en_US.US-ASCII > to work around the issue on the gtk-ews for the moment. Using the same workaround on EFL-EWS. Seems it's working. I'm struggling to reproduce this with a minimal example. I'm not sure how we're hitting this. I could see we might hit a decoding error with run_and_throw_if_fail(cmd, silent=True), because /dev/null is opened w/o any encoding. But I don't see how we hit this case. What stream are we opening with encoding of ascii? The logging stream? Maybe the python on that system doesn't correctly default to utf8? What's the lang value before we override it to US-ASCII? (In reply to comment #7) > But I don't see how we hit this case. What stream are we opening with encoding of ascii? The logging stream? Maybe the python on that system doesn't correctly default to utf8? Looks like you can figure out the default encoding in Python by running this in the REPL: import sys sys.getdefaultencoding() On both my mac and on linux, sys.getdefaultencoding() returns 'ascii'. It does seem like that must be the encoding we're hitting. The question is what file is opened with default encoding? I assume it must be stderr/stdout. But why? If I were able to reproduce this this would be easy to fix. But I failed to make a reduced python script on our EC2 bots. I could probably just edit a webkit file and wait for a whole build to fail, but I was too lazy to try that. |