Bug 46892 - webkitpy.common.system.executive_unittest.ExecutiveTest.test_run_command_with_unicode fails on Windows
Summary: webkitpy.common.system.executive_unittest.ExecutiveTest.test_run_command_with...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Windows XP
: P2 Normal
Assignee: Nobody
URL: http://build.webkit.org/builders/Wind...
Keywords: InRadar, PlatformOnly
Depends on:
Blocks:
 
Reported: 2010-09-30 06:12 PDT by Adam Roben (:aroben)
Modified: 2010-10-01 14:07 PDT (History)
3 users (show)

See Also:


Attachments
Encode Executive command arguments using UTF-8 on Cygwin (3.44 KB, patch)
2010-10-01 12:26 PDT, Adam Roben (:aroben)
abarth: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Roben (:aroben) 2010-09-30 06:12:33 PDT
Here's the failure:


ERROR: Validate that it is safe to pass unicode() objects
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/slave/win-debug-tests/build/WebKitTools/Scripts/webkitpy/common/system/executive_unittest.py", line 63, in test_run_command_with_unicode
    output = executive.run_command(["echo", "-n", unicode_tor])
  File "/home/buildbot/slave/win-debug-tests/build/WebKitTools/Scripts/webkitpy/common/system/executive.py", line 292, in run_command
    close_fds=self._should_close_fds())
  File "/usr/lib/python2.5/subprocess.py", line 594, in __init__
    errread, errwrite)
  File "/usr/lib/python2.5/subprocess.py", line 1091, in _execute_child
    raise child_exception
TypeError: execv() arg 2 must contain only strings
Comment 1 Adam Roben (:aroben) 2010-09-30 06:13:42 PDT
<rdar://problem/8496639>
Comment 2 Adam Roben (:aroben) 2010-09-30 06:31:10 PDT
I'm not really sure what to do with this one. Seems likely to be a limitation of Cygwin Python.
Comment 3 Adam Barth 2010-09-30 10:25:44 PDT
Maybe we need to encode in UTF8?  Sounds like a question for Eric.
Comment 4 Eric Seidel (no email) 2010-09-30 10:47:02 PDT
Looks like we'll need to encode the args for windows.

See:

        args = map(unicode, args)  # Popen will throw an exception if args are non-strings (like int())
in POpen.run_command

I don't know what encoding cygwin would want things passed to it on the command line, but utf8 sounds like a good try.

Just add an if block after that line which is:

# cygwin's os.execv in YOUR_PYTHON_VERSION doesn't seem to like unicode strings.
if sys.platform == "cygwin":
    args = [arg.encode("utf8") for arg in args]


If this is version specific, we'll need to add a version check there too.

I've long thought we should consider making a POpen subclass which we use everywhere to do fixups like this.  Maybe now is the time.  Popen.__init__ should be able to take unicode strings.  It has to for python 3.0 for sure!
Comment 5 Eric Seidel (no email) 2010-09-30 10:47:55 PDT
Actually, my comment is subtly wrong.  str is not really a "string".  It was renamed to bytearray() in python 3.x.  unicode is the one true string in any version of python.
Comment 6 Adam Roben (:aroben) 2010-09-30 13:53:11 PDT
(In reply to comment #4)
> Looks like we'll need to encode the args for windows.
> 
> See:
> 
>         args = map(unicode, args)  # Popen will throw an exception if args are non-strings (like int())
> in POpen.run_command
> 
> I don't know what encoding cygwin would want things passed to it on the command line, but utf8 sounds like a good try.

It looks like Cygwin ends up calling mbstowcs, which uses the current code page. I wonder how we can get access to that from Python...
Comment 7 Adam Roben (:aroben) 2010-10-01 06:57:45 PDT
(In reply to comment #6)
> (In reply to comment #4)
> > Looks like we'll need to encode the args for windows.
> > 
> > See:
> > 
> >         args = map(unicode, args)  # Popen will throw an exception if args are non-strings (like int())
> > in POpen.run_command
> > 
> > I don't know what encoding cygwin would want things passed to it on the command line, but utf8 sounds like a good try.
> 
> It looks like Cygwin ends up calling mbstowcs, which uses the current code page. I wonder how we can get access to that from Python...

Even if we were to do this, we'd run into the problem of code pages not being able to encode certain characters. For instance, the code page on my machine is 437, which corresponds to the cp437 codec in Python. It is not able to handle the strings in the failing test.
Comment 8 Adam Roben (:aroben) 2010-10-01 06:58:59 PDT
So I think we've hit a limitation of Cygwin here. Cygwin is not using UTF-16 everywhere like Windows normally does, and thus is not able to handle characters that don't have a representation in the current code page.
Comment 9 Eric Seidel (no email) 2010-10-01 09:03:07 PDT
We should just encode with replacement characters then.  Or some sort of reasonable fallback.  And have the test expect said reasonable fallback on cygwin.
Comment 10 Adam Roben (:aroben) 2010-10-01 12:26:07 PDT
Created attachment 69501 [details]
Encode Executive command arguments using UTF-8 on Cygwin
Comment 11 Adam Barth 2010-10-01 12:31:48 PDT
Comment on attachment 69501 [details]
Encode Executive command arguments using UTF-8 on Cygwin

Sorry abou the copy/paste.  It's a long standing problem that we need to unify these two codepaths in executive.
Comment 12 Adam Roben (:aroben) 2010-10-01 12:43:37 PDT
Committed r68913: <http://trac.webkit.org/changeset/68913>
Comment 13 WebKit Review Bot 2010-10-01 14:07:34 PDT
http://trac.webkit.org/changeset/68913 might have broken GTK Linux 32-bit Debug