Bug 35163 - autoinstall: should unzip downloaded zip and gzip files prior to importing
Summary: autoinstall: should unzip downloaded zip and gzip files prior to importing
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Chris Jerdonek
URL:
Keywords:
: 35164 (view as bug list)
Depends on: 35584
Blocks: 33639
  Show dependency treegraph
 
Reported: 2010-02-19 10:18 PST by Chris Jerdonek
Modified: 2010-03-31 23:20 PDT (History)
8 users (show)

See Also:


Attachments
Proposed patch (24.45 KB, patch)
2010-02-21 10:58 PST, Chris Jerdonek
cjerdonek: commit-queue-
Details | Formatted Diff | Diff
Proposed patch 2 (24.49 KB, patch)
2010-02-21 11:04 PST, Chris Jerdonek
cjerdonek: commit-queue-
Details | Formatted Diff | Diff
Proposed patch 3 (24.25 KB, patch)
2010-02-21 13:23 PST, Chris Jerdonek
cjerdonek: commit-queue-
Details | Formatted Diff | Diff
Proposed patch 4 (26.49 KB, patch)
2010-02-23 06:09 PST, Chris Jerdonek
cjerdonek: review-
cjerdonek: commit-queue-
Details | Formatted Diff | Diff
Proposed patch 5 (45.13 KB, patch)
2010-03-28 19:40 PDT, Chris Jerdonek
cjerdonek: commit-queue-
Details | Formatted Diff | Diff
Proposed patch 6 (45.97 KB, patch)
2010-03-31 19:16 PDT, Chris Jerdonek
levin: review+
cjerdonek: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Jerdonek 2010-02-19 10:18:59 PST
Autoinstall should probably unzip downloaded zip files prior to importing.

See, for example, the following information from the zipimport documentation:

"Any files may be present in the ZIP archive, but only files .py and .py[co] are available for import.... Note that if an archive only contains .py files, Python will not attempt to modify the archive by adding the corresponding .pyc or .pyo file, meaning that if a ZIP archive doesn’t contain .pyc files, importing may be rather slow."

(from the beginning of http://docs.python.org/library/zipimport.html )

The two packages we currently autoinstall don't seem to have .pyc files.

Also, it seems like autoinstall's code should modify sys.path rather than using zipimport directly, as described in the following example:

http://docs.python.org/library/zipimport.html#examples
Comment 1 Eric Seidel (no email) 2010-02-19 11:44:27 PST
CCing the author of autoinstall.py
Comment 2 Chris Jerdonek 2010-02-19 14:36:41 PST
(In reply to comment #0)
> Also, it seems like autoinstall's code should modify sys.path rather than using
> zipimport directly, as described in the following example:
> 
> http://docs.python.org/library/zipimport.html#examples

Just to expand on this a little bit.  Based on the notes and sample code from the link above, I think the following might be a somewhat more natural flow for autoinstall:

If autoinstall(location) is called from __file__ on a particular URL location, then autoinstall would download the location's contents to an "autoinstalled_packages" directory in the same directory as __file__ (if it hasn't already been downloaded).  It would also unzip the contents if necessary (and unzip and extract the contents in the case of a gzipped tar archive).  The final step would be to add the unzipped and extracted folder to Python's sys.path.

Once this is done, any calls to "import" will now also search all autoinstalled_packages directories.

Some advantages of this approach are that--

(1) it uses Python's usual import process, without having to call zipimport explicitly or using custom import hooks of the type described here:

http://www.python.org/dev/peps/pep-0302/

(2) it allows .pyc files to be generated.

(2) it generalizes to tar files, etc. with no change.

(3) it allows the downloaded packages to be located inside the packages in which they are actually used if we want -- instead of requiring them all to be in a global autoinstall folder.  For example, if the style sub-package of webkitpy needs to autoinstall pep8, then we could put the autoinstall(<url_to_pep8>) line in webkitpy.style's __init__.py.  Then pep8 would get downloaded into the style folder.  Or if we wanted pep8 to be installed in the top-level autoinstall folder, we would simply call autoinstall() from webkitpy's __init__.py.
Comment 3 Eric Seidel (no email) 2010-02-19 14:41:50 PST
I like your proposal Chris.

I chose Daniel's Autoinstall because it was the only thing that looked remotely able to do this sort of auto-download out of the box (and I had no interest in writing my own).

I'm not sure that an autoinstall() call could easily detect the calling __file__, but I agree that that would be preferable.

I also think that autoinstall usage makes a lot of sense from an __init__.py file, since it's nice to set it up once and have it automatically used throughout the module.

It would be very nice if the modified autoinstall could handle gzip and other files yes.
Comment 4 Chris Jerdonek 2010-02-19 14:54:19 PST
(In reply to comment #3)
> I like your proposal Chris.
> 
> I chose Daniel's Autoinstall because it was the only thing that looked remotely
> able to do this sort of auto-download out of the box (and I had no interest in
> writing my own).

I think it's great -- thanks.  It's super compact, too, which helps a lot.

> I'm not sure that an autoinstall() call could easily detect the calling
> __file__, but I agree that that would be preferable.

I think you're right.  I realized this earlier, too -- so that __file__ (or its directory) would probably need to be passed in as a parameter.  Perhaps the autoinstall() method could accept a destination directory as a parameter and default to autoinstall.py's directory if the parameter is not present.

> I also think that autoinstall usage makes a lot of sense from an __init__.py
> file, since it's nice to set it up once and have it automatically used
> throughout the module.

Yes, I agree.  The package was a great find!  And thanks, Daniel!
Comment 5 Chris Jerdonek 2010-02-20 17:54:39 PST
I see that ClientForm is getting auto-installed, but I don't see where it's being imported.  All I see is this comment in bugzilla.py:

        bug_status = self.browser.find_control("bug_status", type="select")
        # This is a hack around the fact that ClientForm.ListControl seems to
        # have no simpler way to ask if a control has an item named "REOPENED"
        # without using exceptions for control flow.
        possible_bug_statuses = map(lambda item: item.name, bug_status.items)

Is ClientForm used anywhere?
Comment 6 Chris Jerdonek 2010-02-21 10:58:18 PST
Created attachment 49151 [details]
Proposed patch

This patch also addresses this report (re: *.tar.gz files for pep8):

https://bugs.webkit.org/show_bug.cgi?id=35164
Comment 7 WebKit Review Bot 2010-02-21 10:59:05 PST
Attachment 49151 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebKitTools/Scripts/webkitpy/__init__.py:18:  Line contains tab character.  [whitespace/tab] [5]
WebKitTools/Scripts/webkitpy/__init__.py:20:  Line contains tab character.  [whitespace/tab] [5]
WebKitTools/Scripts/webkitpy/__init__.py:22:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 3 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Chris Jerdonek 2010-02-21 11:04:34 PST
Created attachment 49153 [details]
Proposed patch 2

Removed tabs.
Comment 9 Chris Jerdonek 2010-02-21 13:23:57 PST
Created attachment 49166 [details]
Proposed patch 3

Minor additions to code comments; modified autoinstall logging to be module-based instead of class-based.
Comment 10 Eric Seidel (no email) 2010-02-22 14:12:07 PST
I think this patch broke the Mac EWS. :(  we should fix the EWS to be more robust about python patches.
Comment 11 Eric Seidel (no email) 2010-02-22 14:15:56 PST
Comment on attachment 49166 [details]
Proposed patch 3

We do lose the ability to have a local copy already in the search path:
 48 from webkitpy.thirdparty.autoinstalled.mechanize import Browser

I'm not sure if that's good or bad.

Yeah... why wouldn't this new autoinstall just modify the search path to include these newly downloaded modules?
Comment 12 Chris Jerdonek 2010-02-22 15:49:50 PST
(In reply to comment #11)
> (From update of attachment 49166 [details])
> We do lose the ability to have a local copy already in the search path:
>  48 from webkitpy.thirdparty.autoinstalled.mechanize import Browser
> 
> I'm not sure if that's good or bad.
> 
> Yeah... why wouldn't this new autoinstall just modify the search path to
> include these newly downloaded modules?

This would be easy to add.  But if we want to do this, we may want to consider doing it outside of autoinstall.py, or at least not force it upon the caller.

It seems preferable to me because you can always add something to sys.path, but it's harder to remove.  For example, we could add--

thirdparty/autoinstalled

to sys.path inside webkitpy's __init__.py (though it would seem slightly odd for thirdparty/autoinstalled to be in the path and not thirdparty).

In general, it seems better to me to be explicit about where imports are coming from when possible because then there's less ambiguity, etc.  Modifying the search path seems a bit more hackish.  See for example--

https://bugs.webkit.org/show_bug.cgi?id=32971#c6

It seems like it should be relied upon more when there isn't really a good alternative -- e.g. when the code is in an external directory, in a place you don't have control over, or in some other awkward location (e.g. autoinstall.cache.d).  But now that we have control over the location, the technique isn't necessary.

In the extreme, we could add all of our directories to sys.path in __init__.py, and then we wouldn't have to know the location of any of our modules when importing. :) But that's obviously not good practice.

In terms of updating the import statements:

In the patch below, I experimented with an approach to addressing intra-package references in check-webkit-style:

https://bugs.webkit.org/show_bug.cgi?id=33791

I took that approach precisely to avoid having to do import search and replaces of the type found in the patch I'm submitting here (though I later found that PEP8 discourages relative imports in all cases in favor of absolute package path imports).

I'll think further about adding support for modifying sys.path in this patch, though I have mixed feelings about using that feature here.
Comment 13 Chris Jerdonek 2010-02-23 06:09:41 PST
Created attachment 49283 [details]
Proposed patch 4

Added support for adding the autoinstaller target directory to the sys.path search path.  Created temp directory lazily in try-except block of _create_scratch_directory() instead of in AutoInstaller constructor.  Other minor changes and clean-ups.
Comment 14 Chris Jerdonek 2010-02-23 06:18:37 PST
(In reply to comment #13)
> Created an attachment (id=49283) [details]
> Proposed patch 4
> 
> Added support for adding the autoinstaller target directory to the sys.path
> search path.  Created temp directory lazily in try-except block of
> _create_scratch_directory() instead of in AutoInstaller constructor.  Other
> minor changes and clean-ups.

I kept the fully-qualified import statements for the reasons I mentioned in comment 12.  For now, I think if we can do it, we should.  If it becomes a maintenance problem, we can revisit the issue and consider alternatives.
Comment 15 Chris Jerdonek 2010-02-24 14:46:44 PST
*** Bug 35164 has been marked as a duplicate of this bug. ***
Comment 16 Chris Jerdonek 2010-02-26 15:21:01 PST
Anything I can do to make reviewing this easier -- any volunteers?  It's the only thing stopping us from enabling Python style-checking.  Thanks a lot in advance!
Comment 17 David Levin 2010-02-26 15:37:09 PST
I'll try.
Comment 18 David Levin 2010-02-26 16:34:18 PST
Comment on attachment 49283 [details]
Proposed patch 4

Just some minor nits to consider addressing before landing.
 


> diff --git a/WebKitTools/Scripts/webkitpy/thirdparty/autoinstall.py b/WebKitTools/Scripts/webkitpy/thirdparty/autoinstall.py

You really re-wrote this. I don't think it should be in thirdparty anymore, even though it originated that way. At this point,
we can't take upstream changes and it should match WK coding guidelines but that can happen in another patch.



> +    def _extract_targz(self, path, scratch_dir):
> +        # tarfile.extractall() extracts to a path without the
> +        # trailing ".tar.gz".
> +        #
> +        # The substring ".tar.gz" has seven characters.
> +        target_basename = os.path.basename(path[:-7])

How about
  target_basename = os.path.basename(path[:-len(".tar.gz")])
with no comment?


> +        target_path = os.path.join(scratch_dir, target_basename)
> +
> +        self._log_transfer("Starting gunzip/extract...", path, target_path)
> +
> +        tfile = tarfile.open(path)

tfile is an odd abbreviation. How about tar_file or compressed_file, etc.?


> +    def _unzip(self, path, scratch_dir):
> +        # zipfile.extractall() extracts to a path without the
> +        # trailing ".zip".
> +        #
> +        # The substring ".zip" has four characters.
> +        target_basename = os.path.basename(path[:-4])

Or
        target_basename = os.path.basename(path[:-len(".zip")])

with no comment (about the length of ".zip").


> +    def _download(self, url, scratch_dir):
> +        """Download URL contents, and return the download path."""
> +        self._log_transfer("Starting download...", url, scratch_dir)
> +        url_path = urlparse.urlsplit(url)[2]
> +        url_path = os.path.normpath(url_path)  # Removes trailing slash.

Unless there is some pep8 rule, stick with the one space between code and comments.


> +
> +    def install(self, url, should_refresh=False, target_name=None,
> +                url_subpath=None):
> +
> +        target_path = os.path.join(self._target_dir, target_name)
> +        if (not should_refresh) and self._is_downloaded(target_name, url):

no need for parens around "not should_refresh"
Comment 19 Chris Jerdonek 2010-02-26 22:58:31 PST
(In reply to comment #18)
> (From update of attachment 49283 [details])

Thanks a lot for the comments and for your review, David.  I'll incorporate all of your suggestions.

> > +    def _download(self, url, scratch_dir):
> > +        """Download URL contents, and return the download path."""
> > +        self._log_transfer("Starting download...", url, scratch_dir)
> > +        url_path = urlparse.urlsplit(url)[2]
> > +        url_path = os.path.normpath(url_path)  # Removes trailing slash.
> 
> Unless there is some pep8 rule, stick with the one space between code and
> comments.

Yes, PEP8 has the following guideline:

An inline comment is a comment on the same line as a statement.  Inline
comments should be separated by at least two spaces from the statement.
They should start with a # and a single space.
Comment 20 Chris Jerdonek 2010-02-27 00:07:17 PST
Manually committed (via "git svn dcommit"):

http://trac.webkit.org/changeset/55348
Comment 21 Tony Chang 2010-02-28 23:21:28 PST
(In reply to comment #20)
> Manually committed (via "git svn dcommit"):
> 
> http://trac.webkit.org/changeset/55348

This uses python2.6 features so breaks those of us on Leopard (which uses python2.5).  Specifically, I get an error with zipfile.extractall:
  http://docs.python.org/library/zipfile.html

This is breaking the commit queue bots too:
  http://webkit-commit-queue.appspot.com/
Comment 22 Chris Jerdonek 2010-02-28 23:29:33 PST
(In reply to comment #21)
> (In reply to comment #20)
> > Manually committed (via "git svn dcommit"):
> > 
> > http://trac.webkit.org/changeset/55348
> 
> This uses python2.6 features so breaks those of us on Leopard (which uses
> python2.5).  Specifically, I get an error with zipfile.extractall:
>   http://docs.python.org/library/zipfile.html
> 
> This is breaking the commit queue bots too:
>   http://webkit-commit-queue.appspot.com/

I'm looking into it right now.  As you said, we basically need to replace the call to extractall() with Python 2.4 code.
Comment 23 Chris Jerdonek 2010-02-28 23:42:07 PST
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #20)
> > > Manually committed (via "git svn dcommit"):
> > > 
> > > http://trac.webkit.org/changeset/55348
> > 
> > This uses python2.6 features so breaks those of us on Leopard (which uses
> > python2.5).  Specifically, I get an error with zipfile.extractall:
> >   http://docs.python.org/library/zipfile.html
> > 
> > This is breaking the commit queue bots too:
> >   http://webkit-commit-queue.appspot.com/
> 
> I'm looking into it right now.  As you said, we basically need to replace the
> call to extractall() with Python 2.4 code.

I think I should be able to fix this within an hour or so.
Comment 24 Chris Jerdonek 2010-03-01 00:38:29 PST
(In reply to comment #23)
> > I'm looking into it right now.  As you said, we basically need to replace the
> > call to extractall() with Python 2.4 code.
> 
> I think I should be able to fix this within an hour or so.

Well, it doesn't look like I'll be able to fix this soon after all.  Not even the ZipFile.open() method (which seems to do the actual unzipping) is available in 2.4.

The Python 2.6 code that implements open() isn't trivial, so I can't re-implement it right away.  For example, it contains this code:

        zef_file.seek(zinfo.header_offset, 0)

        # Skip the file header:
        fheader = zef_file.read(sizeFileHeader)
        if fheader[0:4] != stringFileHeader:
            raise BadZipfile, "Bad magic number for file header"

        fheader = struct.unpack(structFileHeader, fheader)
        fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
        if fheader[_FH_EXTRA_FIELD_LENGTH]:
            zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])

        if fname != zinfo.orig_filename:
            raise BadZipfile, \
                      'File name in directory "%s" and header "%s" differ.' % (
                          zinfo.orig_filename, fname)

Maybe we will have to go back to using zipimport for the *.zip files (but we can stick with the approach of using gzip for pep8.py).

I believe rolling this out will require rolling out at least two patches (this in addition to the pep8 patch).  I'll communicate about this on IRC.
Comment 25 Chris Jerdonek 2010-03-01 02:10:02 PST
Manually rolled out (via "git svn dcommit"):

http://trac.webkit.org/changeset/55363
Comment 26 Chris Jerdonek 2010-03-02 14:57:28 PST
(In reply to comment #5)
> I see that ClientForm is getting auto-installed, but I don't see where it's
> being imported.  All I see is this comment in bugzilla.py:
> 
>         bug_status = self.browser.find_control("bug_status", type="select")
>         # This is a hack around the fact that ClientForm.ListControl seems to
>         # have no simpler way to ask if a control has an item named "REOPENED"
>         # without using exceptions for control flow.
>         possible_bug_statuses = map(lambda item: item.name, bug_status.items)
> 
> Is ClientForm used anywhere?

I discovered the answer to this question.  It is used in the mechanize package, in _html.py:

> import re, copy, htmlentitydefs
> import sgmllib, ClientForm

Thus, mechanize requires that ClientForm be in the search path.
Comment 27 Chris Jerdonek 2010-03-28 17:21:47 PDT
Comment on attachment 49283 [details]
Proposed patch 4

Changing patch 4 from r+ to r-.

I will be attaching an update within a couple days.
Comment 28 Chris Jerdonek 2010-03-28 19:40:54 PDT
Created attachment 51875 [details]
Proposed patch 5

It's been a while since patch 4 was rolled out due to an incompatibility with Python 2.5.  I was waiting for us to settle the versioning stuff and most of the re-organization before submitting an update.

The only substantive change here is writing a replacement for Python 2.6's extract_all method.  I also improved the log messages and  the information that renders when an error is raised.

After this lands, we'll be able to re-land the pep8 patch with essentially no change (just rebasing):

https://bugs.webkit.org/show_bug.cgi?id=33639

Thanks.
Comment 29 Eric Seidel (no email) 2010-03-29 19:54:38 PDT
Why are we still doing any bit of 2.4 dance?  Can't we just error out in 2.4?  I see no reason to have 2.4 soft-error code.
Comment 30 Chris Jerdonek 2010-03-30 09:34:04 PDT
(In reply to comment #29)
> Why are we still doing any bit of 2.4 dance?  Can't we just error out in 2.4? 
> I see no reason to have 2.4 soft-error code.

It's so we can provide a nice error message for people that haven't upgraded, like so--

python -V
Python 2.4.6
chris_g4@stonewall:trunk-git (python-test-webkitpy-clean)> test-webkitpy
[INFO] test-webkitpy: Suppressing most webkitpy logging while running unit tests.
[WARNING] test-webkitpy: WebKit Python scripts do not support your current Python version (2.4.6).  The minimum supported version is 2.5.
[INFO] webkitpy.test.main: Excluding: webkitpy.common.checkout.scm_unittest (use --all to include)
Traceback (most recent call last):
  File "/Users/chris_g4/dev/WebKit/trunk-git/WebKitTools/Scripts/test-webkitpy", line 189, in ?
    Tester().run_tests(sys.argv)

That reminds me.  I meant to add to that message a link to the wiki page with instructions on how to upgrade.
Comment 31 Chris Jerdonek 2010-03-30 09:40:01 PDT
(In reply to comment #29)
> Why are we still doing any bit of 2.4 dance?  Can't we just error out in 2.4? 
> I see no reason to have 2.4 soft-error code.

FYI, we also have that nice message for webkit-patch, too (which will also see the update with the upgrade link):

> python -V
Python 2.4.6
chris_g4@stonewall:trunk-git (python-test-webkitpy-clean)> webkit-patch --help
WARNING: WebKit Python scripts do not support your current Python version (2.4.6).  The minimum supported version is 2.5.
Traceback (most recent call last):
  File "/Users/chris_g4/dev/WebKit/trunk-git/WebKitTools/Scripts/webkit-patch", line 70, in ?
    main()
  File "/Users/chris_g4/dev/WebKit/trunk-git/WebKitTools/Scripts/webkit-patch", line 63, in main
    from webkitpy.tool.main import WebKitPatch
Comment 32 Chris Jerdonek 2010-03-31 19:16:03 PDT
Created attachment 52240 [details]
Proposed patch 6

Rebased to TOT.  Also removed an extraneous "raise" statement and an extraneous "pass" statement.
Comment 33 David Levin 2010-03-31 22:36:35 PDT
Comment on attachment 52240 [details]
Proposed patch 6


> diff --git a/WebKitTools/Scripts/test-webkitpy b/WebKitTools/Scripts/test-webkitpy.
> +        if record.name.startswith("webkitpy.common.system.autoinstall") or \
> +            record.name.startswith("webkitpy.test"):

I prefer using parenthesis for a case like this as opposed to the line continuation \ (but this is very much a personal preference).

(Probably because I learned python while using this style guide: http://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Line_length#Line_length)
Comment 34 Chris Jerdonek 2010-03-31 22:50:01 PDT
(In reply to comment #33)
> (From update of attachment 52240 [details])
> 
> > diff --git a/WebKitTools/Scripts/test-webkitpy b/WebKitTools/Scripts/test-webkitpy.
> > +        if record.name.startswith("webkitpy.common.system.autoinstall") or \
> > +            record.name.startswith("webkitpy.test"):
> 
> I prefer using parenthesis for a case like this as opposed to the line
> continuation \ (but this is very much a personal preference).
> 
> (Probably because I learned python while using this style guide:
> http://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Line_length#Line_length)

Thanks, David!  I guess I don't have a uniform preference on this yet.  Once WebKit adopts PEP8 officially (e.g. on the style guidelines web page), perhaps we can take a position on this issue, too, like Google did.
Comment 35 Chris Jerdonek 2010-03-31 23:10:31 PDT
Committed:

http://trac.webkit.org/changeset/56897
Comment 36 Eric Seidel (no email) 2010-03-31 23:11:39 PDT
webkit-patch land still no-workie for you?  Or webkit-patch mark-as-fixed even. :)
Comment 37 Chris Jerdonek 2010-03-31 23:19:38 PDT
(In reply to comment #36)
> webkit-patch land still no-workie for you?  Or webkit-patch mark-as-fixed even.
> :)

The last 3 times I tried it, it left my command in a half-completed state which has been more hurt than help for me.  :(

I think in all cases it was caused by outdated credentials in my keychain.  I've tried updating the credentials in my keychain using the KeyChain Access app, but it hasn't seemed to stick/work (there are like 5 or 6 keychains associated to bugs.webkit.org which seems like a separate issue in its own right).
Comment 38 Chris Jerdonek 2010-03-31 23:20:29 PDT
(In reply to comment #37)
> app, but it hasn't seemed to stick/work (there are like 5 or 6 keychains

keys not keychains