WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
35163
autoinstall: should unzip downloaded zip and gzip files prior to importing
https://bugs.webkit.org/show_bug.cgi?id=35163
Summary
autoinstall: should unzip downloaded zip and gzip files prior to importing
Chris Jerdonek
Reported
2010-02-19 10:18:59 PST
Autoinstall should probably unzip downloaded zip files prior to importing. See, for example, the following information from the zipimport documentation: "Any files may be present in the ZIP archive, but only files .py and .py[co] are available for import.... Note that if an archive only contains .py files, Python will not attempt to modify the archive by adding the corresponding .pyc or .pyo file, meaning that if a ZIP archive doesn’t contain .pyc files, importing may be rather slow." (from the beginning of
http://docs.python.org/library/zipimport.html
) The two packages we currently autoinstall don't seem to have .pyc files. Also, it seems like autoinstall's code should modify sys.path rather than using zipimport directly, as described in the following example:
http://docs.python.org/library/zipimport.html#examples
Attachments
Proposed patch
(24.45 KB, patch)
2010-02-21 10:58 PST
,
Chris Jerdonek
cjerdonek
: commit-queue-
Details
Formatted Diff
Diff
Proposed patch 2
(24.49 KB, patch)
2010-02-21 11:04 PST
,
Chris Jerdonek
cjerdonek
: commit-queue-
Details
Formatted Diff
Diff
Proposed patch 3
(24.25 KB, patch)
2010-02-21 13:23 PST
,
Chris Jerdonek
cjerdonek
: commit-queue-
Details
Formatted Diff
Diff
Proposed patch 4
(26.49 KB, patch)
2010-02-23 06:09 PST
,
Chris Jerdonek
cjerdonek
: review-
cjerdonek
: commit-queue-
Details
Formatted Diff
Diff
Proposed patch 5
(45.13 KB, patch)
2010-03-28 19:40 PDT
,
Chris Jerdonek
cjerdonek
: commit-queue-
Details
Formatted Diff
Diff
Proposed patch 6
(45.97 KB, patch)
2010-03-31 19:16 PDT
,
Chris Jerdonek
levin
: review+
cjerdonek
: commit-queue-
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Eric Seidel (no email)
Comment 1
2010-02-19 11:44:27 PST
CCing the author of autoinstall.py
Chris Jerdonek
Comment 2
2010-02-19 14:36:41 PST
(In reply to
comment #0
)
> Also, it seems like autoinstall's code should modify sys.path rather than using > zipimport directly, as described in the following example: > >
http://docs.python.org/library/zipimport.html#examples
Just to expand on this a little bit. Based on the notes and sample code from the link above, I think the following might be a somewhat more natural flow for autoinstall: If autoinstall(location) is called from __file__ on a particular URL location, then autoinstall would download the location's contents to an "autoinstalled_packages" directory in the same directory as __file__ (if it hasn't already been downloaded). It would also unzip the contents if necessary (and unzip and extract the contents in the case of a gzipped tar archive). The final step would be to add the unzipped and extracted folder to Python's sys.path. Once this is done, any calls to "import" will now also search all autoinstalled_packages directories. Some advantages of this approach are that-- (1) it uses Python's usual import process, without having to call zipimport explicitly or using custom import hooks of the type described here:
http://www.python.org/dev/peps/pep-0302/
(2) it allows .pyc files to be generated. (2) it generalizes to tar files, etc. with no change. (3) it allows the downloaded packages to be located inside the packages in which they are actually used if we want -- instead of requiring them all to be in a global autoinstall folder. For example, if the style sub-package of webkitpy needs to autoinstall pep8, then we could put the autoinstall(<url_to_pep8>) line in webkitpy.style's __init__.py. Then pep8 would get downloaded into the style folder. Or if we wanted pep8 to be installed in the top-level autoinstall folder, we would simply call autoinstall() from webkitpy's __init__.py.
Eric Seidel (no email)
Comment 3
2010-02-19 14:41:50 PST
I like your proposal Chris. I chose Daniel's Autoinstall because it was the only thing that looked remotely able to do this sort of auto-download out of the box (and I had no interest in writing my own). I'm not sure that an autoinstall() call could easily detect the calling __file__, but I agree that that would be preferable. I also think that autoinstall usage makes a lot of sense from an __init__.py file, since it's nice to set it up once and have it automatically used throughout the module. It would be very nice if the modified autoinstall could handle gzip and other files yes.
Chris Jerdonek
Comment 4
2010-02-19 14:54:19 PST
(In reply to
comment #3
)
> I like your proposal Chris. > > I chose Daniel's Autoinstall because it was the only thing that looked remotely > able to do this sort of auto-download out of the box (and I had no interest in > writing my own).
I think it's great -- thanks. It's super compact, too, which helps a lot.
> I'm not sure that an autoinstall() call could easily detect the calling > __file__, but I agree that that would be preferable.
I think you're right. I realized this earlier, too -- so that __file__ (or its directory) would probably need to be passed in as a parameter. Perhaps the autoinstall() method could accept a destination directory as a parameter and default to autoinstall.py's directory if the parameter is not present.
> I also think that autoinstall usage makes a lot of sense from an __init__.py > file, since it's nice to set it up once and have it automatically used > throughout the module.
Yes, I agree. The package was a great find! And thanks, Daniel!
Chris Jerdonek
Comment 5
2010-02-20 17:54:39 PST
I see that ClientForm is getting auto-installed, but I don't see where it's being imported. All I see is this comment in bugzilla.py: bug_status = self.browser.find_control("bug_status", type="select") # This is a hack around the fact that ClientForm.ListControl seems to # have no simpler way to ask if a control has an item named "REOPENED" # without using exceptions for control flow. possible_bug_statuses = map(lambda item: item.name, bug_status.items) Is ClientForm used anywhere?
Chris Jerdonek
Comment 6
2010-02-21 10:58:18 PST
Created
attachment 49151
[details]
Proposed patch This patch also addresses this report (re: *.tar.gz files for pep8):
https://bugs.webkit.org/show_bug.cgi?id=35164
WebKit Review Bot
Comment 7
2010-02-21 10:59:05 PST
Attachment 49151
[details]
did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebKitTools/Scripts/webkitpy/__init__.py:18: Line contains tab character. [whitespace/tab] [5] WebKitTools/Scripts/webkitpy/__init__.py:20: Line contains tab character. [whitespace/tab] [5] WebKitTools/Scripts/webkitpy/__init__.py:22: Line contains tab character. [whitespace/tab] [5] Total errors found: 3 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Chris Jerdonek
Comment 8
2010-02-21 11:04:34 PST
Created
attachment 49153
[details]
Proposed patch 2 Removed tabs.
Chris Jerdonek
Comment 9
2010-02-21 13:23:57 PST
Created
attachment 49166
[details]
Proposed patch 3 Minor additions to code comments; modified autoinstall logging to be module-based instead of class-based.
Eric Seidel (no email)
Comment 10
2010-02-22 14:12:07 PST
I think this patch broke the Mac EWS. :( we should fix the EWS to be more robust about python patches.
Eric Seidel (no email)
Comment 11
2010-02-22 14:15:56 PST
Comment on
attachment 49166
[details]
Proposed patch 3 We do lose the ability to have a local copy already in the search path: 48 from webkitpy.thirdparty.autoinstalled.mechanize import Browser I'm not sure if that's good or bad. Yeah... why wouldn't this new autoinstall just modify the search path to include these newly downloaded modules?
Chris Jerdonek
Comment 12
2010-02-22 15:49:50 PST
(In reply to
comment #11
)
> (From update of
attachment 49166
[details]
) > We do lose the ability to have a local copy already in the search path: > 48 from webkitpy.thirdparty.autoinstalled.mechanize import Browser > > I'm not sure if that's good or bad. > > Yeah... why wouldn't this new autoinstall just modify the search path to > include these newly downloaded modules?
This would be easy to add. But if we want to do this, we may want to consider doing it outside of autoinstall.py, or at least not force it upon the caller. It seems preferable to me because you can always add something to sys.path, but it's harder to remove. For example, we could add-- thirdparty/autoinstalled to sys.path inside webkitpy's __init__.py (though it would seem slightly odd for thirdparty/autoinstalled to be in the path and not thirdparty). In general, it seems better to me to be explicit about where imports are coming from when possible because then there's less ambiguity, etc. Modifying the search path seems a bit more hackish. See for example--
https://bugs.webkit.org/show_bug.cgi?id=32971#c6
It seems like it should be relied upon more when there isn't really a good alternative -- e.g. when the code is in an external directory, in a place you don't have control over, or in some other awkward location (e.g. autoinstall.cache.d). But now that we have control over the location, the technique isn't necessary. In the extreme, we could add all of our directories to sys.path in __init__.py, and then we wouldn't have to know the location of any of our modules when importing. :) But that's obviously not good practice. In terms of updating the import statements: In the patch below, I experimented with an approach to addressing intra-package references in check-webkit-style:
https://bugs.webkit.org/show_bug.cgi?id=33791
I took that approach precisely to avoid having to do import search and replaces of the type found in the patch I'm submitting here (though I later found that PEP8 discourages relative imports in all cases in favor of absolute package path imports). I'll think further about adding support for modifying sys.path in this patch, though I have mixed feelings about using that feature here.
Chris Jerdonek
Comment 13
2010-02-23 06:09:41 PST
Created
attachment 49283
[details]
Proposed patch 4 Added support for adding the autoinstaller target directory to the sys.path search path. Created temp directory lazily in try-except block of _create_scratch_directory() instead of in AutoInstaller constructor. Other minor changes and clean-ups.
Chris Jerdonek
Comment 14
2010-02-23 06:18:37 PST
(In reply to
comment #13
)
> Created an attachment (id=49283) [details] > Proposed patch 4 > > Added support for adding the autoinstaller target directory to the sys.path > search path. Created temp directory lazily in try-except block of > _create_scratch_directory() instead of in AutoInstaller constructor. Other > minor changes and clean-ups.
I kept the fully-qualified import statements for the reasons I mentioned in
comment 12
. For now, I think if we can do it, we should. If it becomes a maintenance problem, we can revisit the issue and consider alternatives.
Chris Jerdonek
Comment 15
2010-02-24 14:46:44 PST
***
Bug 35164
has been marked as a duplicate of this bug. ***
Chris Jerdonek
Comment 16
2010-02-26 15:21:01 PST
Anything I can do to make reviewing this easier -- any volunteers? It's the only thing stopping us from enabling Python style-checking. Thanks a lot in advance!
David Levin
Comment 17
2010-02-26 15:37:09 PST
I'll try.
David Levin
Comment 18
2010-02-26 16:34:18 PST
Comment on
attachment 49283
[details]
Proposed patch 4 Just some minor nits to consider addressing before landing.
> diff --git a/WebKitTools/Scripts/webkitpy/thirdparty/autoinstall.py b/WebKitTools/Scripts/webkitpy/thirdparty/autoinstall.py
You really re-wrote this. I don't think it should be in thirdparty anymore, even though it originated that way. At this point, we can't take upstream changes and it should match WK coding guidelines but that can happen in another patch.
> + def _extract_targz(self, path, scratch_dir): > + # tarfile.extractall() extracts to a path without the > + # trailing ".tar.gz". > + # > + # The substring ".tar.gz" has seven characters. > + target_basename = os.path.basename(path[:-7])
How about target_basename = os.path.basename(path[:-len(".tar.gz")]) with no comment?
> + target_path = os.path.join(scratch_dir, target_basename) > + > + self._log_transfer("Starting gunzip/extract...", path, target_path) > + > + tfile = tarfile.open(path)
tfile is an odd abbreviation. How about tar_file or compressed_file, etc.?
> + def _unzip(self, path, scratch_dir): > + # zipfile.extractall() extracts to a path without the > + # trailing ".zip". > + # > + # The substring ".zip" has four characters. > + target_basename = os.path.basename(path[:-4])
Or target_basename = os.path.basename(path[:-len(".zip")]) with no comment (about the length of ".zip").
> + def _download(self, url, scratch_dir): > + """Download URL contents, and return the download path.""" > + self._log_transfer("Starting download...", url, scratch_dir) > + url_path = urlparse.urlsplit(url)[2] > + url_path = os.path.normpath(url_path) # Removes trailing slash.
Unless there is some pep8 rule, stick with the one space between code and comments.
> + > + def install(self, url, should_refresh=False, target_name=None, > + url_subpath=None): > + > + target_path = os.path.join(self._target_dir, target_name) > + if (not should_refresh) and self._is_downloaded(target_name, url):
no need for parens around "not should_refresh"
Chris Jerdonek
Comment 19
2010-02-26 22:58:31 PST
(In reply to
comment #18
)
> (From update of
attachment 49283
[details]
)
Thanks a lot for the comments and for your review, David. I'll incorporate all of your suggestions.
> > + def _download(self, url, scratch_dir): > > + """Download URL contents, and return the download path.""" > > + self._log_transfer("Starting download...", url, scratch_dir) > > + url_path = urlparse.urlsplit(url)[2] > > + url_path = os.path.normpath(url_path) # Removes trailing slash. > > Unless there is some pep8 rule, stick with the one space between code and > comments.
Yes, PEP8 has the following guideline: An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space.
Chris Jerdonek
Comment 20
2010-02-27 00:07:17 PST
Manually committed (via "git svn dcommit"):
http://trac.webkit.org/changeset/55348
Tony Chang
Comment 21
2010-02-28 23:21:28 PST
(In reply to
comment #20
)
> Manually committed (via "git svn dcommit"): > >
http://trac.webkit.org/changeset/55348
This uses python2.6 features so breaks those of us on Leopard (which uses python2.5). Specifically, I get an error with zipfile.extractall:
http://docs.python.org/library/zipfile.html
This is breaking the commit queue bots too:
http://webkit-commit-queue.appspot.com/
Chris Jerdonek
Comment 22
2010-02-28 23:29:33 PST
(In reply to
comment #21
)
> (In reply to
comment #20
) > > Manually committed (via "git svn dcommit"): > > > >
http://trac.webkit.org/changeset/55348
> > This uses python2.6 features so breaks those of us on Leopard (which uses > python2.5). Specifically, I get an error with zipfile.extractall: >
http://docs.python.org/library/zipfile.html
> > This is breaking the commit queue bots too: >
http://webkit-commit-queue.appspot.com/
I'm looking into it right now. As you said, we basically need to replace the call to extractall() with Python 2.4 code.
Chris Jerdonek
Comment 23
2010-02-28 23:42:07 PST
(In reply to
comment #22
)
> (In reply to
comment #21
) > > (In reply to
comment #20
) > > > Manually committed (via "git svn dcommit"): > > > > > >
http://trac.webkit.org/changeset/55348
> > > > This uses python2.6 features so breaks those of us on Leopard (which uses > > python2.5). Specifically, I get an error with zipfile.extractall: > >
http://docs.python.org/library/zipfile.html
> > > > This is breaking the commit queue bots too: > >
http://webkit-commit-queue.appspot.com/
> > I'm looking into it right now. As you said, we basically need to replace the > call to extractall() with Python 2.4 code.
I think I should be able to fix this within an hour or so.
Chris Jerdonek
Comment 24
2010-03-01 00:38:29 PST
(In reply to
comment #23
)
> > I'm looking into it right now. As you said, we basically need to replace the > > call to extractall() with Python 2.4 code. > > I think I should be able to fix this within an hour or so.
Well, it doesn't look like I'll be able to fix this soon after all. Not even the ZipFile.open() method (which seems to do the actual unzipping) is available in 2.4. The Python 2.6 code that implements open() isn't trivial, so I can't re-implement it right away. For example, it contains this code: zef_file.seek(zinfo.header_offset, 0) # Skip the file header: fheader = zef_file.read(sizeFileHeader) if fheader[0:4] != stringFileHeader: raise BadZipfile, "Bad magic number for file header" fheader = struct.unpack(structFileHeader, fheader) fname = zef_file.read(fheader[_FH_FILENAME_LENGTH]) if fheader[_FH_EXTRA_FIELD_LENGTH]: zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH]) if fname != zinfo.orig_filename: raise BadZipfile, \ 'File name in directory "%s" and header "%s" differ.' % ( zinfo.orig_filename, fname) Maybe we will have to go back to using zipimport for the *.zip files (but we can stick with the approach of using gzip for pep8.py). I believe rolling this out will require rolling out at least two patches (this in addition to the pep8 patch). I'll communicate about this on IRC.
Chris Jerdonek
Comment 25
2010-03-01 02:10:02 PST
Manually rolled out (via "git svn dcommit"):
http://trac.webkit.org/changeset/55363
Chris Jerdonek
Comment 26
2010-03-02 14:57:28 PST
(In reply to
comment #5
)
> I see that ClientForm is getting auto-installed, but I don't see where it's > being imported. All I see is this comment in bugzilla.py: > > bug_status = self.browser.find_control("bug_status", type="select") > # This is a hack around the fact that ClientForm.ListControl seems to > # have no simpler way to ask if a control has an item named "REOPENED" > # without using exceptions for control flow. > possible_bug_statuses = map(lambda item: item.name, bug_status.items) > > Is ClientForm used anywhere?
I discovered the answer to this question. It is used in the mechanize package, in _html.py:
> import re, copy, htmlentitydefs > import sgmllib, ClientForm
Thus, mechanize requires that ClientForm be in the search path.
Chris Jerdonek
Comment 27
2010-03-28 17:21:47 PDT
Comment on
attachment 49283
[details]
Proposed patch 4 Changing patch 4 from r+ to r-. I will be attaching an update within a couple days.
Chris Jerdonek
Comment 28
2010-03-28 19:40:54 PDT
Created
attachment 51875
[details]
Proposed patch 5 It's been a while since patch 4 was rolled out due to an incompatibility with Python 2.5. I was waiting for us to settle the versioning stuff and most of the re-organization before submitting an update. The only substantive change here is writing a replacement for Python 2.6's extract_all method. I also improved the log messages and the information that renders when an error is raised. After this lands, we'll be able to re-land the pep8 patch with essentially no change (just rebasing):
https://bugs.webkit.org/show_bug.cgi?id=33639
Thanks.
Eric Seidel (no email)
Comment 29
2010-03-29 19:54:38 PDT
Why are we still doing any bit of 2.4 dance? Can't we just error out in 2.4? I see no reason to have 2.4 soft-error code.
Chris Jerdonek
Comment 30
2010-03-30 09:34:04 PDT
(In reply to
comment #29
)
> Why are we still doing any bit of 2.4 dance? Can't we just error out in 2.4? > I see no reason to have 2.4 soft-error code.
It's so we can provide a nice error message for people that haven't upgraded, like so-- python -V Python 2.4.6 chris_g4@stonewall:trunk-git (python-test-webkitpy-clean)> test-webkitpy [INFO] test-webkitpy: Suppressing most webkitpy logging while running unit tests. [WARNING] test-webkitpy: WebKit Python scripts do not support your current Python version (2.4.6). The minimum supported version is 2.5. [INFO] webkitpy.test.main: Excluding: webkitpy.common.checkout.scm_unittest (use --all to include) Traceback (most recent call last): File "/Users/chris_g4/dev/WebKit/trunk-git/WebKitTools/Scripts/test-webkitpy", line 189, in ? Tester().run_tests(sys.argv) That reminds me. I meant to add to that message a link to the wiki page with instructions on how to upgrade.
Chris Jerdonek
Comment 31
2010-03-30 09:40:01 PDT
(In reply to
comment #29
)
> Why are we still doing any bit of 2.4 dance? Can't we just error out in 2.4? > I see no reason to have 2.4 soft-error code.
FYI, we also have that nice message for webkit-patch, too (which will also see the update with the upgrade link):
> python -V
Python 2.4.6 chris_g4@stonewall:trunk-git (python-test-webkitpy-clean)> webkit-patch --help WARNING: WebKit Python scripts do not support your current Python version (2.4.6). The minimum supported version is 2.5. Traceback (most recent call last): File "/Users/chris_g4/dev/WebKit/trunk-git/WebKitTools/Scripts/webkit-patch", line 70, in ? main() File "/Users/chris_g4/dev/WebKit/trunk-git/WebKitTools/Scripts/webkit-patch", line 63, in main from webkitpy.tool.main import WebKitPatch
Chris Jerdonek
Comment 32
2010-03-31 19:16:03 PDT
Created
attachment 52240
[details]
Proposed patch 6 Rebased to TOT. Also removed an extraneous "raise" statement and an extraneous "pass" statement.
David Levin
Comment 33
2010-03-31 22:36:35 PDT
Comment on
attachment 52240
[details]
Proposed patch 6
> diff --git a/WebKitTools/Scripts/test-webkitpy b/WebKitTools/Scripts/test-webkitpy. > + if record.name.startswith("webkitpy.common.system.autoinstall") or \ > + record.name.startswith("webkitpy.test"):
I prefer using parenthesis for a case like this as opposed to the line continuation \ (but this is very much a personal preference). (Probably because I learned python while using this style guide:
http://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Line_length#Line_length
)
Chris Jerdonek
Comment 34
2010-03-31 22:50:01 PDT
(In reply to
comment #33
)
> (From update of
attachment 52240
[details]
) > > > diff --git a/WebKitTools/Scripts/test-webkitpy b/WebKitTools/Scripts/test-webkitpy. > > + if record.name.startswith("webkitpy.common.system.autoinstall") or \ > > + record.name.startswith("webkitpy.test"): > > I prefer using parenthesis for a case like this as opposed to the line > continuation \ (but this is very much a personal preference). > > (Probably because I learned python while using this style guide: >
http://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Line_length#Line_length
)
Thanks, David! I guess I don't have a uniform preference on this yet. Once WebKit adopts PEP8 officially (e.g. on the style guidelines web page), perhaps we can take a position on this issue, too, like Google did.
Chris Jerdonek
Comment 35
2010-03-31 23:10:31 PDT
Committed:
http://trac.webkit.org/changeset/56897
Eric Seidel (no email)
Comment 36
2010-03-31 23:11:39 PDT
webkit-patch land still no-workie for you? Or webkit-patch mark-as-fixed even. :)
Chris Jerdonek
Comment 37
2010-03-31 23:19:38 PDT
(In reply to
comment #36
)
> webkit-patch land still no-workie for you? Or webkit-patch mark-as-fixed even. > :)
The last 3 times I tried it, it left my command in a half-completed state which has been more hurt than help for me. :( I think in all cases it was caused by outdated credentials in my keychain. I've tried updating the credentials in my keychain using the KeyChain Access app, but it hasn't seemed to stick/work (there are like 5 or 6 keychains associated to bugs.webkit.org which seems like a separate issue in its own right).
Chris Jerdonek
Comment 38
2010-03-31 23:20:29 PDT
(In reply to
comment #37
)
> app, but it hasn't seemed to stick/work (there are like 5 or 6 keychains
keys not keychains
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug