Bug 63883

Summary: Make SCM unit tests faster
Product: WebKit Reporter: Daniel Bates <dbates>
Component: Tools / TestsAssignee: Daniel Bates <dbates>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, cjerdonek, eric, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch eric: review+

Description Daniel Bates 2011-07-03 19:04:24 PDT
The SCM unit tests are very slow. We should speed them up.

I used Python's cProfile module to profile test-webkitpy on my Mac Mini and looking at only scm/scm_unittests funtions, sorted by cumulative time, we have:

   524364 function calls (519838 primitive calls) in 851.587 CPU seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       94    0.038    0.000  711.856    7.573 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:149(setup)
       59    0.010    0.000  516.529    8.755 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:934(setUp)
       94    0.030    0.000  469.979    5.000 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:113(_setup_test_commits)
      471    0.017    0.000  464.249    0.986 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:109(_svn_commit)
       35    0.003    0.000  272.737    7.792 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:586(setUp)
       60    0.008    0.000   75.167    1.253 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:66(run_silent)
       59    0.008    0.000   74.148    1.257 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:923(_setup_git_checkout)
      265    0.016    0.000   17.215    0.065 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/detection.py:72(detect_scm_system)
       94    0.012    0.000   11.979    0.127 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:177(tear_down)
       59    0.006    0.000    8.359    0.142 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:943(tearDown)

Notice, GitSVNTest.setUp() (scm_unittest.py:934) and SCMTest.setUp() (scm_unittest.py:586) call SVNTestRepository.setup() (scm_unittest.py:149). And SVNTestRepository.setup() has a cumulative time of 711.856 seconds.
Comment 1 Daniel Bates 2011-07-03 19:16:32 PDT
Created attachment 99589 [details]
Patch

Profiling with this patch:

   581795 function calls (573917 primitive calls) in 205.441 CPU seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       59    0.010    0.000  113.431    1.923 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:967(setUp)
       60    0.005    0.000   76.073    1.268 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:80(run_silent)
       59    0.008    0.000   75.076    1.272 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:956(_setup_git_checkout)
       94    0.011    0.000   62.900    0.669 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:172(setup)
       35    0.003    0.000   28.019    0.801 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:619(setUp)
      265    0.016    0.000   17.359    0.066 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/detection.py:72(detect_scm_system)
       94    0.006    0.000   11.032    0.117 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:211(tear_down)
        1    0.000    0.000    8.149    8.149 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:185(_setup_mock_repo)
       59    0.006    0.000    7.821    0.133 /Users/dbates/Desktop/WebKit/Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:976(tearDown)

I moved the actual creation of the mock SVN repo from SVNTestRepository.setup() into SVNTestRepository._setup_mock_repo(). The first call to SVNTestRepository.setup() calls SVNTestRepository._setup_mock_repo() once then subsequent calls creates a filesystem copy of the mock SVN repo M and performs an SVN checkout of M.
Comment 2 WebKit Review Bot 2011-07-03 19:19:15 PDT
Attachment 99589 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/Scripts/webkitpy..." exit_code: 1

Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:65:  expected 2 blank lines, found 1  [pep8/E302] [5]
Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:70:  expected 2 blank lines, found 1  [pep8/E302] [5]
Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:188:  at least two spaces before inline comment  [pep8/E261] [5]
Total errors found: 3 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Eric Seidel (no email) 2011-07-03 19:21:48 PDT
I <3 dan-the-man bates.
Comment 4 Daniel Bates 2011-07-04 13:15:54 PDT
Created attachment 99653 [details]
Patch

Fixes style issues.
Comment 5 Eric Seidel (no email) 2011-07-04 20:26:20 PDT
Comment on attachment 99653 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99653&action=review

I like this patch.  I'm not sure I 100% follow.  Perhaps once you respond to my comments I will.

> Tools/ChangeLog:8
> +        Speeds up the SCM unit tests by a factor of 4. Currently, we create a
> +        mock SVN repo for each test_ method in SVNTest and GitTest and creating
> +        this repo is expensive.

Is this still the slowest step after your speedup?  Is there more work to do here?

> Tools/ChangeLog:17
> +
> +        Note, Python 2.7's unittest module implements support for per class and
> +        per module setup and tear down methods which could be used to implement
> +        similar functionality. At the time of writing, test-webkitpy is designed
> +        to support Python 2.5. So, we can't take advantage of these Python 2.7
> +        features :(

Could we instead use a global instance of some class with a destructor?  I'm not sure if destructors are run at shutdown.  It feels like our current hack to implement this is poorly encapsulated.

> Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:172
> +    # GitTest. We create a mock SVN repo once and then perform an SVN checkout from a filesystem copy of
> +    # it since it's expensive to create the mock repo.

Why do we even do a checkout?  Why not just cp the original shared checkout?  Wouldn't that be even faster?

> Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:183
> +        shutil.copytree(cached_svn_repo_path, test_object.svn_repo_path)

Maybe you are just copying?  I'm not sure I understand what's going on here.

> Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:193
> -        run_command(['svnadmin', 'create', '--pre-1.5-compatible', test_object.svn_repo_path])
> +        run_command(['svnadmin', 'create', '--pre-1.5-compatible', svn_repo_path])

I wonder why we're still passing the 1.5-compat flag here?
Comment 6 Eric Seidel (no email) 2011-07-04 20:27:22 PDT
It looks like svn setup is still hot. And git import from the svn is also still hot.  Should we look at caching the git one as well?
Comment 7 Daniel Bates 2011-07-04 22:52:14 PDT
(In reply to comment #5)
> (From update of attachment 99653 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=99653&action=review
> 
> I like this patch.  I'm not sure I 100% follow.  Perhaps once you respond to my comments I will.
> 
> > Tools/ChangeLog:8
> > +        Speeds up the SCM unit tests by a factor of 4. Currently, we create a
> > +        mock SVN repo for each test_ method in SVNTest and GitTest and creating
> > +        this repo is expensive.
> 
> Is this still the slowest step after your speedup?

No, the slowest step after this speedup patch is GitTest.setUp() (with a cumulative time of 113.431 seconds).  SVNTest.setup() is still slow with a cumulative time of 62.9 seconds, but it use to have a cumulative time of 711.856 seconds.

> Is there more work to do here?

More work can be done to further speed things up. For example, test cases that don't modify the state of the repo, say SVNTest.test_added_files(), could use a copy of an SVN checkout.

> 
> > Tools/ChangeLog:17
> > +
> > +        Note, Python 2.7's unittest module implements support for per class and
> > +        per module setup and tear down methods which could be used to implement
> > +        similar functionality. At the time of writing, test-webkitpy is designed
> > +        to support Python 2.5. So, we can't take advantage of these Python 2.7
> > +        features :(
> 
> Could we instead use a global instance of some class with a destructor?  I'm not sure if destructors are run at shutdown.  It feels like our current hack to implement this is poorly encapsulated.
>

I'll look into this.

> 
> > Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:172
> > +    # GitTest. We create a mock SVN repo once and then perform an SVN checkout from a filesystem copy of
> > +    # it since it's expensive to create the mock repo.
> 
> Why do we even do a checkout? 

We perform a new checkout so as to associate it with the URL to the newly copied SVN repo. And we create a copy of the SVN repo to ensure that that SVN operations in one test don't affect the state of the repo seen in another test. We probably could repurpose a single SVN checkout by changing the repository URL. We could address this repurposing of the SVN checkout in a subsequent bug.

> Why not just cp the original shared checkout?  

Notice that a commit to an SVN checkout isn't local in the sense of a "git commit". That is, when you run "svn commit" it sends the changes in your working copy to the SVN repository. Copying an SVN checkout directory would only copy any non-committed changes. We want to present a consistent SVN repository state for each test.

Some of the test methods ultimately "svn commit" to their checkout which records a new revision in the associated SVN repo and hence affects the results of other tests that depend on the revision history of the SVN repo.

For example, SVNTest.test_revisions_changing_file() ultimately asserts that the file test_file was changed in revisions [5, 4, 3, 2] (see remark (*)). Notice the test SVNTest.test_commit_text_parsing() commits a change this file. Suppose SVNTest.test_commit_text_parsing() is run before SVNTest.test_revisions_changing_file(). Then SVNTest.test_revisions_changing_file() will fail because the revision list for file test_file is [6, 5, 4, 3, 2] != [5, 4, 3, 2].

(*) This revision list was built up by in SVNTestRepository._setup_test_commits(). Notice, the first revision in the repo is the creation of the directory trunk.

> Wouldn't that be even faster?

Yes, it would be faster if we just copied the SVN checkout. We need to take care of when we do this so that one test doesn't affect another test. See the aforementioned discussion above.

> 
> > Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:183
> > +        shutil.copytree(cached_svn_repo_path, test_object.svn_repo_path)
> 
> Maybe you are just copying?  I'm not sure I understand what's going on here.

Yes, this line copies the SVN repository directory, located at cached_svn_repo_path, to a temporary directory at test_object.svn_repo_path. As aforementioned, we make a copy of the repo directory instead of the checkout directory because it's the repo that holds all of the state. By creating a copy of the repo before each test we can ensure that each test starts off with a repo with 5 test commits (see SVNTestRepository._setup_test_commits())

> 
> > Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:193
> > -        run_command(['svnadmin', 'create', '--pre-1.5-compatible', test_object.svn_repo_path])
> > +        run_command(['svnadmin', 'create', '--pre-1.5-compatible', svn_repo_path])
> 
> I wonder why we're still passing the 1.5-compat flag here?

I take it that we pass this compatibility flag to support Subversion 1.4 under Mac OS 10.4 Tiger as documented on <http://www.webkit.org/building/checkout.html>. Since support for Tiger has been phased out we can remove this --pre-1.5-compatible flag and update the WebKit.org checkout page. I suggest making such changes in a separate bug.
Comment 8 Daniel Bates 2011-07-04 22:53:01 PDT
(In reply to comment #6)
> It looks like svn setup is still hot. And git import from the svn is also still hot.  Should we look at caching the git one as well?

Yes, we should look into this further.
Comment 9 Eric Seidel (no email) 2011-07-04 23:07:02 PDT
I see.  I forgot that we actually pushed to the svn "server".

We could copy both the server and checkout to save the checkout.  But you're right, you might have to do an svn switch to change the url to point to the newly copied checkout.  I'm not sure how much faster a cp + svn switch would be vs. just a normal svn checkout.

This patch is a huge win and I'm happy to have you land it as-is.  I'm also happy to review any further revisions you'd like to post.  Completely up to you.
Comment 10 Eric Seidel (no email) 2011-07-04 23:09:44 PDT
I also just realized we could consider using function decorators to do the setup instead of using setup/teardown.

See memoized.py for an example of a function decorator declared in webkitpy.

That would allow us to do things like:

@svncheckout

@readonlysvncheckout

@gitcheckout

etc, and instead of having class-wide setup we could have per-function setup.

I suspect this is completely orthogonal to the change under consideration, but it just occurred to me and so I thought I'd record it.
Comment 11 Daniel Bates 2011-07-06 11:32:29 PDT
(In reply to comment #5)
> Could we instead use a global instance of some class with a destructor?

I'm unclear how to accomplish this given Python's sequencing of destructors. We eventually need some functionality to actually delete the directory and this functionality may not be available when we're called: "...when __del__() is invoked in response to a module being deleted (e.g., when execution of the program is done), other globals referenced by the __del__() method may already have been deleted or in the process of being torn down (e.g. the import machinery shutting down)." (<http://docs.python.org/reference/datamodel.html#customization>) 

> I'm not sure if destructors are run at shutdown.

From reading <http://docs.python.org/reference/datamodel.html#customization>, it states that "It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits."

> It feels like our current hack to implement this is poorly encapsulated.

Yes, I'm not happy with the atexit() and global variable usage. I am open to other suggestions on how to encapsulate this cleanup functionality.
Comment 12 Eric Seidel (no email) 2011-07-06 11:35:56 PDT
Comment on attachment 99653 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99653&action=review

> Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:75
> +# We cache the mock SVN repo so that we don't create it again for each call to an SVNTest or GitTest test_ method.
> +# We store it in a global variable so that we can delete this cached repo on exit(3).
> +cached_svn_repo_path = None
> +
> +
> +def remove_dir(path):
> +    # Change directory to / to ensure that we aren't in the directory we want to delete.
> +    os.chdir('/')
> +    shutil.rmtree(path)
> +
> +
> +@atexit.register
> +def delete_cached_mock_repo_at_exit():
> +    if cached_svn_repo_path:
> +        remove_dir(cached_svn_repo_path)

Just add a FIXME That this all goes away when we move to 2.7.  LGTM.
Comment 13 Daniel Bates 2011-07-06 11:37:51 PDT
(In reply to comment #12)
> > Tools/Scripts/webkitpy/common/checkout/scm/scm_unittest.py:75
> > +# We cache the mock SVN repo so that we don't create it again for each call to an SVNTest or GitTest test_ method.
> > +# We store it in a global variable so that we can delete this cached repo on exit(3).
> > +cached_svn_repo_path = None
>> [...]
> > +@atexit.register
> > +def delete_cached_mock_repo_at_exit():
> > +    if cached_svn_repo_path:
> > +        remove_dir(cached_svn_repo_path)
> 
> Just add a FIXME That this all goes away when we move to 2.7.  LGTM.

Will do. Thanks for the review.
Comment 14 Daniel Bates 2011-07-06 12:10:42 PDT
Committed r90480: <http://trac.webkit.org/changeset/90480>