WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
65418
webkit-patch needs to be able to "optimize" the storage of baselines on disk
https://bugs.webkit.org/show_bug.cgi?id=65418
Summary
webkit-patch needs to be able to "optimize" the storage of baselines on disk
Adam Barth
Reported
2011-07-30 16:04:40 PDT
webkit-patch needs to be able to "optimize" the storage of baselines on disk
Attachments
works, needs tests
(7.82 KB, patch)
2011-07-30 16:08 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Updated with support for invisible constraints
(9.21 KB, patch)
2011-07-30 17:28 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Updated with smarter hueristic
(10.41 KB, patch)
2011-07-31 23:09 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch
(19.12 KB, patch)
2011-08-01 15:57 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch for landing
(20.08 KB, patch)
2011-08-01 16:08 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2011-07-30 16:08:13 PDT
Created
attachment 102443
[details]
works, needs tests
Adam Barth
Comment 2
2011-07-30 16:10:35 PDT
It's slightly unclear to me what algorithm to use to compute the global optimum placement of results in the tree. This patch has a very simple heuristic that worked on a couple examples I tried. I'm uploading this patch to checkpoint my work. (Obviously tests are needed!) Thoughts on a better algorithm?
Adam Barth
Comment 3
2011-07-30 17:26:34 PDT
Interestingly, we need to take into account mac-future and mysterious qt-unknown results in order to not over optimize. mac-future and qt-unknown are invisible to us, but they prevent us from performing certain optimizations.
Adam Barth
Comment 4
2011-07-30 17:28:19 PDT
Created
attachment 102445
[details]
Updated with support for invisible constraints
Dimitri Glazkov (Google)
Comment 5
2011-07-31 07:36:46 PDT
Comment on
attachment 102445
[details]
Updated with support for invisible constraints Freakish stuff.
Adam Barth
Comment 6
2011-07-31 23:09:23 PDT
Created
attachment 102481
[details]
Updated with smarter hueristic
Adam Barth
Comment 7
2011-07-31 23:10:47 PDT
I ran into some cases where the old heuristic failed. This updated version of the patch applies the allocation strategy in a loop, hoping to reduce the set of unsatisfied ports to zero. I suspect this algorithm can now go into an infinite loop if it can't allocate everything, but that hasn't happened yet.
Adam Barth
Comment 8
2011-08-01 15:57:28 PDT
Created
attachment 102581
[details]
Patch
Dimitri Glazkov (Google)
Comment 9
2011-08-01 16:02:15 PDT
Comment on
attachment 102581
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=102581&action=review
> Tools/Scripts/webkitpy/common/checkout/baselineoptimizer.py:52 > +def _invert_dictionarty(dictionary):
did you mean to suffix it with "arty"? I mean, it's pretty arty, but just want to make sure.
> Tools/Scripts/webkitpy/common/checkout/baselineoptimizer_unittest.py:47 > +class BaselineOptimizerTest(unittest.TestCase):
The tests seems skimpy, given the glorious complexity of the code above.
Adam Barth
Comment 10
2011-08-01 16:08:14 PDT
(In reply to
comment #9
)
> (From update of
attachment 102581
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=102581&action=review
> > > Tools/Scripts/webkitpy/common/checkout/baselineoptimizer.py:52 > > +def _invert_dictionarty(dictionary): > > did you mean to suffix it with "arty"? I mean, it's pretty arty, but just want to make sure.
Fixed.
> > Tools/Scripts/webkitpy/common/checkout/baselineoptimizer_unittest.py:47 > > +class BaselineOptimizerTest(unittest.TestCase): > > The tests seems skimpy, given the glorious complexity of the code above.
Ok. I've added a couple more tricky cases.
Adam Barth
Comment 11
2011-08-01 16:08:34 PDT
Created
attachment 102585
[details]
Patch for landing
WebKit Review Bot
Comment 12
2011-08-01 17:07:29 PDT
Comment on
attachment 102585
[details]
Patch for landing Clearing flags on attachment: 102585 Committed
r92153
: <
http://trac.webkit.org/changeset/92153
>
WebKit Review Bot
Comment 13
2011-08-01 17:07:36 PDT
All reviewed patches have been landed. Closing bug.
Dirk Pranke
Comment 14
2011-08-08 18:52:24 PDT
Hm. Okay, I wasn't a math major, but I would consider myself at least reasonably conversant in graph theory, and yet I have not encountered the concept of a "hypergraph" before. Further, the description in wikipedia did not make things particularly clear to me. Would it be possible to get some explanation of (a) why the fallback path is a hypergraph and/or (b) what the algorithm is intended to be as comments in either this bug or the code itself? As far as (b) goes, I can accept the answer that it's not possible to summarize the algorithm/heuristic any more simply than the code does itself, in which case I can take another look at it. This is more intended to be the sort of feedback that "wow this code is non-obvious and I'd really want more comments on it than it has". I would also be curious to know what sort of improvements this optimizing gives us over the simpler deduplicating algorithms we have used in the past?
Adam Barth
Comment 15
2011-08-08 21:47:33 PDT
> Would it be possible to get some explanation of (a) why the fallback path is a hypergraph and/or
There are lots of different representations of hypergraphs. One that maps nicely onto this problem is to imagine traversing a graph, but that the edges available to add to your patch at any given vertex depend on which edges you've already added to your path. In a normal graph, path construction is history-independent, but in a hypergraph, it is history dependent. It's easy to see that the fallback data structure is a hypergraph. Consider constructing a path that transits the "chromium" node in the graph (e.g., a fallback path that involves LayoutTests/platform/chromium). Which edges are available depends on how you got to this vertex. For example, if you came via chromium-mac, then an edge to mac-snowleopard is available, but if you came via chromium-win, then it is not.
> (b) what the algorithm is intended to be as comments in either this bug or the code itself?
I'm happy to make the code more readable. The outline of the algorithm is as follows: 1) Compute the operative expectations for every port (e.g., by walking all the fallback paths). 2) Compute the optimal directories to store expectations in the fallback hypergraph such that the operative expectations are the same. 3) Compare the operative expectations from before and after the re-allocation to make sure we haven't screwed anything up. 4) Move the expected results around to realize the new storage locations. That much of the algorithm is fairly straightforward. The tough part is step (2). For that, we take all the ports that have the same expected result, compute the set of directories they have in common, and then attempt to store the result in the most specific such directory (based on some scoring algorithm). Repeat this process until you've either satisfied all the ports or you fail to make progress. This algorithm is just heuristic. There's no guantee that it works in the general case, which is why step (3) is important. It seems to work reasonably well on the examples I've tried, but I expect we'll need to refine it over time.
> As far as (b) goes, I can accept the answer that it's not possible to summarize the algorithm/heuristic any more simply than the code does itself, in which case I can take another look at it.
What's more important is the test cases that show the before/after allocations. I'm not that keen on adding comments because I expect the algorithm will change over time as we learn more.
> This is more intended to be the sort of feedback that "wow this code is non-obvious and I'd really want more comments on it than it has".
Yes, this algorithm is very non-obvious. It's probably one of the trickier algorithms I've written in a while.
> I would also be curious to know what sort of improvements this optimizing gives us over the simpler deduplicating algorithms we have used in the past?
This algorithm is massively better than any previous algorithm we've had. For example, here's a patch that deletes tons of redundant baselines:
http://trac.webkit.org/changeset/92648
I've been generating many patches like that as I've bene testing the algorithm. The main advantage that this algorithm has over its predecessors is that it knows how to *move* baseline around whereas deduplicate-tests only knows how to delete results and rebaseline-chromium-whatever only knew how to not add redundant results. I would have preferred to change the fallback system to not be insanely complicated, which would have made the optimization algorithm trivial, but webkit-dev wasn't too keen on that idea. Rather than argue about it, I just made this algorithm smart and now I don't have to worry about it.
Dirk Pranke
Comment 16
2011-08-09 09:22:44 PDT
(In reply to
comment #15
)
> > Would it be possible to get some explanation of (a) why the fallback path is a hypergraph and/or > > There are lots of different representations of hypergraphs. One that maps nicely onto this problem is to imagine traversing a graph, but that the edges available to add to your patch at any given vertex depend on which edges you've already added to your path. In a normal graph, path construction is history-independent, but in a hypergraph, it is history dependent. >
Thanks, that makes sense, but I didn't get that from wikipedia at all :(
> > That much of the algorithm is fairly straightforward. The tough part is step (2). For that, we take all the ports that have the same expected result, compute the set of directories they have in common, and then attempt to store the result in the most specific such directory (based on some scoring algorithm). Repeat this process until you've either satisfied all the ports or you fail to make progress. > > This algorithm is just heuristic. There's no guantee that it works in the general case, which is why step (3) is important. It seems to work reasonably well on the examples I've tried, but I expect we'll need to refine it over time
Hm. I will stare at this part of the code further. Agreed that this is the tough part :)
> > What's more important is the test cases that show the before/after allocations. I'm not that keen on adding comments because I expect the algorithm will change over time as we learn more.
> I agree that the test cases are the main thing; however, reading them to figure out what is actually happening can be hard to follow and error-prone (since the paths and checksums are long). If you look at what I did in rebaseline-chromium-whatever's unit tests, I tried to add short comments indicating what and maybe a little why, but not how.
> This algorithm is massively better than any previous algorithm we've had.
No question, the previous algorithms were pretty simple. Do you have anything like aggregate stats for the tree, e.g., we had 1000 duplicated files before, 500 after? Thanks for explaining!
Adam Barth
Comment 17
2011-08-09 09:35:29 PDT
No stats yet. I'm optimizing a block at a time to test the algorithm. Basically any test that has the same results on chromium-mac and chromium-win will have redundant results today.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug