Bug 81379

Summary: NRWT should not take memory used as disk cache into account when deciding how many processes to launch
Product: WebKit Reporter: Tim Horton <thorton>
Component: Tools / TestsAssignee: Dirk Pranke <dpranke>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dpranke, eric, jberlin, lforschler, ojan, webkit.review.bot
Priority: P2 Keywords: NRWT
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
patch for landing - update tests none

Tim Horton
Reported 2012-03-16 11:49:17 PDT
On OS X: > run-webkit-tests fast This machine could support 24 child processes, but only has enough memory for 13. > purge > run-webkit-tests fast This machine could support 24 child processes, but only has enough memory for 23. "purge" empties purgeable disk cache; this memory shouldn't be included when measuring how many children NRWT can make, as it will quickly evict itself if and when DRT/WKTR want it. (please don't solve this by purging my disk cache :D)
Attachments
Patch (10.35 KB, patch)
2012-06-18 19:09 PDT, Dirk Pranke
no flags
patch for landing - update tests (15.92 KB, patch)
2012-06-19 11:20 PDT, Dirk Pranke
no flags
Dirk Pranke
Comment 1 2012-03-16 11:58:33 PDT
Eric's memory guessing game strikes again ...
Eric Seidel (no email)
Comment 2 2012-03-16 12:02:38 PDT
I don't like the memory guessing game. You're welcome to change it however you see fit. It was added to make the Leopard Mac build bots (which only have 3GB of memory) not hang themselves.
Eric Seidel (no email)
Comment 3 2012-03-16 12:05:52 PDT
Eric Seidel (no email)
Comment 4 2012-03-16 12:09:14 PDT
A slightly better heuristic would be for us to keep launching child processes while we believe we have enough memory and free cores for doing so. The question that this bug is really asking, is how to get the idea of "free memory" from Mac OS X, which it turns out is neigh impossible. :) But checking to see if we have enough memory before we launch each child woudl be more robust than the current solution which checks up-front, and guesses how many max children it should use based on that number.
Dirk Pranke
Comment 5 2012-03-16 12:16:31 PDT
remind me what was wrong with just using total memory?
Eric Seidel (no email)
Comment 6 2012-03-16 13:51:19 PDT
We can look through the bugs to see. I think the concern was that the OS takes up an appreciable fraction of low-memory systems, so we'd have to use a very large "DRT size" approximation in order to avoid overwhelming those systems, but using a very large "DRT size" would cause us to not run enough processes on systems with a smaller percentage used by the OS? But using "free" memory doesn't work, because supposedly OS X tries hard to never have any free memory (since we have idle memory sitting around). Which is how we ended up using the current "inactive pages", iirc. You'd have to look through the bug/commit list to know more.
Dirk Pranke
Comment 7 2012-03-16 14:03:23 PDT
(In reply to comment #6) > You'd have to look through the bug/commit list to know more. I did; it didn't tell me anything, but what you say makes sense. IIRC, it was something like trying to run 12 DRTs on a machine w/ 3GB of memory or something. Perhaps instead of playing games w/ free mem, we should just take (total mem - 1 GB) / 512MB and try that instead.
Eric Seidel (no email)
Comment 8 2012-03-16 14:07:18 PDT
You're welcome to experiment with the heuristic however you like. I suspect hte first thing that should be to change the "this machine could support..." log to explain how to override that with --child-processes=, so that people have immediate recourse. I doubt we'll ever be able to make all users happy with a heuristic here, but getting the 90% is the goal.
Eric Seidel (no email)
Comment 9 2012-04-26 15:23:10 PDT
I suspect our best long-term approach is dynamically launch DRTs as needed each 100 tests or so. We wouldn't then know how many we would launch up-front, we'd just launch more until we hit the max number of processors or something. And stop if we ever thought we would run out of memory.
Dirk Pranke
Comment 10 2012-06-18 19:09:43 PDT
Eric Seidel (no email)
Comment 11 2012-06-19 09:00:06 PDT
Comment on attachment 148225 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=148225&action=review I predict that you'll find folks who complain about this change. :) But it's fine with me. I don't believe it's possible to make everyone happy here. I might suggesting just changing the int/long thing first and seeing if that makes more people happy. :) > Tools/Scripts/webkitpy/common/system/platforminfo.py:86 > - return int(self._executive.run_command(["sysctl", "-n", "hw.memsize"])) > + return long(self._executive.run_command(["sysctl", "-n", "hw.memsize"])) I wonder if this fix alone would be enough. :) > Tools/Scripts/webkitpy/layout_tests/port/mac.py:131 > + overhead = 2048 * 1024 * 1024 # Assume we need 2GB free for the O/S This is similar to what I originally implemented when I first added this feature. Didn't work very well on loaded machines with other things running. But none of these heuristics work well in all cases. I'm happy to use whatever heuristics makes the largest % of people happy. If this is that, then great. :)
Dirk Pranke
Comment 12 2012-06-19 09:21:18 PDT
(In reply to comment #11) > (From update of attachment 148225 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=148225&action=review > > I predict that you'll find folks who complain about this change. :) But it's fine with me. I don't believe it's possible to make everyone happy here. > > I might suggesting just changing the int/long thing first and seeing if that makes more people happy. :) > > > Tools/Scripts/webkitpy/common/system/platforminfo.py:86 > > - return int(self._executive.run_command(["sysctl", "-n", "hw.memsize"])) > > + return long(self._executive.run_command(["sysctl", "-n", "hw.memsize"])) > > I wonder if this fix alone would be enough. :) > Could be. Thanks for the review. We'll see how this goes.
Dirk Pranke
Comment 13 2012-06-19 11:20:18 PDT
Created attachment 148367 [details] patch for landing - update tests
Dirk Pranke
Comment 14 2012-06-19 11:20:46 PDT
Comment on attachment 148367 [details] patch for landing - update tests apparently I never ran the unit tests on this ... fixed now.
Dirk Pranke
Comment 15 2012-06-19 13:47:24 PDT
Note You need to log in before you can comment on or make changes to this bug.