Bug 163470
Summary: | run-webkit-tests consumes gigabytes of memory with --iterations 4294967300 | ||
---|---|---|---|
Product: | WebKit | Reporter: | David Kilzer (:ddkilzer) <ddkilzer> |
Component: | Tools / Tests | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW | ||
Severity: | Normal | CC: | ap, dean_johnson, lforschler, webkit-bug-importer |
Priority: | P2 | Keywords: | InRadar |
Version: | Safari 10 | ||
Hardware: | Unspecified | ||
OS: | Unspecified |
David Kilzer (:ddkilzer)
run-webkit-tests consumes gigabytes of memory with --iterations 4294967300:
$ ./Tools/Scripts/run-webkit-tests --release --no-build -1 --iterations 4294967300 --no-sample-on-timeout --no-timeout --child-processes=1 --batch-size=4294967300 --no-show-results compositing/color-matching/pdf-image-match.html
Are we hitting pathological behavior by specifying that many iterations?

Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
David Kilzer (:ddkilzer)
<rdar://problem/28783669>
Dean Johnson
I would suspect the major issue here being this function:
OpenSource/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py
...
class Manager(object):
...
def _get_test_inputs(self, tests_to_run, repeat_each, iterations):
test_inputs = []
for _ in xrange(iterations):
for test in tests_to_run:
for _ in xrange(repeat_each):
test_inputs.append(self._test_input_for_file(test)) # This line
return test_inputs
Since it figures out all test_inputs before the tests are actually ran, you'll see multiple gigabytes of data stored in memory with large iteration numbers.
Is this really an issue? When do we ever run millions of iterations? If we do need to do this for some reason, can we limit it to 1000000?
David Kilzer (:ddkilzer)
We could also make a smarter data structure that uses memory more efficiently, such as an iterator object that just returns the next test when asked, but internally stores the repetitive iteration state.
This is not a critical bug to fix, but I wanted to document the behavior that I saw when I ran into it.
Dean Johnson
Python has a data structure/process called "generators" that work very similarly to what you describe.
As an example, the code I pasted before could be written as follows, which would evaluate the "test" to return at the time it was being accessed as opposed to generating the set beforehand:
OpenSource/Tools/Scripts/webkitpy/layout_tests/controllers/manager.py
...
class Manager(object):
...
def _get_test_inputs_LIST(self, tests_to_run, repeat_each, iterations):
test_inputs = []
for _ in xrange(iterations):
for test in tests_to_run:
for _ in xrange(repeat_each):
test_inputs.append(self._test_input_for_file(test)) # This line
return test_inputs
def _get_test_inputs_GENERATOR(self, tests_to_run, repeat_each, iterations):
for _ in xrange(iterations):
for test in tests_to_run:
for _ in xrange(repeat_each):
yield self._test_input_for_file(test)
Now, calling _get_test_inputs_GENERATOR(args) would give you a generator object, which evaluates and returns the next item in the "list" at access-time. This *should* take the memory consumption from NUM_TESTS * NUM_ITERATIONS to O(1).
The only reason I have not just written a patch for this is I suspect we probably use _get_test_inputs in a repeatedly-accessible way, which would mean just adopting this new paradigm naively could lead to breaking existing test infrastructure.