Google's page cycler and Apple's PLT test suites are both private due to copyright law restrictions, and people outside of these two organizations cannot see the contents. This severely limits the utility and the effectiveness of these test suites because not all contributors can run them locally. We should add a publicly distributable version of these two test suites that tests page loading time of real websites. The idea here is to load pages out of archive.org. Because archive.org provides a content snapshot at a specific time, we can always pull the same data by simply specifying the URL, eliminating the need for distributing the page contents with the test suite but still allowing all contributors to obtain the same page test data when running the tests. Credit: Greg Simon for this clever idea. In order to avoid DoS'ing archive.org, we can use web-page-replay: http://code.google.com/p/web-page-replay/ to cache data locally and make any other necessary modifications to the page for the purpose of performance tests.
Created attachment 137280 [details] Adds basic replay mechanism
Everything in Tools/WebPageReplay has been imported from http://code.google.com/p/web-page-replay/. I still need to do some polishing but the basic machinery is there already. You can add a bunch of URLs to Replay/replaytest.list and run-perf-tests will automatically create replay tests for you :)
Created attachment 137283 [details] Missing replaytest.py Note: to use this patch, you have to go to Mac's system preference and setup http/https proxies at 8080 and 8413
(In reply to comment #2) > Everything in Tools/WebPageReplay has been imported from http://code.google.com/p/web-page-replay/. Is there any chance that it will work on linux if I install the dependencies? > I still need to do some polishing but the basic machinery is there already. You can add a bunch of URLs to Replay/replaytest.list and run-perf-tests will automatically create replay tests for you :) Hmm, is the ResourceLoadDelegate.mm the only mac specific change? Btw, do you have a recent version of the patch or the latest sits here? :) Thanks!
(In reply to comment #4) > > I still need to do some polishing but the basic machinery is there already. You can add a bunch of URLs to Replay/replaytest.list and run-perf-tests will automatically create replay tests for you :) > > Hmm, is the ResourceLoadDelegate.mm the only mac specific change? I suspect a similar change needs to be made on all ports. > Btw, do you have a recent version of the patch or the latest sits here? :) I'm working on it now.
Created attachment 140121 [details] work in progress Something is broken on web-page-replay side, and I can't get it to work. Everything results in 404 :(
Created attachment 140122 [details] work in progress (got rid of the useless change log entry)
Created attachment 141276 [details] work in progress
Here's a list of popular 60+ web pages (except ad provider, porn, etc...) that were available on archive.org: http://web.archive.org/web/20110729013436/http://www.google.com/ http://web.archive.org/web/20110101045027/http://www.youtube.com/ http://web.archive.org/web/20101029215059/http://www.yahoo.com/ http://web.archive.org/web/20110726050652/http://www.baidu.com/ http://web.archive.org/web/20110713210358/http://news.baidu.com/ http://web.archive.org/web/20110729021258/http://en.wikipedia.org/wiki/Main_Page http://web.archive.org/web/20110716004235/http://en.wikipedia.org/wiki/The_Beatles http://web.archive.org/web/20110131013707/http://www.qq.com/ http://web.archive.org/web/20110729032327/http://www.amazon.com/ http://web.archive.org/web/20110729033222/http://googleblog.blogspot.com/ http://web.archive.org/web/20100209183352/http://www.taobao.com/index_global.php http://web.archive.org/web/20110726224242/http://www.sina.com.cn/ http://web.archive.org/web/20110728224518/http://www.yahoo.co.jp/ http://web.archive.org/web/20110728062744/http://www.msn.com/ http://web.archive.org/web/20110729040307/http://wordpress.com/ http://web.archive.org/web/20110721134904/http://en.blog.wordpress.com/ http://web.archive.org/web/20110721020325/http://www.google.com.hk/ http://web.archive.org/web/20110729062331/http://www.ebay.com/ http://web.archive.org/web/20110721133155/http://www.yandex.ru/ http://web.archive.org/web/20110720113906/http://www.163.com/ http://web.archive.org/web/20110728224522/http://weibo.com/ http://web.archive.org/web/20110426114921/http://weibo.com/yaochen http://web.archive.org/web/20110630155134/http://www.bing.com/news http://web.archive.org/web/20110712090452/http://windows.microsoft.com/en-US/windows/home http://web.archive.org/web/20110723131535/http://news.soso.com/ http://web.archive.org/web/20110721002024/http://staff.tumblr.com/ http://web.archive.org/web/20110728143621/https://www.paypal.com/ http://web.archive.org/web/20110608070205/https://www.apple.com/ http://web.archive.org/web/20110729034928/http://www.google.ru/ http://web.archive.org/web/20110728224803/http://www.sohu.com/ http://web.archive.org/web/20110725235527/http://sfbay.craigslist.org/ http://web.archive.org/web/20110624121311/http://sfbay.craigslist.org/apa/ http://web.archive.org/web/20110728165802/http://www.imdb.com/ http://web.archive.org/web/20110729013439/http://www.bbc.co.uk/ http://web.archive.org/web/20110728190248/http://www.tudou.com/ http://web.archive.org/web/20100613214720/http://sl65amg.blog32.fc2.com/ http://web.archive.org/web/20110708184935/http://blog.goo.ne.jp/isehakusandou http://web.archive.org/web/20110721233757/http://www.ifeng.com/ http://web.archive.org/web/20110728092214/http://www.ask.com/ http://liveweb.archive.org/http://www.ask.com/wiki http://web.archive.org/web/20110726103516/http://www.youku.com/ http://web.archive.org/web/20110710095406/http://imgur.com/gallery http://web.archive.org/web/20110729013512/http://www.cnn.com/ http://web.archive.org/web/20110716175917/http://www.hao123.com/ http://web.archive.org/web/20110728094718/http://www.aol.com/ http://web.archive.org/web/20110728190308/http://espn.go.com/ http://web.archive.org/web/20110724020539/http://www.alibaba.com/ http://web.archive.org/web/20110725075627/http://www.avg.com/us-en/homepage http://web.archive.org/web/20110723060303/http://www.rakuten.co.jp/ http://web.archive.org/web/20110707075549/http://www.about.com/travel/ http://web.archive.org/web/20110727034228/http://www.chinaz.com/ http://web.archive.org/web/20110729082516/http://wordpress.org/ http://web.archive.org/web/20110722032538/http://official.ameba.jp/ranking/day/officialAccessRankingTop.html http://web.archive.org/web/20110709073905/https://www.alipay.com/ http://web.archive.org/web/20110722061518/http://www.uol.com.br/ http://web.archive.org/web/20110721223556/http://www.amazon.co.jp/ http://web.archive.org/web/20110512210519/http://stackoverflow.com/ http://web.archive.org/web/20110729043303/http://www.huffingtonpost.com/ http://web.archive.org/web/20100627183250/http://www.cnet.com/ http://web.archive.org/web/20110416045031/http://www.dailymotion.com/us
Created attachment 144180 [details] work in progress 4
It appears that some of the pages can't be loaded properly in DRT using web-page-replay.
Created attachment 144957 [details] Initial implementation
Comment on attachment 144957 [details] Initial implementation View in context: https://bugs.webkit.org/attachment.cgi?id=144957&action=review Patch generally looks good, just some minor comments > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:533 > + command = driver_input.test_name Seems like this should just go in is_http_test()? I understand that you might be thinking 'that's just for tests in the http/ directory', but it's probably more confusing this way. > Tools/Scripts/webkitpy/performance_tests/perftest.py:72 > + if not output or output.text == None or output.error: when would output be None here? driver.run_test() should always return something; can you call run_failed() outside of run()? > Tools/Scripts/webkitpy/performance_tests/perftest.py:176 > + def run(self, port, driver, time_out_ms): Does it make more sense to pass the port to __init__()? > Tools/Scripts/webkitpy/performance_tests/perftest.py:210 > + return driver.run_test(DriverInput(self.path_or_url(), time_out_ms, None, False)) Maybe this should go up into PerfTest() and then PerfTest.run() can just call this to avoid the repetition? > Tools/Scripts/webkitpy/performance_tests/perftest.py:219 > + replay_path = os.path.join(os.path.dirname(webkitpy.thirdparty.autoinstalled.webpagereplay.__file__), 'replay.py') can you get this from just webkitpy.thirdparty.autoinstalled.webpagereplay.replay.__file__ ? > Tools/Scripts/webkitpy/performance_tests/perftest.py:230 > + connection = socket.create_connection(('localhost', '8080'), timeout=100) timeouts are in seconds, are you really trying to wait for 100 seconds and retrying 100 times, or did you mean for this to be msecs? > Tools/Scripts/webkitpy/performance_tests/perftest.py:239 > + self._process.send_signal(signal.SIGINT) curious why this is SIGINT instead of TERM or KILL? > Tools/Scripts/webkitpy/performance_tests/perftest.py:311 > + filesystem = driver._port.host.filesystem You can just access port.host.filesystem directly.
Comment on attachment 144957 [details] Initial implementation View in context: https://bugs.webkit.org/attachment.cgi?id=144957&action=review >> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:533 >> + command = driver_input.test_name > > Seems like this should just go in is_http_test()? I understand that you might be thinking 'that's just for tests in the http/ directory', but it's probably more confusing this way. That's problematic because of code like: def test_to_uri(self, test_name): """Convert a test name to a URI.""" if not self.is_http_test(test_name): return path.abspath_to_uri(self._port.abspath_for_test(test_name)) I've added is_external_http_test to base.py instead. >> Tools/Scripts/webkitpy/performance_tests/perftest.py:72 >> + if not output or output.text == None or output.error: > > when would output be None here? driver.run_test() should always return something; can you call run_failed() outside of run()? run_single will return None if the web page replay didn't start properly. Moved the condition to PageLoadingPerfTest.run. >> Tools/Scripts/webkitpy/performance_tests/perftest.py:176 >> + def run(self, port, driver, time_out_ms): > > Does it make more sense to pass the port to __init__()? That's a good idea how that both run and prepare take port. >> Tools/Scripts/webkitpy/performance_tests/perftest.py:210 >> + return driver.run_test(DriverInput(self.path_or_url(), time_out_ms, None, False)) > > Maybe this should go up into PerfTest() and then PerfTest.run() can just call this to avoid the repetition? Done. >> Tools/Scripts/webkitpy/performance_tests/perftest.py:219 >> + replay_path = os.path.join(os.path.dirname(webkitpy.thirdparty.autoinstalled.webpagereplay.__file__), 'replay.py') > > can you get this from just webkitpy.thirdparty.autoinstalled.webpagereplay.replay.__file__ ? Done. >> Tools/Scripts/webkitpy/performance_tests/perftest.py:230 >> + connection = socket.create_connection(('localhost', '8080'), timeout=100) > > timeouts are in seconds, are you really trying to wait for 100 seconds and retrying 100 times, or did you mean for this to be msecs? Fixed. It appears that create_connection bails out immediately when there's nobody listening on the specified port. So I've added an explicit sleep of 1s to avoid the busy loop. >> Tools/Scripts/webkitpy/performance_tests/perftest.py:239 >> + self._process.send_signal(signal.SIGINT) > > curious why this is SIGINT instead of TERM or KILL? Changed it to SIGTERM. >> Tools/Scripts/webkitpy/performance_tests/perftest.py:311 >> + filesystem = driver._port.host.filesystem > > You can just access port.host.filesystem directly. Yeah, I forgot to update that. Fixed.
Created attachment 145188 [details] Addressed Dirk's comments.
Comment on attachment 145188 [details] Addressed Dirk's comments. View in context: https://bugs.webkit.org/attachment.cgi?id=145188&action=review > Tools/Scripts/webkitpy/layout_tests/port/driver.py:134 > + Turns out I don't want this ... see below (and sorry!) > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:533 > + command = driver_input.test_name Okay, I see the problem here, which is that driver_input.test_name isn't actually a test_name, it's a full URL. DriverInputs aren't supposed to be full URLs, they're supposed to be test names :). However, it would be lame if that kept you from using the driver, so we need some other way of distinguishing that this is a URL and not a test name. I'm not sure what the best way to do that is yet, and I don't want to block you on this ... Can you add a #FIXME for now, flip this clause so that it's tested first, and go back to what you were doing before? e.g.: # FIXME: perf tests can pass in full URLs, not just test names ... if driver_input.test_name.startswith('http://') or driver_input.test_name.startswith('https://'): command = driver_input.test_name else: ... sorry for the back-and-forth on this. Everything else looks fine. Thanks!
Comment on attachment 145188 [details] Addressed Dirk's comments. View in context: https://bugs.webkit.org/attachment.cgi?id=145188&action=review >> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:533 >> + command = driver_input.test_name > > Okay, I see the problem here, which is that driver_input.test_name isn't actually a test_name, it's a full URL. DriverInputs aren't supposed to be full URLs, they're supposed to be test names :). > > However, it would be lame if that kept you from using the driver, so we need some other way of distinguishing that this is a URL and not a test name. I'm not sure what the best way to do that is yet, and I don't want to block you on this ... Can you add a #FIXME for now, flip this clause so that it's tested first, and go back to what you were doing before? e.g.: > > # FIXME: perf tests can pass in full URLs, not just test names ... > if driver_input.test_name.startswith('http://') or driver_input.test_name.startswith('https://'): > command = driver_input.test_name > else: > ... > > sorry for the back-and-forth on this. > > Everything else looks fine. Thanks! Okay. WIll do that. I think we should just rename test_name to path_or_url.
Committed r119188: <http://trac.webkit.org/changeset/119188>
Thanks for the review, Dirk. I'm thrilled to land this patch :) An open performance test suite is coming!
Mozilla-side work recorded here: https://bugzilla.mozilla.org/show_bug.cgi?id=760574
FYI, I've added some explanation about this on http://trac.webkit.org/wiki/Writing%20Performance%20Tests