Bug 84008

Summary: Add public page loading performance tests using web-page-replay
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: Tools / TestsAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dongseong.hwang, dpranke, efidler, eric, gregsimon, hausmann, jhammel, joel.maher, kling, koivisto, morrita, ojan, slewis, svillar, tony, zoltan
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 86191, 87326, 87893    
Bug Blocks: 77037, 88041    
Attachments:
Description Flags
Adds basic replay mechanism
none
Missing replaytest.py
none
work in progress
none
work in progress (got rid of the useless change log entry)
none
work in progress
none
work in progress 4
none
Initial implementation
none
Addressed Dirk's comments. dpranke: review+

Description Ryosuke Niwa 2012-04-15 23:27:20 PDT
Google's page cycler and Apple's PLT test suites are both private due to copyright law restrictions, and people outside of these two organizations cannot see the contents. This severely limits the utility and the effectiveness of these test suites because not all contributors can run them locally.

We should add a publicly distributable version of these two test suites that tests page loading time of real websites. The idea here is to load pages out of archive.org. Because archive.org provides a content snapshot at a specific time, we can always pull the same data by simply specifying the URL, eliminating the need for distributing the page contents with the test suite but still allowing all contributors to obtain the same page test data when running the tests. Credit: Greg Simon for this clever idea.

In order to avoid DoS'ing archive.org, we can use web-page-replay: http://code.google.com/p/web-page-replay/ to cache data locally and make any other necessary modifications to the page for the purpose of performance tests.
Comment 1 Ryosuke Niwa 2012-04-15 23:31:12 PDT
Created attachment 137280 [details]
Adds basic replay mechanism
Comment 2 Ryosuke Niwa 2012-04-15 23:32:38 PDT
Everything in Tools/WebPageReplay has been imported from http://code.google.com/p/web-page-replay/.

I still need to do some polishing but the basic machinery is there already. You can add a bunch of URLs to Replay/replaytest.list and run-perf-tests will automatically create replay tests for you :)
Comment 3 Ryosuke Niwa 2012-04-15 23:43:13 PDT
Created attachment 137283 [details]
Missing replaytest.py

Note: to use this patch, you have to go to Mac's system preference and setup http/https proxies at 8080 and 8413
Comment 4 Zoltan Horvath 2012-05-03 01:38:43 PDT
(In reply to comment #2)
> Everything in Tools/WebPageReplay has been imported from http://code.google.com/p/web-page-replay/.

Is there any chance that it will work on linux if I install the dependencies?  

> I still need to do some polishing but the basic machinery is there already. You can add a bunch of URLs to Replay/replaytest.list and run-perf-tests will automatically create replay tests for you :)

Hmm, is the ResourceLoadDelegate.mm the only mac specific change?

Btw, do you have a recent version of the patch or the latest sits here? :)

Thanks!
Comment 5 Ryosuke Niwa 2012-05-03 03:27:02 PDT
(In reply to comment #4)
> > I still need to do some polishing but the basic machinery is there already. You can add a bunch of URLs to Replay/replaytest.list and run-perf-tests will automatically create replay tests for you :)
> 
> Hmm, is the ResourceLoadDelegate.mm the only mac specific change?

I suspect a similar change needs to be made on all ports.

> Btw, do you have a recent version of the patch or the latest sits here? :)

I'm working on it now.
Comment 6 Ryosuke Niwa 2012-05-03 16:17:58 PDT
Created attachment 140121 [details]
work in progress

Something is broken on web-page-replay side, and I can't get it to work. Everything results in 404 :(
Comment 7 Ryosuke Niwa 2012-05-03 16:19:13 PDT
Created attachment 140122 [details]
work in progress (got rid of the useless change log entry)
Comment 8 Ryosuke Niwa 2012-05-10 15:13:50 PDT
Created attachment 141276 [details]
work in progress
Comment 9 Ryosuke Niwa 2012-05-23 16:29:26 PDT
Here's a list of popular 60+ web pages (except ad provider, porn, etc...) that were available on archive.org:

http://web.archive.org/web/20110729013436/http://www.google.com/
http://web.archive.org/web/20110101045027/http://www.youtube.com/
http://web.archive.org/web/20101029215059/http://www.yahoo.com/
http://web.archive.org/web/20110726050652/http://www.baidu.com/
http://web.archive.org/web/20110713210358/http://news.baidu.com/
http://web.archive.org/web/20110729021258/http://en.wikipedia.org/wiki/Main_Page
http://web.archive.org/web/20110716004235/http://en.wikipedia.org/wiki/The_Beatles
http://web.archive.org/web/20110131013707/http://www.qq.com/
http://web.archive.org/web/20110729032327/http://www.amazon.com/
http://web.archive.org/web/20110729033222/http://googleblog.blogspot.com/
http://web.archive.org/web/20100209183352/http://www.taobao.com/index_global.php
http://web.archive.org/web/20110726224242/http://www.sina.com.cn/
http://web.archive.org/web/20110728224518/http://www.yahoo.co.jp/
http://web.archive.org/web/20110728062744/http://www.msn.com/
http://web.archive.org/web/20110729040307/http://wordpress.com/
http://web.archive.org/web/20110721134904/http://en.blog.wordpress.com/
http://web.archive.org/web/20110721020325/http://www.google.com.hk/
http://web.archive.org/web/20110729062331/http://www.ebay.com/
http://web.archive.org/web/20110721133155/http://www.yandex.ru/
http://web.archive.org/web/20110720113906/http://www.163.com/
http://web.archive.org/web/20110728224522/http://weibo.com/
http://web.archive.org/web/20110426114921/http://weibo.com/yaochen
http://web.archive.org/web/20110630155134/http://www.bing.com/news
http://web.archive.org/web/20110712090452/http://windows.microsoft.com/en-US/windows/home
http://web.archive.org/web/20110723131535/http://news.soso.com/
http://web.archive.org/web/20110721002024/http://staff.tumblr.com/
http://web.archive.org/web/20110728143621/https://www.paypal.com/
http://web.archive.org/web/20110608070205/https://www.apple.com/
http://web.archive.org/web/20110729034928/http://www.google.ru/
http://web.archive.org/web/20110728224803/http://www.sohu.com/
http://web.archive.org/web/20110725235527/http://sfbay.craigslist.org/
http://web.archive.org/web/20110624121311/http://sfbay.craigslist.org/apa/
http://web.archive.org/web/20110728165802/http://www.imdb.com/
http://web.archive.org/web/20110729013439/http://www.bbc.co.uk/
http://web.archive.org/web/20110728190248/http://www.tudou.com/
http://web.archive.org/web/20100613214720/http://sl65amg.blog32.fc2.com/
http://web.archive.org/web/20110708184935/http://blog.goo.ne.jp/isehakusandou
http://web.archive.org/web/20110721233757/http://www.ifeng.com/
http://web.archive.org/web/20110728092214/http://www.ask.com/
http://liveweb.archive.org/http://www.ask.com/wiki
http://web.archive.org/web/20110726103516/http://www.youku.com/
http://web.archive.org/web/20110710095406/http://imgur.com/gallery
http://web.archive.org/web/20110729013512/http://www.cnn.com/
http://web.archive.org/web/20110716175917/http://www.hao123.com/
http://web.archive.org/web/20110728094718/http://www.aol.com/
http://web.archive.org/web/20110728190308/http://espn.go.com/
http://web.archive.org/web/20110724020539/http://www.alibaba.com/
http://web.archive.org/web/20110725075627/http://www.avg.com/us-en/homepage
http://web.archive.org/web/20110723060303/http://www.rakuten.co.jp/
http://web.archive.org/web/20110707075549/http://www.about.com/travel/
http://web.archive.org/web/20110727034228/http://www.chinaz.com/
http://web.archive.org/web/20110729082516/http://wordpress.org/
http://web.archive.org/web/20110722032538/http://official.ameba.jp/ranking/day/officialAccessRankingTop.html
http://web.archive.org/web/20110709073905/https://www.alipay.com/
http://web.archive.org/web/20110722061518/http://www.uol.com.br/
http://web.archive.org/web/20110721223556/http://www.amazon.co.jp/
http://web.archive.org/web/20110512210519/http://stackoverflow.com/
http://web.archive.org/web/20110729043303/http://www.huffingtonpost.com/
http://web.archive.org/web/20100627183250/http://www.cnet.com/
http://web.archive.org/web/20110416045031/http://www.dailymotion.com/us
Comment 10 Ryosuke Niwa 2012-05-25 18:40:23 PDT
Created attachment 144180 [details]
work in progress 4
Comment 11 Ryosuke Niwa 2012-05-30 10:51:10 PDT
It appears that some of the pages can't be loaded properly in DRT using web-page-replay.
Comment 12 Ryosuke Niwa 2012-05-30 17:44:46 PDT
Created attachment 144957 [details]
Initial implementation
Comment 13 Dirk Pranke 2012-05-30 18:48:31 PDT
Comment on attachment 144957 [details]
Initial implementation

View in context: https://bugs.webkit.org/attachment.cgi?id=144957&action=review

Patch generally looks good, just some minor comments

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:533
> +            command = driver_input.test_name

Seems like this should just go in is_http_test()? I understand that you might be thinking 'that's just for tests in the http/ directory', but it's probably more confusing this way.

> Tools/Scripts/webkitpy/performance_tests/perftest.py:72
> +        if not output or output.text == None or output.error:

when would output be None here? driver.run_test() should always return something; can you call run_failed() outside of run()?

> Tools/Scripts/webkitpy/performance_tests/perftest.py:176
> +    def run(self, port, driver, time_out_ms):

Does it make more sense to pass the port to __init__()?

> Tools/Scripts/webkitpy/performance_tests/perftest.py:210
> +        return driver.run_test(DriverInput(self.path_or_url(), time_out_ms, None, False))

Maybe this should go up into PerfTest() and then PerfTest.run() can just call this to avoid the repetition?

> Tools/Scripts/webkitpy/performance_tests/perftest.py:219
> +        replay_path = os.path.join(os.path.dirname(webkitpy.thirdparty.autoinstalled.webpagereplay.__file__), 'replay.py')

can you get this from just webkitpy.thirdparty.autoinstalled.webpagereplay.replay.__file__ ?

> Tools/Scripts/webkitpy/performance_tests/perftest.py:230
> +                connection = socket.create_connection(('localhost', '8080'), timeout=100)

timeouts are in seconds, are you really trying to wait for 100 seconds and retrying 100 times, or did you mean for this to be msecs?

> Tools/Scripts/webkitpy/performance_tests/perftest.py:239
> +            self._process.send_signal(signal.SIGINT)

curious why this is SIGINT instead of TERM or KILL?

> Tools/Scripts/webkitpy/performance_tests/perftest.py:311
> +            filesystem = driver._port.host.filesystem

You can just access port.host.filesystem directly.
Comment 14 Ryosuke Niwa 2012-05-31 17:55:44 PDT
Comment on attachment 144957 [details]
Initial implementation

View in context: https://bugs.webkit.org/attachment.cgi?id=144957&action=review

>> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:533
>> +            command = driver_input.test_name
> 
> Seems like this should just go in is_http_test()? I understand that you might be thinking 'that's just for tests in the http/ directory', but it's probably more confusing this way.

That's problematic because of code like:
    def test_to_uri(self, test_name):
        """Convert a test name to a URI."""
        if not self.is_http_test(test_name):
            return path.abspath_to_uri(self._port.abspath_for_test(test_name))

I've added is_external_http_test to base.py instead.

>> Tools/Scripts/webkitpy/performance_tests/perftest.py:72
>> +        if not output or output.text == None or output.error:
> 
> when would output be None here? driver.run_test() should always return something; can you call run_failed() outside of run()?

run_single will return None if the web page replay didn't start properly.
Moved the condition to PageLoadingPerfTest.run.

>> Tools/Scripts/webkitpy/performance_tests/perftest.py:176
>> +    def run(self, port, driver, time_out_ms):
> 
> Does it make more sense to pass the port to __init__()?

That's a good idea how that both run and prepare take port.

>> Tools/Scripts/webkitpy/performance_tests/perftest.py:210
>> +        return driver.run_test(DriverInput(self.path_or_url(), time_out_ms, None, False))
> 
> Maybe this should go up into PerfTest() and then PerfTest.run() can just call this to avoid the repetition?

Done.

>> Tools/Scripts/webkitpy/performance_tests/perftest.py:219
>> +        replay_path = os.path.join(os.path.dirname(webkitpy.thirdparty.autoinstalled.webpagereplay.__file__), 'replay.py')
> 
> can you get this from just webkitpy.thirdparty.autoinstalled.webpagereplay.replay.__file__ ?

Done.

>> Tools/Scripts/webkitpy/performance_tests/perftest.py:230
>> +                connection = socket.create_connection(('localhost', '8080'), timeout=100)
> 
> timeouts are in seconds, are you really trying to wait for 100 seconds and retrying 100 times, or did you mean for this to be msecs?

Fixed. It appears that create_connection bails out immediately when there's nobody listening on the specified port.
So I've added an explicit sleep of 1s to avoid the busy loop.

>> Tools/Scripts/webkitpy/performance_tests/perftest.py:239
>> +            self._process.send_signal(signal.SIGINT)
> 
> curious why this is SIGINT instead of TERM or KILL?

Changed it to SIGTERM.

>> Tools/Scripts/webkitpy/performance_tests/perftest.py:311
>> +            filesystem = driver._port.host.filesystem
> 
> You can just access port.host.filesystem directly.

Yeah, I forgot to update that. Fixed.
Comment 15 Ryosuke Niwa 2012-05-31 18:21:38 PDT
Created attachment 145188 [details]
Addressed Dirk's comments.
Comment 16 Dirk Pranke 2012-05-31 19:38:49 PDT
Comment on attachment 145188 [details]
Addressed Dirk's comments.

View in context: https://bugs.webkit.org/attachment.cgi?id=145188&action=review

> Tools/Scripts/webkitpy/layout_tests/port/driver.py:134
> +

Turns out I don't want this ... see below (and sorry!)

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:533
> +            command = driver_input.test_name

Okay, I see the problem here, which is that driver_input.test_name isn't actually a test_name, it's a full URL. DriverInputs aren't supposed to be full URLs, they're supposed to be test names :). 

However, it would be lame if that kept you from using the driver, so we need some other way of distinguishing that this is a URL and not a test name. I'm not sure what the best way to do that is yet, and I don't want to block you on this ... Can you add a #FIXME for now, flip this clause so that it's tested first, and go back to what you were doing before? e.g.:

# FIXME: perf tests can pass in full URLs, not just test names ...
if driver_input.test_name.startswith('http://') or driver_input.test_name.startswith('https://'):
   command = driver_input.test_name
else:
   ...

sorry for the back-and-forth on this.

Everything else looks fine. Thanks!
Comment 17 Ryosuke Niwa 2012-05-31 19:43:23 PDT
Comment on attachment 145188 [details]
Addressed Dirk's comments.

View in context: https://bugs.webkit.org/attachment.cgi?id=145188&action=review

>> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:533
>> +            command = driver_input.test_name
> 
> Okay, I see the problem here, which is that driver_input.test_name isn't actually a test_name, it's a full URL. DriverInputs aren't supposed to be full URLs, they're supposed to be test names :). 
> 
> However, it would be lame if that kept you from using the driver, so we need some other way of distinguishing that this is a URL and not a test name. I'm not sure what the best way to do that is yet, and I don't want to block you on this ... Can you add a #FIXME for now, flip this clause so that it's tested first, and go back to what you were doing before? e.g.:
> 
> # FIXME: perf tests can pass in full URLs, not just test names ...
> if driver_input.test_name.startswith('http://') or driver_input.test_name.startswith('https://'):
>    command = driver_input.test_name
> else:
>    ...
> 
> sorry for the back-and-forth on this.
> 
> Everything else looks fine. Thanks!

Okay. WIll do that. I think we should just rename test_name to path_or_url.
Comment 18 Ryosuke Niwa 2012-05-31 20:19:37 PDT
Committed r119188: <http://trac.webkit.org/changeset/119188>
Comment 19 Ryosuke Niwa 2012-05-31 20:22:36 PDT
Thanks for the review, Dirk. I'm thrilled to land this patch :) An open performance test suite is coming!
Comment 20 Jeff Hammel 2012-06-01 09:59:58 PDT
Mozilla-side work recorded here: https://bugzilla.mozilla.org/show_bug.cgi?id=760574
Comment 21 Ryosuke Niwa 2012-07-27 00:14:53 PDT
FYI, I've added some explanation about this on http://trac.webkit.org/wiki/Writing%20Performance%20Tests