Bug 166885

Summary: [GTK] Add a script to upload the data generated with run-benchmark to a browserperfdash server and use it on the gtk perf bot
Product: WebKit Reporter: Carlos Alberto Lopez Perez <clopez>
Component: Tools / TestsAssignee: Carlos Alberto Lopez Perez <clopez>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, bugs-noreply, calvaris, commit-queue, dbates, glenn, lforschler, mcatanzaro, ossy, rniwa, sbarati, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch mcatanzaro: review+

Description Carlos Alberto Lopez Perez 2017-01-10 02:54:09 PST
The WebKitGTK+ performance bot is automatically running the run-benchmark script with the minibrowser-gtk. But we are not sending the performance data anywhere.
We have started to implement a web application that would be able to receive the data from run-benchmark and generate tables and graphs.

This web application [1] is still under development and not really useful right now, but we want to start sending data now to it, so we can work with real data meanwhile we develop and improve it.


[1]
test deployment: http://browserperfdash.igalia.com
source code: https://github.com/Igalia/browserperfdash
Comment 1 Carlos Alberto Lopez Perez 2017-01-10 05:06:16 PST
Created attachment 298457 [details]
Patch
Comment 2 Michael Catanzaro 2017-01-10 08:49:56 PST
Comment on attachment 298457 [details]
Patch

I didn't check it too closely, but looks sane.
Comment 3 Ryosuke Niwa 2017-01-10 09:33:47 PST
Comment on attachment 298457 [details]
Patch

Do you have your own version of a perf dashboard in addition to perf.webkit.org?
What you're adding is not compatible with what perf.webkit.org accepts.
Comment 4 Ryosuke Niwa 2017-01-10 09:35:51 PST
Comment on attachment 298457 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=298457&action=review

r- because I don't think we want to be creating yet another perf dashboard.
If there's a GTK+ specific perf dashboard, it's more preferable to have a separate script that submit results instead.

> Tools/Scripts/webkitpy/benchmark_runner/upload-results-config-file-example.txt:9
> +# Configuration for dashboard benchmark.webkit.org
> +[benchmark.webkit.org]
> +bot_id = gtk-release-perf-bot-64
> +bot_password = Rand0mPass
> +post_url = https://benchmark.webkit.org/dash/bot-report

This website doesn't exist. Are you creating a new one?
Why don't we just use perf.webkit.org instead?
Comment 5 Carlos Alberto Lopez Perez 2017-01-10 10:06:29 PST
(In reply to comment #4)
> Comment on attachment 298457 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=298457&action=review
> 
> r- because I don't think we want to be creating yet another perf dashboard.

We want another performance dashboard (read below). We don't intend to replace perf.webkit.org but run the other dashboard in parallel. 

> If there's a GTK+ specific perf dashboard, it's more preferable to have a
> separate script that submit results instead.
> 

Other ports will be welcome there once we have it ready for production.
We are now running it only for the GTK+ performance bot because its still under development.

Also, the patch is neutral to the script run-benchmark. We are only adding two optional new arguments. If you don't use those arguments you can continue to run the script as always you did. 

So I don't agree that we should use a different script.

> > Tools/Scripts/webkitpy/benchmark_runner/upload-results-config-file-example.txt:9
> > +# Configuration for dashboard benchmark.webkit.org
> > +[benchmark.webkit.org]
> > +bot_id = gtk-release-perf-bot-64
> > +bot_password = Rand0mPass
> > +post_url = https://benchmark.webkit.org/dash/bot-report
> 
> This website doesn't exist. Are you creating a new one?

Yes. 
Not on that URL, though, that file is just an example of configuration.
The real config file should not be checked out in the SVN because it contains passwords.

Read the first comment of this bug to see the links to it.


> Why don't we just use perf.webkit.org instead?

I don't know if perf.webkit.org in its current state can understand the data from run-benchmark script. Can it?

And even if it can, we are developing this new dashboard and we want the GTK bot to start sending data there so we can play with the data meanwhile we develop it.

Some of the reasons to develop this new dashboard instead of improving perf.webkit.org:

* I really miss be able to easily spot recent regressions on perf.webkit.org. Having to look at the graphs by hand one by one is not really usable. We want something that automatically shows a table with the most recent regressions.

* The run-benchmark script allows to benchmark multiple browsers: chrome, firefox, minibrowser, etc.. This new performance dashboard we are developing has that use case on mind. We will end deploying several performance dashboards instances (some public, some not) that would allow to compare and plot performance graphs from multiple browsers and platforms.

* We have decided that is easier for us to start this new project from scratch (with python-django) than to try to improve perf.webkit.org to fit our use cases (that may be different than the ones you want for perf.webkit.org).


Regarding the GTK+ performance bot. It currently runs two steps:

 * perf-step: this sends the data to perf.webkit.org and will continue to do so.
 * run-benchmark: only our bot runs this step, and we want to start sending this data to this new performance dashboard instead of sending it to /dev/null like currently happens.


So, it will be possible to have two performance dashboard.
The current perf.webkit.org and this new one that is still under development.
Comment 6 Ryosuke Niwa 2017-01-10 10:17:54 PST
(In reply to comment #5)
> (In reply to comment #4)
> > Comment on attachment 298457 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=298457&action=review
> > 
> > r- because I don't think we want to be creating yet another perf dashboard.
> 
> We want another performance dashboard (read below). We don't intend to
> replace perf.webkit.org but run the other dashboard in parallel. 
> 
> > If there's a GTK+ specific perf dashboard, it's more preferable to have a
> > separate script that submit results instead.
> > 
> 
> Other ports will be welcome there once we have it ready for production.
> We are now running it only for the GTK+ performance bot because its still
> under development.
> 
> Also, the patch is neutral to the script run-benchmark. We are only adding
> two optional new arguments. If you don't use those arguments you can
> continue to run the script as always you did. 
> 
> So I don't agree that we should use a different script.

The trouble is that we're using run-benchmark to submit results to an internal instance of perf.webkit.org.

Having this functionality built into run-benchmark only for GTK+ would make this very confusing.

> > Why don't we just use perf.webkit.org instead?
> 
> I don't know if perf.webkit.org in its current state can understand the data
> from run-benchmark script. Can it?

It certainly can. In fact, that's what we use to test it on dozens upon dozens of Macs and iOS devices internally at Apple.

> And even if it can, we are developing this new dashboard and we want the GTK
> bot to start sending data there so we can play with the data meanwhile we
> develop it.
> 
> Some of the reasons to develop this new dashboard instead of improving
> perf.webkit.org:
> 
> * I really miss be able to easily spot recent regressions on
> perf.webkit.org. Having to look at the graphs by hand one by one is not
> really usable. We want something that automatically shows a table with the
> most recent regressions.

Have you tried using v3 UI?  It has the capability to mark each graph with annotations for this exact purpose.

> * The run-benchmark script allows to benchmark multiple browsers: chrome,
> firefox, minibrowser, etc.. This new performance dashboard we are developing
> has that use case on mind. We will end deploying several performance
> dashboards instances (some public, some not) that would allow to compare and
> plot performance graphs from multiple browsers and platforms.

This is an enhancement we're adding to perf.webkit.org.

> Regarding the GTK+ performance bot. It currently runs two steps:
> 
>  * perf-step: this sends the data to perf.webkit.org and will continue to do
> so.
>  * run-benchmark: only our bot runs this step, and we want to start sending
> this data to this new performance dashboard instead of sending it to
> /dev/null like currently happens.
> 
> 
> So, it will be possible to have two performance dashboard.
> The current perf.webkit.org and this new one that is still under development.

Having another dashboard for GTK+ is fine, and that's exactly what we do at Apple. However, please use a separate standalone script to submit results using the JSON run-benchmark generates instead of hard-coding your dashboard's format into run-benchmark because we're not interested in using your dashboard, and having that code in run-benchmark would make it harder for us to maintain the script.
Comment 7 Ryosuke Niwa 2017-01-10 10:23:57 PST
Note that --output-file specifies the filename of the JSON it generates. This uses the exact same format perf.webkit.org uses.

We have a simple Python script that grabs various repository info, etc... and submit this result to an internal instance of perf.webkit.org.

It should be simple to write a similar script and submit the results to your own dashboard.
Comment 8 Carlos Alberto Lopez Perez 2017-01-10 11:29:06 PST
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > Comment on attachment 298457 [details]
> > > Patch
> > > 
> > > View in context:
> > > https://bugs.webkit.org/attachment.cgi?id=298457&action=review
> > > 
> > > r- because I don't think we want to be creating yet another perf dashboard.
> > 
> > We want another performance dashboard (read below). We don't intend to
> > replace perf.webkit.org but run the other dashboard in parallel. 
> > 
> > > If there's a GTK+ specific perf dashboard, it's more preferable to have a
> > > separate script that submit results instead.
> > > 
> > 
> > Other ports will be welcome there once we have it ready for production.
> > We are now running it only for the GTK+ performance bot because its still
> > under development.
> > 
> > Also, the patch is neutral to the script run-benchmark. We are only adding
> > two optional new arguments. If you don't use those arguments you can
> > continue to run the script as always you did. 
> > 
> > So I don't agree that we should use a different script.
> 
> The trouble is that we're using run-benchmark to submit results to an
> internal instance of perf.webkit.org.
> 
> Having this functionality built into run-benchmark only for GTK+ would make
> this very confusing.
> 


Why will be very confusing? run-benchmark has another switches that we don't use (neither know what are meant for) like --device-id and its fine.

Maybe if we move the options --upload-results-config-file and --browser-version inside an argparse GTK-specific option group will make this acceptable to you?

> > > Why don't we just use perf.webkit.org instead?
> > 
> > I don't know if perf.webkit.org in its current state can understand the data
> > from run-benchmark script. Can it?
> 
> It certainly can. In fact, that's what we use to test it on dozens upon
> dozens of Macs and iOS devices internally at Apple.
> 
> > And even if it can, we are developing this new dashboard and we want the GTK
> > bot to start sending data there so we can play with the data meanwhile we
> > develop it.
> > 
> > Some of the reasons to develop this new dashboard instead of improving
> > perf.webkit.org:
> > 
> > * I really miss be able to easily spot recent regressions on
> > perf.webkit.org. Having to look at the graphs by hand one by one is not
> > really usable. We want something that automatically shows a table with the
> > most recent regressions.
> 
> Have you tried using v3 UI?  It has the capability to mark each graph with
> annotations for this exact purpose.
> 

No, I haven't.. 
How can I try this v3 UI in perf.webkit.org?


> > * The run-benchmark script allows to benchmark multiple browsers: chrome,
> > firefox, minibrowser, etc.. This new performance dashboard we are developing
> > has that use case on mind. We will end deploying several performance
> > dashboards instances (some public, some not) that would allow to compare and
> > plot performance graphs from multiple browsers and platforms.
> 
> This is an enhancement we're adding to perf.webkit.org.
> 
> > Regarding the GTK+ performance bot. It currently runs two steps:
> > 
> >  * perf-step: this sends the data to perf.webkit.org and will continue to do
> > so.
> >  * run-benchmark: only our bot runs this step, and we want to start sending
> > this data to this new performance dashboard instead of sending it to
> > /dev/null like currently happens.
> > 
> > 
> > So, it will be possible to have two performance dashboard.
> > The current perf.webkit.org and this new one that is still under development.
> 
> Having another dashboard for GTK+ is fine, and that's exactly what we do at
> Apple. However, please use a separate standalone script to submit results
> using the JSON run-benchmark generates instead of hard-coding your
> dashboard's format into run-benchmark because we're not interested in using
> your dashboard, and having that code in run-benchmark would make it harder
> for us to maintain the script.

I really think you are exaggerating here. I don't think having two extra optional arguments that I can make clearer that are GTK specific and an _upload_results() extra function of 46 lines is going to really make harder to maintain or use the run-benchmark script. 

Also, even if you are not interested in using our dashboard other ports (EFL?) may be interested.

On top of that... As far as the opensource Webkit project is concerned, we are the only ones that run this script on the bots. And I feel restricted to not being able to improve this script to run on the public GTK+ perf bot because you think that such improvement will make your _private_ usage/maintenance of the script confusing/harder.
Comment 9 Ryosuke Niwa 2017-01-10 11:45:28 PST
(In reply to comment #8)
>
> Why will be very confusing? run-benchmark has another switches that we don't
> use (neither know what are meant for) like --device-id and its fine.
> 
> Maybe if we move the options --upload-results-config-file and
> --browser-version inside an argparse GTK-specific option group will make
> this acceptable to you?

They're just options listed. There's no maintenance cost associated with it.
On the other hand, the code to upload results to a dashboard has a significant maintenance cost.

In practice, this script is maintained by Apple contributors:
https://trac.webkit.org/log/trunk/Tools/Scripts/webkitpy/benchmark_runner

Having code that can't be verified by Apple contributors is going to impose an undue burden on all those contributors.

And unlike --device-id which is needed to run benchmarks in iOS, submitting results to a dashboard is something that can be easily split into a operate script as we've done so for our internal instance of perf.webkit.org

We've done our due diligence, please do yours.

> No, I haven't.. 
> How can I try this v3 UI in perf.webkit.org?

You need to create your own instance since I haven't had a chance to upgrade the database schema on perf.webkit.org. Please follow instructions in Websites/perf.webkit.org/Install.md.

> I really think you are exaggerating here. I don't think having two extra
> optional arguments that I can make clearer that are GTK specific and an
> _upload_results() extra function of 46 lines is going to really make harder
> to maintain or use the run-benchmark script. 

Yes, it does. I've maintained run-perf-tests and run-benchmark's submit-results script for internal instance for years, and that's always the part that breaks.

> Also, even if you are not interested in using our dashboard other ports
> (EFL?) may be interested.
> 
> On top of that... As far as the opensource Webkit project is concerned, we
> are the only ones that run this script on the bots. And I feel restricted to
> not being able to improve this script to run on the public GTK+ perf bot
> because you think that such improvement will make your _private_
> usage/maintenance of the script confusing/harder.

You're free to use run-benchmark. Well, we could have easily not open sourced this code. If you're going to add this functionality, then we're likely to just not maintain the open source code version, and keep our own version elsewhere.

On the other hand, why is it so important for you to have this functionality built into run-benchmark?

We don't do it for our own internal dashboard, we've found it much better that the submission step happens in a separate build step so that we can differentiate test failures from submission failures.
Comment 10 Carlos Alberto Lopez Perez 2017-01-11 18:38:44 PST
(In reply to comment #9)
> (In reply to comment #8)
> >
> > Why will be very confusing? run-benchmark has another switches that we don't
> > use (neither know what are meant for) like --device-id and its fine.
> > 
> > Maybe if we move the options --upload-results-config-file and
> > --browser-version inside an argparse GTK-specific option group will make
> > this acceptable to you?
> 
> They're just options listed. There's no maintenance cost associated with it.
> On the other hand, the code to upload results to a dashboard has a
> significant maintenance cost.
> 
> In practice, this script is maintained by Apple contributors:
> https://trac.webkit.org/log/trunk/Tools/Scripts/webkitpy/benchmark_runner
> 
> Having code that can't be verified by Apple contributors is going to impose
> an undue burden on all those contributors.
> 

I think that having to deal with (lot of) code that you are unable to verify is
something very common for any WebKit developer.
When you work on a WebKit port is totally normal that you are not able
to verify the code of the other ports, yet you still can see that code
in your IDE.

Thankfully, we have lot of automated testing that verifies and tests all
this code automatically.

And this was not an exception, all the code I was submitting here was going
to be ran (and verified) automatically by our Perf bot. So any regression introduced would be quickly detected.


> And unlike --device-id which is needed to run benchmarks in iOS, submitting
> results to a dashboard is something that can be easily split into a operate
> script as we've done so for our internal instance of perf.webkit.org
> 
> We've done our due diligence, please do yours.
> 
> > No, I haven't.. 
> > How can I try this v3 UI in perf.webkit.org?
> 
> You need to create your own instance since I haven't had a chance to upgrade
> the database schema on perf.webkit.org. Please follow instructions in
> Websites/perf.webkit.org/Install.md.
> 
> > I really think you are exaggerating here. I don't think having two extra
> > optional arguments that I can make clearer that are GTK specific and an
> > _upload_results() extra function of 46 lines is going to really make harder
> > to maintain or use the run-benchmark script. 
> 
> Yes, it does. I've maintained run-perf-tests and run-benchmark's
> submit-results script for internal instance for years, and that's always the
> part that breaks.
> 
> > Also, even if you are not interested in using our dashboard other ports
> > (EFL?) may be interested.
> > 
> > On top of that... As far as the opensource Webkit project is concerned, we
> > are the only ones that run this script on the bots. And I feel restricted to
> > not being able to improve this script to run on the public GTK+ perf bot
> > because you think that such improvement will make your _private_
> > usage/maintenance of the script confusing/harder.
> 
> You're free to use run-benchmark. Well, we could have easily not open
> sourced this code. If you're going to add this functionality, then we're
> likely to just not maintain the open source code version, and keep our own
> version elsewhere.
> 
> On the other hand, why is it so important for you to have this functionality
> built into run-benchmark?
> 
> We don't do it for our own internal dashboard, we've found it much better
> that the submission step happens in a separate build step so that we can
> differentiate test failures from submission failures.

Fair enough.

I will be moving all this code to a new script, that then calls into
run-benchmark as needed.
Comment 11 Ryosuke Niwa 2017-01-11 18:40:13 PST
By the way, just as heads up, I’m going to change the format of JSON generated by run-benchmark this or next week.
Comment 12 Ryosuke Niwa 2017-01-11 18:50:57 PST
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > >
> > > Why will be very confusing? run-benchmark has another switches that we don't
> > > use (neither know what are meant for) like --device-id and its fine.
> > > 
> > > Maybe if we move the options --upload-results-config-file and
> > > --browser-version inside an argparse GTK-specific option group will make
> > > this acceptable to you?
> > 
> > They're just options listed. There's no maintenance cost associated with it.
> > On the other hand, the code to upload results to a dashboard has a
> > significant maintenance cost.
> > 
> > In practice, this script is maintained by Apple contributors:
> > https://trac.webkit.org/log/trunk/Tools/Scripts/webkitpy/benchmark_runner
> > 
> > Having code that can't be verified by Apple contributors is going to impose
> > an undue burden on all those contributors.
> > 
> 
> I think that having to deal with (lot of) code that you are unable to verify
> is
> something very common for any WebKit developer.
> When you work on a WebKit port is totally normal that you are not able
> to verify the code of the other ports, yet you still can see that code
> in your IDE.

That’s not a justification for adding even more code that can’t be tested by other contributors. Over the years, we have always strived to reduce the amount of such code.

> Thankfully, we have lot of automated testing that verifies and tests all
> this code automatically.

Except this patch doesn’t have any tests.

> And this was not an exception, all the code I was submitting here was going
> to be ran (and verified) automatically by our Perf bot. So any regression
> introduced would be quickly detected.

That means we wouldn’t know any brokerage until we land patches, which is highly undesirable.
Comment 13 Carlos Alberto Lopez Perez 2018-06-11 04:40:18 PDT
Per previous discussion I'm retitling the bug.

Also I'm uploading for review a new approach to this problem: its an independent script that will take care of executing run-benchmark and will upload the data it generates to a server running the app browserperfdash (like on browserperfdash.igalia.com ).

But run-benchmark itself won't be touched.
Comment 14 Carlos Alberto Lopez Perez 2018-06-11 05:26:44 PDT
Created attachment 342418 [details]
Patch
Comment 15 Michael Catanzaro 2018-06-13 16:44:55 PDT
Comment on attachment 342418 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=342418&action=review

Just giving this a cursory review... I think this avoids Ryusoke's concerns, right?

> Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py:74
> +        # This is the dictionary that will be sent in the post petition that browserperfdash expects

"post petition"? What does this mean? I guess you meant to write something else here?

> Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py:194
> +            _log.error('The following benchmark plans have failed to run or to upload : {list_plan_failed}'.format(list_plan_failed=failed))

No space before the :
Comment 16 Carlos Alberto Lopez Perez 2018-06-13 17:55:54 PDT
(In reply to Michael Catanzaro from comment #15)
> Comment on attachment 342418 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=342418&action=review
> 
> Just giving this a cursory review... I think this avoids Ryusoke's concerns,
> right?
> 


I hope, as this new patch doesn't touch or modify run-benchmark

> > Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py:74
> > +        # This is the dictionary that will be sent in the post petition that browserperfdash expects
> 
> "post petition"? What does this mean? I guess you meant to write something
> else here?
> 

HTTP POST request

> > Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py:194
> > +            _log.error('The following benchmark plans have failed to run or to upload : {list_plan_failed}'.format(list_plan_failed=failed))
> 
> No space before the :
Comment 17 Carlos Alberto Lopez Perez 2018-06-13 18:04:55 PDT
Committed r232820: <https://trac.webkit.org/changeset/232820>
Comment 18 Radar WebKit Bug Importer 2018-06-13 18:05:22 PDT
<rdar://problem/41109801>