Bug 74929

Summary: webkit-patch apply-attachment is very slow for big patches
Product: WebKit Reporter: Csaba Osztrogonác <ossy>
Component: Tools / TestsAssignee: Csaba Osztrogonác <ossy>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, blambov, ddkilzer, dpranke, eric, krit, ojan, ossy, rwlbuis, webkit.review.bot, wsiegrist, zarvai, zimmermann
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 60223    
Attachments:
Description Flags
show.xml.tmpl
ddkilzer: review-, eric: commit-queue-
Diff between default and custom show.xml.tmpl
none
Untested fix to add "excludefield=attachmentdata" everywhere "ctype=xml" is used
none
r+-ed patch with changelog none

Description Csaba Osztrogonác 2011-12-20 07:20:35 PST
I ran into this problem, because Qt EWS was very very slow today.
And the slowness was caused by big patches in https://bugs.webkit.org/show_bug.cgi?id=73643

I measured a little bit and found that apply-attachment is absolutely non effective now.

apply-attachment: 640 sec
($ Tools/Scripts/webkit-patch apply-attachment 120013 [details])
 - Fetching: https://bugs.webkit.org/attachment.cgi?id=120013&action=edit : 100 sec
 - Fetching: https://bugs.webkit.org/show_bug.cgi?id=73643&ctype=xml : 420 sec
 - Processing patch 120013 from bug 73643 : 120 sec

my simple method: 85 sec
($ wget https://bugs.webkit.org/attachment.cgi?id=120013
 $ Tools/Scripts/svn-apply attachment.cgi?id=120013)
 - wget: 60 sec
 - svn-apply: 25 sec

It would be great if we can make webkit-patch apply-attachment faster for huge patches to make EWS bots faster.
Comment 1 Adam Barth 2011-12-20 11:41:11 PST
Assigned for investigation.  Feel free to steal.
Comment 2 Eric Seidel (no email) 2011-12-20 12:11:18 PST
I suspect it's because we load the entire attachment into memory.  Probably multiple times.

see attachment.py:

    def contents(self):
        # FIXME: We shouldn't be grabbing at _bugzilla.
        return self._bug._bugzilla.fetch_attachment_contents(self.id())

Then we apply the contents via applypatch.py:

        self._tool.checkout().apply_patch(state["patch"], force=self._options.non_interactive or self._options.force_patch)

which in turn pipes it through:
        self._executive.run_command(args, input=patch.contents())


I suspect that for starters, we should make sure contents() is never called more than once (And if so, cache the value).
Comment 3 Csaba Osztrogonác 2012-04-03 03:07:03 PDT
I ran into this bug again with a huge sized bug:

EWS did the following things:
Fetching: https://bugs.webkit.org/attachment.cgi?id=135006&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml
Fetching: https://bugs.webkit.org/attachment.cgi?id=135006&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml
Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml

Unfortunately https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml 
is 104Mb sized and it takes ~9.5 minutes to fetch it.

I'm going to pick up this bug.
Comment 4 Nikolas Zimmermann 2012-04-03 03:12:42 PDT
(In reply to comment #3)
> I ran into this bug again with a huge sized bug:
> 
> EWS did the following things:
> Fetching: https://bugs.webkit.org/attachment.cgi?id=135006&action=edit
> Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml
> Fetching: https://bugs.webkit.org/attachment.cgi?id=135006&action=edit
> Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml
> Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml
> 
> Unfortunately https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml 
> is 104Mb sized and it takes ~9.5 minutes to fetch it.
> 
> I'm going to pick up this bug.

Just a quick FYI:
WildFox: Ossy: nope, I have no clue of webkitpy, I only know how to make it check the big file checkbox 
WildFox: Ossy: Tools/Scripts/webkitpy/common/net/bugzilla/bugzilla.py
WildFox: Ossy: in "def _fill_attachment_form", right after "self.browser['ispatch'] = ("1",)" inject:
WildFox: Ossy: self.browser['bigfile'] = ("bigfile",)

This makes us always check the big file check box.. What we really should do is figure out the size before, and eventually check it, if needed (I'd even ask the user, your patch is XXX MB, do you really want to upload it...)
Comment 5 Eric Seidel (no email) 2012-04-03 03:20:35 PDT
We keep the patch in memory.  Maybe more than one copy. :)
Comment 6 Csaba Osztrogonác 2012-04-03 06:13:21 PDT
https://bugs.webkit.org/show_bug.cgi?id=25206

This bug has many 7.5-10Mb sized attachments from Chromium EWS.
It is overkiller for EWS bots fetch the whole bug 3 times. :(
( For Qt EWS it takes 30 minutes to process a patch from this bug. :( )

Can we make Chromium EWS not to upload huge diffs? 
URLs point to an external storage would be better.
Comment 7 Csaba Osztrogonác 2012-04-03 08:10:53 PDT
The size of the bug is 105Mb. :(( 

Our EWS always dies with OOM during processing this huge XML.
Comment 8 Rob Buis 2012-04-03 08:14:25 PDT
Hi Ossy,

(In reply to comment #7)
> The size of the bug is 105Mb. :(( 
> 
> Our EWS always dies with OOM during processing this huge XML.

I was wondering, would obsoleting those archive attachments help?
Cheers,

Rob.
Comment 9 Csaba Osztrogonác 2012-04-03 08:23:09 PDT
(In reply to comment #8)
> Hi Ossy,
> 
> (In reply to comment #7)
> > The size of the bug is 105Mb. :(( 
> > 
> > Our EWS always dies with OOM during processing this huge XML.
> 
> I was wondering, would obsoleting those archive attachments help?
> Cheers,
> 
> Rob.

I hope, let's see. I'll remove the r? from the patch, and then put 
it back after EWSs dropped it from their queue.
Comment 10 Eric Seidel (no email) 2012-04-03 12:34:15 PDT
I see.  This has little to do with the attachment fetch, and all do to with the fact that bugzilla at some point started including attachments in the ctype=xml response.  I've seen this for a while, but don't know how to fix it.  Presumably we'll have to move off of ctype=xml, onto some custom BeautifulSoup parsing of the normal page. :(

Ah!  I see.  It's a template option:
https://bugzilla.mozilla.org/show_bug.cgi?id=302669

We'll just have to fix our template.
Comment 11 Eric Seidel (no email) 2012-04-03 12:36:36 PDT
http://trac.webkit.org/browser/trunk/Websites/bugs.webkit.org/template/en/default/bug/show.xml.tmpl#L106

I'm not sure how we disable the "displayfields.attachmentdata" option, but I'm sure we can figure it out. :)

We can also copy that template over to the custom section and modify it as we see fit.
Comment 12 Zoltan Arvai 2012-04-12 07:45:06 PDT
Created attachment 136911 [details]
show.xml.tmpl

add bugs.webkit.org/template/en/custom/bug/show.xml.tmpl :

I replaced this section 

 [% IF displayfields.attachmentdata %]
 <data encoding="base64">[% a.data FILTER base64 %]</data>

with this content:

 [% IF displayfields.attachmentdata %]
 [%# Return URL of attachment instead of data #%]
 <url>[% urlbase FILTER uri FILTER html %]attachment.cgi?id=[% a.id %]</url>

Is it a valid solution for this problem? Is any other modification necessary to apply the new template file?
Comment 13 Eric Seidel (no email) 2012-04-12 10:37:06 PDT
Looks fine to me, but ddkilzer or wms would be better reviewers.
Comment 14 Eric Seidel (no email) 2012-04-12 10:37:50 PDT
Comment on attachment 136911 [details]
show.xml.tmpl

I'm going to r+, but please give them 24 hours to comment before changing to cq+.  We're in no rush here.
Comment 15 David Kilzer (:ddkilzer) 2012-04-18 07:39:51 PDT
Created attachment 137684 [details]
Diff between default and custom show.xml.tmpl
Comment 16 David Kilzer (:ddkilzer) 2012-04-18 07:53:02 PDT
(In reply to comment #10)
> I see.  This has little to do with the attachment fetch, and all do to with the fact that bugzilla at some point started including attachments in the ctype=xml response.  I've seen this for a while, but don't know how to fix it.  Presumably we'll have to move off of ctype=xml, onto some custom BeautifulSoup parsing of the normal page. :(
> 
> Ah!  I see.  It's a template option:
> https://bugzilla.mozilla.org/show_bug.cgi?id=302669
> 
> We'll just have to fix our template.

Looking at the patches on that b.m.o. bug, it seems like adding this query parameter to the URL fetching the data would do the same thing:

    excludefield=attachmentdata

Did anyone try modifying webkit-patch to add that query parameter to "ctype=xml"?

I only ask because modifying templates in Bugzilla means that the template modification has to be reapplied every time we update Bugzilla, which tends to be labor-intensive.  If we can just exclude the attachment data (and aren't using the attachment URL for anything), that seems like a better approach.

Yes, I really think we should add the "excludefield=attachmentdata" query parameter instead.
Comment 17 David Kilzer (:ddkilzer) 2012-04-18 07:53:50 PDT
Comment on attachment 136911 [details]
show.xml.tmpl

r- since we should just add "excludefield=attachmentdata" to the URL to exclude this field from the XML.  Compare these two URLs:

https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml
https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml&excludefield=attachmentdata
Comment 18 Csaba Osztrogonác 2012-04-18 07:56:54 PDT
(In reply to comment #17)
> (From update of attachment 136911 [details])
> r- since we should just add "excludefield=attachmentdata" to the URL to exclude this field from the XML.  Compare these two URLs:
> 
> https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml
> https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml&excludefield=attachmentdata

Yayyy, it will be so simpler. Zoltán, could you prepare a patch to 
make webkit-patch add "&excludefield=attachmentdata" to the query?
Comment 19 David Kilzer (:ddkilzer) 2012-04-18 08:19:37 PDT
Created attachment 137695 [details]
Untested fix to add "excludefield=attachmentdata" everywhere "ctype=xml" is used
Comment 20 David Kilzer (:ddkilzer) 2012-04-18 08:20:40 PDT
(In reply to comment #19)
> Created an attachment (id=137695) [details]
> Untested fix to add "excludefield=attachmentdata" everywhere "ctype=xml" is used

I don't have time to test this, but this is basically the type of fix I'm suggesting.
Comment 21 Eric Seidel (no email) 2012-04-18 10:39:55 PDT
Comment on attachment 137695 [details]
Untested fix to add "excludefield=attachmentdata" everywhere "ctype=xml" is used

If this works, this sounds like the better fix, yes.  I wonder what other options to show_bug.cgi I should know about. :)
Comment 22 Zoltan Arvai 2012-04-20 05:00:41 PDT
I ran some test with the patch, it seems ok.
Comment 23 Csaba Osztrogonác 2012-05-08 10:07:57 PDT
Oh ... I thought it is fixed long time ago. :) If you guys don't want to submit the full patch, I'll do it tomorrow.
Comment 24 Csaba Osztrogonác 2012-05-08 10:09:02 PDT
(In reply to comment #23)
> Oh ... I thought it is fixed long time ago. :) If you guys don't want to submit the full patch, I'll do it tomorrow.

We really need this patch, because Qt EWS bots are dying now, because they try to fetch https://bugs.webkit.org/show_bug.cgi?id=84614
Comment 25 Adam Barth 2012-05-08 10:11:53 PDT
Comment on attachment 137695 [details]
Untested fix to add "excludefield=attachmentdata" everywhere "ctype=xml" is used

Looks like this just needs a ChangeLog.
Comment 26 Csaba Osztrogonác 2012-05-09 00:59:40 PDT
Created attachment 140878 [details]
r+-ed patch with changelog
Comment 27 Csaba Osztrogonác 2012-05-09 01:00:53 PDT
Comment on attachment 140878 [details]
r+-ed patch with changelog

David, is it OK for you? :)
Comment 28 Csaba Osztrogonác 2012-05-14 04:16:21 PDT
Comment on attachment 140878 [details]
r+-ed patch with changelog

There wasn't any objection in the last 3 weeks, so we shouldn't wait more (and block fixing this annoying bug).
Comment 29 WebKit Review Bot 2012-05-14 05:07:34 PDT
Comment on attachment 140878 [details]
r+-ed patch with changelog

Clearing flags on attachment: 140878

Committed r116934: <http://trac.webkit.org/changeset/116934>
Comment 30 WebKit Review Bot 2012-05-14 05:07:42 PDT
All reviewed patches have been landed.  Closing bug.