Summary: | webkit-patch apply-attachment is very slow for big patches | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Csaba Osztrogonác <ossy> | ||||||||||
Component: | Tools / Tests | Assignee: | Csaba Osztrogonác <ossy> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | abarth, blambov, ddkilzer, dpranke, eric, krit, ojan, ossy, rwlbuis, webkit.review.bot, wsiegrist, zarvai, zimmermann | ||||||||||
Priority: | P2 | ||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Bug Depends on: | |||||||||||||
Bug Blocks: | 60223 | ||||||||||||
Attachments: |
|
Description
Csaba Osztrogonác
2011-12-20 07:20:35 PST
Assigned for investigation. Feel free to steal. I suspect it's because we load the entire attachment into memory. Probably multiple times. see attachment.py: def contents(self): # FIXME: We shouldn't be grabbing at _bugzilla. return self._bug._bugzilla.fetch_attachment_contents(self.id()) Then we apply the contents via applypatch.py: self._tool.checkout().apply_patch(state["patch"], force=self._options.non_interactive or self._options.force_patch) which in turn pipes it through: self._executive.run_command(args, input=patch.contents()) I suspect that for starters, we should make sure contents() is never called more than once (And if so, cache the value). I ran into this bug again with a huge sized bug: EWS did the following things: Fetching: https://bugs.webkit.org/attachment.cgi?id=135006&action=edit Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml Fetching: https://bugs.webkit.org/attachment.cgi?id=135006&action=edit Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml Unfortunately https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml is 104Mb sized and it takes ~9.5 minutes to fetch it. I'm going to pick up this bug. (In reply to comment #3) > I ran into this bug again with a huge sized bug: > > EWS did the following things: > Fetching: https://bugs.webkit.org/attachment.cgi?id=135006&action=edit > Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml > Fetching: https://bugs.webkit.org/attachment.cgi?id=135006&action=edit > Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml > Fetching: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml > > Unfortunately https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml > is 104Mb sized and it takes ~9.5 minutes to fetch it. > > I'm going to pick up this bug. Just a quick FYI: WildFox: Ossy: nope, I have no clue of webkitpy, I only know how to make it check the big file checkbox WildFox: Ossy: Tools/Scripts/webkitpy/common/net/bugzilla/bugzilla.py WildFox: Ossy: in "def _fill_attachment_form", right after "self.browser['ispatch'] = ("1",)" inject: WildFox: Ossy: self.browser['bigfile'] = ("bigfile",) This makes us always check the big file check box.. What we really should do is figure out the size before, and eventually check it, if needed (I'd even ask the user, your patch is XXX MB, do you really want to upload it...) We keep the patch in memory. Maybe more than one copy. :) https://bugs.webkit.org/show_bug.cgi?id=25206 This bug has many 7.5-10Mb sized attachments from Chromium EWS. It is overkiller for EWS bots fetch the whole bug 3 times. :( ( For Qt EWS it takes 30 minutes to process a patch from this bug. :( ) Can we make Chromium EWS not to upload huge diffs? URLs point to an external storage would be better. The size of the bug is 105Mb. :(( Our EWS always dies with OOM during processing this huge XML. Hi Ossy, (In reply to comment #7) > The size of the bug is 105Mb. :(( > > Our EWS always dies with OOM during processing this huge XML. I was wondering, would obsoleting those archive attachments help? Cheers, Rob. (In reply to comment #8) > Hi Ossy, > > (In reply to comment #7) > > The size of the bug is 105Mb. :(( > > > > Our EWS always dies with OOM during processing this huge XML. > > I was wondering, would obsoleting those archive attachments help? > Cheers, > > Rob. I hope, let's see. I'll remove the r? from the patch, and then put it back after EWSs dropped it from their queue. I see. This has little to do with the attachment fetch, and all do to with the fact that bugzilla at some point started including attachments in the ctype=xml response. I've seen this for a while, but don't know how to fix it. Presumably we'll have to move off of ctype=xml, onto some custom BeautifulSoup parsing of the normal page. :( Ah! I see. It's a template option: https://bugzilla.mozilla.org/show_bug.cgi?id=302669 We'll just have to fix our template. http://trac.webkit.org/browser/trunk/Websites/bugs.webkit.org/template/en/default/bug/show.xml.tmpl#L106 I'm not sure how we disable the "displayfields.attachmentdata" option, but I'm sure we can figure it out. :) We can also copy that template over to the custom section and modify it as we see fit. Created attachment 136911 [details]
show.xml.tmpl
add bugs.webkit.org/template/en/custom/bug/show.xml.tmpl :
I replaced this section
[% IF displayfields.attachmentdata %]
<data encoding="base64">[% a.data FILTER base64 %]</data>
with this content:
[% IF displayfields.attachmentdata %]
[%# Return URL of attachment instead of data #%]
<url>[% urlbase FILTER uri FILTER html %]attachment.cgi?id=[% a.id %]</url>
Is it a valid solution for this problem? Is any other modification necessary to apply the new template file?
Looks fine to me, but ddkilzer or wms would be better reviewers. Comment on attachment 136911 [details]
show.xml.tmpl
I'm going to r+, but please give them 24 hours to comment before changing to cq+. We're in no rush here.
Created attachment 137684 [details]
Diff between default and custom show.xml.tmpl
(In reply to comment #10) > I see. This has little to do with the attachment fetch, and all do to with the fact that bugzilla at some point started including attachments in the ctype=xml response. I've seen this for a while, but don't know how to fix it. Presumably we'll have to move off of ctype=xml, onto some custom BeautifulSoup parsing of the normal page. :( > > Ah! I see. It's a template option: > https://bugzilla.mozilla.org/show_bug.cgi?id=302669 > > We'll just have to fix our template. Looking at the patches on that b.m.o. bug, it seems like adding this query parameter to the URL fetching the data would do the same thing: excludefield=attachmentdata Did anyone try modifying webkit-patch to add that query parameter to "ctype=xml"? I only ask because modifying templates in Bugzilla means that the template modification has to be reapplied every time we update Bugzilla, which tends to be labor-intensive. If we can just exclude the attachment data (and aren't using the attachment URL for anything), that seems like a better approach. Yes, I really think we should add the "excludefield=attachmentdata" query parameter instead. Comment on attachment 136911 [details] show.xml.tmpl r- since we should just add "excludefield=attachmentdata" to the URL to exclude this field from the XML. Compare these two URLs: https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml&excludefield=attachmentdata (In reply to comment #17) > (From update of attachment 136911 [details]) > r- since we should just add "excludefield=attachmentdata" to the URL to exclude this field from the XML. Compare these two URLs: > > https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml > https://bugs.webkit.org/show_bug.cgi?id=25206&ctype=xml&excludefield=attachmentdata Yayyy, it will be so simpler. Zoltán, could you prepare a patch to make webkit-patch add "&excludefield=attachmentdata" to the query? Created attachment 137695 [details]
Untested fix to add "excludefield=attachmentdata" everywhere "ctype=xml" is used
(In reply to comment #19) > Created an attachment (id=137695) [details] > Untested fix to add "excludefield=attachmentdata" everywhere "ctype=xml" is used I don't have time to test this, but this is basically the type of fix I'm suggesting. Comment on attachment 137695 [details]
Untested fix to add "excludefield=attachmentdata" everywhere "ctype=xml" is used
If this works, this sounds like the better fix, yes. I wonder what other options to show_bug.cgi I should know about. :)
I ran some test with the patch, it seems ok. Oh ... I thought it is fixed long time ago. :) If you guys don't want to submit the full patch, I'll do it tomorrow. (In reply to comment #23) > Oh ... I thought it is fixed long time ago. :) If you guys don't want to submit the full patch, I'll do it tomorrow. We really need this patch, because Qt EWS bots are dying now, because they try to fetch https://bugs.webkit.org/show_bug.cgi?id=84614 Comment on attachment 137695 [details]
Untested fix to add "excludefield=attachmentdata" everywhere "ctype=xml" is used
Looks like this just needs a ChangeLog.
Created attachment 140878 [details]
r+-ed patch with changelog
Comment on attachment 140878 [details]
r+-ed patch with changelog
David, is it OK for you? :)
Comment on attachment 140878 [details]
r+-ed patch with changelog
There wasn't any objection in the last 3 weeks, so we shouldn't wait more (and block fixing this annoying bug).
Comment on attachment 140878 [details] r+-ed patch with changelog Clearing flags on attachment: 140878 Committed r116934: <http://trac.webkit.org/changeset/116934> All reviewed patches have been landed. Closing bug. |