Bug 37292

Summary: http://trac.webkit.org/changeset/57215 caused perf regressions
Product: WebKit Reporter: Ojan Vafai <ojan>
Component: TextAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: adele, ddkilzer, dglazkov, enrica, fishd, hyatt, mitz, slewis
Priority: P1    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Attachments:
Description Flags
zip of reduced perf test
none
Patch
none
Patch
none
Patch adele: review+

Ojan Vafai
Reported 2010-04-08 14:15:49 PDT
http://trac.webkit.org/changeset/57215 caused a ~6% perf regression on Mac on Chromium's page cycler intl2 test and a ~3% regression on page cycler intl2. It caused a 2-8% memory regression on Chromium's intl1, intl2 and moz page cyclers. I'm pretty sure Safari would see the same regressions. At the very least, Chromium uses the same font code as Safari on the Mac. Chromium will be cutting a release very soon (in a matter of days). Is there something we can do to address this regression before then? Can we put this behind a flag until the perf/memory issues are resolved?
Attachments
zip of reduced perf test (6.14 MB, application/zip)
2010-04-09 16:06 PDT, Ojan Vafai
no flags
Patch (56.05 KB, patch)
2010-04-21 14:48 PDT, Ojan Vafai
no flags
Patch (55.94 KB, patch)
2010-04-28 11:35 PDT, Ojan Vafai
no flags
Patch (56.00 KB, patch)
2010-04-28 12:34 PDT, Ojan Vafai
adele: review+
mitz
Comment 1 2010-04-08 14:22:21 PDT
Bug 6274 comment 52 mentions a couple of possible speed improvements, but further steps probably require more information. One important question is what fraction of the regression is due to sending more cases down the complex text code path. This can be easily tested by undoing the change to canUseGlyphCache(). Depending on the answer, the idea at the end of bug 6274 comment 25 may be worth pursuing.
Ojan Vafai
Comment 2 2010-04-08 14:32:33 PDT
(In reply to comment #1) > Bug 6274 comment 52 mentions a couple of possible speed improvements, but > further steps probably require more information. One important question is what > fraction of the regression is due to sending more cases down the complex text > code path. This can be easily tested by undoing the change to > canUseGlyphCache(). Depending on the answer, the idea at the end of bug 6274 > comment 25 may be worth pursuing. I would guess that's all of it. Would you be open to an experiment where we commit a change putting just that behind a flag or comment that code out? It's hard to get definitive answers testing this locally. Need to actually see it run on the bot to have confidence in the results.
mitz
Comment 3 2010-04-08 14:42:54 PDT
(In reply to comment #2) > (In reply to comment #1) > Would you be open to an experiment where we > commit a change putting just that behind a flag or comment that code out? It's > hard to get definitive answers testing this locally. Need to actually see it > run on the bot to have confidence in the results. Given the magnitude of the regression on the specific test, I expect that you should be able to get a good qualitative result from testing locally.
Ojan Vafai
Comment 4 2010-04-09 10:35:03 PDT
(In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > Would you be open to an experiment where we > > commit a change putting just that behind a flag or comment that code out? It's > > hard to get definitive answers testing this locally. Need to actually see it > > run on the bot to have confidence in the results. > > Given the magnitude of the regression on the specific test, I expect that you > should be able to get a good qualitative result from testing locally. I'm looking into this today. I'm also trying to find the specific pages in those page cyclers that are slowed down. Overall though, running these tests takes a long time and I have to run them many times to get numbers I have confidence in. It will take at least most of today. I am not comfortable leaving a regression like this in the tree for long. Other perf regressions or improvements happen and it quickly becomes difficult to judge whether the fixes for the regression actually fully fixed it. My understanding is that WebKit has a strict no-regressions policy for PLT (am I wrong?). It seems like that should extend to cases like this. Is there something we can do in the interim while we figure out how to fix the regression? The simplest solutions are to put this behind a flag or to rollback. I'd obviously prefer an option that didn't require one of those two, but nothing comes to mind that doesn't require getting more data and working on fixes, which seems like it will take days at least. Anyways, I'll try to have some useful data ASAP.
Ojan Vafai
Comment 5 2010-04-09 16:05:13 PDT
OK. I've created a reduced perf test that I'll attach shortly. The page cycler runs through 30 pages and does 10 cycles. I took the tests where median_after_patch / median_before_patch > 1.1. In other words, the pages that regressed. In the test case, I load each one in an iframe 100 times. I ran each case 5 times in Safari. Numbers are in milliseconds. Before r57215: 106454, 105744, 102916, 101746, 103130 After r57215: 116417, 115711, 116228, 122100, 115701 After r57215 with canUseGlyphCache changes commented out: 105953, 106326, 105762, 105035, 105254 So, clearly the canUseGlyphCache changes account for all of the perf regression.
Ojan Vafai
Comment 6 2010-04-09 16:06:36 PDT
Created attachment 53010 [details] zip of reduced perf test To run, unzip, and run test.html. It alerts the results when it finishes.
mitz
Comment 7 2010-04-09 16:15:49 PDT
Ojan, thanks for doing this investigation! I think this is good news actually.
Ojan Vafai
Comment 8 2010-04-09 17:36:07 PDT
(In reply to comment #7) > Ojan, thanks for doing this investigation! I think this is good news actually. I agree. The ideas in bug 6274 seem likely to address the perf issue. Sorry to be a broken record, but, is there anything we can do in the meantime about the perf and/or memory regressions? For the perf regression, can we just delete the canUseGlyphCache changes? Then we'll get to fixing the bug in a way that doesn't cause a perf regression without leaving a perf regression in the tree in the interim. I don't know what to do about the memory regression. Enrica mentioned the idea of tweaking the size of the glyph cache. That seems reasonable to me, but I don't have a sense of the tradeoffs involved there.
Dave Hyatt
Comment 9 2010-04-12 10:51:59 PDT
Yeah I really should not have put r+ on the original bug. I'll take some of the blame for that. It was obvious that the bug was going to regress the performance of the complex text code path, and I shouldn't have let that slide. We should probably do a surgical patch that disables this tracking without removing all the code from the tree, so that we can get the performance back.
Dave Hyatt
Comment 10 2010-04-12 10:54:01 PDT
I should have said "It was obvious performance would regress because we'd use the complex text code path more."
mitz
Comment 11 2010-04-12 10:55:22 PDT
(In reply to comment #9) > Yeah I really should not have put r+ on the original bug. I'll take some of > the blame for that. It was obvious that the bug was going to regress the > performance of the complex text code path, and I shouldn't have let that slide. According to Ojan’s testing, that is not correct. The complex path has not become measurably slower. > We should probably do a surgical patch that disables this tracking without > removing all the code from the tree, so that we can get the performance back. I think the thinks to consider now are: 1) Add a third code path (fast w/glyph overflow) for the “stacked diacritics” range or if that turns out to be too hard 2) #ifdef-out the change to canUseGlyphCache()
Enrica Casucci
Comment 12 2010-04-12 11:01:44 PDT
(In reply to comment #11) > (In reply to comment #9) > > Yeah I really should not have put r+ on the original bug. I'll take some of > > the blame for that. It was obvious that the bug was going to regress the > > performance of the complex text code path, and I shouldn't have let that slide. > > According to Ojan’s testing, that is not correct. The complex path has not > become measurably slower. > > > We should probably do a surgical patch that disables this tracking without > > removing all the code from the tree, so that we can get the performance back. > > I think the thinks to consider now are: > 1) Add a third code path (fast w/glyph overflow) for the “stacked diacritics” > range > or if that turns out to be too hard > 2) #ifdef-out the change to canUseGlyphCache() I believe there is some confusion here. Unless Ojan tests provide a different picture of the problem, here is where I think we stand: 1. we no performance impact for the simple text path 2. we have very small impact on the complex text path (but I don't have the exact numbers) 3. where things have gotten worse is where we started using the complex path for cases where we used to use the simple one. I will address number 3 shortly.
Ojan Vafai
Comment 13 2010-04-21 14:48:12 PDT
Ojan Vafai
Comment 14 2010-04-21 14:54:40 PDT
(In reply to comment #12) > (In reply to comment #11) > > (In reply to comment #9) > > > Yeah I really should not have put r+ on the original bug. I'll take some of > > > the blame for that. It was obvious that the bug was going to regress the > > > performance of the complex text code path, and I shouldn't have let that slide. > > > > According to Ojan’s testing, that is not correct. The complex path has not > > become measurably slower. > > > Unless Ojan tests provide a different picture of the problem, here is where I > think we stand: > 1. we no performance impact for the simple text path > 2. we have very small impact on the complex text path (but I don't have the > exact numbers) > 3. where things have gotten worse is where we started using the complex path > for cases where we used to use the simple one. This is my understanding as well, but I don't know this code or this patch well enough to say. One thing to note, both my testcase and the Chromium page load tests do *not* restart the browser between loads of pages. So any perf hit from loading extra font data that is then cached would not appear on either test since each page is loaded many times. What does appear very clearly however is the memory regression. Is there anything we can do here? Can we store the bounding box only for a certain range of glyphs instead of all complex text?
Stephanie Lewis
Comment 15 2010-04-21 16:13:27 PDT
A patch for fixing the Membuster memory regression caused by 52715 is here: http://bugs.webkit.org/show_bug.cgi?id=37936
Adele Peterson
Comment 16 2010-04-21 16:18:12 PDT
Sorry, I didn't see this bug before working on my patch. I guess it should be a dupe of this bug. Anyway, my patch breaks up GlyphMetricsMap into GlyphWidthMap and GlyphBoundsMap. I'm still working on it, but maybe someone could help me test.
Ojan Vafai
Comment 17 2010-04-21 16:32:32 PDT
Seems fine to treat the perf and memory regressions as separate bugs.
Ojan Vafai
Comment 18 2010-04-25 09:01:01 PDT
Ping for the review.
mitz
Comment 19 2010-04-28 10:54:34 PDT
Comment on attachment 53994 [details] Patch > #define ROMAN_AND_GREEK_DIACRITICS_CAN_USE_GLYPH_CACHE 0 Isn’t this name backwards? When this is false, characters in U+1E00-U+2000 can (and will) use the glyph cache.
Ojan Vafai
Comment 20 2010-04-28 11:35:37 PDT
Ojan Vafai
Comment 21 2010-04-28 11:36:12 PDT
(In reply to comment #19) > (From update of attachment 53994 [details]) > > #define ROMAN_AND_GREEK_DIACRITICS_CAN_USE_GLYPH_CACHE 0 > > Isn’t this name backwards? When this is false, characters in U+1E00-U+2000 can > (and will) use the glyph cache. Heh. Whoops. Fixed.
Adele Peterson
Comment 22 2010-04-28 11:58:08 PDT
Comment on attachment 54597 [details] Patch I'd really like someone to take a crack at fixing the underlying problem. Dan has agreed to look into this, but others are welcome to try as well. Would it be possible to leave the #define set to 0 until someone tries and fails to fix this? If it really seems unlikely we'll be able to fix the underlying problem, then we can change the default value of the #define.
Ojan Vafai
Comment 23 2010-04-28 12:05:24 PDT
(In reply to comment #22) > (From update of attachment 54597 [details]) > I'd really like someone to take a crack at fixing the underlying problem. Dan > has agreed to look into this, but others are welcome to try as well. Would it > be possible to leave the #define set to 0 until someone tries and fails to fix > this? If it really seems unlikely we'll be able to fix the underlying problem, > then we can change the default value of the #define. It's certainly possible, but I don't think we should leave performance regressions in trunk by default. We wouldn't leave failing tests in the tree, would we? Performance and memory regressions are even worse since there is only one scale to measure them on. That means that other changes that affect performance will mask this one and it becomes nearly impossible to tell if the fix actually fixes the entirety of the regression.
Ojan Vafai
Comment 24 2010-04-28 12:12:03 PDT
Stated differently, what is the advantage to leaving it enabled?
Adele Peterson
Comment 25 2010-04-28 12:17:37 PDT
I guess I just want to make sure we still feel some pressure to do the right fix. But that's not too important. Various branches can twiddle the #ifdef as needed.
Ojan Vafai
Comment 26 2010-04-28 12:34:28 PDT
Ojan Vafai
Comment 27 2010-04-28 12:37:25 PDT
(In reply to comment #25) > I guess I just want to make sure we still feel some pressure to do the right > fix. But that's not too important. This regresses the bug fixed by r57215, so the pressure to do the right fix is to fix the original bug. Isn't this the same as leaving a failing test in the tree so that there is pressure to fix it? In that case, we'd expect the test to be fixed in a reasonable timeframe (hours, not weeks) or for the change to be reverted. > Various branches can twiddle the #ifdef as needed. Uploaded a new patch that's properly ifndef'ed so branches can actually twiddle. :)
Adele Peterson
Comment 28 2010-04-28 13:25:58 PDT
Comment on attachment 54611 [details] Patch OK sounds good. Please update https://bugs.webkit.org/show_bug.cgi?id=6274 and we'll keep working on the performance issue.
Ojan Vafai
Comment 29 2010-04-28 13:50:32 PDT
Note You need to log in before you can comment on or make changes to this bug.