37292 – http://trac.webkit.org/changeset/57215 caused perf regressions

RESOLVED FIXED37292

http://trac.webkit.org/changeset/57215 caused perf regressions

https://bugs.webkit.org/show_bug.cgi?id=37292

Summary http://trac.webkit.org/changeset/57215 caused perf regressions

Ojan Vafai

Reported 2010-04-08 14:15:49 PDT

http://trac.webkit.org/changeset/57215 caused a ~6% perf regression on Mac on Chromium's page cycler intl2 test and a ~3% regression on page cycler intl2. It caused a 2-8% memory regression on Chromium's intl1, intl2 and moz page cyclers. I'm pretty sure Safari would see the same regressions. At the very least, Chromium uses the same font code as Safari on the Mac. Chromium will be cutting a release very soon (in a matter of days). Is there something we can do to address this regression before then? Can we put this behind a flag until the perf/memory issues are resolved?

Attachments
zip of reduced perf test (6.14 MB, application/zip) 2010-04-09 16:06 PDT, Ojan Vafai	no flags	Details
Patch (56.05 KB, patch) 2010-04-21 14:48 PDT, Ojan Vafai	no flags	Details Formatted Diff Diff
Patch (55.94 KB, patch) 2010-04-28 11:35 PDT, Ojan Vafai	no flags	Details Formatted Diff Diff
Patch (56.00 KB, patch) 2010-04-28 12:34 PDT, Ojan Vafai	adele: review+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

mitz

Comment 1 2010-04-08 14:22:21 PDT

Bug 6274 comment 52 mentions a couple of possible speed improvements, but further steps probably require more information. One important question is what fraction of the regression is due to sending more cases down the complex text code path. This can be easily tested by undoing the change to canUseGlyphCache(). Depending on the answer, the idea at the end of bug 6274 comment 25 may be worth pursuing.

Ojan Vafai

Comment 2 2010-04-08 14:32:33 PDT

(In reply to comment #1) > Bug 6274 comment 52 mentions a couple of possible speed improvements, but > further steps probably require more information. One important question is what > fraction of the regression is due to sending more cases down the complex text > code path. This can be easily tested by undoing the change to > canUseGlyphCache(). Depending on the answer, the idea at the end of bug 6274 > comment 25 may be worth pursuing. I would guess that's all of it. Would you be open to an experiment where we commit a change putting just that behind a flag or comment that code out? It's hard to get definitive answers testing this locally. Need to actually see it run on the bot to have confidence in the results.

mitz

Comment 3 2010-04-08 14:42:54 PDT

(In reply to comment #2) > (In reply to comment #1) > Would you be open to an experiment where we > commit a change putting just that behind a flag or comment that code out? It's > hard to get definitive answers testing this locally. Need to actually see it > run on the bot to have confidence in the results. Given the magnitude of the regression on the specific test, I expect that you should be able to get a good qualitative result from testing locally.

Ojan Vafai

Comment 4 2010-04-09 10:35:03 PDT

(In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > Would you be open to an experiment where we > > commit a change putting just that behind a flag or comment that code out? It's > > hard to get definitive answers testing this locally. Need to actually see it > > run on the bot to have confidence in the results. > > Given the magnitude of the regression on the specific test, I expect that you > should be able to get a good qualitative result from testing locally. I'm looking into this today. I'm also trying to find the specific pages in those page cyclers that are slowed down. Overall though, running these tests takes a long time and I have to run them many times to get numbers I have confidence in. It will take at least most of today. I am not comfortable leaving a regression like this in the tree for long. Other perf regressions or improvements happen and it quickly becomes difficult to judge whether the fixes for the regression actually fully fixed it. My understanding is that WebKit has a strict no-regressions policy for PLT (am I wrong?). It seems like that should extend to cases like this. Is there something we can do in the interim while we figure out how to fix the regression? The simplest solutions are to put this behind a flag or to rollback. I'd obviously prefer an option that didn't require one of those two, but nothing comes to mind that doesn't require getting more data and working on fixes, which seems like it will take days at least. Anyways, I'll try to have some useful data ASAP.

Ojan Vafai

Comment 5 2010-04-09 16:05:13 PDT

OK. I've created a reduced perf test that I'll attach shortly. The page cycler runs through 30 pages and does 10 cycles. I took the tests where median_after_patch / median_before_patch > 1.1. In other words, the pages that regressed. In the test case, I load each one in an iframe 100 times. I ran each case 5 times in Safari. Numbers are in milliseconds. Before r57215: 106454, 105744, 102916, 101746, 103130 After r57215: 116417, 115711, 116228, 122100, 115701 After r57215 with canUseGlyphCache changes commented out: 105953, 106326, 105762, 105035, 105254 So, clearly the canUseGlyphCache changes account for all of the perf regression.

Ojan Vafai

Comment 6 2010-04-09 16:06:36 PDT

Created attachment 53010 [details] zip of reduced perf test To run, unzip, and run test.html. It alerts the results when it finishes.

mitz

Comment 7 2010-04-09 16:15:49 PDT

Ojan, thanks for doing this investigation! I think this is good news actually.

Ojan Vafai

Comment 8 2010-04-09 17:36:07 PDT

(In reply to comment #7) > Ojan, thanks for doing this investigation! I think this is good news actually. I agree. The ideas in bug 6274 seem likely to address the perf issue. Sorry to be a broken record, but, is there anything we can do in the meantime about the perf and/or memory regressions? For the perf regression, can we just delete the canUseGlyphCache changes? Then we'll get to fixing the bug in a way that doesn't cause a perf regression without leaving a perf regression in the tree in the interim. I don't know what to do about the memory regression. Enrica mentioned the idea of tweaking the size of the glyph cache. That seems reasonable to me, but I don't have a sense of the tradeoffs involved there.

Dave Hyatt

Comment 9 2010-04-12 10:51:59 PDT

Yeah I really should not have put r+ on the original bug. I'll take some of the blame for that. It was obvious that the bug was going to regress the performance of the complex text code path, and I shouldn't have let that slide. We should probably do a surgical patch that disables this tracking without removing all the code from the tree, so that we can get the performance back.

Dave Hyatt

Comment 10 2010-04-12 10:54:01 PDT

I should have said "It was obvious performance would regress because we'd use the complex text code path more."

mitz

Comment 11 2010-04-12 10:55:22 PDT

(In reply to comment #9) > Yeah I really should not have put r+ on the original bug. I'll take some of > the blame for that. It was obvious that the bug was going to regress the > performance of the complex text code path, and I shouldn't have let that slide. According to Ojan’s testing, that is not correct. The complex path has not become measurably slower. > We should probably do a surgical patch that disables this tracking without > removing all the code from the tree, so that we can get the performance back. I think the thinks to consider now are: 1) Add a third code path (fast w/glyph overflow) for the “stacked diacritics” range or if that turns out to be too hard 2) #ifdef-out the change to canUseGlyphCache()

Enrica Casucci

Comment 12 2010-04-12 11:01:44 PDT

(In reply to comment #11) > (In reply to comment #9) > > Yeah I really should not have put r+ on the original bug. I'll take some of > > the blame for that. It was obvious that the bug was going to regress the > > performance of the complex text code path, and I shouldn't have let that slide. > > According to Ojan’s testing, that is not correct. The complex path has not > become measurably slower. > > > We should probably do a surgical patch that disables this tracking without > > removing all the code from the tree, so that we can get the performance back. > > I think the thinks to consider now are: > 1) Add a third code path (fast w/glyph overflow) for the “stacked diacritics” > range > or if that turns out to be too hard > 2) #ifdef-out the change to canUseGlyphCache() I believe there is some confusion here. Unless Ojan tests provide a different picture of the problem, here is where I think we stand: 1. we no performance impact for the simple text path 2. we have very small impact on the complex text path (but I don't have the exact numbers) 3. where things have gotten worse is where we started using the complex path for cases where we used to use the simple one. I will address number 3 shortly.

Ojan Vafai

Comment 13 2010-04-21 14:48:12 PDT

Created attachment 53994 [details] Patch

Ojan Vafai

Comment 14 2010-04-21 14:54:40 PDT

(In reply to comment #12) > (In reply to comment #11) > > (In reply to comment #9) > > > Yeah I really should not have put r+ on the original bug. I'll take some of > > > the blame for that. It was obvious that the bug was going to regress the > > > performance of the complex text code path, and I shouldn't have let that slide. > > > > According to Ojan’s testing, that is not correct. The complex path has not > > become measurably slower. > > > Unless Ojan tests provide a different picture of the problem, here is where I > think we stand: > 1. we no performance impact for the simple text path > 2. we have very small impact on the complex text path (but I don't have the > exact numbers) > 3. where things have gotten worse is where we started using the complex path > for cases where we used to use the simple one. This is my understanding as well, but I don't know this code or this patch well enough to say. One thing to note, both my testcase and the Chromium page load tests do *not* restart the browser between loads of pages. So any perf hit from loading extra font data that is then cached would not appear on either test since each page is loaded many times. What does appear very clearly however is the memory regression. Is there anything we can do here? Can we store the bounding box only for a certain range of glyphs instead of all complex text?

Stephanie Lewis

Comment 15 2010-04-21 16:13:27 PDT

A patch for fixing the Membuster memory regression caused by 52715 is here: http://bugs.webkit.org/show_bug.cgi?id=37936

Adele Peterson

Comment 16 2010-04-21 16:18:12 PDT

Sorry, I didn't see this bug before working on my patch. I guess it should be a dupe of this bug. Anyway, my patch breaks up GlyphMetricsMap into GlyphWidthMap and GlyphBoundsMap. I'm still working on it, but maybe someone could help me test.

Ojan Vafai

Comment 17 2010-04-21 16:32:32 PDT

Seems fine to treat the perf and memory regressions as separate bugs.

Ojan Vafai

Comment 18 2010-04-25 09:01:01 PDT

Ping for the review.

mitz

Comment 19 2010-04-28 10:54:34 PDT

Comment on attachment 53994 [details] Patch > #define ROMAN_AND_GREEK_DIACRITICS_CAN_USE_GLYPH_CACHE 0 Isn’t this name backwards? When this is false, characters in U+1E00-U+2000 can (and will) use the glyph cache.

Ojan Vafai

Comment 20 2010-04-28 11:35:37 PDT

Created attachment 54597 [details] Patch

Ojan Vafai

Comment 21 2010-04-28 11:36:12 PDT

(In reply to comment #19) > (From update of attachment 53994 [details]) > > #define ROMAN_AND_GREEK_DIACRITICS_CAN_USE_GLYPH_CACHE 0 > > Isn’t this name backwards? When this is false, characters in U+1E00-U+2000 can > (and will) use the glyph cache. Heh. Whoops. Fixed.

Adele Peterson

Comment 22 2010-04-28 11:58:08 PDT

Comment on attachment 54597 [details] Patch I'd really like someone to take a crack at fixing the underlying problem. Dan has agreed to look into this, but others are welcome to try as well. Would it be possible to leave the #define set to 0 until someone tries and fails to fix this? If it really seems unlikely we'll be able to fix the underlying problem, then we can change the default value of the #define.

Ojan Vafai

Comment 23 2010-04-28 12:05:24 PDT

(In reply to comment #22) > (From update of attachment 54597 [details]) > I'd really like someone to take a crack at fixing the underlying problem. Dan > has agreed to look into this, but others are welcome to try as well. Would it > be possible to leave the #define set to 0 until someone tries and fails to fix > this? If it really seems unlikely we'll be able to fix the underlying problem, > then we can change the default value of the #define. It's certainly possible, but I don't think we should leave performance regressions in trunk by default. We wouldn't leave failing tests in the tree, would we? Performance and memory regressions are even worse since there is only one scale to measure them on. That means that other changes that affect performance will mask this one and it becomes nearly impossible to tell if the fix actually fixes the entirety of the regression.

Ojan Vafai

Comment 24 2010-04-28 12:12:03 PDT

Stated differently, what is the advantage to leaving it enabled?

Adele Peterson

Comment 25 2010-04-28 12:17:37 PDT

I guess I just want to make sure we still feel some pressure to do the right fix. But that's not too important. Various branches can twiddle the #ifdef as needed.

Ojan Vafai

Comment 26 2010-04-28 12:34:28 PDT

Created attachment 54611 [details] Patch

Ojan Vafai

Comment 27 2010-04-28 12:37:25 PDT

(In reply to comment #25) > I guess I just want to make sure we still feel some pressure to do the right > fix. But that's not too important. This regresses the bug fixed by r57215, so the pressure to do the right fix is to fix the original bug. Isn't this the same as leaving a failing test in the tree so that there is pressure to fix it? In that case, we'd expect the test to be fixed in a reasonable timeframe (hours, not weeks) or for the change to be reverted. > Various branches can twiddle the #ifdef as needed. Uploaded a new patch that's properly ifndef'ed so branches can actually twiddle. :)

Adele Peterson

Comment 28 2010-04-28 13:25:58 PDT

Comment on attachment 54611 [details] Patch OK sounds good. Please update https://bugs.webkit.org/show_bug.cgi?id=6274 and we'll keep working on the performance issue.

Ojan Vafai

Comment 29 2010-04-28 13:50:32 PDT

Committed r58426: <http://trac.webkit.org/changeset/58426>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P1

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS OS X 10.5

Product WebKit

Component Text

Assignee

Nobody

Reported

2010-04-08 14:15 PDT

Modified

2010-04-28 13:50 PDT History

CC List

8 users Show

URL

Keywords

Depends on

Blocks