RESOLVED FIXED 142457
[iOS] Sweep all collected objects on critical memory pressure
https://bugs.webkit.org/show_bug.cgi?id=142457
Summary [iOS] Sweep all collected objects on critical memory pressure
Chris Dumez
Reported 2015-03-08 12:09:08 PDT
On memory pressure, we don't have much time to free-up memory. If we are not fast enough, the process can end up getting killed. The mobile version of cnn.com constructs a lot of canvases that use up a lot of memory. On memory pressure, we were calling gcController().garbageCollectSoon(). However, the garbage collection was not triggering fast enough and the WebContent process would often end up being killed. If we call gcController().garbageCollectNow() on memory pressure instead, cnn.com no longer crashes as the canvases get collected in time. Radar: <rdar://problem/20044440>
Attachments
Patch (2.31 KB, patch)
2015-03-08 12:11 PDT, Chris Dumez
no flags
Patch (2.48 KB, patch)
2015-03-08 12:25 PDT, Chris Dumez
no flags
Patch (4.37 KB, patch)
2015-03-09 17:24 PDT, Chris Dumez
no flags
Chris Dumez
Comment 1 2015-03-08 12:11:55 PDT
Chris Dumez
Comment 2 2015-03-08 12:16:56 PDT
Comment on attachment 248193 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=248193&action=review > Source/WebCore/platform/cocoa/MemoryPressureHandlerCocoa.mm:-84 > - gcController().garbageCollectSoon(); Maybe we could call garbageCollectSoon() on non-critical memory pressure still? This would decrease the likelihood of ending up under critical memory pressure.
Chris Dumez
Comment 3 2015-03-08 12:25:13 PDT
Sam Weinig
Comment 4 2015-03-08 13:37:20 PDT
Comment on attachment 248195 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=248195&action=review > Source/WebCore/platform/cocoa/MemoryPressureHandlerCocoa.mm:88 > - if (isUnderMemoryPressure()) { > - gcController().garbageCollectSoon(); > - } else { > - // If we're not under memory pressure, that means we're here due to impending process suspension. > - // Do a full GC since this is our last chance to run any code. > + { > ReliefLogger log("Collecting JavaScript garbage"); > - gcController().garbageCollectNow(); > + if (critical) > + gcController().garbageCollectNow(); > + else > + gcController().garbageCollectSoon(); It seems like you are changing the behavior of this code getting called when not under memory pressure (but rather "due to impending process suspension"), where it used to call garbageCollectNow() and now will call seemingly call garbageCollectSoon(). Is that intentional? Does an call to this in that scenario always have the critical bit set?
Chris Dumez
Comment 5 2015-03-08 13:40:00 PDT
Comment on attachment 248195 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=248195&action=review >> Source/WebCore/platform/cocoa/MemoryPressureHandlerCocoa.mm:88 >> + gcController().garbageCollectSoon(); > > It seems like you are changing the behavior of this code getting called when not under memory pressure (but rather "due to impending process suspension"), where it used to call garbageCollectNow() and now will call seemingly call garbageCollectSoon(). Is that intentional? Does an call to this in that scenario always have the critical bit set? No, I checked and on suspension we always set the critical flag to true. From WebProcess.cpp: void WebProcess::processWillSuspend() { MemoryPressureHandler::singleton().releaseMemory(true); ... } true is for the critical argument.
Chris Dumez
Comment 6 2015-03-08 13:41:00 PDT
And setting the "critical" flag to true upon suspension makes sense as we want to free-up as much memory as possible in this case.
Geoffrey Garen
Comment 7 2015-03-08 16:01:45 PDT
Comment on attachment 248195 [details] Patch A few thoughts about this patch: (1) Did anybody figure out why the JSC::Heap::reportExtraMemoryCost API was insufficient to collect the canvases at cnn.com? (2) What is the normal GC timer period at cnn.com, and why is it insufficient to collect the canvases? (3) It is preferable to fix (1) and/or (2) rather than respond to their failure after the fact in a low memory warning handler, since the low memory warning warning handler will have the negative side effect of dumping useful caches. (4) Is there any documented limit on the number of memory warnings we'll get, or the rate at which we'll get them? In the past, I have seen cases where there was no limit, and so we hung the browser and also out-competed useful responses to memory warning that would have reclaimed memory more effectively than garbage collection. That was a terrible bug, and this patch might introduce a regression in those cases, along with a ticking time bomb that might go off any time the implementation details of memory warnings change. I'd rather not reintroduce a well-known very bad performance pathology -- especially without clear evidence showing that no better solution is available.
Chris Dumez
Comment 8 2015-03-08 16:19:59 PDT
(In reply to comment #7) > Comment on attachment 248195 [details] > Patch > > A few thoughts about this patch: > > (1) Did anybody figure out why the JSC::Heap::reportExtraMemoryCost API was > insufficient to collect the canvases at cnn.com? > > (2) What is the normal GC timer period at cnn.com, and why is it > insufficient to collect the canvases? > > (3) It is preferable to fix (1) and/or (2) rather than respond to their > failure after the fact in a low memory warning handler, since the low memory > warning warning handler will have the negative side effect of dumping useful > caches. > > (4) Is there any documented limit on the number of memory warnings we'll > get, or the rate at which we'll get them? > > In the past, I have seen cases where there was no limit, and so we hung the > browser and also out-competed useful responses to memory warning that would > have reclaimed memory more effectively than garbage collection. That was a > terrible bug, and this patch might introduce a regression in those cases, > along with a ticking time bomb that might go off any time the implementation > details of memory warnings change. > > I'd rather not reintroduce a well-known very bad performance pathology -- > especially without clear evidence showing that no better solution is > available. Ok, thanks for the feedback. I'll try to take a look at (1) and (2).
Chris Dumez
Comment 9 2015-03-08 17:41:12 PDT
(In reply to comment #8) > (In reply to comment #7) > > Comment on attachment 248195 [details] > > Patch > > > > A few thoughts about this patch: > > > > (1) Did anybody figure out why the JSC::Heap::reportExtraMemoryCost API was > > insufficient to collect the canvases at cnn.com? > > > > (2) What is the normal GC timer period at cnn.com, and why is it > > insufficient to collect the canvases? I added some logging when loading cnn.com, I see: Mar 8 17:34:44 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: 0x107f78c80 - Time since last gc: 2044.99 ms Mar 8 17:34:49 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: 0x107f78c80 - Time since last gc: 4855.09 ms Mar 8 17:34:50 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: 0x107f78c80 - Time since last gc: 918.94 ms Mar 8 17:34:51 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: 0x107f78c80 - Time since last gc: 1246 ms Mar 8 17:34:57 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: Memory pressure, critical: 1 Mar 8 17:35:01 Chris-iPhone MobileSafari[3339] <Notice>: CHRIS: WebProcess CRASH As you can see, no gc is happening between 17:34:51 and the critical memory pressure signal at 17:34:57. The WebContent process killed is then killed at 17:35:01 and gc() still hasn't happened. Here is the code I used to do the logging (please check I put the logging in the right place as I am not familiar with JSC): http://pastebin.com/VD06nEfv
Chris Dumez
Comment 10 2015-03-08 17:48:53 PDT
I added logging in EdenGCCallback::doCollection() as well: Mar 8 17:47:11 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107f78c80 - FullGCActivityCallback::doCollection() Mar 8 17:47:12 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:13 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107f78c80 - FullGCActivityCallback::doCollection() Mar 8 17:47:14 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:16 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:18 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:18 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:19 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107f78c80 - FullGCActivityCallback::doCollection() Mar 8 17:47:19 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:23 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:24 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:26 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: Memory pressure, critical: 1 Mar 8 17:47:26 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:26 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:27 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: Memory pressure, critical: 1 Mar 8 17:47:27 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection() Mar 8 17:47:28 Chris-iPhone MobileSafari[4251] <Notice>: CHRIS: WebProcess CRASH
Chris Dumez
Comment 11 2015-03-08 18:03:53 PDT
Here is logging with the heap size before and after the call to collect: http://pastebin.com/1tZ37uju It claims the heap size is ~60MB at the point it crashes. However, I can see there are ~380 MB of IOSurfaces at the point it crashes.
Chris Dumez
Comment 12 2015-03-08 18:16:20 PDT
(In reply to comment #11) > Here is logging with the heap size before and after the call to collect: > http://pastebin.com/1tZ37uju > > It claims the heap size is ~60MB at the point it crashes. However, I can see > there are ~380 MB of IOSurfaces at the point it crashes. We are correctly calling reportExtraMemoryCost(2700000) in HTMLCanvasElement::createImageBuffer() for each Canvas construction. That's about 2.57MB which is very close to the actual size of the IOSurfaces (2.61MB): http://pastebin.com/x3Pj32vw
Chris Dumez
Comment 13 2015-03-08 18:51:50 PDT
Additionally logging when HTMLCanvasElement are constructed / destroyed: http://pastebin.com/JCVDgS0v
Chris Dumez
Comment 14 2015-03-08 19:01:42 PDT
At the point of the crash, there are 157 alive HTMLCanvasElements.
Geoffrey Garen
Comment 15 2015-03-08 23:07:02 PDT
> Here is the code I used to do the logging (please check I put the logging in > the right place as I am not familiar with JSC): http://pastebin.com/VD06nEfv This patch will log full GCs triggered by the full GC timer. (And it looks like you've got logging from the eden GC timer too, not included in this patch.) One thing we'll miss here is synchronous GCs. You can catch those by logging inside Heap::collect. (Unfortunately, then you'll see double logging for every timer-based GC: one in the timer, and one in Heap::collect.)
Geoffrey Garen
Comment 16 2015-03-08 23:32:31 PDT
> At the point of the crash, there are 157 alive HTMLCanvasElements. Crazy! > Mar 8 17:47:19 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107f78c80 - FullGCActivityCallback::doCollection() > ... > Mar 8 17:47:28 Chris-iPhone MobileSafari[4251] <Notice>: CHRIS: WebProcess CRASH So, we're doing eden GCs at a rate of about 1 per second, but 9 seconds have passed since the last full GC. A few thoughts about this: (1) So many canvases surviving GC seems like a bug. It would be interesting to log the number of live canvases at the point of the last full collection, and to use the tool Andreas has for GC backtracing to discover why they survived. Same with the last eden collection. (2) If there's some good reason for the canvases to survive -- something about the design of cnn.com that requires them to stay alive -- it might be a good idea to teach canvases to deallocate their IOSurfaces when they're not in the document and/or painting. This is probably not a substitute for fixing the GC issue, but it's probably desirable for those cases when a website's design simply requires keeping lots of canvases alive. (3) Can you log how long the full GC timer was scheduled for at the time of the crash? We know it was at least 9 seconds, but we don't know how long it was. It seems like there might be a bug in that timer setting. (4) Currently, extra cost only hastens the next eden collection. Perhaps it should hasten the next eden collection and the next full collection, in case the large objects happen to survive eden collection at a pathologically high rate. (5) Are these canvases actually surviving garbage collection, or have we just neglected to sweep them yet? Perhaps the low memory warning should eagerly run all outstanding destructors. The IncrementalSweeper can do this for us.
Chris Dumez
Comment 17 2015-03-09 10:27:54 PDT
(In reply to comment #15) > > Here is the code I used to do the logging (please check I put the logging in > > the right place as I am not familiar with JSC): http://pastebin.com/VD06nEfv > > This patch will log full GCs triggered by the full GC timer. (And it looks > like you've got logging from the eden GC timer too, not included in this > patch.) All the logging I provided was WITHOUT my patch, just vanilla WebKit ToT.
Chris Dumez
Comment 18 2015-03-09 10:32:46 PDT
(In reply to comment #16) > > At the point of the crash, there are 157 alive HTMLCanvasElements. > > Crazy! Over 500 are created so a good amount is getting destroyed by GC, just not enough or fast enough.
Geoffrey Garen
Comment 19 2015-03-09 12:17:58 PDT
Actually, it looks like the logs imply that canvas objects are being identified as garbage by GC, and just not swept promptly enough. Let's do an experiment to force sweeping inside the low memory handler, and see if that helps. It's seems like a problem that (a) Eden collection does not start the incremental sweeper (b) The incremental sweeper does not hasten itself after lots of extra cost has been reported.
Chris Dumez
Comment 20 2015-03-09 12:33:05 PDT
Crash reproduction case: http://jsfiddle.net/fvyw4ba0/
Darin Adler
Comment 21 2015-03-09 13:17:45 PDT
Comment on attachment 248195 [details] Patch OK.
Chris Dumez
Comment 22 2015-03-09 15:11:48 PDT
Geoff, is this the right way to trigger a full sweep on memory pressure? http://pastebin.com/0Wjz8muv If so, I'm afraid it doesn't help. The jsfiddle demo is still crashing for me.
Geoffrey Garen
Comment 23 2015-03-09 15:43:06 PDT
> Geoff, is this the right way to trigger a full sweep on memory pressure? > http://pastebin.com/0Wjz8muv Yes, that's right. > If so, I'm afraid it doesn't help. The jsfiddle demo is still crashing for > me. I think this means that the real bug here is that eden collection doesn't register its finalizers with the sweeper. Side note: I think it would be worthwhile to post your eager finalizer code as a stand-alone patch. It is a good thing to do -- just not for this test case.
Chris Dumez
Comment 24 2015-03-09 16:03:44 PDT
(In reply to comment #22) > Geoff, is this the right way to trigger a full sweep on memory pressure? > http://pastebin.com/0Wjz8muv > > If so, I'm afraid it doesn't help. The jsfiddle demo is still crashing for > me. I was an async repro case: http://jsfiddle.net/fvyw4ba0/3/ Full sweep on memory pressure also doesn't seem to help in this case. A gcController().garbageCollectNow() on memory pressure does prevent this reproduction case from crashing though.
Chris Dumez
Comment 25 2015-03-09 17:24:11 PDT
Chris Dumez
Comment 26 2015-03-09 17:24:59 PDT
This is not enough to fix the crash on cnn.com but this is a step in the right direction.
Geoffrey Garen
Comment 27 2015-03-09 21:49:20 PDT
Comment on attachment 248295 [details] Patch r=me
WebKit Commit Bot
Comment 28 2015-03-09 22:33:18 PDT
Comment on attachment 248295 [details] Patch Clearing flags on attachment: 248295 Committed r181310: <http://trac.webkit.org/changeset/181310>
WebKit Commit Bot
Comment 29 2015-03-09 22:33:22 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.