Bug 142457

Summary: [iOS] Sweep all collected objects on critical memory pressure
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: JavaScriptCoreAssignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, commit-queue, ggaren, kling, sam
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jsfiddle.net/fvyw4ba0/3/
Bug Depends on: 142595, 142593    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Chris Dumez 2015-03-08 12:09:08 PDT
On memory pressure, we don't have much time to free-up memory. If we are not fast enough, the process can end up getting killed.

The mobile version of cnn.com constructs a lot of canvases that use up a lot of memory. On memory pressure, we were calling gcController().garbageCollectSoon(). However, the garbage collection was not triggering fast enough and the WebContent process would often end up being killed. If we call gcController().garbageCollectNow() on memory pressure instead, cnn.com no longer crashes as the canvases get collected in time. 


Radar: <rdar://problem/20044440>
Comment 1 Chris Dumez 2015-03-08 12:11:55 PDT
Created attachment 248193 [details]
Patch
Comment 2 Chris Dumez 2015-03-08 12:16:56 PDT
Comment on attachment 248193 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=248193&action=review

> Source/WebCore/platform/cocoa/MemoryPressureHandlerCocoa.mm:-84
> -        gcController().garbageCollectSoon();

Maybe we could call garbageCollectSoon() on non-critical memory pressure still? This would decrease the likelihood of ending up under critical memory pressure.
Comment 3 Chris Dumez 2015-03-08 12:25:13 PDT
Created attachment 248195 [details]
Patch
Comment 4 Sam Weinig 2015-03-08 13:37:20 PDT
Comment on attachment 248195 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=248195&action=review

> Source/WebCore/platform/cocoa/MemoryPressureHandlerCocoa.mm:88
> -    if (isUnderMemoryPressure()) {
> -        gcController().garbageCollectSoon();
> -    } else {
> -        // If we're not under memory pressure, that means we're here due to impending process suspension.
> -        // Do a full GC since this is our last chance to run any code.
> +    {
>          ReliefLogger log("Collecting JavaScript garbage");
> -        gcController().garbageCollectNow();
> +        if (critical)
> +            gcController().garbageCollectNow();
> +        else
> +            gcController().garbageCollectSoon();

It seems like you are changing the behavior of this code getting called when not under memory pressure (but rather "due to impending process suspension"), where it used to call garbageCollectNow() and now will call seemingly call garbageCollectSoon().  Is that intentional?  Does an call to this in that scenario always have the critical bit set?
Comment 5 Chris Dumez 2015-03-08 13:40:00 PDT
Comment on attachment 248195 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=248195&action=review

>> Source/WebCore/platform/cocoa/MemoryPressureHandlerCocoa.mm:88
>> +            gcController().garbageCollectSoon();
> 
> It seems like you are changing the behavior of this code getting called when not under memory pressure (but rather "due to impending process suspension"), where it used to call garbageCollectNow() and now will call seemingly call garbageCollectSoon().  Is that intentional?  Does an call to this in that scenario always have the critical bit set?

No, I checked and on suspension we always set the critical flag to true.

From WebProcess.cpp:
void WebProcess::processWillSuspend()
{
    MemoryPressureHandler::singleton().releaseMemory(true);
...
}

true is for the critical argument.
Comment 6 Chris Dumez 2015-03-08 13:41:00 PDT
And setting the "critical" flag to true upon suspension makes sense as we want to free-up as much memory as possible in this case.
Comment 7 Geoffrey Garen 2015-03-08 16:01:45 PDT
Comment on attachment 248195 [details]
Patch

A few thoughts about this patch:

(1) Did anybody figure out why the JSC::Heap::reportExtraMemoryCost API was insufficient to collect the canvases at cnn.com?

(2) What is the normal GC timer period at cnn.com, and why is it insufficient to collect the canvases?

(3) It is preferable to fix (1) and/or (2) rather than respond to their failure after the fact in a low memory warning handler, since the low memory warning warning handler will have the negative side effect of dumping useful caches. 

(4) Is there any documented limit on the number of memory warnings we'll get, or the rate at which we'll get them?

In the past, I have seen cases where there was no limit, and so we hung the browser and also out-competed useful responses to memory warning that would have reclaimed memory more effectively than garbage collection. That was a terrible bug, and this patch might introduce a regression in those cases, along with a ticking time bomb that might go off any time the implementation details of memory warnings change.

I'd rather not reintroduce a well-known very bad performance pathology -- especially without clear evidence showing that no better solution is available.
Comment 8 Chris Dumez 2015-03-08 16:19:59 PDT
(In reply to comment #7)
> Comment on attachment 248195 [details]
> Patch
> 
> A few thoughts about this patch:
> 
> (1) Did anybody figure out why the JSC::Heap::reportExtraMemoryCost API was
> insufficient to collect the canvases at cnn.com?
> 
> (2) What is the normal GC timer period at cnn.com, and why is it
> insufficient to collect the canvases?
> 
> (3) It is preferable to fix (1) and/or (2) rather than respond to their
> failure after the fact in a low memory warning handler, since the low memory
> warning warning handler will have the negative side effect of dumping useful
> caches. 
> 
> (4) Is there any documented limit on the number of memory warnings we'll
> get, or the rate at which we'll get them?
> 
> In the past, I have seen cases where there was no limit, and so we hung the
> browser and also out-competed useful responses to memory warning that would
> have reclaimed memory more effectively than garbage collection. That was a
> terrible bug, and this patch might introduce a regression in those cases,
> along with a ticking time bomb that might go off any time the implementation
> details of memory warnings change.
> 
> I'd rather not reintroduce a well-known very bad performance pathology --
> especially without clear evidence showing that no better solution is
> available.

Ok, thanks for the feedback. I'll try to take a look at (1) and (2).
Comment 9 Chris Dumez 2015-03-08 17:41:12 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > Comment on attachment 248195 [details]
> > Patch
> > 
> > A few thoughts about this patch:
> > 
> > (1) Did anybody figure out why the JSC::Heap::reportExtraMemoryCost API was
> > insufficient to collect the canvases at cnn.com?
> > 
> > (2) What is the normal GC timer period at cnn.com, and why is it
> > insufficient to collect the canvases?

I added some logging when loading cnn.com, I see:
Mar  8 17:34:44 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: 0x107f78c80 - Time since last gc: 2044.99 ms
Mar  8 17:34:49 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: 0x107f78c80 - Time since last gc: 4855.09 ms
Mar  8 17:34:50 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: 0x107f78c80 - Time since last gc: 918.94 ms
Mar  8 17:34:51 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: 0x107f78c80 - Time since last gc: 1246 ms
Mar  8 17:34:57 Chris-iPhone com.apple.WebKit.WebContent[3342] <Notice>: CHRIS: Memory pressure, critical: 1
Mar  8 17:35:01 Chris-iPhone MobileSafari[3339] <Notice>: CHRIS: WebProcess CRASH

As you can see, no gc is happening between 17:34:51 and the critical memory pressure signal at 17:34:57. The WebContent process killed is then killed at 17:35:01 and gc() still hasn't happened.

Here is the code I used to do the logging (please check I put the logging in the right place as I am not familiar with JSC): http://pastebin.com/VD06nEfv
Comment 10 Chris Dumez 2015-03-08 17:48:53 PDT
I added logging in EdenGCCallback::doCollection() as well:
Mar  8 17:47:11 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107f78c80 - FullGCActivityCallback::doCollection()
Mar  8 17:47:12 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:13 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107f78c80 - FullGCActivityCallback::doCollection()
Mar  8 17:47:14 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:16 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:18 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:18 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:19 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107f78c80 - FullGCActivityCallback::doCollection()
Mar  8 17:47:19 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:23 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:24 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:26 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: Memory pressure, critical: 1
Mar  8 17:47:26 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:26 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:27 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: Memory pressure, critical: 1
Mar  8 17:47:27 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107fd8688 - EdenGCActivityCallback::doCollection()
Mar  8 17:47:28 Chris-iPhone MobileSafari[4251] <Notice>: CHRIS: WebProcess CRASH
Comment 11 Chris Dumez 2015-03-08 18:03:53 PDT
Here is logging with the heap size before and after the call to collect:
http://pastebin.com/1tZ37uju

It claims the heap size is ~60MB at the point it crashes. However, I can see there are ~380 MB of IOSurfaces at the point it crashes.
Comment 12 Chris Dumez 2015-03-08 18:16:20 PDT
(In reply to comment #11)
> Here is logging with the heap size before and after the call to collect:
> http://pastebin.com/1tZ37uju
> 
> It claims the heap size is ~60MB at the point it crashes. However, I can see
> there are ~380 MB of IOSurfaces at the point it crashes.

We are correctly calling reportExtraMemoryCost(2700000) in HTMLCanvasElement::createImageBuffer() for each Canvas construction. That's about 2.57MB which is very close to the actual size of the IOSurfaces (2.61MB):
http://pastebin.com/x3Pj32vw
Comment 13 Chris Dumez 2015-03-08 18:51:50 PDT
Additionally logging when HTMLCanvasElement are constructed / destroyed:
http://pastebin.com/JCVDgS0v
Comment 14 Chris Dumez 2015-03-08 19:01:42 PDT
At the point of the crash, there are 157 alive HTMLCanvasElements.
Comment 15 Geoffrey Garen 2015-03-08 23:07:02 PDT
> Here is the code I used to do the logging (please check I put the logging in
> the right place as I am not familiar with JSC): http://pastebin.com/VD06nEfv

This patch will log full GCs triggered by the full GC timer. (And it looks like you've got logging from the eden GC timer too, not included in this patch.)

One thing we'll miss here is synchronous GCs. You can catch those by logging inside Heap::collect. (Unfortunately, then you'll see double logging for every timer-based GC: one in the timer, and one in Heap::collect.)
Comment 16 Geoffrey Garen 2015-03-08 23:32:31 PDT
> At the point of the crash, there are 157 alive HTMLCanvasElements.

Crazy!

> Mar  8 17:47:19 Chris-iPhone com.apple.WebKit.WebContent[4254] <Notice>: CHRIS: 0x107f78c80 - FullGCActivityCallback::doCollection()
> ...
> Mar  8 17:47:28 Chris-iPhone MobileSafari[4251] <Notice>: CHRIS: WebProcess CRASH

So, we're doing eden GCs at a rate of about 1 per second, but 9 seconds have passed since the last full GC.

A few thoughts about this:

(1) So many canvases surviving GC seems like a bug. It would be interesting to log the number of live canvases at the point of the last full collection, and to use the tool Andreas has for GC backtracing to discover why they survived. Same with the last eden collection.

(2) If there's some good reason for the canvases to survive -- something about the design of cnn.com that requires them to stay alive -- it might be a good idea to teach canvases to deallocate their IOSurfaces when they're not in the document and/or painting. This is probably not a substitute for fixing the GC issue, but it's probably desirable for those cases when a website's design simply requires keeping lots of canvases alive.

(3) Can you log how long the full GC timer was scheduled for at the time of the crash? We know it was at least 9 seconds, but we don't know how long it was. It seems like there might be a bug in that timer setting. 

(4) Currently, extra cost only hastens the next eden collection. Perhaps it should hasten the next eden collection and the next full collection, in case the large objects happen to survive eden collection at a pathologically high rate.

(5) Are these canvases actually surviving garbage collection, or have we just neglected to sweep them yet? Perhaps the low memory warning should eagerly run all outstanding destructors. The IncrementalSweeper can do this for us.
Comment 17 Chris Dumez 2015-03-09 10:27:54 PDT
(In reply to comment #15)
> > Here is the code I used to do the logging (please check I put the logging in
> > the right place as I am not familiar with JSC): http://pastebin.com/VD06nEfv
> 
> This patch will log full GCs triggered by the full GC timer. (And it looks
> like you've got logging from the eden GC timer too, not included in this
> patch.)

All the logging I provided was WITHOUT my patch, just vanilla WebKit ToT.
Comment 18 Chris Dumez 2015-03-09 10:32:46 PDT
(In reply to comment #16)
> > At the point of the crash, there are 157 alive HTMLCanvasElements.
> 
> Crazy!

Over 500 are created so a good amount is getting destroyed by GC, just not enough or fast enough.
Comment 19 Geoffrey Garen 2015-03-09 12:17:58 PDT
Actually, it looks like the logs imply that canvas objects are being identified as garbage by GC, and just not swept promptly enough.

Let's do an experiment to force sweeping inside the low memory handler, and see if that helps.

It's seems like a problem that

(a) Eden collection does not start the incremental sweeper

(b) The incremental sweeper does not hasten itself after lots of extra cost has been reported.
Comment 20 Chris Dumez 2015-03-09 12:33:05 PDT
Crash reproduction case:
http://jsfiddle.net/fvyw4ba0/
Comment 21 Darin Adler 2015-03-09 13:17:45 PDT
Comment on attachment 248195 [details]
Patch

OK.
Comment 22 Chris Dumez 2015-03-09 15:11:48 PDT
Geoff, is this the right way to trigger a full sweep on memory pressure?
http://pastebin.com/0Wjz8muv

If so, I'm afraid it doesn't help. The jsfiddle demo is still crashing for me.
Comment 23 Geoffrey Garen 2015-03-09 15:43:06 PDT
> Geoff, is this the right way to trigger a full sweep on memory pressure?
> http://pastebin.com/0Wjz8muv

Yes, that's right.

> If so, I'm afraid it doesn't help. The jsfiddle demo is still crashing for
> me.

I think this means that the real bug here is that eden collection doesn't register its finalizers with the sweeper.

Side note: I think it would be worthwhile to post your eager finalizer code as a stand-alone patch. It is a good thing to do -- just not for this test case.
Comment 24 Chris Dumez 2015-03-09 16:03:44 PDT
(In reply to comment #22)
> Geoff, is this the right way to trigger a full sweep on memory pressure?
> http://pastebin.com/0Wjz8muv
> 
> If so, I'm afraid it doesn't help. The jsfiddle demo is still crashing for
> me.

I was an async repro case: http://jsfiddle.net/fvyw4ba0/3/
Full sweep on memory pressure also doesn't seem to help in this case.

A gcController().garbageCollectNow() on memory pressure does prevent this reproduction case from crashing though.
Comment 25 Chris Dumez 2015-03-09 17:24:11 PDT
Created attachment 248295 [details]
Patch
Comment 26 Chris Dumez 2015-03-09 17:24:59 PDT
This is not enough to fix the crash on cnn.com but this is a step in the right direction.
Comment 27 Geoffrey Garen 2015-03-09 21:49:20 PDT
Comment on attachment 248295 [details]
Patch

r=me
Comment 28 WebKit Commit Bot 2015-03-09 22:33:18 PDT
Comment on attachment 248295 [details]
Patch

Clearing flags on attachment: 248295

Committed r181310: <http://trac.webkit.org/changeset/181310>
Comment 29 WebKit Commit Bot 2015-03-09 22:33:22 PDT
All reviewed patches have been landed.  Closing bug.