Bug 76225

Summary: [chromium] Hint garbage collector to run if page uses Canvas contexts
Product: WebKit Reporter: Kenneth Russell <kbr>
Component: WebGLAssignee: Kenneth Russell <kbr>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, andersca, atwilson, bajones, barraclough, cmarrin, danno, dglazkov, enne, fpizlo, ggaren, gman, haraken, jamesr, japhet, jkummerow, jochen, mrobinson, mstarzinger, oliver, sam, senorblanco, tonikitoo, twiz, webkit.review.bot, yongsheng.zhu, zhuyongsheng2010, zmo
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 92287, 92856, 104733    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from gce-cr-linux-03
none
Patch
none
Chart of live WebGLRenderingContexts over time
none
Chart of live WebGLRenderingContexts over time
none
Patch for landing none

Description Kenneth Russell 2012-01-12 16:39:12 PST
Currently, if a page fetches a WebGLRenderingContext for a canvas, and the user navigates away from that page, the WebGLRenderingContext won't be collected until the JavaScript engine's garbage collection runs. This is how ordinary disposal of DOM related objects works.

Web applications using WebGL tend to allocate more GPU resources than most other web applications, though. There have been a couple of reports that on resource constrained devices, it would be beneficial to more quickly release WebGL resources upon navigating away from a page. See http://code.google.com/p/chromium/issues/detail?id=104740 for the most recent report from Intel.

dglazkov@ suggests that a FrameDestructionObserver could be used to trigger deletion of the WebGL context and drawing buffer. I think we should do this.
Comment 1 yongsheng 2012-01-12 18:44:29 PST
I'll follow up the suggestion by dglazkov if no objection.
So I think the observer just triggers the deletion of webGL context and drawing buffer, not others, which is for the consideration of performance, right?
Comment 2 Kenneth Russell 2012-01-12 19:06:48 PST
(In reply to comment #1)
> I'll follow up the suggestion by dglazkov if no objection.
> So I think the observer just triggers the deletion of webGL context and drawing buffer, not others, which is for the consideration of performance, right?

Yes, and also to minimize the potential impact of the change.

This probably has to be written carefully to avoid reference counting cycles and stale observers.
Comment 3 yongsheng 2012-01-19 05:02:02 PST
Here is the rough idea:
Let HTMLCanvasElement inherits from FrameDestructionObserver. When the frame is destroyed, then the context (m_context) of HTMLCanvasElement will be cleared. Make sense?

My question is, is it okay to destroy the context for both WebGLRenderingContext and CanvasRenderingContext2D?
Comment 4 Kenneth Russell 2012-01-19 10:43:16 PST
(In reply to comment #3)
> Here is the rough idea:
> Let HTMLCanvasElement inherits from FrameDestructionObserver. When the frame is destroyed, then the context (m_context) of HTMLCanvasElement will be cleared. Make sense?
> 
> My question is, is it okay to destroy the context for both WebGLRenderingContext and CanvasRenderingContext2D?

I don't think that's the best way to do this. I would instead make a FrameDestructionObserver specifically for the WebGLRenderingContext, but make sure that if the WebGLRenderingContext is destroyed before the frame, that the observer gets unregistered.
Comment 5 yongsheng 2012-01-20 01:46:21 PST
FrameDestructionObserver doesn't make sense here because the frame seems not be destroyed if refreshing a page or switching to a new case. I find that when refreshing or a new page, Frame calls the 'detach' function of the document before the new dom tree is created. So i think in HTMLCanvasElement, by overriding the virtual 'detach' function, the WebGLRenderingContext and drawingbuffer can be freed. What's your opinion?

Another question, is it enough to do the test 'fast/canvas/webgl' of layout tests for this issue?
Comment 6 Kenneth Russell 2012-01-20 12:30:02 PST
(In reply to comment #5)
> FrameDestructionObserver doesn't make sense here because the frame seems not be destroyed if refreshing a page or switching to a new case. I find that when refreshing or a new page, Frame calls the 'detach' function of the document before the new dom tree is created. So i think in HTMLCanvasElement, by overriding the virtual 'detach' function, the WebGLRenderingContext and drawingbuffer can be freed. What's your opinion?

I've talked with jamesr about this and it sounds like the solution is not this simple. It's valid, and common in sophisticated web applications, to have two documents (for example, using window.open()) and move Canvases around between them. Using Canvas' detach() method as a hint, even just to "lose" the WebGLRenderingContext, will be too much of a penalty.

It seems that the only true way to determine that a WebGLRenderingContext or other DOM-related object is really unreachable is for all of its JavaScript references to go away. Unfortunately, as we've seen, doing so requires full GCs to run, and these are expensive.

A couple of other suggestions were discussed. One was to modify DumpRenderTree to force full GCs between test runs. This won't work well because it will only fix layout test runs. The WebGL conformance tests exhibit the same problem and run within the full browser.

Another suggestion was to use V8's V8::AdjustAmountOfExternalAllocatedMemory() API to artificially increase heap pressure when a WebGLRenderingContext is created. This, however, will cause more full GCs to be performed, and this is a really bad idea for the overall performance of web applications. Also, it's a solution that will only work on one JS engine.

There was an idea to watch for calls to the detach() method of the owning Document of the Canvas (rather than the Canvas itself), and use those as heuristics for when to lose the WebGLRenderingContext.

Another idea, possibly the most principled, would be to watch for destruction of the ScriptExecutionContext (see ContextDestructionObserver in ActiveDOMObject.h) and force loss of the WebGL context when that happens. From what I have heard, it may still be possible for another ScriptExecutionContext to have a reference to the same DOM object, but this is probably not a common case.

In summary, though, making correct WebGL test execution depend on eager destruction of the WebGL contexts is not going to be a reliable solution. At this point, I firmly believe that fixing Bug 76654 (reallocate the WebGL back buffer at a smaller size if allocation fails) is the best solution and should be the top priority.

> Another question, is it enough to do the test 'fast/canvas/webgl' of layout tests for this issue?

A full layout test run would definitely be required. Doing full layout test runs should be a habit.
Comment 7 yongsheng 2012-01-20 17:55:11 PST
> There was an idea to watch for calls to the detach() method of the owning Document of the Canvas (rather than the Canvas itself), and use those as heuristics for when to lose the WebGLRenderingContext.
> 
> Another idea, possibly the most principled, would be to watch for destruction of the ScriptExecutionContext (see ContextDestructionObserver in ActiveDOMObject.h) and force loss of the WebGL context when that happens. From what I have heard, it may still be possible for another ScriptExecutionContext to have a reference to the same DOM object, but this is probably not a common case.
emm... seems two better ideas than previous ones. I'll try to investigate them and find out whether they works.

> In summary, though, making correct WebGL test execution depend on eager destruction of the WebGL contexts is not going to be a reliable solution. At this point, I firmly believe that fixing Bug 76654 (reallocate the WebGL back buffer at a smaller size if allocation fails) is the best solution and should be the top priority.
> 
okay, i'm gonna fix that bug firstly if there is no owner working on it.
> > Another question, is it enough to do the test 'fast/canvas/webgl' of layout tests for this issue?
> 
> A full layout test run would definitely be required. Doing full layout test runs should be a habit.
Yes, of course. I ran the full layout tests on Ubuntu for the webkit chromium port without any code changes. However, it reports many regressions. Below are some failures:
  fast/canvas/arc360.html = IMAGE
  fast/canvas/canvas-before-css.html = IMAGE
  fast/canvas/canvas-composite.html = IMAGE
  fast/canvas/canvas-resize-reset.html = IMAGE
Have you met this kind of problem before?
Comment 8 Kenneth Russell 2012-01-20 18:15:10 PST
(In reply to comment #7)
> > There was an idea to watch for calls to the detach() method of the owning Document of the Canvas (rather than the Canvas itself), and use those as heuristics for when to lose the WebGLRenderingContext.
> > 
> > Another idea, possibly the most principled, would be to watch for destruction of the ScriptExecutionContext (see ContextDestructionObserver in ActiveDOMObject.h) and force loss of the WebGL context when that happens. From what I have heard, it may still be possible for another ScriptExecutionContext to have a reference to the same DOM object, but this is probably not a common case.
> emm... seems two better ideas than previous ones. I'll try to investigate them and find out whether they works.
> 
> > In summary, though, making correct WebGL test execution depend on eager destruction of the WebGL contexts is not going to be a reliable solution. At this point, I firmly believe that fixing Bug 76654 (reallocate the WebGL back buffer at a smaller size if allocation fails) is the best solution and should be the top priority.
> > 
> okay, i'm gonna fix that bug firstly if there is no owner working on it.

Nobody is working on it, so great, thanks. Do you have the ability to assign bugs to yourself? If not let me know.

> > > Another question, is it enough to do the test 'fast/canvas/webgl' of layout tests for this issue?
> > 
> > A full layout test run would definitely be required. Doing full layout test runs should be a habit.
> Yes, of course. I ran the full layout tests on Ubuntu for the webkit chromium port without any code changes. However, it reports many regressions. Below are some failures:
>   fast/canvas/arc360.html = IMAGE
>   fast/canvas/canvas-before-css.html = IMAGE
>   fast/canvas/canvas-composite.html = IMAGE
>   fast/canvas/canvas-resize-reset.html = IMAGE
> Have you met this kind of problem before?

Yes. The first step is to follow the instructions on http://code.google.com/p/chromium/wiki/LayoutTestsLinux if you haven't already. If you have, then you need to use your best judgment about whether the failure is related to your changes. I don't know how to precisely reproduce the configuration on the testing bots, nor why layout test runs seem to frequently have many failures.
Comment 9 yongsheng 2012-01-20 18:20:57 PST
> > okay, i'm gonna fix that bug firstly if there is no owner working on it.
> 
> Nobody is working on it, so great, thanks. Do you have the ability to assign bugs to yourself? If not let me know.
Sorry, I don't. Please help me.
Comment 10 Zhenyao Mo 2012-01-22 11:16:00 PST
(In reply to comment #9)
> > > okay, i'm gonna fix that bug firstly if there is no owner working on it.
> > 
> > Nobody is working on it, so great, thanks. Do you have the ability to assign bugs to yourself? If not let me know.
> Sorry, I don't. Please help me.

Done.
Comment 11 Kenneth Russell 2012-05-08 17:40:48 PDT
crogers and abarth pointed out that making WebGLRenderingContext an ActiveDOMObject would give it the desired semantics, or at least something very close to the desired semantics. abarth's high level point is that ContextDestructionObserver (which ActiveDOMObject subclasses) is good for shutting down things that are tied to the Document, where FrameDestructionObserver is good for shutting down things tied to the lifetime of the surrounding Frame. Since the WebGLRenderingContext is tied to the Canvas which is tied to the Document, it matches ActiveDOMObject more closely.

For correctness, we could just have the WebGLRenderingContext force a context lost event when the page is destroyed. That way if the canvas is still reachable from JavaScript after the document has been destroyed (for example, transplanted from one page to another) it can continue to work in the new page.
Comment 12 yongsheng 2012-05-09 00:10:23 PDT
(In reply to comment #11)
> crogers and abarth pointed out that making WebGLRenderingContext an ActiveDOMObject would give it the desired semantics, or at least something very close to the desired semantics. abarth's high level point is that ContextDestructionObserver (which ActiveDOMObject subclasses) is good for shutting down things that are tied to the Document, where FrameDestructionObserver is good for shutting down things tied to the lifetime of the surrounding Frame. Since the WebGLRenderingContext is tied to the Canvas which is tied to the Document, it matches ActiveDOMObject more closely.

Is this because FrameDestructionObserver will be notified once the page is destroyed, but ContextDestructionObserver won't be notified? So WebGLRenderingContext could be reused later as described below?

Frame Destruction is done once the page is destroyed, what about document destruction?

> 
> For correctness, we could just have the WebGLRenderingContext force a context lost event when the page is destroyed. That way if the canvas is still reachable from JavaScript after the document has been destroyed (for example, transplanted from one page to another) it can continue to work in the new page.
Comment 13 Kenneth Russell 2012-05-09 13:57:46 PDT
(In reply to comment #12)
]> Is this because FrameDestructionObserver will be notified once the page is destroyed, but ContextDestructionObserver won't be notified? So WebGLRenderingContext could be reused later as described below?

I think the issue is more that the lifetime of the Frame may differ from that of the Document. My understanding is that if WebGLRenderingContext were an ActiveDOMObject and the page were destroyed, the WebGLRenderingContext would still be lost, even if the Canvas had been moved to a document that was still alive. This is a rare enough situation that the improvements in resource consumption should far outweigh the pitfalls in this scenario.
Comment 14 Kenneth Russell 2012-05-25 16:05:27 PDT
mrobinson pointed out on IRC that after fixing Bug 80509 the GTK port passes the stress test https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/conformance/context/context-creation-and-destruction.html ,, although it runs slowly -- and they were running on NVIDIA hardware, e.g. a real GPU. So it's possible there is a dangling pointer in the Chromium implementation that prevents these resources from being collected efficiently.

Still worth investigating when the Canvas and WebGLRenderingContext objects are GC'd in this example in other ports, and whether making WebGLRenderingContext an ActiveDOMObject would improve the situation.
Comment 15 yongsheng 2012-07-02 00:29:21 PDT
(In reply to comment #14)
> mrobinson pointed out on IRC that after fixing Bug 80509 the GTK port passes the stress test https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/conformance/context/context-creation-and-destruction.html ,, although it runs slowly -- and they were running on NVIDIA hardware, e.g. a real GPU. So it's possible there is a dangling pointer in the Chromium implementation that prevents these resources from being collected efficiently.
> 
> Still worth investigating when the Canvas and WebGLRenderingContext objects are GC'd in this example in other ports, and whether making WebGLRenderingContext an ActiveDOMObject would improve the situation.
Anyone working on this?
If no, maybe i can do investigation and some experiments according to your suggestions and then try to draft out one patch.
Comment 16 Kenneth Russell 2012-07-10 17:05:24 PDT
(In reply to comment #15)
> (In reply to comment #14)
> > mrobinson pointed out on IRC that after fixing Bug 80509 the GTK port passes the stress test https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/conformance/context/context-creation-and-destruction.html ,, although it runs slowly -- and they were running on NVIDIA hardware, e.g. a real GPU. So it's possible there is a dangling pointer in the Chromium implementation that prevents these resources from being collected efficiently.
> > 
> > Still worth investigating when the Canvas and WebGLRenderingContext objects are GC'd in this example in other ports, and whether making WebGLRenderingContext an ActiveDOMObject would improve the situation.
> Anyone working on this?
> If no, maybe i can do investigation and some experiments according to your suggestions and then try to draft out one patch.

Hi Yongsheng,

I'm currently investigating the WebGL resource leak in http://crbug.com/128484 . There is definitely a leak in this test case; I'm trying to track down where it is, and to see if it's a more general problem.
Comment 17 yongsheng 2012-07-10 18:16:46 PDT
> Hi Yongsheng,
> 
> I'm currently investigating the WebGL resource leak in http://crbug.com/128484 . There is definitely a leak in this test case; I'm trying to track down where it is, and to see if it's a more general problem.
ok, thanks for update. I'll track its status.
Comment 18 Kenneth Russell 2012-07-12 16:30:30 PDT
I've investigated the resource leak with the WebGL stress tests from NVIDIA attached to http://code.google.com/p/chromium/issues/detail?id=128484 . It's happening only in Chrome, and because full GCs aren't occurring often enough. Plenty of young generation scavenges occur, but the canvases and their associated WebGLRenderingContexts are only collected upon full GC. (Even if they could be collected in scavenges, these objects might be promoted to the old generation, in which case they would only be collected during a full GC.)

I also tested with the context-creation-and-destruction.html stress test in the WebGL conformance suite. In this test case we would also like to collect the canvases and contexts more eagerly.

After consideration, the behavior we want is to simply tell the garbage collector to run soon after WebGL context creation. WebGL context creation is already a relatively expensive operation, so even though forcing a GC is a not insignificant cost, we do not expect real applications to be impacted.

The other solutions considered above, such as triggering an event if the canvas is removed from the web page, don't really address the problem. For example, it's perfectly valid for a canvas to be used for WebGL rendering even if it isn't added to the document.

JSC seems to GC often enough that it isn't affected by these issues, but I'd still like to hint its garbage collector to run soon after a WebGL context is created so that if its GC algorithm is ever adjusted, the behavior for collection of WebGL contexts doesn't change significantly.

Patch forthcoming which implements this.
Comment 19 Kenneth Russell 2012-07-12 16:39:49 PDT
Created attachment 152094 [details]
Patch
Comment 20 Kenneth Russell 2012-07-12 16:40:39 PDT
Geoff, Oliver: could you take a look at the addition of ScriptController::lowMemoryNotification on the JSC side?
Comment 21 WebKit Review Bot 2012-07-12 18:59:35 PDT
Comment on attachment 152094 [details]
Patch

Attachment 152094 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13200961

New failing tests:
http/tests/w3c/webperf/approved/navigation-timing/html/test_performance_attributes_exist_in_object.html
Comment 22 WebKit Review Bot 2012-07-12 18:59:40 PDT
Created attachment 152130 [details]
Archive of layout-test-results from gce-cr-linux-03

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: gce-cr-linux-03  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.39-gcg-201203291735-x86_64-with-Ubuntu-10.04-lucid
Comment 23 Kenneth Russell 2012-07-13 11:06:59 PDT
The layout test failure on the cr-ews bot is definitely unrelated to this patch.
Comment 24 Kenneth Russell 2012-07-13 11:24:20 PDT
CC'ing several more people who have done reviews in JSC's ScriptController.h.

I don't expect the addition of the lowMemoryNotification method to be controversial. I do want to enforce uniform behavior, however, so that even if JSC's garbage collection algorithm changes, the behavior of these various WebGL stress tests will be the same.
Comment 25 Stephen White 2012-07-13 12:51:45 PDT
I don't feel qualified to review this, but I seem to remember that calls to manually trigger garbage collection are somewhat frowned on by the V8 team; should we get one of them to look in here?

If it is a good approach, is it something we should consider for accelerated canvas as well, since it also can hang onto GPU resources after navigating away?
Comment 26 Kenneth Russell 2012-07-13 13:18:08 PDT
=
Comment 27 Kenneth Russell 2012-07-13 13:23:30 PDT
(In reply to comment #25)
> I don't feel qualified to review this, but I seem to remember that calls to manually trigger garbage collection are somewhat frowned on by the V8 team; should we get one of them to look in here?

+danno from the V8 team for comment. However, I do not want to block this patch on the V8 team's approval. It fixes reliability problems with WebGL stress tests without impacting other areas of the web platform.

I'm aware of the pitfalls of triggering full GCs, but I've carefully considered all of the options and this is the solution which best fits the constraints and available primitives.

> If it is a good approach, is it something we should consider for accelerated canvas as well, since it also can hang onto GPU resources after navigating away?

Possibly. 2D Canvas is in much broader use than WebGL, though, so we might want to consider other options there. For example, it might be desirable to trigger GC soon after a page containing a 2D canvas is navigated away from, rather than waiting for the next time a 2D context is allocated.
Comment 28 Kenneth Russell 2012-07-13 14:13:49 PDT
At this point I'm looking for an r+ based on the code changes to WebGLRenderingContext and the explanation of the behavior above, in the ChangeLog and in the comments. I think the changes to JSC's ScriptController are trivial and non-controversial, and we can revisit them if necessary. I don't want to hold this patch up any longer as we want to re-enable these stress tests in the WebGL conformance suite and doing so is blocked by this.
Comment 29 Geoffrey Garen 2012-07-13 15:27:56 PDT
As a general rule, if you're thinking about triggering a GC to fix a correctness bug, you're doing it wrong. In particular, you're creating the potential for pathological collection churn, and you're not fixing the cases where relevant objects are, in fact, still reachable from some variable or closure somewhere.

We're working hard to remove existing explicit GC invocations because of the problems they have created and/or masked in the past. Please don't add new explicit GC invocations.

> Currently, if a page fetches a WebGLRenderingContext for a canvas, and the user navigates away from
> that page, the WebGLRenderingContext won't be collected until the JavaScript engine's garbage
> collection runs. This is how ordinary disposal of DOM related objects works.

It sounds like WebGL rendering contexts shouldn't work like ordinary DOM objects, since they're backed by a scarce resource. A better solution would be to add explicit pooled management for GPU resources.
Comment 30 Kenneth Russell 2012-07-13 15:58:50 PDT
(In reply to comment #29)
> It sounds like WebGL rendering contexts shouldn't work like ordinary DOM objects, since they're backed by a scarce resource. A better solution would be to add explicit pooled management for GPU resources.

There are already explicit management APIs (e.g., "delete") for the GPU resources allocated by the WebGLRenderingContext. Applications are expected to use those, rather than rely on GC of those resources, if they expect to stay within a certain memory footprint.

However, the context itself doesn't have a "delete" API. Neither does the CanvasRenderingContext2D. Both are fairly heavyweight at this point given the introduction of GPU accelerated rendering for the 2D canvas context. Some mechanism is necessary to ensure that the JavaScript references to these objects can be cleaned up in a timely fashion.

Since JSC doesn't currently use a generational garbage collection algorithm, any JavaScript object allocation eventually provokes a full GC which provides the opportunity to clean up unreferenced canvases and contexts (2D or 3D). This technical detail is the only reason these stress tests work in Safari and not Chromium.

I absolutely have to find some solution for the Chromium port. If you have suggestions about cross-port solutions you would find acceptable, please post them here. However, in the meantime I will re-cast this as a V8-specific patch.
Comment 31 Geoffrey Garen 2012-07-16 11:11:40 PDT
> There are already explicit management APIs (e.g., "delete") for the GPU resources allocated by the WebGLRenderingContext.

Actually, I'm suggesting explicit management inside WebCore -- not a Web API for explicit management.

I'm surprised that we already have some explicit "delete" APIs in the Web platform. Usually, we don't use that kind of API because it's too easy to get wrong, and a Web programmer doesn't have enough information about the platform to make the right decision.

> However, the context itself doesn't have a "delete" API. Neither does the CanvasRenderingContext2D.

Going forward, we need explicit management inside WebCore for any APIs that might render the GPU inoperative.

> Some mechanism is necessary to ensure that the JavaScript references to these objects can be cleaned up in a timely fashion.

JavaScript references are a red herring here. I think you're focusing on JavaScript references because it just-so-happens that this failing test operates on WebGL contexts whose sole references are (a) from JavaScript and (b) garbage-collectable. But it would be trivial to construct a test or a web app that operated on WebGL contexts that were referenced outside of JavaScript and/or couldn't be garbage collected. We need to make those apps work. We should learn the lesson this test is trying to teach us instead of sweeping it under the rug.

> However, in the meantime I will re-cast this as a V8-specific patch.

I don't think making this patch V8-specific would resolve any of the issues I've raised.
Comment 32 Zhenyao Mo 2012-07-18 12:56:07 PDT
(In reply to comment #29)
> As a general rule, if you're thinking about triggering a GC to fix a correctness bug, you're doing it wrong. In particular, you're creating the potential for pathological collection churn, and you're not fixing the cases where relevant objects are, in fact, still reachable from some variable or closure somewhere.
> 
> We're working hard to remove existing explicit GC invocations because of the problems they have created and/or masked in the past. Please don't add new explicit GC invocations.
> 
> > Currently, if a page fetches a WebGLRenderingContext for a canvas, and the user navigates away from
> > that page, the WebGLRenderingContext won't be collected until the JavaScript engine's garbage
> > collection runs. This is how ordinary disposal of DOM related objects works.
> 
> It sounds like WebGL rendering contexts shouldn't work like ordinary DOM objects, since they're backed by a scarce resource. A better solution would be to add explicit pooled management for GPU resources.

I don't think GPU resource management here would work.  For one thing, we don't know how many resources we can hold in the pool or manage.  Besides webkit, there are other consumers, and other applications besides our browser.
Comment 33 Kenneth Russell 2012-07-20 14:01:15 PDT
Created attachment 153596 [details]
Patch
Comment 34 Kenneth Russell 2012-07-20 14:07:25 PDT
I've recast this patch to incorporate the minimal change in behavior which allows V8 to reach parity with JSC's GC algorithm on WebGL stress tests. Now creation of a canvas context (either 2D or WebGL) hints to the isolate that upon the next page navigation, a low memory hint should be sent to V8, which is likely to cause it to do a full GC.

In the future, we do aim to track how many GPU resources a WebGLRenderingContext is holding on to, and forcibly send lost context notifications to contexts on backgrounded tabs holding on to lots of resources. This is a larger project, though; the current patch is a step toward a solution.
Comment 35 Kenneth Russell 2012-07-20 14:14:07 PDT
BTW, I compared SunSpider runs with and without this change and there was no difference.
Comment 36 Kentaro Hara 2012-07-21 01:41:37 PDT
Given the current design of V8's GC, the proposed patch looks reasonable to me.

[1] V8 cannot reclaim (dependent) DOM objects in the scavenger GC. All DOM objects have to wait for the full GC.

[2] Without V8::LowMemoryNotification() in your patch, V8 has no way to know that WebKit is wasting a lot of memory. If V8 doesn't notice the memory bloat, V8 won't trigger the full GC.

Provided that we accept the behavior [1], the only way to fix the problem would be to explicitly call V8::LowMemoryNotification(). In that sense, the proposed patch looks reasonable.

However, the root cause of the problem is [1]. The correct fix would be to reclaim DOM objects in the scavenger GC. Then the problem in the stress test in your patch will be automatically solved without any V8::LowMemoryNotification() call. (Of course, in an unfortunate case where wrapper objects are promoted to the old space, we have to wait for the full GC. But that's the way a generational GC works.)

I've been implementing the new scavenger GC that can reclaim DOM objects. However, since this is a big change across V8, V8 bindings and WebCore, it will take time to convince all the folks. In that sense, solving your problem with V8::LowMemoryNotification() for now sounds reasonable to me.

================

> This is a rare enough situation that the improvements in resource consumption should far outweigh the pitfalls in this scenario.

Just a comment: BTW, I don't like an approach that violates correctness in any way. Even if we believe it would be a rare case, it can cause a serious bug we couldn't come up with in the future, which will be difficult to debug.
Comment 37 Kenneth Russell 2012-07-23 10:13:47 PDT
(In reply to comment #36)
> I've been implementing the new scavenger GC that can reclaim DOM objects. However, since this is a big change across V8, V8 bindings and WebCore, it will take time to convince all the folks. In that sense, solving your problem with V8::LowMemoryNotification() for now sounds reasonable to me.

Thanks for your review and comment. In this case would you please r+ the patch?

> > This is a rare enough situation that the improvements in resource consumption should far outweigh the pitfalls in this scenario.
> 
> Just a comment: BTW, I don't like an approach that violates correctness in any way. Even if we believe it would be a rare case, it can cause a serious bug we couldn't come up with in the future, which will be difficult to debug.

Completely agree that correctness must not be violated. In the scenario in my comment #13 above, it would actually be legal to force a "lost context" event against a WebGLRenderingContext removed from one page and inserted into another. Currently, though, we're not pursuing that approach.
Comment 38 mstarzinger 2012-07-23 12:15:50 PDT
(In reply to comment #33)
> Created an attachment (id=153596) [details]

I have strong objections against this patch. It seems to call v8::V8::LowMemoryNotification() to force a collection. This API method is much more than a "hint". It will force V8 to perform a last-resort GC, that might take up to 7 (!) full GC cycles during which V8 tries to squeeze out and release every free byte on the collected heap it can find. This will introduce a pauses that is up to several seconds long, depending on the size of the heap. This is why we also call it "last-resort GC".

http://code.google.com/p/v8/source/browse/tags/3.12.14/src/heap.h#1064

On a more general level: Forcing a GC at frequent points in the code is generally not a good idea, because those forced GCs will be non-incremental and hence introduce significant pauses into the overall system. In most of the cases this trade-off plays out against forced GCs.

One way to tell V8 that a lot of external memory is currently attached to some object on the heap, is to use V8::AdjustAmountOfExternalAllocatedMemory() and tell V8 how much external memory is dangling off the heap so that the GC can account for that in its heuristics. If that doesn't have the desired effect we can work on tweaking that.

If this approach isn't applicable for WebGL rendering contexts, we can think of other hints that the bindings can pass to V8, that identify these contexts as "heavy" objects which should be reclaimed early. One idea that Danno mentioned today, was to (kind of) keep them in the new-space (V8's young generation) and prolong their promotion period. This way those objects will be scavenged frequently and hence disposed as quickly as possible.
Comment 39 mstarzinger 2012-07-23 13:44:16 PDT
As discussed offline with Kenneth: Using AdjustAmountOfExternalAllocatedMemory isn't feasible, because it introduces more full GCs instead of scavenges and WebGL apps benefit from frequent (smaller) scavenges. We have seen a similar situation for V8's Splay benchmark and the Spinning Balls benchmark in the past, so this is not at all uncommon.

As an interim solution it might be worth considering to use V8::IdleNotification() instead of V8::LowMemoryNotification(), which is a much "softer hint" and not force a last-resort GC. Using and idle notification is OK in this situation as far as I can tell.
Comment 40 Kenneth Russell 2012-07-23 15:13:20 PDT
Created attachment 153872 [details]
Chart of live WebGLRenderingContexts over time
Comment 41 jochen 2012-07-23 15:16:49 PDT
(In reply to comment #40)
> Created an attachment (id=153872) [details]
> Chart of live WebGLRenderingContexts over time

can you include a graph without any notification patched in?
Comment 42 Kenneth Russell 2012-07-23 15:29:50 PDT
(In reply to comment #39)
> As discussed offline with Kenneth: Using AdjustAmountOfExternalAllocatedMemory isn't feasible, because it introduces more full GCs instead of scavenges and WebGL apps benefit from frequent (smaller) scavenges. We have seen a similar situation for V8's Splay benchmark and the Spinning Balls benchmark in the past, so this is not at all uncommon.
> 
> As an interim solution it might be worth considering to use V8::IdleNotification() instead of V8::LowMemoryNotification(), which is a much "softer hint" and not force a last-resort GC. Using and idle notification is OK in this situation as far as I can tell.

Above is a chart comparing the use of V8::IdleNotification() in a loop (e.g., "while (!V8::IdleNotification()) {}") to V8::LowMemoryNotification for this use case. The stress test used was that from http://code.google.com/p/chromium/issues/detail?id=128484 . Each was called within the V8Proxy::hintForGCIfNecessary() method added in the patch above. The data points are the number of outstanding WebGLRenderingContext C++ objects. The measurements were taken immediately after calling each V8 API, during page navigation, before the new page started execution and consequently created new WebGLRenderingContexts.

Unfortunately, from this graph, it is clear that V8's IdleNotification is not doing as much work as its LowMemoryNotification. It is not discovering and cleaning up all of the unreachable WebGLRenderingContext references.

If one uses the Inspector's Heap Snapshot to see what is keeping the WebGLRenderingContext objects alive at the point where many are outstanding, only one WebGLRenderingContext is reported as alive -- the one the current demonstration is using. The reason is that the Inspector runs a full GC just before taking the heap snapshot.

I will be glad to evolve any heuristics in this area in the future as new V8 APIs are added to make different hints to the GC, and as new GC algorithms are added which can collect DOM nodes in young generation scavenges. Until that point, however, I request to be allowed to land this patch in its current form. It is the only proposed solution that achieves the desired results at the current time.
Comment 43 Kenneth Russell 2012-07-23 17:56:04 PDT
Created attachment 153923 [details]
Chart of live WebGLRenderingContexts over time
Comment 44 Kenneth Russell 2012-07-23 18:03:06 PDT
(In reply to comment #41)
> can you include a graph without any notification patched in?

Yes. Revised graph attached which includes this as well as tracks the number of live contexts over a longer period of time. It's easy to see that without any notifications the system eventually performs a full GC and reclaims all of the contexts, but it takes a long period of time.

Even though the idle notifications improve the situation, Chrome's WebGL implementation still occasionally fails to allocate a full-resolution back buffer at some points where the number of live contexts is high. Therefore I still would like to commit the version of this patch which sends low memory notifications rather than idle notifications. I fully expect we will iterate on this patch and add other heuristics and mechanisms, but again, I would first like to reach parity with how JSC behaves on this stress test.
Comment 45 Kentaro Hara 2012-07-23 18:11:01 PDT
General comments:

- The correct fix is to reclaim DOM nodes in the scavenger GC. But it takes time. Thus the goal of this patch would be to seek a _modest_temporary_ solution for the WebGLRenderingCotnext problem. Practically speaking, rather than seeking a beautiful solution, landing this patch quickly with a temporal solution would be much more important. In that sense, I think it is OK to hackily use the current V8 APIs and solve our problem somehow.

- So I am happy to r+ the LowMemoryNotification() patch, but I'd like to wait for approval from michael and danno before r+ing it.



A couple of questions:

> As discussed offline with Kenneth: Using AdjustAmountOfExternalAllocatedMemory isn't feasible, because it introduces more full GCs instead of scavenges and WebGL apps benefit from frequent (smaller) scavenges.

You mentioned that LowMemoryNotification(), which triggers 7 full GCs, works fine for your problem. Then why do you think that more full GCs caused by AdjustAmountOfExternalAllocatedMemory() are not feasible? More full GCs are what you are expecting, aren't they?

> As an interim solution it might be worth considering to use V8::IdleNotification() instead of V8::LowMemoryNotification(), which is a much "softer hint" and not force a last-resort GC. Using and idle notification is OK in this situation as far as I can tell.

What happens if you call IdleNotification() multiple times until it returns true? (true means that V8 has completed enough cleanup.)



Just a comment:

> One idea that Danno mentioned today, was to (kind of) keep them in the new-space (V8's young generation) and prolong their promotion period. This way those objects will be scavenged frequently and hence disposed as quickly as possible.

Yes, but this idea wouldn't be helpful in the current scavenger GC that cannot reclaim DOM nodes.
Comment 46 Kenneth Russell 2012-07-23 19:12:04 PDT
(In reply to comment #45)
> General comments:
> 
> - The correct fix is to reclaim DOM nodes in the scavenger GC. But it takes time. Thus the goal of this patch would be to seek a _modest_temporary_ solution for the WebGLRenderingCotnext problem. Practically speaking, rather than seeking a beautiful solution, landing this patch quickly with a temporal solution would be much more important. In that sense, I think it is OK to hackily use the current V8 APIs and solve our problem somehow.

I agree that being able to reclaim DOM nodes in young generation GCs will improve the situation -- but I am not 100% sure that it will completely fix it, because if the DOM node or the WebGLRenderingContext happens to get promoted, it won't get reclaimed until the next full GC.

I have a feeling that a more general solution might be to segregate all JS wrappers for C++ objects into their own generation, and collect that generation more often than the young gen, but less often than the old generation -- but this is a topic for an offline conversation.

> - So I am happy to r+ the LowMemoryNotification() patch, but I'd like to wait for approval from michael and danno before r+ing it.

Sure.

> A couple of questions:
> 
> > As discussed offline with Kenneth: Using AdjustAmountOfExternalAllocatedMemory isn't feasible, because it introduces more full GCs instead of scavenges and WebGL apps benefit from frequent (smaller) scavenges.
> 
> You mentioned that LowMemoryNotification(), which triggers 7 full GCs, works fine for your problem. Then why do you think that more full GCs caused by AdjustAmountOfExternalAllocatedMemory() are not feasible? More full GCs are what you are expecting, aren't they?

The full GCs are only desired at specific points in time. We do not want to increase memory pressure in general and cause full GCs to happen instead of scavenges. While a WebGL application is running, we want only scavenges to occur. If a full GC occurs the application will drop frames. I don't know how AdjustAmountOfExternalAllocatedMemory's heuristics work but I am 100% sure that it can not behave exactly as desired.

Note that this patch is only being proposed as an initial step. The current thought is that WebGL's lost context notifications will be forcibly used in some situations to release GPU resources: for example, upon navigating away from the page, or when many tabs are backgrounded that are using lots of resources. In these cases, the JavaScript and C++ objects will still be alive, but the resources they reference will be released.

> > As an interim solution it might be worth considering to use V8::IdleNotification() instead of V8::LowMemoryNotification(), which is a much "softer hint" and not force a last-resort GC. Using and idle notification is OK in this situation as far as I can tell.
> 
> What happens if you call IdleNotification() multiple times until it returns true? (true means that V8 has completed enough cleanup.)

That's shown in the attached chart. (In that experiment, IdleNotification() is called in a loop.) The result is that the number of live WebGLRenderingContexts hovers around 5, sometimes going a little higher, sometimes a little lower. Again, I don't know how the algorithm differs from LowMemoryNotification(), but it definitely isn't as aggressive, and in some situations it fails to collect "enough" contexts to prevent poor behavior.

> 
> Just a comment:
> 
> > One idea that Danno mentioned today, was to (kind of) keep them in the new-space (V8's young generation) and prolong their promotion period. This way those objects will be scavenged frequently and hence disposed as quickly as possible.
> 
> Yes, but this idea wouldn't be helpful in the current scavenger GC that cannot reclaim DOM nodes.

Right. That was being discussed as a longer-term solution, once your new scavenger GC is in place.
Comment 47 Kentaro Hara 2012-07-23 19:26:14 PDT
Thanks kbr, your point makes sense to me. Let's wait for opinions of V8 folks.
Comment 48 Daniel Clifford 2012-07-24 01:30:42 PDT
 I really don't like it, but as a quick fix it seems tolerable. Ken, go ahead and land this under one condition: Please work with Ulan when he returns next week to clean this up so you can avoid calling LowMemoryNotification. Specifically, I was under the impression that we already handle page navigations specially. Ulan specifically added special logic recently to recognize when navigating away from a page to trigger a full collection under some circumstances. Maybe the existing mechanism isn't catching your situation, or maybe it's not doing an aggressive enough GC. In any case, I would like to see your requirements put in single, common "clean slate" mechanism for navigating away from a page so we can contain the damage of this kind explicit GC to one place in the code.
Comment 49 Kentaro Hara 2012-07-24 01:55:48 PDT
Comment on attachment 153596 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=153596&action=review

Marking r+ based on the following points:
- danno says OK.
- This change affects V8 only.
- The implementation looks good to me.

BTW, is page navigation the only concern? For example, the test case in this patch is creating a lot of canvases without navigating pages.

> Source/WebCore/bindings/v8/V8Binding.cpp:114
> +void V8BindingPerIsolateData::setLowMemoryNotificationHint(bool hint)
> +{
> +    m_lowMemoryNotificationHint = hint;
> +}
> +
> +bool V8BindingPerIsolateData::getLowMemoryNotificationHint() const
> +{
> +    return m_lowMemoryNotificationHint;
> +}

Nit: You can write them in the header file.

> Source/WebCore/bindings/v8/V8Binding.h:228
> +        void setLowMemoryNotificationHint(bool);
> +        bool getLowMemoryNotificationHint() const;

Nit: I would prefer setLowMemoryNotificationHint(), clearLowMemoryNotificationHint() and isLowMemoryNotificationHint().

> LayoutTests/fast/canvas/webgl/context-creation-and-destruction.html:12
> +<div id="description"></div>
> +<div id="console"></div>

Nit: These are not needed. Will be added automatically.
Comment 50 Kenneth Russell 2012-07-24 14:34:33 PDT
(In reply to comment #48)
> Ken, go ahead and land this under one condition: Please work with Ulan when he returns next week to clean this up so you can avoid calling LowMemoryNotification.

I definitely will.
Comment 51 Kenneth Russell 2012-07-24 14:37:20 PDT
(In reply to comment #49)
> 
> BTW, is page navigation the only concern? For example, the test case in this patch is creating a lot of canvases without navigating pages.

It's a start, in order to get other WebGL endurance tests to run. It's mentioned in the ChangeLog that the test being added doesn't actually exercise the new code path, but needs to be expanded out and more stress tests added.

> > Source/WebCore/bindings/v8/V8Binding.cpp:114
> > +void V8BindingPerIsolateData::setLowMemoryNotificationHint(bool hint)
> > +{
> > +    m_lowMemoryNotificationHint = hint;
> > +}
> > +
> > +bool V8BindingPerIsolateData::getLowMemoryNotificationHint() const
> > +{
> > +    return m_lowMemoryNotificationHint;
> > +}
> 
> Nit: You can write them in the header file.

Done.

> > Source/WebCore/bindings/v8/V8Binding.h:228
> > +        void setLowMemoryNotificationHint(bool);
> > +        bool getLowMemoryNotificationHint() const;
> 
> Nit: I would prefer setLowMemoryNotificationHint(), clearLowMemoryNotificationHint() and isLowMemoryNotificationHint().

Thanks for the suggestion. Done.

> > LayoutTests/fast/canvas/webgl/context-creation-and-destruction.html:12
> > +<div id="description"></div>
> > +<div id="console"></div>
> 
> Nit: These are not needed. Will be added automatically.

Thanks. Removed.
Comment 52 Kenneth Russell 2012-07-24 14:43:33 PDT
Created attachment 154146 [details]
Patch for landing
Comment 53 WebKit Review Bot 2012-07-24 17:37:32 PDT
Comment on attachment 154146 [details]
Patch for landing

Clearing flags on attachment: 154146

Committed r123556: <http://trac.webkit.org/changeset/123556>
Comment 54 WebKit Review Bot 2012-07-24 17:37:46 PDT
All reviewed patches have been landed.  Closing bug.
Comment 55 Andrew Wilson 2012-07-25 12:50:21 PDT
This CL started making a couple of layout tests time out:

BUGCR76225 SLOW WIN MAC : fast/canvas/webgl/context-creation-and-destruction.html = PASS
BUGCR76225 SLOW WIN MAC : platform/chromium/virtual/gpu/fast/canvas/webgl/context-creation-and-destruction.html = PASS

I've marked these tests as SLOW for now, but you might want to look into this to see if perhaps this change is going to cause a real perf regression (anything that impacts our layout tests may also impact real pages).
Comment 57 Kenneth Russell 2012-07-25 13:13:00 PDT
(In reply to comment #55)
> This CL started making a couple of layout tests time out:
> 
> BUGCR76225 SLOW WIN MAC : fast/canvas/webgl/context-creation-and-destruction.html = PASS
> BUGCR76225 SLOW WIN MAC : platform/chromium/virtual/gpu/fast/canvas/webgl/context-creation-and-destruction.html = PASS
> 
> I've marked these tests as SLOW for now, but you might want to look into this to see if perhaps this change is going to cause a real perf regression (anything that impacts our layout tests may also impact real pages).

That test's new in this patch and doesn't actually trigger the full GCs added in this patch. We will need to watch for perf regressions in the real world, though.

Feel free to file a new bug about that timeout and assign it to me.
Comment 58 Kenneth Russell 2012-07-25 13:31:19 PDT
I'm closing this as fixed again. Bug 92287 tracks the slowness of the newly added context-creation-and-destruction test.
Comment 59 Kenneth Russell 2012-12-10 11:38:21 PST
@bajones is currently working on removing this hack by making WebGLRenderingContext an ActiveDOMObject and forcibly losing WebGL contexts when navigating away from a page containing their canvases.