Bug 43516

Summary: REGRESSION: Huge number of memory leaks after enabling MathML
Product: WebKit Reporter: Beth Dakin <bdakin>
Component: MathMLAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, alex, bdakin, darin, hyatt, mrowe, sausset
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Attachments:
Description Flags
Patch
none
Cleaner Patch kenneth: review+

Description Beth Dakin 2010-08-04 15:50:20 PDT
r64589 enabled MathML.  The first build after that on the leak bot was <http://build.webkit.org/results/SnowLeopard%20Intel%20Leaks/r64592%20%289585%29/>, which showed over 80,000 leaks.  Over 350MB of leaks output.

<rdar://problem/8274123>
Comment 1 Alex Milowski 2010-08-05 05:31:08 PDT
OK.  I've asked before about trying to detect leaks.  Pardon my ignorance, but what exactly are we looking at in these DumpRenderTree*-leaks.txt files?  How do I understand what they are saying?

I have a hunch this all has to do with the anonymous render objects created by the MathML rendering code.  RenderMathMLOperator creates quite a few of these and will re-create the glyphs stack (and the anonymous objects) every time layout is called.
Comment 2 Alex Milowski 2010-08-05 06:43:48 PDT
I think this is related to RenderMathMLOperator.  Every call stack has that in it and it creates a lot of anonymous rendering objects for the glyph stacking.
Comment 3 Alex Milowski 2010-08-05 06:54:50 PDT
I believe this is the fix that is needed:


Index: RenderMathMLOperator.cpp
===================================================================
--- RenderMathMLOperator.cpp	(revision 64442)
+++ RenderMathMLOperator.cpp	(working copy)
@@ -121,6 +121,7 @@
     while (firstChild()) {
        RenderObject* obj = firstChild();
        removeChild(obj);
+       obj->destroy();
     }


I just don't have a test system to try this out right now.
Comment 4 Beth Dakin 2010-08-05 10:00:11 PDT
(In reply to comment #3)
> I believe this is the fix that is needed:
> 
> 
> Index: RenderMathMLOperator.cpp
> ===================================================================
> --- RenderMathMLOperator.cpp    (revision 64442)
> +++ RenderMathMLOperator.cpp    (working copy)
> @@ -121,6 +121,7 @@
>      while (firstChild()) {
>         RenderObject* obj = firstChild();
>         removeChild(obj);
> +       obj->destroy();
>      }
> 
> 
> I just don't have a test system to try this out right now.

Hey Alex! I pin-pointed that exact same problematic spot in the code myself. Unfortunately, the fix you suggest (which I came up with and tested last night), causes crashes. I am trying to work through a non-crashy solution now.
Comment 5 Beth Dakin 2010-08-05 10:06:14 PDT
(In reply to comment #1)
> OK.  I've asked before about trying to detect leaks.  Pardon my ignorance, but what exactly are we looking at in these DumpRenderTree*-leaks.txt files?  How do I understand what they are saying?
> 
> I have a hunch this all has to do with the anonymous render objects created by the MathML rendering code.  RenderMathMLOperator creates quite a few of these and will re-create the glyphs stack (and the anonymous objects) every time layout is called.

Seems like you figured this out on your own, but in general, in case you are still curious…

You can get the leaks on your own machine if you want rather than looking at the leaks bot logs by running "run-webkit-tests --leaks". In this case since we know we want to focus on MathML leaks, looking at "run-webkit-tests --leaks mathml/" works even better.

The spew is a backwards backtrace to the point in the stack where memory that has been leaked was first allocated. In this case, it is hard to look through the logs because it looks like we are leaking RenderObjects, so there are some (very helpful) backtraces that show a RenderObject being leaked…but then since the RenderObject is leaked, everything the RenderObject owns is also leaked, so we get a bunch of much-less-helpfull stacktraces showing, for example, a HashMap that has leaked from the Font class. This is less helpful because it is not directly the problem.

I hope that helps.
Comment 6 Alex Milowski 2010-08-05 10:50:56 PDT
(In reply to comment #4)
> 
> Hey Alex! I pin-pointed that exact same problematic spot in the code myself. Unfortunately, the fix you suggest (which I came up with and tested last night), causes crashes. I am trying to work through a non-crashy solution now.

I wish I could be of more help.  I'm just unable to do much without a stable machine to develop on.  Sausset may be taking a look at this.  You might want to confer with him.
Comment 7 Beth Dakin 2010-08-05 11:31:06 PDT
(In reply to comment #3)
> I believe this is the fix that is needed:
> 
> 
> Index: RenderMathMLOperator.cpp
> ===================================================================
> --- RenderMathMLOperator.cpp    (revision 64442)
> +++ RenderMathMLOperator.cpp    (working copy)
> @@ -121,6 +121,7 @@
>      while (firstChild()) {
>         RenderObject* obj = firstChild();
>         removeChild(obj);
> +       obj->destroy();
>      }
> 
> 
> I just don't have a test system to try this out right now.

Okay, I see why this solution crashes now. When you destroy a renderer, it sets its node's renderer to 0. But in MathML operator, the parent operator and all of the children share the same node! So when we try to destroy the children, we end up setting the parent's node's renderer to 0 as well. 

Since I haven't been involved in MathML lo these many months that you all have been working on it, I would love to be filled in on this design. Why does the MathMLOperator share a node with all of its children? Why aren't the children anonymous content instead?
Comment 8 Beth Dakin 2010-08-05 11:36:55 PDT
Index: mathml/RenderMathMLOperator.cpp
===================================================================
--- mathml/RenderMathMLOperator.cpp	(revision 64705)
+++ mathml/RenderMathMLOperator.cpp	(working copy)
@@ -118,10 +118,18 @@
 void RenderMathMLOperator::updateFromElement()
 {
     // clear our children
     while (firstChild()) {
        RenderObject* obj = firstChild();
        removeChild(obj);
+       obj->destroy();
     }
+
+    node()->setRenderer(this);

This fixes the leaks without any crashes. But I would really like the understand the design better before committing it. Are we totally sure we want all of these renderers to share a node?
Comment 9 François Sausset 2010-08-05 11:59:24 PDT
(In reply to comment #8)
> This fixes the leaks without any crashes. But I would really like the understand the design better before committing it. Are we totally sure we want all of these renderers to share a node?

It's Alex's design and I'm not familiar with the tree structure of all the MathML renderers.
So, I think he could answer better than me.

What do you mean by "anonymous content"?
Comment 10 Beth Dakin 2010-08-05 12:25:02 PDT
(In reply to comment #9)
> (In reply to comment #8)
> > This fixes the leaks without any crashes. But I would really like the understand the design better before committing it. Are we totally sure we want all of these renderers to share a node?
> 
> It's Alex's design and I'm not familiar with the tree structure of all the MathML renderers.
> So, I think he could answer better than me.
> 
> What do you mean by "anonymous content"?

Anonymous blocks are usually what we use in the render tree for render objects that are not represented by a node in the DOM tree. There is some more information about it here: http://webkit.org/blog/115/webcore-rendering-ii-blocks-and-inlines/
Comment 11 Beth Dakin 2010-08-05 13:28:38 PDT
Created attachment 63622 [details]
Patch

Here is a patch to fix the leaks. I still think that there are greater architecture issues to be considered here, but in the meantime, we may as well fix the leak in the current architecture.
Comment 12 Beth Dakin 2010-08-05 13:38:49 PDT
Created attachment 63625 [details]
Cleaner Patch
Comment 13 mitz 2010-08-05 13:51:54 PDT
Comment on attachment 63625 [details]
Cleaner Patch

> +    if (RenderObjectChildList* oldChildren = children())

I don’t think this null check is necessary.
Comment 14 Beth Dakin 2010-08-05 14:00:04 PDT
Thanks Kenneth and Dan! http://trac.webkit.org/changeset/64783
Comment 15 Alex Milowski 2010-08-06 04:26:24 PDT
I'm certainly open to other suggestions on how to approach operator stretch.  The general algorithm is if the operator height is above a certain threshold and is a stretchable operator, you stack glyphs using different glyphs for specific parts of the character's normal representation.  For example, a left curly bracket has glyphs for the top, bottom, middle, and extension.

When the element or stretch height changes, which can happen multiple times during layout of a row of mathematical constructs, the stretch height of the operator (i.e. the height of the containing row) will change.  Currently, we deal with this by assuming that the current set of stacked glyphs are invalid.  We then remove them from the render tree and re-stack the glyphs.

In the past, I've had strange issues when using document() for the backing node for the anonymous blocks.  These are briefly described here:

   https://lists.webkit.org/pipermail/webkit-dev/2010-February/011559.html

All these issues went away when I changed the backing node to be the operator node (e.g. mo element) itself.  Those particular issues were for mrow's rendering and I don't know if simply changing the backing node to document() in the glyph stack will make this anymore clear or similar to other render objects.

The technique of glyph stack is a fairly standard way to handle operator stretching when typesetting Mathematics.  In the end, we still need to stack glyphs.  With our current multi-pass row layout and the fact that the DOM can be changed at any time, I don't see how handle a change of height other than to delete the children and start over stacking a new set of glyphs.

I'm open to any suggestions but the current patch seems cleaner than what I had and also doesn't leak memory.  If it passes the tests, in my opinion, we should commit it.
Comment 16 Adam Barth 2010-08-10 23:10:30 PDT
This patch appears to have been landed.
Comment 17 Darin Adler 2010-08-11 09:21:33 PDT
Generally speaking, destroying the render tree every time updateFromElement is called may be simple from an implementation point of view, but is likely to be quite inefficient. I haven’t looked further at the implementation, but this is one of the things Beth might be referring to when she says the architecture may need work.
Comment 18 Alex Milowski 2010-08-11 09:29:51 PDT
(In reply to comment #17)
> Generally speaking, destroying the render tree every time updateFromElement is called may be simple from an implementation point of view, but is likely to be quite inefficient. I haven’t looked further at the implementation, but this is one of the things Beth might be referring to when she says the architecture may need work.

Yet, this is only for the 'mo' element and only in certain situations. If it is not a stretched glyph, there is no glyph stack and the children do not need to be destroyed.  I don't think we've made those optimization checks.  We'd have to determine from the stretch height has changed such that the glyph is stretched differently and so the children are invalid.

Keep in mind that the 'mo' element only contains text children and is only expected to contain a single unicode character (the operator).  The children of this are either (A) a single RenderText instance or (B) a stack of glyphs implemented as a sequence of vertically stacked RenderBlock instances.

As I've said before, vertical glyph stacking is a standard technique here.  If there was a way to write vertical line boxes with the text progression direction being orthogonal (e.g. left to right versus the top-to-bottom stacking), we might be able to do things differently.
Comment 19 François Sausset 2010-08-11 10:09:44 PDT
For sure, frequent allocations/deallocations are not good for performance.
But as Alex stated, it happens only in specific cases and it is not straightforward to use another method.

A least, we could check if the stretch height changed before destroying everything.

To finish, there is so much to do with MathML implementation that optimizations are not our top priority right now, but we should be more and more careful about that, as the current implementation is definitely not optimal from a performance point of view (see bug 43462, where it should be great to find the bottlenecks).