Bug 107074

Summary: Remove NodeListsNodeData when it's no longer needed
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: DOMAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, andersca, benjamin, darin, dpranke, eric, esprehn, kling, koivisto, ojan.autocc, sam, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 106586    
Attachments:
Description Flags
Fixes the bug
darin: review+
Perf. test results (open Memory tab) and look at Malloc values none

Description Ryosuke Niwa 2013-01-16 17:48:41 PST
Don't keep NodeListsNodeData forever.
Comment 1 Ryosuke Niwa 2013-01-16 18:14:08 PST
Created attachment 183081 [details]
Fixes the bug
Comment 2 Ryosuke Niwa 2013-01-16 18:15:20 PST
Created attachment 183082 [details]
Perf. test results (open Memory tab) and look at Malloc values
Comment 3 Ryosuke Niwa 2013-01-16 18:16:25 PST
This simple patch reduces the memory usage of the micro benchmark by 3%. Given that NodeListsNodeData is much smaller than NodeRareData, removing NodeRareData may reduce the overall memory consumption by as much as 6%-8%.
Comment 4 Darin Adler 2013-01-17 09:22:11 PST
Comment on attachment 183081 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=183081&action=review

Seems OK, although I have a few substantive thoughts below.

Somewhere there should be a comment about how we decided to delete these when no longer needed rather than keeping them around to optimize re-using them later. It’s the performance test, but something in the code should talk about how we made the tradeoff, if possible.

> Source/WebCore/ChangeLog:10
> +        If we detect that we have only one node lists left in the data structure,

Should be “node list” instead of “node lists”.

> Source/WebCore/ChangeLog:20
> +        (Node):

Bogus line should be removed. Or fix the script so it doesn’t add these.

> Source/WebCore/ChangeLog:26
> +        (NodeListsNodeData):

Bogus line should be removed. Or fix the script so it doesn’t add these.

> Source/WebCore/dom/NodeRareData.h:74
> +        if (!removeThisIfOnlyOneListIsLeft(list->ownerNode()))
> +            m_childNodeList = 0;

I think I’d like these better if they used an early return statement instead of nesting the remaining code. The logic is “if I can do this just by self destructing, then do that and return, otherwise do it the normal way”.

> Source/WebCore/dom/NodeRareData.h:225
> +    bool removeThisIfOnlyOneListIsLeft(Node*);

This function is confusing for a few reasons. The idea of this function is that we call it just before removing a list, which is why it checks for one remaining list rather than zero. That was completely non-obvious to me and I had to read the patch over and over again to figure that out. So I think that either the function name or a comment needs to explain that.

The trickiness and danger of using this function without looking at the boolean result would be clearer if it used the word “delete” instead of “remove”. On the other hand, “delete” does make it sound like this just deletes the object rather than cleanly removing it from the node.

Maybe a better name is deleteThisIfAboutToRemoveLastList? Or maybe selfDestructIfAboutToRemoveLastList? Maybe you can come up with an even better name.

I’m also not sure the factoring is quite right. It seems awkward to call this class to remove a list, then have this class recover the node pointer so it can do work on the node’s rare data. It’s kind of a messy relationship between the node, rare data, and lists classes. I understand that this is an easy place to correctly make the change, so I see the attraction of doing it this way, but the need to recover the owner node pointer is a factoring danger sign.

> Tools/Scripts/webkitpy/performance_tests/perftest.py:209
>          # Following is for Parser/html-parser.html
> -        re.compile(re.escape("""CONSOLE MESSAGE: Blocked script execution in 'html-parser.html' because the document's frame is sandboxed and the 'allow-scripts' permission is not set.""")),
> +        re.compile(r"CONSOLE MESSAGE: Blocked script execution in '[A-Za-z0-9\-\.]+' because the document's frame is sandboxed and the 'allow-scripts' permission is not set."),

Comment now seems wrong.
Comment 5 Ryosuke Niwa 2013-01-17 12:28:19 PST
(In reply to comment #4)
> (From update of attachment 183081 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=183081&action=review
> 
> Seems OK, although I have a few substantive thoughts below.
> 
> Somewhere there should be a comment about how we decided to delete these when no longer needed rather than keeping them around to optimize re-using them later. It’s the performance test, but something in the code should talk about how we made the tradeoff, if possible.

I forgot to mention this but this was the intended design back in the days although it never worked. I removed the code in http://trac.webkit.org/changeset/120367/trunk/Source/WebCore/dom/Node.cpp but the intent had always been to remove this data structure when we no longer need them. I'll clarify it in the change log.

> > Source/WebCore/ChangeLog:10
> > +        If we detect that we have only one node lists left in the data structure,
> 
> Should be “node list” instead of “node lists”.
> 
> > Source/WebCore/ChangeLog:20
> > +        (Node):
> 
> Bogus line should be removed. Or fix the script so it doesn’t add these.
> 
> > Source/WebCore/ChangeLog:26
> > +        (NodeListsNodeData):
> 
> Bogus line should be removed. Or fix the script so it doesn’t add these.

Fixed.

> > Source/WebCore/dom/NodeRareData.h:74
> > +        if (!removeThisIfOnlyOneListIsLeft(list->ownerNode()))
> > +            m_childNodeList = 0;
> 
> I think I’d like these better if they used an early return statement instead of nesting the remaining code. The logic is “if I can do this just by self destructing, then do that and return, otherwise do it the normal way”.
> 
> > Source/WebCore/dom/NodeRareData.h:225
> > +    bool removeThisIfOnlyOneListIsLeft(Node*);
> 
> This function is confusing for a few reasons. The idea of this function is that we call it just before removing a list, which is why it checks for one remaining list rather than zero. That was completely non-obvious to me and I had to read the patch over and over again to figure that out. So I think that either the function name or a comment needs to explain that.

The original design was to let each function delete the last time, and then removeThisIfPossible to check whether all hash maps & m_childNodes has been cleared or not That's definitely a much simpler design but incurs a runtime cost when we're deleting the last time.

> The trickiness and danger of using this function without looking at the boolean result would be clearer if it used the word “delete” instead of “remove”. On the other hand, “delete” does make it sound like this just deletes the object rather than cleanly removing it from the node.
> 
> Maybe a better name is deleteThisIfAboutToRemoveLastList? Or maybe selfDestructIfAboutToRemoveLastList? Maybe you can come up with an even better name.

I'm gonna go extra verbose and call it deleteThisAndUpdateNodeRareDataIfAboutToRemoveLastList.

> I’m also not sure the factoring is quite right. It seems awkward to call this class to remove a list, then have this class recover the node pointer so it can do work on the node’s rare data. It’s kind of a messy relationship between the node, rare data, and lists classes. I understand that this is an easy place to correctly make the change, so I see the attraction of doing it this way, but the need to recover the owner node pointer is a factoring danger sign.

Yeah, it's quite messy. On the other hand, making each node list calling clearNodeLists looked fragile. It's also important that we assert the list we're removing to be in the node lists because we've had quite few bugs where we had a mismatch in the past.

> 
> > Tools/Scripts/webkitpy/performance_tests/perftest.py:209
> >          # Following is for Parser/html-parser.html
> > -        re.compile(re.escape("""CONSOLE MESSAGE: Blocked script execution in 'html-parser.html' because the document's frame is sandboxed and the 'allow-scripts' permission is not set.""")),
> > +        re.compile(r"CONSOLE MESSAGE: Blocked script execution in '[A-Za-z0-9\-\.]+' because the document's frame is sandboxed and the 'allow-scripts' permission is not set."),
> 
> Comment now seems wrong.

Oops, you're right. Fixed.
Comment 6 Ryosuke Niwa 2013-01-17 16:37:13 PST
Committed r140070: <http://trac.webkit.org/changeset/140070>
Comment 7 Ryosuke Niwa 2013-01-18 01:53:31 PST
The bug 107239 tracks an unrelated crash.