Bug 55440

Summary: improve layout performance by reducing the traversal time of the floating objects
Product: WebKit Reporter: Yuqiang Xian <yuqiang.xian>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, commit-queue, darin, eric, hyatt, mrowe, sam, staikos
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: All   
URL: http://ie.microsoft.com/testdrive/Performance/MazeSolver/Default.html
Attachments:
Description Flags
the patch
sam: review-
updated patch addressing comment #8
darin: review-, darin: commit-queue-
patch addressing comment #10
none
patch addressing comment #12 none

Description Yuqiang Xian 2011-02-28 19:19:13 PST
The WebKit layout engine performs bad on the case of MazeSolver from Microsoft (http://ie.microsoft.com/testdrive/Performance/MazeSolver/Default.html), which is caused by large overhead on traversing the floating objects list in logicalLeftOffsetForLine() and logicalRightOffsetForLine() especially when the list becomes enormous, for example in the default 30x30 maze there're >3700 floating objects. When placing a new floating object the entire list (from begin to end) is traversed for multiple times, and it requires O(n^2) time when we do a re-layout to place all the floating objects.
We need to improve such performance issue either 1) by improving the efficiency of the traversal (possibly by using other data structures to hold the floating objects), or 2) by reducing the chances to do the traversal. 
There's a low hanging fruit for approach 2) which is especially applicable in logicalLeftOffsetForLine and logicalRightOffsetForLine. As the two routines either cares about FloatLeft objects or FloatRight objects only, if we know there's no corresponding type floating objects in the list we can avoid the traversal actually. One thing we could do is to record the number of FloatLeft objects and the number of FloatRight objects and add a check before doing the traversal. This can improve the 30x30 MazeSolver performance by 80~90% measured on N450 Netbook MeeGo using latest Chromium browser 11 (i.e. from 503s to 269s to resolve the 30x30 maze). 
We could also think about traversing from the last visited object instead of from the beginning of the list each time when placing a new floating object, as long as we know that the state of the existing objects in the list doesn't change since last traversal, which could ideally convert the O(n^2) to O(n) when placing all the floating objects. But this is a bit tricky and needs further investigation.
Comment 1 Yuqiang Xian 2011-02-28 19:29:13 PST
Created attachment 84173 [details]
the patch
Comment 2 Yuqiang Xian 2011-02-28 19:31:18 PST
Add CC.
Comment 3 Mark Rowe (bdash) 2011-03-01 04:24:27 PST
Maybe I’m missing something, but how is going from 503s to 269s a 80~90% improvement?  That seems like a 45% improvement to me.
Comment 4 Yuqiang Xian 2011-03-01 06:17:51 PST
(In reply to comment #3)
> Maybe I’m missing something, but how is going from 503s to 269s a 80~90% improvement?  That seems like a 45% improvement to me.

Sorry for confusing you but I think it depends on how we describe the speedup. Yes the time is reduced by 45%, but considering the speed it's a 80~90% improvement. Just like we see "2X speedup" more often than "50% improvement" when the time to complete a job is shortened from 2s to 1s.
Comment 5 Mark Rowe (bdash) 2011-03-01 11:58:10 PST
(In reply to comment #4)
> (In reply to comment #3)
> > Maybe I’m missing something, but how is going from 503s to 269s a 80~90% improvement?  That seems like a 45% improvement to me.
> 
> Sorry for confusing you but I think it depends on how we describe the speedup. Yes the time is reduced by 45%, but considering the speed it's a 80~90% improvement. Just like we see "2X speedup" more often than "50% improvement" when the time to complete a job is shortened from 2s to 1s.

The change from 269 to 503 is ~85%. The change from 503 to 269 is -45%.  It seems nonsensical to describe the speedup in terms of what regression it would take to get back to the original performance.
Comment 6 Yuqiang Xian 2011-03-01 16:34:16 PST
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > Maybe I’m missing something, but how is going from 503s to 269s a 80~90% improvement?  That seems like a 45% improvement to me.
> > 
> > Sorry for confusing you but I think it depends on how we describe the speedup. Yes the time is reduced by 45%, but considering the speed it's a 80~90% improvement. Just like we see "2X speedup" more often than "50% improvement" when the time to complete a job is shortened from 2s to 1s.
> 
> The change from 269 to 503 is ~85%. The change from 503 to 269 is -45%.  It seems nonsensical to describe the speedup in terms of what regression it would take to get back to the original performance.

OK. Can we just say that the proposed patch reduces the time from 503s to 269s and not to stick to the debate of how speedup is calculated?
Back to the issue itself, can anybody help review if this is a issue that needs to be addressed, and if current quick fix reasonable to do? Also I'd appreciate any discussions on how we will do in long term to address the floating objects performance issue when the list becomes enormous like the one demonstrated in this Microsoft test case.
Comment 7 Mark Rowe (bdash) 2011-03-01 16:36:22 PST
(In reply to comment #6)
> OK. Can we just say that the proposed patch reduces the time from 503s to 269s and not to stick to the debate of how speedup is calculated?

You’re free to say whatever you’d like so long as it is accurate.
Comment 8 Sam Weinig 2011-03-02 16:00:07 PST
Comment on attachment 84173 [details]
the patch

This will increase the size of all RenderBlocks which would be a pretty big memory regression. One option is to only allocate this memory when there are floating objects (allocate it with  m_floatingObjects).
Comment 9 Yuqiang Xian 2011-03-03 05:31:49 PST
Created attachment 84547 [details]
updated patch addressing comment #8

Introduce the left and right floats counters only when there're floating objects in the RenderBlock, by wrapping the counters and the floating object list in one structure which is allocated on demand.
Comment 10 Darin Adler 2011-03-03 11:23:14 PST
Comment on attachment 84547 [details]
updated patch addressing comment #8

View in context: https://bugs.webkit.org/attachment.cgi?id=84547&action=review

Seems like a good approach, but needs some style fixes.

> Source/WebCore/rendering/RenderBlock.h:716
> -    OwnPtr<FloatingObjectSet> m_floatingObjects;
> +    struct FloatingObjects {
> +        FloatingObjects()

I don’t think the functions in this class should all be defined inside the class definition. That just makes the class harder to read. I suggest putting inline function bodies elsewhere. Since they are only used in the one .cpp file, they can be inside the .cpp file and not in the header at all.

> Source/WebCore/rendering/RenderBlock.h:723
> +        virtual ~FloatingObjects() {ASSERT(m_floatingObjectSet); delete m_floatingObjectSet;}

It is incorrect to mark this virtual. Makes the entire object bigger and makes calling the destructor slower for no reason.

And you won’t need this destructor if you take my advice below about m_floatingObjectSet.

Formatting without spaces is wrong for our project.

> Source/WebCore/rendering/RenderBlock.h:729
> +            m_leftFloatsCount = 0;

I don’t think the same class should call floating objects both “floating objects” and “floats”. We should choose one name or the other.

> Source/WebCore/rendering/RenderBlock.h:741
> +        void increaseFloatsCount(FloatingObject::Type type)
> +        {
> +            type == FloatingObject::FloatLeft ? m_leftFloatsCount++ : m_rightFloatsCount++;
> +        }
> +        void decreaseFloatsCount(FloatingObject::Type type)
> +        {
> +            type == FloatingObject::FloatLeft ? m_leftFloatsCount-- : m_rightFloatsCount--;
> +            ASSERT(m_leftFloatsCount >= 0 && m_rightFloatsCount >= 0);
> +        }

I think these would read much better with if statements or switch statements. Using a ? : expression for its side effects is not a conventional readable coding style.

> Source/WebCore/rendering/RenderBlock.h:743
> +        bool hasLeftFloats() {return m_leftFloatsCount > 0;}

WebKit style puts spaces after these.

> Source/WebCore/rendering/RenderBlock.h:746
> +        FloatingObjectSet* m_floatingObjectSet;

This should be a set, not a pointer to a set. If it was a pointer it should be OwnPtr.

This should probably be called just m_set. It’s already in a class named FloatingObjects.

Data members of a struct should not have the m_ prefix. We use those only for private data members in classes, not public data members like these. Maybe this should be a class. I’m not sure clients need direct access to the floating object counts.

> Source/WebCore/rendering/RenderBlock.h:748
> +        int m_leftFloatsCount;
> +        int m_rightFloatsCount;

These should be unsigned integers, not signed. And maybe private as I mentioned above.

> Source/WebCore/rendering/RenderBlock.h:751
> +    FloatingObjectSet* getFloatingObjectSet() const {return m_floatingObjects ? m_floatingObjects->m_floatingObjectSet : 0;}

In the WebKit project we don’t use get in the function names of this type of function.

Further, the callers of this function all already have explicit checks to see if m_floatingObjects is 0, so you are adding a double check for null everywhere. I think you should probably dump this function and just use:

    m_floatingObjects->set()

everwherer. Or m_floatingObjects->set if you keep the set a public data member.
Comment 11 Yuqiang Xian 2011-03-03 18:30:38 PST
Created attachment 84677 [details]
patch addressing comment #10

Thanks a lot for the comments from Darin. This new patch addresses those comments.
Comment 12 Darin Adler 2011-03-03 18:36:40 PST
Comment on attachment 84677 [details]
patch addressing comment #10

View in context: https://bugs.webkit.org/attachment.cgi?id=84677&action=review

> Source/WebCore/rendering/RenderBlock.h:729
> +        FloatingObjectSet* set() const { return m_set.get(); }

Would be better to return a reference to the set instead of a pointer. Also no reason for this to be const.

> Source/WebCore/rendering/RenderBlock.h:732
> +        OwnPtr<FloatingObjectSet> m_set;

Why did you chose to use an OwnPtr here instead of just putting the set in? This should just be:

    FloatingObjectSet m_set;

> Source/WebCore/rendering/RenderBlock.h:734
> +        unsigned int m_leftObjectsCount;
> +        unsigned int m_rightObjectsCount;

Should just be unsigned, not unsigned int, given WebKit coding style.
Comment 13 Yuqiang Xian 2011-03-03 18:46:54 PST
Thanks for the reivew.

(In reply to comment #12)
> (From update of attachment 84677 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=84677&action=review
> 
> > Source/WebCore/rendering/RenderBlock.h:729
> > +        FloatingObjectSet* set() const { return m_set.get(); }
> 
> Would be better to return a reference to the set instead of a pointer. Also no reason for this to be const.

OK. Will modify it.

> 
> > Source/WebCore/rendering/RenderBlock.h:732
> > +        OwnPtr<FloatingObjectSet> m_set;
> 
> Why did you chose to use an OwnPtr here instead of just putting the set in? This should just be:
> 
>     FloatingObjectSet m_set;
> 

As the original code uses the FloatingObjectSet as an OwnPtr and I didn't want to change that. But yes it can be just an instance inside. Will modify it.

> > Source/WebCore/rendering/RenderBlock.h:734
> > +        unsigned int m_leftObjectsCount;
> > +        unsigned int m_rightObjectsCount;
> 
> Should just be unsigned, not unsigned int, given WebKit coding style.

Will modify it.
Comment 14 Yuqiang Xian 2011-03-03 19:23:48 PST
Created attachment 84682 [details]
patch addressing comment #12

Modify code to address comment #12 from Darin. Thanks for review.
Comment 15 Darin Adler 2011-03-04 12:06:11 PST
Comment on attachment 84682 [details]
patch addressing comment #12

View in context: https://bugs.webkit.org/attachment.cgi?id=84682&action=review

Can we double check that increase/decrease is called in all the right places?

I’m going to say review+ even though I think we may want a little more refinement.

> Source/WebCore/rendering/RenderBlock.cpp:6243
> +inline void RenderBlock::FloatingObjects::clear()

If these are marked inline, I think they need to be in the file *before* the first call to the function. Otherwise on at least some compilers I think this won’t be inlined.

> Source/WebCore/rendering/RenderBlock.cpp:6250
> +inline void RenderBlock::FloatingObjects::increaseObjectsCount(FloatingObject::Type type)

Ditto.

> Source/WebCore/rendering/RenderBlock.cpp:6258
> +inline void RenderBlock::FloatingObjects::decreaseObjectsCount(FloatingObject::Type type)

Ditto.

> Source/WebCore/rendering/RenderBlock.h:721
> +        FloatingObjects()
> +            : m_leftObjectsCount(0)
> +            , m_rightObjectsCount(0)
> +        {
> +        }

Ideally I’d like to see this function body inside the cpp file too; I think the class is easier to read without the function body here.
Comment 16 WebKit Commit Bot 2011-03-04 15:07:30 PST
Comment on attachment 84682 [details]
patch addressing comment #12

Clearing flags on attachment: 84682

Committed r80380: <http://trac.webkit.org/changeset/80380>
Comment 17 WebKit Commit Bot 2011-03-04 15:07:37 PST
All reviewed patches have been landed.  Closing bug.
Comment 18 Yuqiang Xian 2011-03-04 16:00:47 PST
(In reply to comment #15)
> (From update of attachment 84682 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=84682&action=review
> 
> Can we double check that increase/decrease is called in all the right places?
> 
> I’m going to say review+ even though I think we may want a little more refinement.
> 

Thanks for the review. Yes I've checked the places of adding/removing the floating objects. There're 3 "add" and 2 "remove" in RenderBlock.cpp and nothing elsewhere. The corresponding increase/decrease is called in such place. 
Maybe in the future we could make that be automatically performed, for example bridging the add/remove calls through FloatingObjects. I just didn't do that this time because there're many other functions which add or remove objects in FloatingObjectSet class and if we bridge all of them the FloatingObjects class would become somewhat complex.
I'll address the refinement in another change.