Bug 64262

Summary: Small speed up, which switches some virtual functions to inline ones.
Product: WebKit Reporter: Tamas Czene <tczene>
Component: CSSAssignee: Nobody <webkit-unassigned>
Status: RESOLVED WONTFIX    
Severity: Enhancement CC: ap, benjamin, dglazkov, eric, hyatt, jamesr, loki, sam, tonikitoo, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 64862, 64863, 64865    
Bug Blocks:    
Attachments:
Description Flags
Small speed up, which switches some virtual functions to inline ones.
zherczeg: review-
Archive of layout-test-results from ec2-cr-linux-01
none
Small speed up, which switches some virtual functions to inline ones.
none
Small speed up, which switches some virtual functions to inline ones.
benjamin: review-, hyatt: commit-queue-
Small speed up, which switches some virtual functions to inline ones.
eric: review-, webkit.review.bot: commit-queue-
Archive of layout-test-results from ec2-cr-linux-03
none
Idea for comments
none
Methanol results none

Description Tamas Czene 2011-07-11 04:56:42 PDT
Removing the virtual functions - removing the calls - makes measurable performance progression on CSS.

We measured the following performance progressions on some very popular websites, for example:
twitter.com: 5.5%
youtube.com: 7.4%
yahoo.com: 6.5%
Comment 1 Tamas Czene 2011-07-11 05:02:01 PDT
Created attachment 100268 [details]
Small speed up, which switches some virtual functions to inline ones.
Comment 2 WebKit Review Bot 2011-07-11 05:33:34 PDT
Comment on attachment 100268 [details]
Small speed up, which switches some virtual functions to inline ones.

Attachment 100268 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9012419

New failing tests:
css2.1/t1202-counter-16-f.html
css2.1/t1202-counters-03-b.html
css2.1/t1202-counter-12-b.html
css2.1/t1202-counter-08-b.html
css2.1/t1202-counter-02-b.html
css2.1/t1202-counter-14-b.html
css2.1/t1202-counter-00-b.html
css2.1/t1202-counter-09-b.html
css2.1/t1202-counter-15-b.html
css2.1/t1202-counter-05-b.html
css2.1/t1202-counter-07-b.html
css2.1/t1202-counter-06-b.html
css2.1/t1202-counter-01-b.html
css2.1/t1202-counter-03-b.html
css2.1/t1202-counters-02-b.html
css2.1/t1202-counter-04-b.html
css2.1/t1202-counters-01-b.html
css2.1/t1202-counter-13-b.html
css2.1/t1202-counter-11-b.html
css2.1/t1202-counters-00-b.html
Comment 3 WebKit Review Bot 2011-07-11 05:33:38 PDT
Created attachment 100283 [details]
Archive of layout-test-results from ec2-cr-linux-01

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-01  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 4 Gabor Loki 2011-07-11 05:49:26 PDT
Comment on attachment 100268 [details]
Small speed up, which switches some virtual functions to inline ones.

View in context: https://bugs.webkit.org/attachment.cgi?id=100268&action=review

> Source/WebCore/css/CSSImageGeneratorValue.h:70
> +    bool m_isFixedSize;

You should initialize it.

> Source/WebCore/css/CSSPrimitiveValue.cpp:115
>  CSSPrimitiveValue::CSSPrimitiveValue()

Could we move all CSSPrimitiveValue::CSSPrimitiveValue constructors into the header file?

> Source/WebCore/css/CSSPrimitiveValue.h:189
> +    bool m_isQuirkValue;

It should be protected.

> Source/WebCore/css/CSSValue.h:77
> +    bool m_isPrimitiveValue;

It should be protected as well.

> Source/WebCore/css/CSSValue.h:81
> +    CSSValue()
> +        :m_isPrimitiveValue(false)
> +    {
> +    }

It is not in WebKit style (the check-webkit-style should complain about it).
Please, move up the constructor after 'public:' and add space between ':' and m_isPrimitiveValue
Comment 5 Zoltan Herczeg 2011-07-11 06:05:42 PDT
Comment on attachment 100268 [details]
Small speed up, which switches some virtual functions to inline ones.

View in context: https://bugs.webkit.org/attachment.cgi?id=100268&action=review

> Source/WebCore/css/CSSImageGeneratorValue.h:53
>      virtual IntSize fixedSize(const RenderObject*) { return IntSize(); }

Such function must remain virtal, as it can be called from its base class (if exists), or from its descendants. I would not overwrite them except if we got a permission from the author of those functions, and done them one-by-one.

There are several others like this.
Comment 6 Gabor Loki 2011-07-11 06:45:35 PDT
> >      virtual IntSize fixedSize(const RenderObject*) { return IntSize(); }

I guess you wanted to point to 'virtual bool isFixedSize() const { return false; }' line. ;)

> Such function must remain virtal, as it can be called from its base class (if exists), or from its descendants. I would not overwrite them except if we got a permission from the author of those functions, and done them one-by-one.
> 
> There are several others like this.

Well, this code is 3 years old. Since then there were no attempt to introduce new descendant or to extend this virtual function. So, I do not expect this patch will crash a rapidly changing API. However, we can ask the author about this code.

David, what do you think about Tamas's change?
Comment 7 Alexey Proskuryakov 2011-07-11 10:16:53 PDT
Besides measuring speedup, one needs to consider memory use impact. How did you measure it?
Comment 8 Tamas Czene 2011-07-12 00:52:38 PDT
(In reply to comment #7)
> Besides measuring speedup, one needs to consider memory use impact. How did you measure it?

I made the measurements with Methanol, which measures page loads to locally stored sites. The measures were 5*10, and I analysed the average of these.
Binary size:
   References: 32804896
   Modified: 32800323
I will measures the change of the memory usage.
Comment 9 Tamas Czene 2011-07-15 00:51:24 PDT
The size of modified objects is increased by 4 bytes (measured at runtime)

The number of variables created during the test.
m_isPrimitiveValue 261596
m_isQuirkValue 229806 
m_isFixedSize 725

The max number of variables in the memory during the test.
m_isPrimitiveValue 468 
m_isQuirkValue 2749
m_isFixedsize 57
Comment 10 Tamas Czene 2011-07-15 06:16:40 PDT
Created attachment 100965 [details]
Small speed up, which switches some virtual functions to inline ones.
Comment 11 Tamas Czene 2011-07-18 05:09:46 PDT
Created attachment 101148 [details]
Small speed up, which switches some virtual functions to inline ones.

We measured the following performance progressions on some very popular websites, for example:
twitter.com: 5.5%
youtube.com: 7.4%
yahoo.com: 6.5%

The number of variables created during the test.
m_isPrimitiveValue 261596
m_isQuirkValue 229806 
m_isFixedSize 725

The max number of variables in the memory during the test.
m_isPrimitiveValue 468 
m_isQuirkValue 2749
m_isFixedsize 57
Comment 12 Benjamin Poulain 2011-07-18 05:29:50 PDT
Comment on attachment 100965 [details]
Small speed up, which switches some virtual functions to inline ones.

Marking obsolete, it seems to be the same as the last one.

You should try "webkit-patch upload" to update your patch, it obsoletes old patches for you.
Comment 13 Benjamin Poulain 2011-07-18 05:40:29 PDT
Comment on attachment 101148 [details]
Small speed up, which switches some virtual functions to inline ones.

View in context: https://bugs.webkit.org/attachment.cgi?id=101148&action=review

You are changing from virtual function to protected attribute for 3 features:
::isFixedSize()
::isQuirkValue()
::isPrimitiveValue()

In each case, you are increasing complexity of the code, and increasing the memory consumption per instance.

I would prefer to see 3 patches, one for each value. With numbers for performance and memory usage (ideally 3 bugs on bugzilla).

When changing something used all over the place like CSSValue, I think it is better to do small changes with a good rationale.

> Source/WebCore/css/CSSCanvasValue.h:52
> +        m_isFixedSize = true;

This should use the constructor initialiazation: m_isFixedSize(true);

> Source/WebCore/css/CSSPrimitiveValue.h:210
> +        m_isPrimitiveValue = true;

Initialization...

> Source/WebCore/css/CSSPrimitiveValue.h:220
> +        m_isPrimitiveValue = true;

Initialization...

> Source/WebCore/css/CSSPrimitiveValue.h:229
> +        m_isPrimitiveValue = true;

Initialization...

> Source/WebCore/css/CSSPrimitiveValue.h:243
> +        m_isPrimitiveValue = true;

Initialization...

> Source/WebCore/css/CSSPrimitiveValue.h:251
> +        m_isPrimitiveValue = true;

Initialization...

> Source/WebCore/css/CSSQuirkPrimitiveValue.h:43
> +        m_isQuirkValue = true;

Constructor initialization: m_isQuirkValue(true);
Comment 14 Gabor Loki 2011-07-18 06:49:37 PDT
> > Source/WebCore/css/CSSCanvasValue.h:52
> > +        m_isFixedSize = true;
> 
> This should use the constructor initialiazation: m_isFixedSize(true);

You cannot initialize any member of the base class from the delivered class like this. ;)

There are two popular ways to initialize the base members from delivered class:
 (1) using an assignment in the delivered constructor (as Tamas did) or
 (2) calling the base constructor with an extra parameter from the delivered constructor

Probably you wanted to suggest the second approach.
For example:
class CSSValue : public RefCounted<CSSValue> {
public:
  CSSValue(isPrimitiveValue = false)
    : m_isPrimitiveValue(isPrimitiveValue)
...
protected:
  bool m_isPrimitiveValue;
);
...
CSSPrimitiveValue(int ident)
  : CSSValue(true)
  , m_isQuirkValue(false)
  , m_type(CSS_IDENT)
  , m_hasCachedCSSText(false)
{
    m_value.ident = ident;
}
... etc ...
Comment 15 Benjamin Poulain 2011-07-18 07:54:31 PDT
(In reply to comment #14)
> > This should use the constructor initialiazation: m_isFixedSize(true);

My bad, I was not thinking straight, that code is obviously in the subclass.


> CSSPrimitiveValue(int ident)
>   : CSSValue(true)
>   , m_isQuirkValue(false)
>   , m_type(CSS_IDENT)
>   , m_hasCachedCSSText(false)
> {
>     m_value.ident = ident;
> }
> ... etc ...

Not necessarily. This make the internal variables m_isPrimitiveValue visible through the constructor which is not necessarily a good thing.

My comments were just wrong in this case. My point was that I would like this code to be split. I had a quick skim over the patch afterwards but I overlooked which variables come from which class. Just ignore me there :)
Comment 16 Dave Hyatt 2011-07-20 13:24:23 PDT
Comment on attachment 101148 [details]
Small speed up, which switches some virtual functions to inline ones.

I just don't believe these performance claims. Can you explain your methodology here? I'm not interested in taking a memory footprint hit unless the performance claims are authentic. I'm extremely skeptical that virtualization is costing us as much as you claim. I think something is flawed with your testing.
Comment 17 Tamas Czene 2011-07-25 00:36:24 PDT
This patch, i used gprof values. example:  WebCore::CSSStyleApplyProperty::applyValue(CSSStyleSelector* selector, CSSValue* value) const  cumulative seconds 158.50 self seconds 0.26 

If we remove virtual function we will makes small speed up.
example:
Virtual function:

#include <climits>
using namespace std;

class BaseClass {
public:
  int k;
  BaseClass() { 
     k = 6; 
  }
  virtual int myFunction(int i)
  {
    return k+i;        
  }
};

int main()
{
 BaseClass p;
 int a;
for(int j=0;j<300;j++)
 for(int i=0; i<INT_MAX;i++)
    a+=i+p.myFunction(i);
 return a;
}


czene@czene-CELSIUS-W ~/Asztal $ g++ -O3 virtual.cpp -o virtual
czene@czene-CELSIUS-W ~/Asztal $ time ./virtual 

real        6m6.756s
user        2m30.241s
sys        3m17.356s

Inline:

#include <climits>
using namespace std;

class BaseClass {
public:
  int k;
  BaseClass() { 
     k = 6; 
  }
  inline int myFunction(int i)
  {
    return k+i;        
  }
};

int main()
{
 BaseClass p;
 int a;
for(int j=0;j<300;j++)
 for(int i=0; i<INT_MAX;i++)
    a+=i+p.myFunction(i);
 return a;
}

czene@czene-CELSIUS-W ~/Asztal $ g++ -O3 inline.cpp -o inline
czene@czene-CELSIUS-W ~/Asztal $ time ./inline 

real        6m1.851s
user        2m29.433s
sys        3m17.628s


The msn.com value not valid, i copied wrong text in the comment.
Comment 18 Tamas Czene 2011-07-28 05:58:22 PDT
Created attachment 102248 [details]
Small speed up, which switches some virtual functions to inline ones.

We measured the following performance progressions on some very popular websites:
Qt-Linuxon:
twitter.com: 5.5%
youtube.com: 7.4%
Mac:
twitter.com: 6.9%
youtube.com: 7%
Comment 19 WebKit Review Bot 2011-07-28 06:31:09 PDT
Comment on attachment 102248 [details]
Small speed up, which switches some virtual functions to inline ones.

Attachment 102248 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9265203

New failing tests:
css2.1/t1202-counter-16-f.html
css2.1/t1202-counters-03-b.html
css2.1/t1202-counter-12-b.html
css2.1/t1202-counter-08-b.html
css2.1/t1202-counter-02-b.html
css2.1/t1202-counter-14-b.html
css2.1/t1202-counter-00-b.html
css2.1/t1202-counter-09-b.html
css2.1/t1202-counter-15-b.html
css2.1/t1202-counter-05-b.html
css2.1/t1202-counter-07-b.html
css2.1/t1202-counter-06-b.html
css2.1/t1202-counter-01-b.html
css2.1/t1202-counter-03-b.html
css2.1/t1202-counters-02-b.html
css2.1/t1202-counter-04-b.html
css2.1/t1202-counters-01-b.html
css2.1/t1202-counter-13-b.html
css2.1/t1202-counter-11-b.html
css2.1/t1202-counters-00-b.html
Comment 20 WebKit Review Bot 2011-07-28 06:31:14 PDT
Created attachment 102252 [details]
Archive of layout-test-results from ec2-cr-linux-03

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-03  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 21 Gabor Loki 2011-07-28 06:39:32 PDT
> Qt-Linuxon:
> twitter.com: 5.5%
> youtube.com: 7.4%


> Mac:

It was measured on Mac Pro 2x2.93GHz 6 Core Intel Xeon with native WK.

> twitter.com: 6.9%
> youtube.com: 7%

The benchmark engine was the Methanol (https://gitorious.org/methanol)
which can measure the page loading time of mirrored sites.

Tamas, did you see any performance regression?
Comment 22 Benjamin Poulain 2011-07-29 08:16:08 PDT
Created attachment 102363 [details]
Idea for comments

I have been thinking about this issue.

On one side, I understand how this can have a dramatic impact on performance. On the other side, I really do not like the patch because it is creating instance variable for something that should really be a class property.

Ideally, we would attach the booleans/enums to the vtable to the class but C++ does not provide anything to do that as far as I know. So a solution to have both speed and tight memory usage is to make a vtable-like structure ourself :-D
That should reduce the generated code size in this case, keep the same memory consumption as currently, and recuce the number of CPU cycles to access those properties.

The cost is:
-very C-like code, quite harder to read and edit than C++.
-less type checking at compile time, like instanciating a class with pure virtual functions

That is just an idea for comments. I attached a quick mockup of what CSSValue would look like.
I am willing to make such changes if we agree this kind of patch can land (and of course iff that change is indeed a significant speed improvement).
Comment 23 Sam Weinig 2011-07-29 08:27:52 PDT
I, like Dave, don't understand the testing methodology here.  Is this a speed up on page load time?
Comment 24 Benjamin Poulain 2011-07-29 08:30:04 PDT
I think so :) :

> The benchmark engine was the Methanol (https://gitorious.org/methanol)
> which can measure the page loading time of mirrored sites.
Comment 25 Benjamin Poulain 2011-07-29 08:35:11 PDT
> We measured the following performance progressions on some very popular websites:
> Qt-Linuxon:
> twitter.com: 5.5%
> youtube.com: 7.4%
> Mac:
> twitter.com: 6.9%
> youtube.com: 7%

Is that absolute time or valgrind CPU's cycles count? That seems like a  lot if this is actual time.
Comment 26 Gabor Loki 2011-07-29 09:25:53 PDT
> I, like Dave, don't understand the testing methodology here.  Is this a speed up on page load time?

Methanol measures only the page loads, nothing else.
Lets see how you can reproduce this numbers. ;)

1.- Make a local mirror of a webpage (for example: http://gitorious.org/methanol/methanol/trees/master/sample/openbsd.org )
2.- Add an event listener - which can call back the Methanol's engine - to that local webpage (see here: http://gitorious.org/methanol/methanol/blobs/master/sample/openbsd.org/index.html#line1 )
3.- Register the relative path of that website in the test container (lets say: sample.js).
4.- Open Methanol's URL from the browser which loads the local website several times.
5.- Each page loads are saved separately and the final page (results.html) shows the average of page loads and the standard deviation as well.

So, at the end we have the average of page loads (in millisecond). No gprof, valgrind or such thing is needed.


Tamas's percentages are computed with the following formula:
  ((old value) / (new value) -1).
For example: (100 ms)/(89 ms)-1 ~= 12.35%

I hope that I have been a help.
Comment 27 Sam Weinig 2011-07-29 10:12:07 PDT
(In reply to comment #26)
> > I, like Dave, don't understand the testing methodology here.  Is this a speed up on page load time?
> 
> Methanol measures only the page loads, nothing else.
> Lets see how you can reproduce this numbers. ;)
> 
> 1.- Make a local mirror of a webpage (for example: http://gitorious.org/methanol/methanol/trees/master/sample/openbsd.org )
> 2.- Add an event listener - which can call back the Methanol's engine - to that local webpage (see here: http://gitorious.org/methanol/methanol/blobs/master/sample/openbsd.org/index.html#line1 )
> 3.- Register the relative path of that website in the test container (lets say: sample.js).
> 4.- Open Methanol's URL from the browser which loads the local website several times.
> 5.- Each page loads are saved separately and the final page (results.html) shows the average of page loads and the standard deviation as well.

Do you have the complete set of data including the standard deviation?  The issue is that these results are very surprising; these functions have not traditionally shown us as this hot.
Comment 28 Gabor Loki 2011-07-29 11:38:11 PDT
I guess Tamas will be able to provide the raw page load data.
Comment 29 Tamas Czene 2011-08-01 04:45:19 PDT
Created attachment 102510 [details]
Methanol results
Comment 30 Benjamin Poulain 2011-08-01 08:43:33 PDT
I figured we might as weel get more data so I ran our "cycler" benchmark on my phone :)

Without patch:
        benchmark: cycler_one_qnetworkaccessmanager::load       http://www.facebook.com/
                mean: 309 msecs +/- 32 msecs, +/- 10.356 %
                avg:  318 msecs
                        407 311 309 310 308 309 305 311 307 309 
        benchmark: cycler_one_qnetworkaccessmanager::load       http://www.twitter.com/
                mean: 514 msecs +/- 6 msecs, +/- 1.16732 %
                avg:  512 msecs
                        517 514 507 504 509 506 514 511 519 521 
        benchmark: cycler_one_qnetworkaccessmanager::load       http://www.youtube.com/
                mean: 1930 msecs +/- 3 msecs, +/- 0.15544 %
                avg:  1930 msecs
                        1924 1931 1929 1928 1929 1927 1934 1937 1930 1931 
        benchmark: cycler_one_qnetworkaccessmanager::load       https://twitter.com/?lang=en&logged_out=1#!/search/webkit
                mean: 30 msecs +/- 11 msecs, +/- 36.6667 %
                avg:  23 msecs
                        14 31 15 33 16 30 15 31 15 32 


With patch: 
        benchmark: cycler_one_qnetworkaccessmanager::load       http://www.facebook.com/
                mean: 310 msecs +/- 31 msecs, +/- 10 %
                avg:  321 msecs
                        401 309 305 310 307 309 305 320 329 316 
        benchmark: cycler_one_qnetworkaccessmanager::load       http://www.twitter.com/
                mean: 511 msecs +/- 4 msecs, +/- 0.782779 %
                avg:  510 msecs
                        514 511 504 505 506 509 513 511 515 513 
        benchmark: cycler_one_qnetworkaccessmanager::load       http://www.youtube.com/
                mean: 1899 msecs +/- 4 msecs, +/- 0.210637 %
                avg:  1898 msecs
                        1889 1899 1900 1901 1896 1898 1898 1903 1895 1905 
        benchmark: cycler_one_qnetworkaccessmanager::load       https://twitter.com/?lang=en&logged_out=1#!/search/webkit
                mean: 31 msecs +/- 11 msecs, +/- 35.4839 %
                avg:  23 msecs
                        15 31 15 31 15 31 16 31 15 31 

So basically no difference.
I really doubt this patch improves loading performance. Maybe you should try to write a test page that keep changing the CSS values of the Elements to demonstrate how much time can be taken by CSSValue?
Comment 31 Eric Seidel (no email) 2011-09-12 17:03:24 PDT
Comment on attachment 102248 [details]
Small speed up, which switches some virtual functions to inline ones.

Memory.
Comment 32 Tamas Czene 2011-09-14 03:35:16 PDT
This patch uses too much memory and it doesnt worth the effort.
Comment 33 Alexey Proskuryakov 2011-09-14 08:19:29 PDT
Marking WONTFIX per above comments.