Bug 68944 - REGRESSION(80893): Reduce virtual dispatch overheads in CSSStyleSelector
Summary: REGRESSION(80893): Reduce virtual dispatch overheads in CSSStyleSelector
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Luke Macpherson
URL:
Keywords:
Depends on: 70009 69374
Blocks: 54707
  Show dependency treegraph
 
Reported: 2011-09-27 15:34 PDT by Eric Seidel (no email)
Modified: 2011-10-24 19:37 PDT (History)
9 users (show)

See Also:


Attachments
instruments trace from r96142 (352.19 KB, application/zip)
2011-09-27 15:36 PDT, Eric Seidel (no email)
no flags Details
Local copy of html spec for testing (1.95 MB, application/zip)
2011-09-27 15:37 PDT, Eric Seidel (no email)
no flags Details
Benchmark, but doesn't seem to test CSS as expected (3.70 KB, patch)
2011-09-28 16:26 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
revert all properties back to the giant switch (somewhat incomplete) (196.78 KB, patch)
2011-10-04 05:51 PDT, Antti Koivisto
no flags Details | Formatted Diff | Diff
Remove all virtual dispatch from CSSStyleApplyProperty (obsolete) (118.55 KB, patch)
2011-10-04 16:58 PDT, Luke Macpherson
no flags Details | Formatted Diff | Diff
Remove all virtual dispatch from CSSStyleApplyProperty (Passing tests). (129.29 KB, patch)
2011-10-04 21:58 PDT, Luke Macpherson
no flags Details | Formatted Diff | Diff
Remove virtual dispatch (updated to patch cleanly again) (129.28 KB, patch)
2011-10-12 23:01 PDT, Luke Macpherson
no flags Details | Formatted Diff | Diff
Patch (130.61 KB, patch)
2011-10-19 19:58 PDT, Luke Macpherson
no flags Details | Formatted Diff | Diff
Patch (121.21 KB, patch)
2011-10-19 20:20 PDT, Luke Macpherson
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2011-09-27 15:34:39 PDT
REGRESSION(80893): HTML5 spec takes 2s longer to load due to time spent in CSSStyleSelector

This is based on https://bugs.webkit.org/show_bug.cgi?id=54707#c61.

I've confirmed locally that CSSStyleSelector::applyProperty is the hottest symbol for this page (when loading from disk).  Luke is investigating.
Comment 1 Eric Seidel (no email) 2011-09-27 15:36:45 PDT
Created attachment 108915 [details]
instruments trace from r96142
Comment 2 Eric Seidel (no email) 2011-09-27 15:37:43 PDT
Created attachment 108917 [details]
Local copy of html spec for testing
Comment 3 Luke Macpherson 2011-09-27 16:41:57 PDT
I have uploaded a similar trace, along with some analysis to 54707.
Comment 4 Eric Seidel (no email) 2011-09-28 16:26:48 PDT
Created attachment 109086 [details]
Benchmark, but doesn't seem to test CSS as expected
Comment 5 Eric Seidel (no email) 2011-09-29 13:58:59 PDT
In my latest testing with Shark on my MPB i5, I no longer see CSSStyleSelector::applyProperty as the top sample when loading html5 from disk.

Finishing up peer reviews today (as have been distracting all googlers for the last 2 weeks).  Will update with more numbers, and hopefully finally a benchmark we can run against revisions pre/post the accused regression tomorrow.

Thank you again for your patience.
Comment 6 Antti Koivisto 2011-09-30 09:38:36 PDT
You can't do meaningful comparisons between different revisions as the more properties have been added to the new mechanism gradually and there have been many other changes that affect the amount of style applying done.

The only real way to compare is to implement the full revert patch and test that with the same base revision.
Comment 7 Eric Seidel (no email) 2011-10-03 13:34:56 PDT
Bug 69285 was another attempt at a microbenchmark.

I'm able to reproduce CSS stymbols showing up as hot when loading the HTML5 spec locally from disk in safari.  applyProperty is no longer the hotest symbol.

I'm still trying to come up with a consistent benchmark so we can actually make useful assertions about performance here.
Comment 8 Eric Seidel (no email) 2011-10-03 18:10:47 PDT
I spent the day investigating getting a consistent benchmark we can use to decide if there is a regression, and if so, how much.

I seem to be running into some sort of interaction with layout timers, or possibly some sort of "wait until the document is done before respecting stylesheets" trouble.

Sometimes LineBreaker::nextLineBreak is by far the hotest symbol.  It is always the hottest symbol if your benchmark writes everything into the document and then does one final style resolve at once.

Sometimes SelectorChecker::checkOneSelector (and other style-related) symbols are hot.  These normally show up as the hotest symbols when loading the html5 spec from disk in Safari (which I assume is what Antti was seeing when he initially reported this issue).  I cannot consistently get these symbols to appear hot in my benchmark, but often they do.

Sometimes RenderQuote::renderIsRemovedFromTree is the hottest symbol (mostly because my benchmark currently tears down the tree it creates each iteration, and includes that time in the statistics).  I'm not sure my current methodology of including the teardown is correct, but it does point out that RenderQuote::renderIsRemovedFromTree is O(T) where T is the size of the document, instead of being O(Q) where Q would be the number of quote elements.

I'm off to dinner now but I'll post my work-in-progress benchmark on a separate bug tomorrow.
Comment 9 Antti Koivisto 2011-10-04 05:49:53 PDT
I made a revert patch. It is the applyProperty() implementation from the revision prior to the first commit of CSSStyleApplyProperty, fixed up to work on the current TOT. It is slightly incomplete as some newer properties are missing. It hasn't been checked for correctness.

I see ~10% reduction in total time below applyProperty (~100ms our of ~1s) when loading a local copy of the HTML5 spec. In switch version about ~50% of this time is self-time in applyProperty so the overhead from the new mechanism can be estimated to be ~20% of the function execution time.
Comment 10 Antti Koivisto 2011-10-04 05:51:46 PDT
Created attachment 109612 [details]
revert all properties back to the giant switch (somewhat incomplete)
Comment 11 Eric Seidel (no email) 2011-10-04 11:39:32 PDT
Thank you for the patch antti.

I'm still struggling to make a repeatable number.  I assume you're just using a stopwatch and loading the HTML5 spec from disk?
Comment 12 Antti Koivisto 2011-10-04 11:46:29 PDT
(In reply to comment #11)
> Thank you for the patch antti.
> 
> I'm still struggling to make a repeatable number.  I assume you're just using a stopwatch and loading the HTML5 spec from disk?

I'm taking an Instruments sample over reload of the spec from the disk (with the portion of the script that shows the "slow load" popup disabled).
Comment 13 Eric Seidel (no email) 2011-10-04 15:50:40 PDT
So I ran Antti's patch through the new benchmark landed from bug 69374, and these were my results:

(For my testing I tweaked the current checked-in copy of PerformanceTests/Parser/html5-full-render.html to use 500k chunks instead of 750k chunks per https://bugs.webkit.org/show_bug.cgi?id=69374#c7):

r96649 (500k chunks) + antti patch
Testing 6092696 byte document in 13 500000 byte chunks.
Running 2 times
Ignoring warm-up run (48727)
47711
47973

avg 47842
median 47842
stdev 131
min 47711
max 47973

r96649 (500k chunks):
Testing 6092696 byte document in 13 500000 byte chunks.
Running 2 times
Ignoring warm-up run (45955)
45061
45491

avg 45276
median 45276
stdev 215
min 45061
max 45491

I didn't believe that antti's patch could be slower, so I ran the baseline again:

r96649 (500k chunks):
Testing 6092696 byte document in 13 500000 byte chunks.
Running 2 times
Ignoring warm-up run (46319)
45529
45189

avg 45359
median 45359
stdev 170
min 45189
max 45529

Still not believing that antti's patch could be slower, I increased the number of chunks (which from previous testing in instruments, leads me to believe that it should cause us to spend more time in style resolution relative to line layout):

r96649 (300k chunks):
Testing 6092696 byte document in 21 300000 byte chunks.
Running 2 times
Ignoring warm-up run (150514)
149989
149982

avg 149985.5
median 149985.5
stdev 3.5
min 149982
max 149989

r96649 (300k chunks) + antti's patch:
Testing 6092696 byte document in 21 300000 byte chunks.
Running 2 times
Ignoring warm-up run (154387)
152189
150989

avg 151589
median 151589
stdev 600
min 150989
max 152189

Still a 2s negative.  Not sure.  Maybe something funny is going on.  I would have expected the change (if any) between before/after antti's patch to be proportional to total time, not a fixed constant increase.


This may mean that my benchmark is not an accurate representation of time spent loading the HTML5 page.  I would strongly encourage any interested parties to come up with a better benchmark.  We certainly could use more! :)


This is the patch I had applied locally when testing:

diff --git a/PerformanceTests/Parser/html5-full-render.html b/PerformanceTests/Parser/html5-full-render.html
index 446a14d..58e9676 100644
--- a/PerformanceTests/Parser/html5-full-render.html
+++ b/PerformanceTests/Parser/html5-full-render.html
@@ -11,7 +11,7 @@ var chunks = [];
 // Larger chunk sizes will show more samples in line layout.
 // Smaller chunk sizes run slower overall, as the per-chunk overhead is high.
 // Testing on my machine has shown that we need 10-15 chunks before style resolution is always the top sample.
-var chunkSize = 750000; // 8mb / 750k = approx 12 chunks.
+var chunkSize = 500000; // 6.09mb / 500k = approx 13 chunks (thus 13 forced layouts/style resolves).
 var chunkCount = Math.ceil(spec.length / chunkSize);
 for (var chunkIndex = 0; chunkIndex < chunkCount; chunkIndex++) {
     var chunk = spec.substring(chunkIndex * chunkSize, chunkSize);
@@ -36,7 +36,7 @@ function loadChunkedSpecIntoIframe(iframe) {
         // Note that we won't cause a style resolve until we've encountered the <body> element.
         // Thus the number of chunks counted above is not exactly equal to the number of style resolves.
         if (iframe.contentDocument.body)
-            iframe.contentDocument.body.clientHeight; // Force a full style-resolve.
+            iframe.contentDocument.body.clientHeight; // Force a full layout/style-resolve.
     }
 
     iframe.contentDocument.close();
Comment 14 Eric Seidel (no email) 2011-10-04 15:59:55 PDT
As of http://trac.webkit.org/changeset/96658 the default chunkSize is now 500k (which matches the testing I did above with Antti's patch).

I would be very interested to know if others are able to replicate my results, and more generally if they find the benchmark useful or accurate!

There are numerous possibly juicy symbols which show up when sampling with instruments, including setAnimationValue and RenderQuote::removedFromTree which one might not initially expect to be hot. :)


I attempted to pull r80892 (one revision before Luke's first change) and test on my Lion box, but I was unable to get any of those old revisions to build.  I think mostly due to llvm compile errors.  I didn't investigate long.  It's possible someone could get r80892 or earlier to build and then we'd have a performance number to compare to tip of tree.  That wouldn't tell us what performance difference may have been caused by Luke's change, but it would give us some concept as to if there had been an overall regression in HTML5 spec page load time since before Luke's changes.
Comment 15 Luke Macpherson 2011-10-04 16:58:39 PDT
Created attachment 109718 [details]
Remove all virtual dispatch from CSSStyleApplyProperty (obsolete)
Comment 16 Early Warning System Bot 2011-10-04 17:16:01 PDT
Comment on attachment 109718 [details]
Remove all virtual dispatch from CSSStyleApplyProperty (obsolete)

Attachment 109718 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/9953296
Comment 17 WebKit Review Bot 2011-10-04 17:23:51 PDT
Comment on attachment 109718 [details]
Remove all virtual dispatch from CSSStyleApplyProperty (obsolete)

Attachment 109718 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9942541
Comment 18 Luke Macpherson 2011-10-04 21:58:50 PDT
Created attachment 109744 [details]
Remove all virtual dispatch from CSSStyleApplyProperty (Passing tests).
Comment 19 Eric Seidel (no email) 2011-10-05 18:02:20 PDT
Luke:

If we believe this benchmark to be sound (which is yet to be proven, but is the current working theory).  Then we could end all this confusion by getting numbers from before your very first change.

Basically you'd need to do something like this:

cp -r PerformanceTests/Parser ./Benchmark
build-webkit && run-safari ./Bechmark/html5-full-render.html (to get tip-of-tree results for your test machine)
svn update -r 80892 && build-webkit && run-safari ./Bechmark/html5-full-render.html (to get a number from before your very first change).

If tip-of-tree is already faster than before your change, than we're done.  We can close this bug as fixed.

Otherwise the goal of all this work is to make the number for tip of tree as-fast-or-faster than the benchmark number from the revision before your first change.  At that point there will be no question as to whether any of your changes result in a regression or not.
Comment 20 Luke Macpherson 2011-10-05 18:14:37 PDT
Yeah, I spent a few days last week trying to get that to work, but new safari doesn't run with old webkit revisions since the switch to webkit2. This makes it hard to get an apples-apples comparison.

I'll look into downgrading safari to see if that can be workable, but unfortunately it's not quite so simple as reverting (or using bisect to pull down an old build).

I'll also post up some numbers for the virtual dispatch removal patch I posted yesterday to see how it compares with ToT.
Comment 21 Eric Seidel (no email) 2011-10-05 18:20:45 PDT
One last note about the benchmark:

If the benchmark is working correctly, it should show CSSStyleSelector symbols as the hottest symbols when running against current tip of tree.

It was designed to have a very similar instruments profile to loading the HTML5 spec directly in Safari.  At time of writing, the instruments profile looks very similar between the benchmark and loading the HTML5 spec in Safari.  My belief is that improvements to the benchmark score should have a very similar effect in improvements to total HTML5 spec load time in a browser (but testing with the benchmark is infinitely easier, more repeatable, and more precise).

Again, thanks for your work on this.  Best of luck.
Comment 22 Luke Macpherson 2011-10-05 23:30:19 PDT
run-safari PerformanceTests/Parser/html5-full-render.html

ToT:
Ignoring warm-up run (44544)
43468
43612

avg 43540
median 43540
stdev 72
min 43468
max 43612

With patch 109744:
Ignoring warm-up run (43861)
43296
43412

avg 43354
median 43354
stdev 58
min 43296
max 43412

The non-virtual-dispatch is slightly faster here, but I think more runs might be necessary to increase confidence.
Comment 23 Luke Macpherson 2011-10-06 17:30:07 PDT
I ran again with 20 iterations. Standard deviation is still a bit high, but judging by the medians it looks like removing the virtual dispatch gives around a 300ms (~0.7%) speedup.

Tot:
Testing 6092696 byte document in 13 500000 byte chunks.
Running 20 times
Ignoring warm-up run (45682)
44763 44732 44978 44921 43547 43064 43411 43473 43249 43554
43243 43348 44291 44962 43756 43319 43450 43335 43482 43584

avg 43823.1
median 43514.5
stdev 651.2832640257233
min 43064
max 44978

Patch without virtual dispatch:
Testing 6092696 byte document in 13 500000 byte chunks.
Running 20 times
Ignoring warm-up run (44155)
43399 43284 43277 43147 43377 43075 43395 43172 43217 43203
43212 43231 43205 42989 43189 43195 43219 43247 43123 43094

avg 43212.5
median 43208.5
stdev 100.96113113470946
min 42989
max 43399
Comment 24 Antti Koivisto 2011-10-07 02:17:14 PDT
(In reply to comment #19)
> If tip-of-tree is already faster than before your change, than we're done.  We can close this bug as fixed.
> 
> Otherwise the goal of all this work is to make the number for tip of tree as-fast-or-faster than the benchmark number from the revision before your first change.  At that point there will be no question as to whether any of your changes result in a regression or not.

I'm not sure why the focus is on "revision before the change".

1. No such revision exists. The initial commit introduced very few properties to the new path, more have been added over many moths.
2. Comparing any numbers between ToT and some version from past only tells you if ToT is faster than that revision. It doesn't tell you if a particular change made things faster or slower.

I haven't looked into this benchmark yet. If the results are surprising we need to understand why before drawing conclusions.
Comment 25 Luke Macpherson 2011-10-12 23:01:36 PDT
Created attachment 110801 [details]
Remove virtual dispatch (updated to patch cleanly again)
Comment 26 Eric Seidel (no email) 2011-10-12 23:07:32 PDT
Comment on attachment 110801 [details]
Remove virtual dispatch (updated to patch cleanly again)

View in context: https://bugs.webkit.org/attachment.cgi?id=110801&action=review

> Source/WebCore/css/CSSStyleApplyProperty.cpp:125
> +         const CSSStyleApplyProperty& table = CSSStyleApplyProperty::sharedCSSStyleApplyProperty();
> +         if (one != CSSPropertyInvalid) {
> +             const PropertyHandler& handlerOne = table.propertyHandler(one);
> +             if (handlerOne.isValid())
> +                 handlerOne.applyValue(selector, value);
> +         }
> +         if (two != CSSPropertyInvalid) {
> +             const PropertyHandler& handlerTwo = table.propertyHandler(two);
> +             if (handlerTwo.isValid())
> +                 handlerTwo.applyValue(selector, value);
> +         }
> +         if (three != CSSPropertyInvalid) {
> +             const PropertyHandler& handlerThree = table.propertyHandler(three);
> +             if (handlerThree.isValid())
> +                 handlerThree.applyValue(selector, value);
> +         }
> +         if (four != CSSPropertyInvalid) {
> +             const PropertyHandler& handlerFour = table.propertyHandler(four);
> +             if (handlerFour.isValid())
> +                 handlerFour.applyValue(selector, value);
> +         }
> +     }

I'm very confused by this.  Confused why it's needed, and why it wouldnt' be a loop.
Comment 27 Luke Macpherson 2011-10-13 15:14:45 PDT
Comment on attachment 110801 [details]
Remove virtual dispatch (updated to patch cleanly again)

View in context: https://bugs.webkit.org/attachment.cgi?id=110801&action=review

>> Source/WebCore/css/CSSStyleApplyProperty.cpp:125
>> +     }
> 
> I'm very confused by this.  Confused why it's needed, and why it wouldnt' be a loop.

I wanted (needed?) to make the property handlers template parameters.
1) This allows the compiler to generate roughly optimal code over the runtime loop. Some compilers I noticed even compile-time-checked (and presumably generated code for) the table lookups.
2) It means that no auxilliary storage for the expanded property ids is necessary. This would have meant adding unused parameters to apply*Value everywhere, and increased the size of the PropertyHandler class.
Comment 28 Eric Seidel (no email) 2011-10-19 15:56:59 PDT
Comment on attachment 110801 [details]
Remove virtual dispatch (updated to patch cleanly again)

I can't tell if net code size is getting smaller or larger.  I also can't tell if the one, two, three, four if branches which you repeat are going to be one, two, three, four, five... etc. in some later patch.
Comment 29 Luke Macpherson 2011-10-19 19:58:06 PDT
Created attachment 111712 [details]
Patch
Comment 30 Eric Seidel (no email) 2011-10-19 20:04:40 PDT
Comment on attachment 111712 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=111712&action=review

This seems like a wash for readability.  I would encourage you to mention the performance gains from (and hence justification for) this patch.

> Source/WebCore/css/CSSStyleSelector.cpp:3757
> +        ASSERT_NOT_REACHED();
>      case CSSPropertyDirection:
> +        ASSERT_NOT_REACHED();
>      case CSSPropertyBackgroundAttachment:
> +        ASSERT_NOT_REACHED();
>      case CSSPropertyBackgroundClip:
> +        ASSERT_NOT_REACHED();
>      case CSSPropertyWebkitBackgroundClip:
> +        ASSERT_NOT_REACHED();
>      case CSSPropertyWebkitBackgroundComposite:
> +        ASSERT_NOT_REACHED();
>      case CSSPropertyBackgroundOrigin:
> +        ASSERT_NOT_REACHED();

Why not have these all cascade to an aSSERT_NOT_REACHED like before?  these added ASSERTS aren't actually buying you anything, since youre' still cascading through each of these (and just asserting N times) in release mode.
Comment 31 Luke Macpherson 2011-10-19 20:20:15 PDT
Created attachment 111716 [details]
Patch
Comment 32 Eric Seidel (no email) 2011-10-19 20:24:48 PDT
Comment on attachment 111716 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=111716&action=review

> Source/WebCore/css/CSSStyleApplyProperty.cpp:77
> +        const CSSStyleApplyProperty& table = CSSStyleApplyProperty::sharedCSSStyleApplyProperty();
> +        const PropertyHandler& handler = table.propertyHandler(id);

Are we trading virtual dispatch for our own table-based dispatch?

> Source/WebCore/css/CSSStyleApplyProperty.cpp:192
> +Color defaultInitialColor();
> +Color defaultInitialColor() { return Color(); }

Why declare and define separately?
Comment 33 Luke Macpherson 2011-10-19 20:38:54 PDT
Comment on attachment 111716 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=111716&action=review

>> Source/WebCore/css/CSSStyleApplyProperty.cpp:77
>> +        const PropertyHandler& handler = table.propertyHandler(id);
> 
> Are we trading virtual dispatch for our own table-based dispatch?

Sort-of. Before we had a table look up for the class, followed by a virtual dispatch. Now we have just a single table lookup.
The cost of that lookup looks favorable when compared to a switch statement in the profiling results I've seen so far.

>> Source/WebCore/css/CSSStyleApplyProperty.cpp:192
>> +Color defaultInitialColor() { return Color(); }
> 
> Why declare and define separately?

I wonder the same thing. The compiler was issuing a warning about not having a prototype, so this was the easy fix. I still don't understand why this was necessary though.
Comment 34 Eric Seidel (no email) 2011-10-19 20:59:05 PDT
Comment on attachment 111716 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=111716&action=review

>>> Source/WebCore/css/CSSStyleApplyProperty.cpp:192
>>> +Color defaultInitialColor() { return Color(); }
>> 
>> Why declare and define separately?
> 
> I wonder the same thing. The compiler was issuing a warning about not having a prototype, so this was the easy fix. I still don't understand why this was necessary though.

Does the static keyword help?
Comment 35 Luke Macpherson 2011-10-19 21:13:15 PDT
(In reply to comment #34)
> (From update of attachment 111716 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=111716&action=review
> 
> >>> Source/WebCore/css/CSSStyleApplyProperty.cpp:192
> >>> +Color defaultInitialColor() { return Color(); }
> >> 
> >> Why declare and define separately?
> > 
> > I wonder the same thing. The compiler was issuing a warning about not having a prototype, so this was the easy fix. I still don't understand why this was necessary though.
> 
> Does the static keyword help?

Using static gives a different error:
‘WebCore::defaultInitialColor’ is not a valid template argument for type ‘WebCore::Color (*)()’ because function ‘WebCore::Color WebCore::defaultInitialColor()’ has not external linkage

Pretty weird huh, but that's why I ended up with a prototype there.
Comment 36 Antti Koivisto 2011-10-24 06:43:47 PDT
Comment on attachment 111716 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=111716&action=review

Looks like a good improvement. r=me

> Source/WebCore/css/CSSStyleApplyProperty.cpp:275
> +template <Length (RenderStyle::*getterFunction)() const,
> +          void (RenderStyle::*setterFunction)(Length),
> +          Length (*initialFunction)(),
> +          LengthAuto autoEnabled = AutoDisabled,
>            LengthIntrinsic intrinsicEnabled = IntrinsicDisabled,
>            LengthMinIntrinsic minIntrinsicEnabled = MinIntrinsicDisabled,
>            LengthNone noneEnabled = NoneDisabled,
>            LengthUndefined noneUndefined = UndefinedDisabled,
>            LengthFlexDirection flexDirection = FlexDirectionDisabled>
> -class ApplyPropertyLength : public ApplyPropertyDefaultBase<Length> {
> +class ApplyPropertyLength {
>  public:
> -    ApplyPropertyLength(GetterFunction getter, SetterFunction setter, InitialFunction initial)
> -        : ApplyPropertyDefaultBase<Length>(getter, setter, initial)
> -    {
> -    }
> -
> -private:
> -    virtual void applyValue(CSSStyleSelector* selector, CSSValue* value) const
> +    static void setValue(RenderStyle* style, Length value) { (style->*setterFunction)(value); }

Have you verified that the function pointers actually get optimized away and the RenderStyle setter functions inline using this syntax?

> Source/WebCore/css/CSSStyleApplyProperty.cpp:1019
> +    setPropertyHandler(CSSPropertyPaddingTop, ApplyPropertyLength<&RenderStyle::paddingTop, &RenderStyle::setPaddingTop, &RenderStyle::initialPadding>::createHandler());
> +    setPropertyHandler(CSSPropertyPaddingRight, ApplyPropertyLength<&RenderStyle::paddingRight, &RenderStyle::setPaddingRight, &RenderStyle::initialPadding>::createHandler());
> +    setPropertyHandler(CSSPropertyPaddingBottom, ApplyPropertyLength<&RenderStyle::paddingBottom, &RenderStyle::setPaddingBottom, &RenderStyle::initialPadding>::createHandler());

It would be nice to refactor this in the future so that it used a static table as input. This is pretty unreadable.
Comment 37 Antti Koivisto 2011-10-24 06:48:28 PDT
You should also update the bug title (2s was the total time under style applying, not the assumed regression, and is currently much less due other optimizations. Also the regression from refactoring was less than i orginally thought, ~10-20% from my later measurements).
Comment 38 Luke Macpherson 2011-10-24 18:32:38 PDT
(In reply to comment #36)
> Have you verified that the function pointers actually get optimized away and the RenderStyle setter functions inline using this syntax?

I don't think that is always the case, but I'm not 100% sure if I'm reading the assembly correctly. In some cases I see a call in the assembly, but sometimes not. I assume the compiler has some heuristics for deciding when to inline.
Comment 39 WebKit Review Bot 2011-10-24 19:37:35 PDT
Comment on attachment 111716 [details]
Patch

Clearing flags on attachment: 111716

Committed r98310: <http://trac.webkit.org/changeset/98310>
Comment 40 WebKit Review Bot 2011-10-24 19:37:47 PDT
All reviewed patches have been landed.  Closing bug.