Bug 119625 - [Performance]making HTMLSelectElement::usesMenuList() inline
Summary: [Performance]making HTMLSelectElement::usesMenuList() inline
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-09 05:53 PDT by Santosh Mahto
Modified: 2013-08-12 21:30 PDT (History)
6 users (show)

See Also:


Attachments
Patch (3.05 KB, patch)
2013-08-09 06:46 PDT, Santosh Mahto
no flags Details | Formatted Diff | Diff
Patch (3.15 KB, patch)
2013-08-09 10:45 PDT, Santosh Mahto
no flags Details | Formatted Diff | Diff
Patch (3.14 KB, patch)
2013-08-09 11:57 PDT, Santosh Mahto
benjamin: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Santosh Mahto 2013-08-09 05:53:23 PDT
Note: Related to Benjamin Blog ( https://www.webkit.org/blog/)  for improving performance of webcore 

The function HTMLSelectElement::usesMenuList() is used to decide how to render the select element(RenderBox or MenuList)
It has been used lot of places (22 times) in same class .
Its good to declare this  as inline ,
The size of this function is unusually big as in "every times  theme checking" is done.


So futher to improve performance I have adjusted the code to make it inline and  theme checking only once.
Comment 1 Santosh Mahto 2013-08-09 06:46:24 PDT
Created attachment 208430 [details]
Patch
Comment 2 Santosh Mahto 2013-08-09 10:45:57 PDT
Created attachment 208441 [details]
Patch
Comment 3 Sam Weinig 2013-08-09 11:35:39 PDT
Does this actually speed anything up?
Comment 4 Santosh Mahto 2013-08-09 11:57:20 PDT
Created attachment 208444 [details]
Patch
Comment 5 Santosh Mahto 2013-08-10 00:24:47 PDT
(In reply to comment #3)
> Does this actually speed anything up?

Isn't function usesMenuList() will take less machine code now .
+ inlining the function will reduce the machine cycle.

This function had useless overhead of theme checking every time.

it will definitely up this function performance logically. which is used many times(22) in HTMLSelectElement class. 
I think any improvement(minor/major) in WebCore is really good, as  it is difficult  for big architectural changes to improve performance.

This is just way of improving performance by Thinking a bit low level as explained by Benjamin in his blog.
Comment 6 Sam Weinig 2013-08-10 15:01:17 PDT
(In reply to comment #5)
> (In reply to comment #3)
> > Does this actually speed anything up?
> 
> Isn't function usesMenuList() will take less machine code now .
> + inlining the function will reduce the machine cycle.
> 
> This function had useless overhead of theme checking every time.
> 
> it will definitely up this function performance logically. which is used many times(22) in HTMLSelectElement class. 
> I think any improvement(minor/major) in WebCore is really good, as  it is difficult  for big architectural changes to improve performance.
> 
> This is just way of improving performance by Thinking a bit low level as explained by Benjamin in his blog.

Have you measured an improvement (speed / binary size/ etc), or are you just guessing?  Is usesMenuList() a hot function?
Comment 7 Benjamin Poulain 2013-08-10 15:21:17 PDT
Comment on attachment 208444 [details]
Patch

> > Does this actually speed anything up?
> 
> Isn't function usesMenuList() will take less machine code now .
> + inlining the function will reduce the machine cycle.

The burden of proof is on the one making the change. You should benchmark your changes and provide the numbers for the improvement.

> This is just way of improving performance by Thinking a bit low level as explained by Benjamin in his blog.

I think you are misunderstanding the outcome of your change.

Previously, all the calls to usesMenuList() were made in one instruction:
    call _Whatever_The_Symbol_is_for_usesMenuList.

With your change, you will get something like this:
    move 1 to r0
    load byte from m_isThemeSpecifyMenuRendering to registerX
    compare registerX with 0
    jump to (fail) if flag Equal is Set
    load byte m_multiple to registerX
    compare m_multiple with 0
    jump to (fail) if flag NonEqual is Set 
    load byte from m_size to registerX
    compare with 1
    jump to (fail) if flag BiggerThan is Set
    non conditional jump to (end).

    fail:
    move 0 to r0

    end:
    ret

Some architecture can make multiple of those pseudo instructions in one machine instruction, but no architecture will do all of that in just 1 instruction like the function call was doing.

My blog warn about this kind of changes. Sometimes we are too eager to inline things, and that make the code worse.
Comment 8 Santosh Mahto 2013-08-11 09:41:29 PDT
(In reply to comment #7)

> (From update of attachment 208444 [details])

> The burden of proof is on the one making the change. You should benchmark your changes and provide the numbers for the improvement.
This patch is more of code refactoring with a goal of better function complexity.
Keeping [performance] in title is misguiding as I found now.Because patch is not of any benchmark improvement.
I ask sorry for this crime.

Please ignore the patch and close the bug if this refactoring is not allowed now.
.
> 
> Previously, all the calls to usesMenuList() were made in one instruction:
>     call _Whatever_The_Symbol_is_for_usesMenuList. 
 
> With your change, you will get something like this:
>     move 1 to r0
>     load byte from m_isThemeSpecifyMenuRendering to registerX


I agree code size will increase with  inlining (although runtime speed will be better , no fucntion call)

We can avoid making it inline.
What is your view on avoiding inlining but keeping other modification.
I have removed theme style access everytime in that fucntion which is redundant for each call.
Comment 9 Benjamin Poulain 2013-08-11 21:08:45 PDT
(In reply to comment #8)
> What is your view on avoiding inlining but keeping other modification.
> I have removed theme style access everytime in that fucntion which is redundant for each call.

I think it makes the class worse.

You are adding one more binary state to a class, and it is effectively a cold cache for the result of a virtual function of an other class.
In the context of managing software complexity, this change is not in the interest of the project.
Comment 10 Santosh Mahto 2013-08-12 20:55:16 PDT
As per benjamin comments this bug can be closed.