Bug 69114 - DFG does not speculate aggressively enough on put_by_id
Summary: DFG does not speculate aggressively enough on put_by_id
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-29 17:28 PDT by Filip Pizlo
Modified: 2011-09-30 17:57 PDT (History)
5 users (show)

See Also:


Attachments
work in progress (24.65 KB, patch)
2011-09-29 18:58 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - works on 64-bit (24.68 KB, patch)
2011-09-29 19:41 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - 32_64 and 64 work (26.54 KB, patch)
2011-09-30 14:19 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - fix style and merge (26.56 KB, patch)
2011-09-30 15:08 PDT, Filip Pizlo
oliver: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2011-09-29 17:28:29 PDT
The DFG should implement the same by-offset optimizations for put_by_id as it already performs for get_by_id.
Comment 1 Filip Pizlo 2011-09-29 18:58:45 PDT
Created attachment 109236 [details]
work in progress

This seems to sort of work.  Still doing more tests.
Comment 2 Filip Pizlo 2011-09-29 19:15:11 PDT
Nice speed-up on V8.


Benchmark report for V8.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"PutByOffset" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1
benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution times with 95%
confidence intervals in milliseconds.

                     TipOfTree              PutByOffset                                   

crypto            70.8626+-0.3336    !    72.0406+-0.3267       ! definitely 1.0166x slower
deltablue        227.4823+-1.2145        225.5072+-2.2500       
earley-boyer      90.0359+-0.8468    ^    87.4768+-0.3990       ^ definitely 1.0293x faster
raytrace          62.8978+-0.2882    ^    60.9215+-0.5096       ^ definitely 1.0324x faster
regexp           103.5795+-0.4278        102.7796+-0.4376       
richards         198.4522+-0.8880    ^   186.9556+-0.6424       ^ definitely 1.0615x faster
splay             90.2233+-0.3854    ?    91.0759+-0.7460       ?

<arithmetic>     120.5048+-0.2634    ^   118.1082+-0.3645       ^ definitely 1.0203x faster
<geometric>      107.8085+-0.1881    ^   106.1108+-0.2448       ^ definitely 1.0160x faster
<harmonic>        98.1841+-0.1934    ^    96.9183+-0.2551       ^ definitely 1.0131x faster
Comment 3 Filip Pizlo 2011-09-29 19:20:57 PDT
Looks like it's only a win on V8.



Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"PutByOffset" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree              PutByOffset                                   
SunSpider:
   3d-cube                                7.4733+-0.2370    ?     7.4909+-0.2434       ?
   3d-morph                               7.3221+-0.1314    ?     7.4902+-0.1818       ? might be 1.0230x slower
   3d-raytrace                            8.2079+-0.2675    ?     8.2614+-0.2639       ?
   access-binary-trees                    2.0855+-0.0733    ^     1.7651+-0.0578       ^ definitely 1.1816x faster
   access-fannkuch                        6.5079+-0.1798          6.3323+-0.0819         might be 1.0277x faster
   access-nbody                           3.6268+-0.0917          3.4877+-0.0829         might be 1.0399x faster
   access-nsieve                          2.6092+-0.0605    ?     2.6369+-0.0700       ? might be 1.0106x slower
   bitops-3bit-bits-in-byte               1.7039+-0.0381    ?     1.7147+-0.0353       ?
   bitops-bits-in-byte                    2.7511+-0.0767    ?     2.7978+-0.0933       ? might be 1.0170x slower
   bitops-bitwise-and                     3.4511+-0.0848    ^     3.2750+-0.0814       ^ definitely 1.0538x faster
   bitops-nsieve-bits                     5.5331+-0.1009          5.4708+-0.0805         might be 1.0114x faster
   controlflow-recursive                  2.0846+-0.0539          2.0461+-0.0608         might be 1.0188x faster
   crypto-aes                             6.6652+-0.2079    ?     6.7579+-0.1919       ? might be 1.0139x slower
   crypto-md5                             2.8164+-0.0746    ?     2.8615+-0.0950       ? might be 1.0160x slower
   crypto-sha1                            2.5247+-0.0600          2.4973+-0.0644         might be 1.0110x faster
   date-format-tofte                     10.0118+-0.2674    ?    10.1345+-0.2735       ? might be 1.0123x slower
   date-format-xparb                      9.2736+-0.3044    ?     9.5957+-0.3815       ? might be 1.0347x slower
   math-cordic                            6.5799+-0.2671          6.4915+-0.1929         might be 1.0136x faster
   math-partial-sums                      7.5752+-0.1633    ?     7.6859+-0.1510       ? might be 1.0146x slower
   math-spectral-norm                     2.8814+-0.0999          2.8715+-0.0738       
   regexp-dna                            10.8292+-0.1427    ?    10.8850+-0.1618       ?
   string-base64                          6.0760+-0.2190    ?     6.1591+-0.2329       ? might be 1.0137x slower
   string-fasta                           7.0525+-0.2171    ?     7.1913+-0.3145       ? might be 1.0197x slower
   string-tagcloud                       11.8762+-0.3947    ?    11.9610+-0.3092       ?
   string-unpack-code                    21.3559+-0.4601         21.2620+-0.4328       
   string-validate-input                  6.2368+-0.1467    ?     6.3990+-0.2173       ? might be 1.0260x slower

   <arithmetic>                           6.3504+-0.0324    ?     6.3662+-0.0315       ?
   <geometric>                            5.2336+-0.0294          5.2091+-0.0221       
   <harmonic>                             4.3158+-0.0355    ^     4.2486+-0.0303       ^ definitely 1.0158x faster

                                            TipOfTree              PutByOffset                                   
V8:
   crypto                                71.1752+-0.4499    !    72.1471+-0.3928       ! definitely 1.0137x slower
   deltablue                            228.0470+-1.1437    ?   228.2657+-2.3416       ?
   earley-boyer                          89.8495+-0.6176    ^    88.1906+-0.5496       ^ definitely 1.0188x faster
   raytrace                              62.3209+-0.3664    ^    61.1129+-0.4345       ^ definitely 1.0198x faster
   regexp                               104.0114+-0.7839        103.5840+-0.6265       
   richards                             199.0179+-1.0726    ^   187.5831+-1.0648       ^ definitely 1.0610x faster
   splay                                 91.0199+-0.7136    ?    91.4576+-0.6711       ?

   <arithmetic>                         120.7774+-0.3729    ^   118.9059+-0.3800       ^ definitely 1.0157x faster
   <geometric>                          107.9834+-0.3348    ^   106.7217+-0.2557       ^ definitely 1.0118x faster
   <harmonic>                            98.2570+-0.3084    ^    97.3992+-0.2199       ^ definitely 1.0088x faster

                                            TipOfTree              PutByOffset                                   
Kraken:
   ai-astar                             493.2977+-4.6697        491.8601+-3.7826       
   audio-beat-detection                 190.4707+-1.7293    ?   190.7607+-1.5873       ?
   audio-dft                            280.9297+-2.4491        279.2958+-2.6566       
   audio-fft                            128.3063+-1.0409    ?   129.6684+-1.1003       ? might be 1.0106x slower
   audio-oscillator                     256.1838+-1.9104        255.2704+-2.2988       
   imaging-darkroom                     422.3094+-1.3344    ^   418.5479+-1.3490       ^ definitely 1.0090x faster
   imaging-desaturate                   223.5703+-0.8074    ?   224.3784+-1.7922       ?
   imaging-gaussian-blur                582.5605+-2.3726    ?   583.2726+-2.1460       ?
   json-parse-financial                  48.5278+-0.8188         47.7118+-0.3265         might be 1.0171x faster
   json-stringify-tinderbox              68.5631+-0.5214         68.4865+-0.3847       
   stanford-crypto-aes                  130.9508+-1.2565        130.0566+-1.3290       
   stanford-crypto-ccm                  101.1984+-0.9027    ?   102.0392+-0.9587       ?
   stanford-crypto-pbkdf2               193.0240+-1.6461    !   196.9612+-1.7595       ! definitely 1.0204x slower
   stanford-crypto-sha256-iterative      83.5403+-0.4886    !    85.1920+-0.5650       ! definitely 1.0198x slower

   <arithmetic>                         228.8166+-0.5134    ?   228.8215+-0.4248       ?
   <geometric>                          178.7126+-0.6271    ?   178.9531+-0.3788       ?
   <harmonic>                           138.6559+-0.7664    ?   138.7200+-0.3518       ?

                                            TipOfTree              PutByOffset                                   
All benchmarks:
   <arithmetic>                          89.6593+-0.1651         89.3907+-0.1544       
   <geometric>                           23.5139+-0.0709         23.4213+-0.0565       
   <harmonic>                             7.5847+-0.0606    ^     7.4693+-0.0520       ^ definitely 1.0155x faster
Comment 4 Filip Pizlo 2011-09-29 19:41:09 PDT
Created attachment 109243 [details]
the patch - works on 64-bit

Still need to do the 32_64 work.
Comment 5 Filip Pizlo 2011-09-30 14:19:15 PDT
Created attachment 109347 [details]
the patch - 32_64 and 64 work
Comment 6 WebKit Review Bot 2011-09-30 14:21:08 PDT
Attachment 109347 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/dfg/DFGPropagator.cpp:1034:  An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement.  [readability/control_flow] [4]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 Filip Pizlo 2011-09-30 15:08:44 PDT
Created attachment 109353 [details]
the patch - fix style and merge
Comment 8 Filip Pizlo 2011-09-30 15:14:01 PDT
Latest numbers, after merging and such.  It appears that there are slow-downs in some benchmarks, though the slow-downs are smaller than the speed-ups.  The slow-downs are all because PutById has no polymorphic case, and in some (rare!) cases the structure that it's patched on is no longer live.  We should fix that by blowing away all property access patches (and maybe all patches, period) whenever we trigger recompilation.  For now, I recommend landing this, because it's a bigger win (1.6% on V8) than it is a loss (0.6% on Kraken).



Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"PutByOffset" at /Volumes/Data/pizlo/septenary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree              PutByOffset                                   
SunSpider:
   3d-cube                                7.6548+-0.3226          7.3917+-0.1907         might be 1.0356x faster
   3d-morph                               7.4151+-0.1395    ?     7.4507+-0.1249       ?
   3d-raytrace                            8.4812+-0.2948          8.3150+-0.2184         might be 1.0200x faster
   access-binary-trees                    2.1094+-0.0995    ^     1.8069+-0.0613       ^ definitely 1.1674x faster
   access-fannkuch                        6.5808+-0.1922    ?     6.7499+-0.3102       ? might be 1.0257x slower
   access-nbody                           3.6466+-0.0806          3.5337+-0.0536         might be 1.0320x faster
   access-nsieve                          2.6069+-0.0616    ?     2.6798+-0.0636       ? might be 1.0280x slower
   bitops-3bit-bits-in-byte               1.7354+-0.0394          1.7119+-0.0337         might be 1.0137x faster
   bitops-bits-in-byte                    2.7756+-0.0751          2.7511+-0.0885       
   bitops-bitwise-and                     3.4489+-0.0767    ^     3.2602+-0.0614       ^ definitely 1.0579x faster
   bitops-nsieve-bits                     5.5177+-0.1257    ?     5.5852+-0.2135       ? might be 1.0122x slower
   controlflow-recursive                  2.1451+-0.0706          2.1262+-0.1009       
   crypto-aes                             6.8211+-0.1833          6.7461+-0.2125         might be 1.0111x faster
   crypto-md5                             2.8203+-0.0738          2.7874+-0.0739         might be 1.0118x faster
   crypto-sha1                            2.5016+-0.0660    ?     2.5311+-0.0773       ? might be 1.0118x slower
   date-format-tofte                     10.0443+-0.2708    ?    10.2338+-0.2693       ? might be 1.0189x slower
   date-format-xparb                     10.0244+-0.4041          9.7601+-0.2739         might be 1.0271x faster
   math-cordic                            6.3638+-0.2144          6.3315+-0.0620       
   math-partial-sums                      7.7000+-0.0621    ?     7.7615+-0.1886       ?
   math-spectral-norm                     3.0138+-0.3396          2.8333+-0.0575         might be 1.0637x faster
   regexp-dna                            10.8769+-0.1672    ?    10.9393+-0.2064       ?
   string-base64                          6.2164+-0.1799          6.0559+-0.2046         might be 1.0265x faster
   string-fasta                           7.0881+-0.1568          6.8526+-0.1197         might be 1.0344x faster
   string-tagcloud                       12.1531+-0.2665         11.9861+-0.3321         might be 1.0139x faster
   string-unpack-code                    21.3522+-0.4916    ?    21.4023+-0.4834       ?
   string-validate-input                  6.3879+-0.2148    ?     6.4203+-0.2185       ?

   <arithmetic>                           6.4416+-0.0494          6.3848+-0.0395       
   <geometric>                            5.3031+-0.0494          5.2245+-0.0305         might be 1.0150x faster
   <harmonic>                             4.3660+-0.0548    ^     4.2681+-0.0382       ^ definitely 1.0229x faster

                                            TipOfTree              PutByOffset                                   
V8:
   crypto                                72.1369+-0.7540    ?    73.1136+-0.4965       ? might be 1.0135x slower
   deltablue                            230.9621+-0.8870    ^   226.2957+-1.9236       ^ definitely 1.0206x faster
   earley-boyer                          91.2661+-0.8792    ^    87.8639+-0.2933       ^ definitely 1.0387x faster
   raytrace                              63.6002+-0.4248    ^    62.0997+-1.0089       ^ definitely 1.0242x faster
   regexp                               104.6459+-0.6469    ?   106.7863+-1.6047       ? might be 1.0205x slower
   richards                             203.1269+-2.4756    ^   188.0540+-1.1212       ^ definitely 1.0802x faster
   splay                                 91.4723+-0.7206    ?    92.3211+-0.9555       ?

   <arithmetic>                         122.4586+-0.3488    ^   119.5049+-0.5252       ^ definitely 1.0247x faster
   <geometric>                          109.4324+-0.3007    ^   107.6235+-0.4169       ^ definitely 1.0168x faster
   <harmonic>                            99.5829+-0.3574    ^    98.4368+-0.4103       ^ definitely 1.0116x faster

                                            TipOfTree              PutByOffset                                   
Kraken:
   ai-astar                             492.2343+-4.4177    ?   500.2010+-5.8382       ? might be 1.0162x slower
   audio-beat-detection                 190.9514+-0.5210    ?   193.0569+-1.7599       ? might be 1.0110x slower
   audio-dft                            287.1908+-2.8916        284.1722+-4.7994         might be 1.0106x faster
   audio-fft                            128.8452+-1.2426    ?   128.9132+-1.2200       ?
   audio-oscillator                     260.5544+-1.9829        259.4385+-2.5186       
   imaging-darkroom                     420.9855+-1.0263    ?   421.8993+-1.7584       ?
   imaging-desaturate                   225.8952+-2.0593    ?   228.8723+-2.6980       ? might be 1.0132x slower
   imaging-gaussian-blur                586.2501+-4.6554    ?   589.6864+-4.7268       ?
   json-parse-financial                  48.4939+-0.2546    ?    48.7435+-1.2599       ?
   json-stringify-tinderbox              68.6375+-0.3577    !    70.0573+-0.9318       ! definitely 1.0207x slower
   stanford-crypto-aes                  133.3343+-1.9512        132.8290+-2.0660       
   stanford-crypto-ccm                  102.4281+-0.6916        102.1125+-1.6788       
   stanford-crypto-pbkdf2               196.7817+-1.7446    ?   201.3275+-4.3554       ? might be 1.0231x slower
   stanford-crypto-sha256-iterative      85.1036+-0.6282    ?    85.6230+-0.8534       ?

   <arithmetic>                         230.5490+-0.5868    !   231.9238+-0.7332       ! definitely 1.0060x slower
   <geometric>                          180.3196+-0.3844    !   181.3460+-0.6401       ! definitely 1.0057x slower
   <harmonic>                           139.7823+-0.3152    ?   140.6246+-0.7808       ?

                                            TipOfTree              PutByOffset                                   
All benchmarks:
   <arithmetic>                          90.4761+-0.2161         90.4143+-0.2503       
   <geometric>                           23.7963+-0.1389         23.5822+-0.0897       
   <harmonic>                             7.6727+-0.0940    ^     7.5051+-0.0658       ^ definitely 1.0223x faster
Comment 9 Oliver Hunt 2011-09-30 16:41:51 PDT
Comment on attachment 109353 [details]
the patch - fix style and merge

View in context: https://bugs.webkit.org/attachment.cgi?id=109353&action=review

r=me, but you should probably drop the write barrier if the cell is already in the old generation

> Source/JavaScriptCore/dfg/DFGJITCodeGenerator.cpp:270
> +void JITCodeGenerator::writeBarrier(GPRReg ownerGPR, JSCell* value, WriteBarrierUseKind useKind, GPRReg scratch1, GPRReg scratch2)

if value is a cell you can do a generation check (Heap::isMarked()) you can avoid planting this write barrier entirely
Comment 10 Filip Pizlo 2011-09-30 17:52:37 PDT
(In reply to comment #9)
> (From update of attachment 109353 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=109353&action=review
> 
> r=me, but you should probably drop the write barrier if the cell is already in the old generation
> 
> > Source/JavaScriptCore/dfg/DFGJITCodeGenerator.cpp:270
> > +void JITCodeGenerator::writeBarrier(GPRReg ownerGPR, JSCell* value, WriteBarrierUseKind useKind, GPRReg scratch1, GPRReg scratch2)
> 
> if value is a cell you can do a generation check (Heap::isMarked()) you can avoid planting this write barrier entirely

Right, good point!  I made that change, will land.
Comment 11 Filip Pizlo 2011-09-30 17:57:53 PDT
Landed in r96443.