Bug 67176 - JavaScriptCore does not have tiered compilation
Summary: JavaScriptCore does not have tiered compilation
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-29 18:53 PDT by Filip Pizlo
Modified: 2011-09-06 02:23 PDT (History)
8 users (show)

See Also:


Attachments
work in progress (66.70 KB, patch)
2011-08-29 19:43 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (fix style) (66.61 KB, patch)
2011-08-29 20:04 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - fix review (58.94 KB, patch)
2011-08-30 20:37 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
work in progress (132.56 KB, patch)
2011-09-02 19:50 PDT, Filip Pizlo
gyuyoung.kim: commit-queue-
Details | Formatted Diff | Diff
the patch (117.56 KB, patch)
2011-09-03 01:12 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - fix conflicts, style (117.61 KB, patch)
2011-09-03 01:19 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - removed spurious debug nonsense from v8-crypto (117.13 KB, patch)
2011-09-03 01:21 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - fix windows (117.67 KB, patch)
2011-09-03 02:01 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch - another attempt to fix windows (119.41 KB, patch)
2011-09-03 02:53 PDT, Filip Pizlo
gustavo.noronha: commit-queue-
Details | Formatted Diff | Diff
the patch - fix GTK (120.03 KB, patch)
2011-09-03 21:38 PDT, Filip Pizlo
barraclough: review-
barraclough: commit-queue-
Details | Formatted Diff | Diff
the patch - partial fix review (130.88 KB, patch)
2011-09-05 17:49 PDT, Filip Pizlo
webkit-ews: commit-queue-
Details | Formatted Diff | Diff
the patch - partial fix review, fix Qt build (130.86 KB, patch)
2011-09-05 18:07 PDT, Filip Pizlo
barraclough: review+
barraclough: commit-queue-
Details | Formatted Diff | Diff
the patch - more fix review, sort of (130.12 KB, patch)
2011-09-05 21:16 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (130.66 KB, patch)
2011-09-05 23:45 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2011-08-29 18:53:37 PDT
JSC does not support the notion of a piece of JS code being recompiled.  This prevents the implementation of a number of interesting features:

1) Profile-driven optimization in DFG JIT.

2) JITing code only after the code is hot.

3) Having a JIT so powerful that it would be prohibitive to run it for every function.
Comment 1 Filip Pizlo 2011-08-29 19:21:48 PDT
Here is the performance of tiered compilation, without any optimizations that would benefit from tiered compilation.  As in:

1) There is no profile-driven compilation.

2) We always JIT code.  The first tier JIT is the old JIT, and the second tier JIT is the DFG.

3) The DFG JIT is not so much more expensive than the old JIT that tiering would reduce compile times significantly.

As in, this is an worst-case scenario measurement - it gives us the worst-case pathological cost of not immediately compiling with the DFG.


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"TieringEnabled" at /Volumes/Data/pizlo/octonary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree             TieringEnabled                                 
SunSpider:
   3d-cube                                7.5697+-0.1302    !     8.1947+-0.1789       ! definitely 1.0826x slower
   3d-morph                               7.3310+-0.1581    !     7.6346+-0.1198       ! definitely 1.0414x slower
   3d-raytrace                            7.5872+-0.1801    !     8.6595+-0.2115       ! definitely 1.1413x slower
   access-binary-trees                    2.3936+-0.1317    ?     2.4144+-0.0996       ?
   access-fannkuch                       11.7383+-0.1824    ?    12.0820+-0.2585       ? might be 1.0293x slower
   access-nbody                           4.2421+-0.0683    ?     4.3795+-0.1220       ? might be 1.0324x slower
   access-nsieve                          2.4851+-0.0591    !     2.6592+-0.0485       ! definitely 1.0701x slower
   bitops-3bit-bits-in-byte               1.7253+-0.0628    ?     1.8459+-0.0681       ? might be 1.0699x slower
   bitops-bits-in-byte                    4.5103+-0.2142    !     4.9207+-0.0933       ! definitely 1.0910x slower
   bitops-bitwise-and                     3.6722+-0.0712    !     4.0303+-0.1294       ! definitely 1.0975x slower
   bitops-nsieve-bits                     5.4084+-0.1234    ?     5.4185+-0.0880       ?
   controlflow-recursive                  2.0106+-0.0685    ?     2.0777+-0.0619       ? might be 1.0334x slower
   crypto-aes                             6.4807+-0.1584    !     7.1072+-0.2762       ! definitely 1.0967x slower
   crypto-md5                             2.8333+-0.0827    !     3.0637+-0.0605       ! definitely 1.0813x slower
   crypto-sha1                            2.1963+-0.0656    !     2.4276+-0.0576       ! definitely 1.1053x slower
   date-format-tofte                     10.1948+-0.3012    ?    10.3135+-0.3361       ? might be 1.0116x slower
   date-format-xparb                      8.3079+-0.1122    !     9.1526+-0.2566       ! definitely 1.1017x slower
   math-cordic                            6.4793+-0.3005    ?     6.5032+-0.1603       ?
   math-partial-sums                      7.7221+-0.1542          7.4640+-0.1467         might be 1.0346x faster
   math-spectral-norm                     2.4993+-0.0623    ?     2.6149+-0.0656       ? might be 1.0463x slower
   regexp-dna                            10.2265+-0.2753         10.0736+-0.2134         might be 1.0152x faster
   string-base64                          6.0017+-0.2027          6.0012+-0.0787       
   string-fasta                           7.4792+-0.2292          7.1326+-0.1780         might be 1.0486x faster
   string-tagcloud                       13.4894+-0.2745         13.0466+-0.2483         might be 1.0339x faster
   string-unpack-code                    18.3762+-0.3810    ?    19.0636+-0.4548       ? might be 1.0374x slower
   string-validate-input                  6.9764+-0.2309          6.8614+-0.2039         might be 1.0168x faster

   <arithmetic>                           6.5360+-0.0434    !     6.7363+-0.0402       ! definitely 1.0306x slower
   <geometric>                            5.4317+-0.0374    !     5.6435+-0.0279       ! definitely 1.0390x slower
   <harmonic>                             4.4484+-0.0301    !     4.6648+-0.0340       ! definitely 1.0486x slower

                                            TipOfTree             TieringEnabled                                 
V8:
   crypto                                90.1561+-0.4657    !    92.6007+-0.5567       ! definitely 1.0271x slower
   deltablue                            261.2825+-1.2059    !   265.1254+-1.7597       ! definitely 1.0147x slower
   earley-boyer                         101.1890+-0.4968    !   104.4750+-0.2499       ! definitely 1.0325x slower
   raytrace                              77.7841+-0.3200    ?    77.9236+-0.4897       ?
   regexp                               108.4768+-0.6381    !   111.4160+-0.3292       ! definitely 1.0271x slower
   richards                             241.2977+-1.2640    !   249.0259+-2.2002       ! definitely 1.0320x slower
   splay                                109.6627+-0.5339    ^   108.3732+-0.5277       ^ definitely 1.0119x faster

   <arithmetic>                         141.4070+-0.2960    !   144.1343+-0.6120       ! definitely 1.0193x slower
   <geometric>                          126.9741+-0.2363    !   129.1974+-0.3898       ! definitely 1.0175x slower
   <harmonic>                           116.3908+-0.2368    !   118.2490+-0.2897       ! definitely 1.0160x slower

                                            TipOfTree             TieringEnabled                                 
Kraken:
   ai-astar                            1099.4152+-8.6736    ^  1081.0538+-7.3193       ^ definitely 1.0170x faster
   audio-beat-detection                 464.7362+-1.6339    !   475.2486+-2.9215       ! definitely 1.0226x slower
   audio-dft                            421.8051+-3.9788    ?   423.8868+-3.3273       ?
   audio-fft                            374.3489+-3.1956    ^   369.9372+-0.9839       ^ definitely 1.0119x faster
   audio-oscillator                     375.2353+-0.4462    ^   357.1955+-3.6390       ^ definitely 1.0505x faster
   imaging-darkroom                     534.1020+-7.4953    ?   536.1056+-3.3782       ?
   imaging-desaturate                   591.7430+-6.5169    ?   596.3413+-6.6589       ?
   imaging-gaussian-blur               1727.0032+-7.2924    !  2019.4240+-18.6047      ! definitely 1.1693x slower
   json-parse-financial                  49.2286+-1.2706         48.5747+-0.3344         might be 1.0135x faster
   json-stringify-tinderbox              61.5364+-0.3364    ?    62.2929+-0.4507       ? might be 1.0123x slower
   stanford-crypto-aes                  145.0281+-0.7231    ?   147.0604+-1.3901       ? might be 1.0140x slower
   stanford-crypto-ccm                  110.8017+-0.2247    !   113.3705+-0.4937       ! definitely 1.0232x slower
   stanford-crypto-pbkdf2               337.3557+-1.4168    ?   339.8807+-3.2412       ?
   stanford-crypto-sha256-iterative     129.9748+-0.6521    ?   130.0296+-0.8055       ?

   <arithmetic>                         458.7367+-1.0545    !   478.6001+-1.4694       ! definitely 1.0433x slower
   <geometric>                          293.5137+-0.6624    !   296.9148+-0.5346       ! definitely 1.0116x slower
   <harmonic>                           179.3067+-1.1042    ?   179.8807+-0.5467       ?

                                            TipOfTree             TieringEnabled                                 
All benchmarks:
   <arithmetic>                         161.3213+-0.3186    !   167.7550+-0.4560       ! definitely 1.0399x slower
   <geometric>                           28.5043+-0.0949    !    29.2899+-0.0912       ! definitely 1.0276x slower
   <harmonic>                             7.8556+-0.0515    !     8.2301+-0.0588       ! definitely 1.0477x slower
Comment 2 Filip Pizlo 2011-08-29 19:38:00 PDT
Here's the performance with the patch (that I'm about to put up) with TIERED_COMPILATION disabled.  It appears to be neutral.



Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"TieringDisabled" at /Volumes/Data/pizlo/octonary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree            TieringDisabled                                 
SunSpider:
   3d-cube                                7.5376+-0.1175    ?     7.7058+-0.3003       ? might be 1.0223x slower
   3d-morph                               7.3861+-0.1502    ?     7.4441+-0.1638       ?
   3d-raytrace                            7.9061+-0.1573          7.7273+-0.1972         might be 1.0231x faster
   access-binary-trees                    2.2056+-0.0434    ?     2.3275+-0.1729       ? might be 1.0553x slower
   access-fannkuch                       11.8176+-0.2460    ?    11.8344+-0.2259       ?
   access-nbody                           4.3464+-0.1199          4.2083+-0.0448         might be 1.0328x faster
   access-nsieve                          2.5201+-0.0780          2.4449+-0.0236         might be 1.0308x faster
   bitops-3bit-bits-in-byte               1.7132+-0.0409    ?     1.7336+-0.0583       ? might be 1.0119x slower
   bitops-bits-in-byte                    4.4561+-0.2215    ?     4.4569+-0.2233       ?
   bitops-bitwise-and                     3.7906+-0.1316          3.6795+-0.0828         might be 1.0302x faster
   bitops-nsieve-bits                     5.4563+-0.1567          5.4409+-0.1175       
   controlflow-recursive                  2.0028+-0.0456    ?     2.0519+-0.0506       ? might be 1.0245x slower
   crypto-aes                             6.5796+-0.1957          6.5665+-0.2603       
   crypto-md5                             2.7610+-0.0811    ?     2.7837+-0.1032       ?
   crypto-sha1                            2.2108+-0.0583    ?     2.2621+-0.0746       ? might be 1.0232x slower
   date-format-tofte                     10.0549+-0.2862    ?    10.0583+-0.2622       ?
   date-format-xparb                      8.5015+-0.2033    ?     8.6012+-0.2057       ? might be 1.0117x slower
   math-cordic                            6.3485+-0.1002          6.2499+-0.0975         might be 1.0158x faster
   math-partial-sums                      7.8015+-0.1462          7.6821+-0.1618         might be 1.0155x faster
   math-spectral-norm                     2.4839+-0.0601    ?     2.5007+-0.0551       ?
   regexp-dna                            10.2690+-0.2139    ?    10.4431+-0.1691       ? might be 1.0169x slower
   string-base64                          5.9300+-0.1266    ?     6.0768+-0.1648       ? might be 1.0248x slower
   string-fasta                           7.5471+-0.2306    ?     7.6356+-0.2971       ? might be 1.0117x slower
   string-tagcloud                       13.1309+-0.2623    ?    13.2849+-0.2355       ? might be 1.0117x slower
   string-unpack-code                    18.4339+-0.3087    ?    18.6805+-0.4475       ? might be 1.0134x slower
   string-validate-input                  6.9646+-0.2545    ?     7.2475+-0.1046       ? might be 1.0406x slower

   <arithmetic>                           6.5445+-0.0258    ?     6.5818+-0.0279       ?
   <geometric>                            5.4303+-0.0209    ?     5.4580+-0.0307       ?
   <harmonic>                             4.4312+-0.0178    ?     4.4605+-0.0364       ?

                                            TipOfTree            TieringDisabled                                 
V8:
   crypto                                89.7999+-0.4161    ?    90.5472+-0.3796       ?
   deltablue                            261.8698+-0.8554    !   267.0150+-1.3306       ! definitely 1.0196x slower
   earley-boyer                         101.2657+-0.6838    ?   102.4584+-1.3149       ? might be 1.0118x slower
   raytrace                              77.6936+-0.2382    ?    77.9529+-1.3854       ?
   regexp                               109.3492+-0.6109        108.4792+-0.4163       
   richards                             240.9417+-1.7815        240.3986+-0.6354       
   splay                                108.7129+-0.9346    ?   108.9099+-0.8099       ?

   <arithmetic>                         141.3761+-0.3684    !   142.2516+-0.3139       ! definitely 1.0062x slower
   <geometric>                          126.8950+-0.2565    ?   127.5128+-0.3834       ?
   <harmonic>                           116.2815+-0.1985    ?   116.7642+-0.4657       ?

                                            TipOfTree            TieringDisabled                                 
Kraken:
   ai-astar                            1103.8657+-8.3351    ^  1087.3269+-2.9530       ^ definitely 1.0152x faster
   audio-beat-detection                 467.9644+-1.6541    ?   469.2818+-1.6212       ?
   audio-dft                            419.3480+-5.3797        415.8305+-3.9171       
   audio-fft                            370.3477+-0.7266    !   372.8229+-0.6572       ! definitely 1.0067x slower
   audio-oscillator                     377.0762+-3.1503    !   384.9752+-2.9895       ! definitely 1.0209x slower
   imaging-darkroom                     529.8054+-1.8055    ?   533.5978+-5.9939       ?
   imaging-desaturate                   588.5433+-6.8542        587.1311+-6.6282       
   imaging-gaussian-blur               1736.0974+-10.3083      1722.0121+-6.5547       
   json-parse-financial                  48.0002+-0.3510    !    49.2782+-0.7477       ! definitely 1.0266x slower
   json-stringify-tinderbox              61.6009+-0.2921    !    62.3810+-0.3903       ! definitely 1.0127x slower
   stanford-crypto-aes                  144.3355+-0.5705    ?   145.9901+-2.8725       ? might be 1.0115x slower
   stanford-crypto-ccm                  111.3489+-0.3694    !   113.0364+-1.1566       ! definitely 1.0152x slower
   stanford-crypto-pbkdf2               336.8550+-1.3959    ?   336.9365+-1.4817       ?
   stanford-crypto-sha256-iterative     130.2361+-0.6723    ^   128.8800+-0.4144       ^ definitely 1.0105x faster

   <arithmetic>                         458.9589+-1.5070        457.8200+-1.0110       
   <geometric>                          292.8484+-0.6801    ?   294.0579+-0.7532       ?
   <harmonic>                           178.1834+-0.3880    !   180.2808+-1.0294       ! definitely 1.0118x slower

                                            TipOfTree            TieringDisabled                                 
All benchmarks:
   <arithmetic>                         161.3875+-0.4462        161.1994+-0.3013       
   <geometric>                           28.4787+-0.0634    ?    28.6145+-0.0824       ?
   <harmonic>                             7.8251+-0.0308    ?     7.8771+-0.0627       ?
Comment 3 Filip Pizlo 2011-08-29 19:43:10 PDT
Created attachment 105564 [details]
work in progress

Not yet ready for review (tests still running, some asm work not yet ported to other architectures).
Comment 4 WebKit Review Bot 2011-08-29 19:45:01 PDT
Attachment 105564 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/bytecode/CodeBlock.h:217:  The parameter name "symbolTable" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/runtime/Executable.h:311:  The parameter name "exec" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/runtime/Executable.h:311:  The parameter name "scopeChainNode" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/runtime/Executable.h:374:  The parameter name "exec" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/runtime/Executable.h:374:  The parameter name "scopeChainNode" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/runtime/Executable.h:455:  The parameter name "exec" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/runtime/Executable.h:455:  The parameter name "scopeChainNode" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/runtime/Executable.h:478:  The parameter name "exec" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/runtime/Executable.h:478:  The parameter name "scopeChainNode" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/JavaScriptCore/bytecode/CodeBlock.cpp:1454:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/JavaScriptCore/bytecode/CodeBlock.cpp:1814:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Total errors found: 11 in 20 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Filip Pizlo 2011-08-29 20:04:05 PDT
Created attachment 105565 [details]
the patch (fix style)
Comment 6 Filip Pizlo 2011-08-29 20:04:55 PDT
(In reply to comment #5)
> Created an attachment (id=105565) [details]
> the patch (fix style)

This now passes tests with tiering disabled, and has style fixes.  Now I just want to test again, to make sure that the tiering-disabled fixes don't break tiering-enabled.
Comment 7 Filip Pizlo 2011-08-29 20:34:27 PDT
(In reply to comment #6)
> (In reply to comment #5)
> > Created an attachment (id=105565) [details] [details]
> > the patch (fix style)
> 
> This now passes tests with tiering disabled, and has style fixes.  Now I just want to test again, to make sure that the tiering-disabled fixes don't break tiering-enabled.

Tests pass with and without tiering enabled.  The web is usable with tiering enabled.  Now checking if the web is usable with tiering disabled...
Comment 8 Filip Pizlo 2011-08-29 21:11:47 PDT
Comment on attachment 105565 [details]
the patch (fix style)

All tests pass.  The web is usable with and without tiered compilation enabled.
Comment 9 Geoffrey Garen 2011-08-30 13:36:49 PDT
> wtf/DoublyLinkedSentinelList.h: Added.

I think WTF::SentinelLinkedList can work for your purposes. Would be nice to only have one such class.
Comment 10 Filip Pizlo 2011-08-30 13:41:05 PDT
(In reply to comment #9)
> > wtf/DoublyLinkedSentinelList.h: Added.
> 
> I think WTF::SentinelLinkedList can work for your purposes. Would be nice to only have one such class.

Note quite: SentinelLinkedList would result in inlining two copies of CallLinkInfo into CodeBlock.  CallLinkInfo is a big-ish structure.  I'm totally fine with doing that except for the increased mem usage.

So what do you think?  Have one such class but more space usage, or have two and win back ~10 words or so per CodeBlock?
Comment 11 Geoffrey Garen 2011-08-30 14:35:57 PDT
+// Finally, it differs from SentinelLinkedList by ensuring that the
+// sentinels only have next/prev pointers instead of the full contents of
+// the node structure. This makes DoublyLinkedSentinelList great for cases
+// where the nodes are large.

Sorry, didn't finish reading and see this comment.

> SentinelLinkedList would result in inlining two copies of CallLinkInfo into CodeBlock. 

What we definitely don't want is two classes with similar-sounding names that don't distinguish what's different between them. "DoublyLinkedSentinelList" isn't a very good way to distinguish vs "SentinelLinkedList", since they're both doubly linked, and it doesn't say what's actually different between them (the inlining of the full node vs just prev/next pointers).

We also probably don't want two different implementations of very similar algorithms, if we can avoid it.

To solve this problem, how about templatizing the existing SentinelLinkedList to be SentinelLinkedList<T, Node=SentinelLinkedListNode<T> >, to allow clients to specialize as necessary for a class that doesn't inherit from SentinelLinkedListNode? (Internally, SentinelLinkedList would only deal in Node*, but when returning a Node* to the client, it would static_cast to T*.)

Then, HandleHeap would use SentinelLinkedList<HandleHeap::Node, HandleHeap::Node>, and this patch would use SentinelLinkedList<CallLinkInfo>.
Comment 12 Filip Pizlo 2011-08-30 14:37:36 PDT
(In reply to comment #11)
> +// Finally, it differs from SentinelLinkedList by ensuring that the
> +// sentinels only have next/prev pointers instead of the full contents of
> +// the node structure. This makes DoublyLinkedSentinelList great for cases
> +// where the nodes are large.
> 
> Sorry, didn't finish reading and see this comment.
> 
> > SentinelLinkedList would result in inlining two copies of CallLinkInfo into CodeBlock. 
> 
> What we definitely don't want is two classes with similar-sounding names that don't distinguish what's different between them. "DoublyLinkedSentinelList" isn't a very good way to distinguish vs "SentinelLinkedList", since they're both doubly linked, and it doesn't say what's actually different between them (the inlining of the full node vs just prev/next pointers).
> 
> We also probably don't want two different implementations of very similar algorithms, if we can avoid it.
> 
> To solve this problem, how about templatizing the existing SentinelLinkedList to be SentinelLinkedList<T, Node=SentinelLinkedListNode<T> >, to allow clients to specialize as necessary for a class that doesn't inherit from SentinelLinkedListNode? (Internally, SentinelLinkedList would only deal in Node*, but when returning a Node* to the client, it would static_cast to T*.)
> 
> Then, HandleHeap would use SentinelLinkedList<HandleHeap::Node, HandleHeap::Node>, and this patch would use SentinelLinkedList<CallLinkInfo>.

This sounds good, I'll give it a shot.
Comment 13 Filip Pizlo 2011-08-30 20:37:04 PDT
Created attachment 105733 [details]
the patch - fix review
Comment 14 Filip Pizlo 2011-08-31 14:12:21 PDT
The latest performance of this patch:


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/secondary/OpenSource/WebKitBuild/Release/jsc
"TieringDisabled" at /Volumes/Data/pizlo/OpenSource/WebKitBuild/Release/jsc

Collected 18 samples per benchmark/VM, with 6 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree            TieringDisabled                                 
SunSpider:
   3d-cube                                8.2834+-0.0364          8.2577+-0.0268       
   3d-morph                               8.1428+-0.0572          8.1077+-0.0321       
   3d-raytrace                            8.3447+-0.0734          8.2389+-0.0536         might be 1.0128x faster
   access-binary-trees                    2.5759+-0.0295          2.5376+-0.0195         might be 1.0151x faster
   access-fannkuch                       13.0586+-0.0459    ?    13.0897+-0.0386       ?
   access-nbody                           4.9319+-0.0224    ?     4.9500+-0.0322       ?
   access-nsieve                          3.0545+-0.0191    ?     3.0591+-0.0254       ?
   bitops-3bit-bits-in-byte               1.8822+-0.0140          1.8570+-0.0186         might be 1.0136x faster
   bitops-bits-in-byte                    5.6499+-0.0987          5.6485+-0.0961       
   bitops-bitwise-and                     4.1460+-0.0288          4.1221+-0.0225       
   bitops-nsieve-bits                     5.7399+-0.0354          5.7095+-0.0371       
   controlflow-recursive                  2.2823+-0.0281    ?     2.3029+-0.0331       ?
   crypto-aes                             6.8808+-0.0355          6.8423+-0.0293       
   crypto-md5                             3.0541+-0.0535          3.0423+-0.0357       
   crypto-sha1                            2.4634+-0.0340          2.4452+-0.0260       
   date-format-tofte                     11.1781+-0.1704         11.0855+-0.1969       
   date-format-xparb                      9.3850+-0.0766    !     9.7049+-0.0942       ! definitely 1.0341x slower
   math-cordic                            7.0896+-0.0331          7.0832+-0.0353       
   math-partial-sums                     10.7382+-0.0415    ^    10.6581+-0.0225       ^ definitely 1.0075x faster
   math-spectral-norm                     2.7366+-0.0245    ?     2.7499+-0.0275       ?
   regexp-dna                            12.0810+-0.0976    ?    12.1257+-0.0889       ?
   string-base64                          6.5605+-0.0861          6.5471+-0.0790       
   string-fasta                           8.3149+-0.0272    ?     8.3204+-0.0309       ?
   string-tagcloud                       15.0926+-0.1070    ?    15.2919+-0.1229       ? might be 1.0132x slower
   string-unpack-code                    20.6676+-0.0894    ?    20.6991+-0.1211       ?
   string-validate-input                  7.5367+-0.2107    ?     7.6801+-0.2017       ? might be 1.0190x slower

   <arithmetic>                           7.3797+-0.0296    ?     7.3906+-0.0288       ?
   <geometric>                            6.1216+-0.0265          6.1210+-0.0254       
   <harmonic>                             5.0008+-0.0258          4.9910+-0.0263       

                                            TipOfTree            TieringDisabled                                 
V8:
   crypto                               102.7261+-0.2975    ?   102.8874+-0.2085       ?
   deltablue                            298.8926+-1.6919        298.6987+-1.9454       
   earley-boyer                         122.4942+-0.3332    ?   122.6251+-0.2529       ?
   raytrace                              87.8440+-0.5329         87.6053+-0.7894       
   regexp                               131.1448+-0.5178    !   132.5521+-0.3833       ! definitely 1.0107x slower
   richards                             301.1331+-0.5884        300.9316+-1.3004       
   splay                                156.6207+-1.4762        156.5469+-0.9595       

   <arithmetic>                         171.5508+-0.3431    ?   171.6924+-0.3451       ?
   <geometric>                          153.8781+-0.2491    ?   154.0718+-0.2973       ?
   <harmonic>                           140.1027+-0.2176    ?   140.2896+-0.3298       ?

                                            TipOfTree            TieringDisabled                                 
Kraken:
   ai-astar                            1662.2271+-14.8888   ?  1662.3195+-15.2833      ?
   audio-beat-detection                 541.9177+-3.5377        538.8129+-1.4198       
   audio-dft                            451.6485+-2.3367        449.9374+-2.8209       
   audio-fft                            422.3728+-2.3541        419.6235+-0.6314       
   audio-oscillator                     408.6592+-3.3316    ^   404.3170+-0.6803       ^ definitely 1.0107x faster
   imaging-darkroom                     586.6719+-1.3957    ?   594.6635+-7.4560       ? might be 1.0136x slower
   imaging-desaturate                   635.8691+-13.4831   ?   636.3929+-13.6931      ?
   imaging-gaussian-blur               1854.6785+-2.7334       1850.4129+-2.9003       
   json-parse-financial                  62.4728+-0.4730    !    63.5460+-0.1337       ! definitely 1.0172x slower
   json-stringify-tinderbox              75.8496+-0.1552         75.5888+-0.1383       
   stanford-crypto-aes                  166.6060+-0.4238        166.5077+-0.4959       
   stanford-crypto-ccm                  130.7967+-0.3259    ?   130.9919+-0.4046       ?
   stanford-crypto-pbkdf2               376.4091+-1.3560    ^   371.4304+-1.1870       ^ definitely 1.0134x faster
   stanford-crypto-sha256-iterative     144.4347+-0.3728    ?   144.4837+-0.5826       ?

   <arithmetic>                         537.1867+-1.4454        536.3592+-1.7807       
   <geometric>                          339.6910+-0.7336        339.3740+-0.9549       
   <harmonic>                           213.0308+-0.4558    ?   213.5641+-0.4034       ?

                                            TipOfTree            TieringDisabled                                 
All benchmarks:
   <arithmetic>                         189.6455+-0.4455        189.4262+-0.5564       
   <geometric>                           32.7321+-0.0950         32.7274+-0.0929       
   <harmonic>                             8.8432+-0.0447          8.8266+-0.0455
Comment 15 Filip Pizlo 2011-09-02 19:50:03 PDT
Created attachment 106240 [details]
work in progress

Decided to combine the tiering patch with implementing dynamic optimization, because it turns out that my efforts to get dynamic optimization working revealed the need to make some changes to tiering heuristics.  This patch is not in reviewable state yet, because it depends on a value profiling patch that is still in the commit queue.
Comment 16 WebKit Review Bot 2011-09-02 19:53:52 PDT
Attachment 106240 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'PerformanceTests/SunSpider/tests/v8-v6/v8-..." exit_code: 1

Source/JavaScriptCore/dfg/DFGPropagation.cpp:62:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/JavaScriptCore/bytecode/ValueProfile.h:168:  One line control clauses should not use braces.  [whitespace/braces] [4]
Source/JavaScriptCore/bytecode/ValueProfile.h:172:  One line control clauses should not use braces.  [whitespace/braces] [4]
Source/JavaScriptCore/dfg/DFGGraph.h:261:  The parameter name "codeBlock" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 4 in 39 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 17 Gyuyoung Kim 2011-09-02 20:11:02 PDT
Comment on attachment 106240 [details]
work in progress

Attachment 106240 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/9586660
Comment 18 Filip Pizlo 2011-09-02 22:23:57 PDT
Tiered compilation - even in this current fairly dumb form - is starting to show signs of goodness.  When combined with https://bugs.webkit.org/show_bug.cgi?id=67553, we get a 10% speed-up on v8-crypto.  Other benchmarks are still hurting because of a lack of OSR (so many benchmarks never tier up) and the fact that the DFG back-end still doesn't properly incorporate the rich set of numeric predictions we now have (bottom, int-static, int, double, number, top).


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"DynamicOpt" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc
"DynamicOpt2" at /Volumes/Data/pizlo/octonary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM invocation for warm-up.
Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence
intervals in milliseconds.

                                            TipOfTree               DynamicOpt             DynamicOpt2          DynamicOpt2 v. TipOfTree 
SunSpider:
   3d-cube                                7.9698+-0.2087    !    12.3583+-0.2144    ?    12.4259+-0.2686       ! definitely 1.5591x slower
   3d-morph                               7.7744+-0.1793    !     8.1831+-0.1140    ?     8.1930+-0.2560       ? might be 1.0538x slower
   3d-raytrace                            7.6502+-0.1953    !     8.4078+-0.1219    ?     8.7114+-0.3048       ! definitely 1.1387x slower
   access-binary-trees                    2.3893+-0.0906    ?     2.4728+-0.0687          2.4634+-0.0682       ? might be 1.0310x slower
   access-fannkuch                       11.9543+-0.2186    !    13.0364+-0.2499         13.0251+-0.2684       ! definitely 1.0896x slower
   access-nbody                           4.4484+-0.1560          4.3845+-0.0838          4.3802+-0.0679         might be 1.0156x faster
   access-nsieve                          2.5408+-0.0307    !     2.7409+-0.0567    ?     2.7937+-0.1355       ! definitely 1.0995x slower
   bitops-3bit-bits-in-byte               1.7154+-0.0824    !     1.9264+-0.0602          1.9255+-0.0443       ! definitely 1.1224x slower
   bitops-bits-in-byte                    4.5476+-0.2678    !     5.3181+-0.2523    ?     5.5533+-0.3096       ! definitely 1.2211x slower
   bitops-bitwise-and                     3.6850+-0.0556    !     4.0393+-0.1645    ?     4.0762+-0.0869       ! definitely 1.1062x slower
   bitops-nsieve-bits                     5.5807+-0.2016    !     5.9615+-0.1491    ?     5.9727+-0.1217       ! definitely 1.0702x slower
   controlflow-recursive                  2.0337+-0.0349    ?     2.0952+-0.0382          2.0585+-0.0367       ? might be 1.0122x slower
   crypto-aes                             6.8239+-0.1930    !     8.0399+-0.3227          7.8494+-0.3143       ! definitely 1.1503x slower
   crypto-md5                             2.8098+-0.0809    !     2.9741+-0.0799          2.9550+-0.0900       ? might be 1.0517x slower
   crypto-sha1                            2.2982+-0.0805    !     2.4675+-0.0638    ?     2.5981+-0.1117       ! definitely 1.1305x slower
   date-format-tofte                     10.2719+-0.2905    !    10.8794+-0.2106         10.5788+-0.2598       ? might be 1.0299x slower
   date-format-xparb                      8.7886+-0.1802    !     9.7157+-0.2797          9.5360+-0.2017       ! definitely 1.0850x slower
   math-cordic                            6.4300+-0.1255    !     6.8179+-0.0991          6.8140+-0.1386       ! definitely 1.0597x slower
   math-partial-sums                      7.7420+-0.1017          7.6207+-0.1555    ?     7.7644+-0.1373       ?
   math-spectral-norm                     2.5527+-0.0737    ?     2.6611+-0.0927    ?     2.7136+-0.0870       ! definitely 1.0630x slower
   regexp-dna                            10.5863+-0.2935         10.4799+-0.1675         10.3336+-0.1690         might be 1.0245x faster
   string-base64                          6.1758+-0.1848    ?     6.4751+-0.1442    ?     6.4844+-0.2474       ? might be 1.0500x slower
   string-fasta                           7.6886+-0.2365          7.4774+-0.1863          7.4341+-0.1208         might be 1.0342x faster
   string-tagcloud                       12.4531+-0.4789    ?    12.5956+-0.4655         12.5680+-0.3960       ?
   string-unpack-code                    18.8142+-0.2687    ?    19.4565+-0.5111         19.2330+-0.4460       ? might be 1.0223x slower
   string-validate-input                  7.2405+-0.3145          7.0192+-0.2957          6.8666+-0.1451         might be 1.0545x faster

   <arithmetic>                           6.6525+-0.0370    !     7.1386+-0.0312          7.1272+-0.0400       ! definitely 1.0714x slower
   <geometric>                            5.5352+-0.0272    !     5.9295+-0.0265    ?     5.9402+-0.0369       ! definitely 1.0732x slower
   <harmonic>                             4.5247+-0.0275    !     4.8366+-0.0343    ?     4.8590+-0.0477       ! definitely 1.0739x slower

                                            TipOfTree               DynamicOpt             DynamicOpt2          DynamicOpt2 v. TipOfTree 
V8:
   crypto                                92.1294+-0.3719    !    97.0536+-1.8706    ^    83.3741+-0.6893       ^ definitely 1.1050x faster
   deltablue                            143.2966+-1.3033        142.6860+-1.3562    ^   138.8982+-0.6696       ^ definitely 1.0317x faster
   earley-boyer                         100.6071+-0.4585    !   108.4473+-0.9126        108.2658+-1.2111       ! definitely 1.0761x slower
   raytrace                              52.3645+-0.4948    !    55.0936+-0.4014    ?    55.6850+-0.4855       ! definitely 1.0634x slower
   regexp                               110.4607+-0.8697    ?   110.7716+-0.3791    !   112.5372+-1.1640       ! definitely 1.0188x slower
   richards                             247.5590+-1.0163    ^   235.0075+-1.6766        234.1091+-0.7328       ^ definitely 1.0575x faster
   splay                                104.8805+-0.7566    ?   106.2095+-0.5770    ^   104.7761+-0.6962       

   <arithmetic>                         121.6140+-0.2401    ?   122.1813+-0.3841    ^   119.6636+-0.2768       ^ definitely 1.0163x faster
   <geometric>                          110.3617+-0.2388    !   112.5381+-0.4232    ^   109.8188+-0.2973       ^ definitely 1.0049x faster
   <harmonic>                           100.7142+-0.3088    !   103.8665+-0.4832    ^   101.3032+-0.3159       ?

                                            TipOfTree               DynamicOpt             DynamicOpt2          DynamicOpt2 v. TipOfTree 
Kraken:
   ai-astar                            1105.0592+-7.7066    !  1130.2690+-4.0420    !  1147.0835+-6.9637       ! definitely 1.0380x slower
   audio-beat-detection                 480.9968+-2.2409    !   510.7421+-2.2406    ?   516.0047+-4.2511       ! definitely 1.0728x slower
   audio-dft                            443.7356+-16.1293   !   477.5576+-6.2779    ?   479.8115+-6.7655       ! definitely 1.0813x slower
   audio-fft                            378.2326+-4.0596    !   394.4044+-3.0770    !   401.4458+-2.2673       ! definitely 1.0614x slower
   audio-oscillator                     381.0797+-1.3206    ^   355.8970+-2.5926        355.7188+-2.0398       ^ definitely 1.0713x faster
   imaging-darkroom                     540.2286+-3.5820    ?   550.3868+-7.1062    ^   540.4649+-2.3819       ?
   imaging-desaturate                   606.5337+-7.1713    ?   608.6832+-5.2899        606.8448+-1.6518       ?
   imaging-gaussian-blur               1753.3311+-11.1394   !  2310.2974+-5.2848    ?  2320.8847+-16.6530      ! definitely 1.3237x slower
   json-parse-financial                  48.0398+-0.4362    ^    47.3302+-0.2202    !    48.6760+-0.1937       ! definitely 1.0132x slower
   json-stringify-tinderbox              75.2160+-1.0468         74.0179+-0.2819    ?    74.3878+-0.9351         might be 1.0111x faster
   stanford-crypto-aes                  148.1361+-0.7944    !   163.0568+-0.9930        162.7343+-0.9379       ! definitely 1.0985x slower
   stanford-crypto-ccm                  113.2673+-1.2877    !   126.0253+-1.6922    ?   126.2694+-1.5381       ! definitely 1.1148x slower
   stanford-crypto-pbkdf2               343.6412+-1.8972    !   367.0378+-2.6543        363.8759+-2.2831       ! definitely 1.0589x slower
   stanford-crypto-sha256-iterative     133.1163+-0.9564    !   140.6644+-1.3553    ?   143.2696+-1.3289       ! definitely 1.0763x slower

   <arithmetic>                         467.9010+-1.5248    !   518.3121+-1.0232    !   520.5337+-0.8504       ! definitely 1.1125x slower
   <geometric>                          302.6273+-0.8312    !   318.6361+-0.7012    !   320.3008+-0.4731       ! definitely 1.0584x slower
   <harmonic>                           187.3953+-0.5625    !   192.5117+-0.5001    !   194.6711+-0.4082       ! definitely 1.0388x slower

                                            TipOfTree               DynamicOpt             DynamicOpt2          DynamicOpt2 v. TipOfTree 
All benchmarks:
   <arithmetic>                         161.1676+-0.4418    !   176.5371+-0.3106    ?   176.8176+-0.2616       ! definitely 1.0971x slower
   <geometric>                           28.4665+-0.0678    !    30.1160+-0.0719         30.0832+-0.1083       ! definitely 1.0568x slower
   <harmonic>                             7.9791+-0.0471    !     8.5209+-0.0588    ?     8.5580+-0.0820       ! definitely 1.0726x slower
Comment 19 Filip Pizlo 2011-09-02 22:24:40 PDT
> VMs tested:
> "TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
> "DynamicOpt" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc
> "DynamicOpt2" at /Volumes/Data/pizlo/octonary/OpenSource/WebKitBuild/Release/jsc

Clarification: DynamicOpt is without https://bugs.webkit.org/show_bug.cgi?id=67553, while DynamicOpt2 includes it.
Comment 20 Filip Pizlo 2011-09-03 00:57:34 PDT
This now passes tests on Mac with tiering turned on or off, and is performance neutral:


Benchmark report for SunSpider, V8, and Kraken.

VMs tested:
"TipOfTree" at /Volumes/Data/pizlo/quinary/OpenSource/WebKitBuild/Release/jsc
"DynamicOptOff" at /Volumes/Data/pizlo/quartary/OpenSource/WebKitBuild/Release/jsc

Collected 12 samples per benchmark/VM, with 4 VM invocations per benchmark. Used 1 benchmark iteration per VM
invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting
benchmark execution times with 95% confidence intervals in milliseconds.

                                            TipOfTree             DynamicOptOff                                  
SunSpider:
   3d-cube                                7.8026+-0.1319    ?     8.0115+-0.2923       ? might be 1.0268x slower
   3d-morph                               7.7991+-0.1005          7.7622+-0.1117       
   3d-raytrace                            7.5590+-0.1870    ?     7.7149+-0.1663       ? might be 1.0206x slower
   access-binary-trees                    2.2853+-0.0465    ?     2.3733+-0.0971       ? might be 1.0385x slower
   access-fannkuch                       11.9772+-0.1921    ?    12.0523+-0.2001       ?
   access-nbody                           4.2695+-0.0637    ?     4.3550+-0.0838       ? might be 1.0200x slower
   access-nsieve                          2.5857+-0.0623          2.5259+-0.0335         might be 1.0237x faster
   bitops-3bit-bits-in-byte               1.7484+-0.0610          1.7483+-0.0627       
   bitops-bits-in-byte                    4.5852+-0.2422    ?     4.6009+-0.2260       ?
   bitops-bitwise-and                     3.9057+-0.1456    ^     3.6829+-0.0259       ^ definitely 1.0605x faster
   bitops-nsieve-bits                     5.5786+-0.1629          5.5126+-0.1170         might be 1.0120x faster
   controlflow-recursive                  2.0641+-0.0366          2.0464+-0.0442       
   crypto-aes                             6.6893+-0.1335          6.6408+-0.1453       
   crypto-md5                             2.8474+-0.0735          2.8030+-0.0486         might be 1.0158x faster
   crypto-sha1                            2.2725+-0.0441          2.2244+-0.0288         might be 1.0216x faster
   date-format-tofte                     10.4639+-0.3802    ?    10.5192+-0.3720       ?
   date-format-xparb                      8.7383+-0.2265    ?     8.7740+-0.2847       ?
   math-cordic                            6.4347+-0.0758          6.4201+-0.1139       
   math-partial-sums                      7.7395+-0.1153    ?     7.7764+-0.1624       ?
   math-spectral-norm                     2.5587+-0.0453    ?     2.6333+-0.0696       ? might be 1.0292x slower
   regexp-dna                            10.4553+-0.1850    ?    10.4958+-0.3009       ?
   string-base64                          6.0972+-0.1483          6.0229+-0.0849         might be 1.0123x faster
   string-fasta                           7.6165+-0.1736          7.5390+-0.0975         might be 1.0103x faster
   string-tagcloud                       12.1907+-0.3555    ?    12.2377+-0.3733       ?
   string-unpack-code                    18.9763+-0.3913    ?    18.9901+-0.4418       ?
   string-validate-input                  7.3371+-0.2900    ?     7.4599+-0.3495       ? might be 1.0167x slower

   <arithmetic>                           6.6376+-0.0251    ?     6.6509+-0.0409       ?
   <geometric>                            5.5297+-0.0173    ?     5.5308+-0.0351       ?
   <harmonic>                             4.5317+-0.0182          4.5257+-0.0334       

                                            TipOfTree             DynamicOptOff                                  
V8:
   crypto                                92.2392+-0.3275         91.8243+-0.5196       
   deltablue                            142.5229+-0.8982        142.2090+-1.0506       
   earley-boyer                         100.9745+-0.8094    ?   101.6080+-0.3983       ?
   raytrace                              52.8335+-0.3815    ?    53.4131+-0.4945       ? might be 1.0110x slower
   regexp                               111.2927+-0.2813    ^   110.2772+-0.3147       ^ definitely 1.0092x faster
   richards                             247.2330+-0.5759    ?   249.2783+-2.2003       ?
   splay                                105.6152+-0.3719        104.7256+-0.7142       

   <arithmetic>                         121.8159+-0.2410    ?   121.9051+-0.3111       ?
   <geometric>                          110.7040+-0.2602    ?   110.7184+-0.2196       ?
   <harmonic>                           101.1695+-0.2987    ?   101.2719+-0.2621       ?

                                            TipOfTree             DynamicOptOff                                  
Kraken:
   ai-astar                            1114.1983+-9.0038       1113.4397+-6.8018       
   audio-beat-detection                 482.8746+-3.3869        482.4078+-3.1902       
   audio-dft                            438.5291+-5.3478    ?   443.8677+-8.3113       ? might be 1.0122x slower
   audio-fft                            371.6724+-0.6381    !   378.7265+-0.3997       ! definitely 1.0190x slower
   audio-oscillator                     386.6817+-0.6754    ^   381.2110+-0.8942       ^ definitely 1.0144x faster
   imaging-darkroom                     560.0890+-1.2229    ^   539.1056+-1.3278       ^ definitely 1.0389x faster
   imaging-desaturate                   601.8699+-6.3995        600.7752+-6.6430       
   imaging-gaussian-blur               1768.8532+-6.7428    ^  1754.3062+-2.6500       ^ definitely 1.0083x faster
   json-parse-financial                  48.5644+-0.2174    ?    48.5948+-0.3096       ?
   json-stringify-tinderbox              74.0839+-0.5558    ?    75.0791+-0.4754       ? might be 1.0134x slower
   stanford-crypto-aes                  147.1298+-1.2234    ?   148.3481+-0.5196       ?
   stanford-crypto-ccm                  112.6304+-0.2754    !   113.9235+-0.3058       ! definitely 1.0115x slower
   stanford-crypto-pbkdf2               349.9624+-3.3126        347.6330+-1.4456       
   stanford-crypto-sha256-iterative     133.9685+-0.4310    ^   133.0874+-0.4124       ^ definitely 1.0066x faster

   <arithmetic>                         470.7934+-0.5195    ^   468.6075+-0.9538       ^ definitely 1.0047x faster
   <geometric>                          303.5829+-0.5037        303.3028+-0.6546       
   <harmonic>                           187.5942+-0.5320    ?   188.1997+-0.4789       ?

                                            TipOfTree             DynamicOptOff                                  
All benchmarks:
   <arithmetic>                         162.0510+-0.1704    ^   161.4205+-0.2913       ^ definitely 1.0039x faster
   <geometric>                           28.4908+-0.0604         28.4865+-0.1037       
   <harmonic>                             7.9916+-0.0315          7.9817+-0.0575
Comment 21 Filip Pizlo 2011-09-03 01:12:41 PDT
Created attachment 106254 [details]
the patch

This might be the one, if the bots are happy.
Comment 22 Filip Pizlo 2011-09-03 01:19:52 PDT
Created attachment 106255 [details]
the patch - fix conflicts, style
Comment 23 Filip Pizlo 2011-09-03 01:21:30 PDT
Created attachment 106256 [details]
the patch - removed spurious debug nonsense from v8-crypto
Comment 24 Filip Pizlo 2011-09-03 01:43:14 PDT
Comment on attachment 106256 [details]
the patch - removed spurious debug nonsense from v8-crypto

All tests pass after merge.
Comment 25 Filip Pizlo 2011-09-03 02:01:17 PDT
Created attachment 106257 [details]
the patch - fix windows
Comment 26 Filip Pizlo 2011-09-03 02:53:37 PDT
Created attachment 106259 [details]
the patch - another attempt to fix windows
Comment 27 Collabora GTK+ EWS bot 2011-09-03 04:49:19 PDT
Comment on attachment 106259 [details]
the patch - another attempt to fix windows

Attachment 106259 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/9591065
Comment 28 Filip Pizlo 2011-09-03 21:38:58 PDT
Created attachment 106276 [details]
the patch - fix GTK
Comment 29 Gavin Barraclough 2011-09-05 16:21:54 PDT
Comment on attachment 106276 [details]
the patch - fix GTK

View in context: https://bugs.webkit.org/attachment.cgi?id=106276&action=review

r- for a few comments to look at, basically all looks good though.

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1006
> +#if CPU(X86_64)

This should be using 'scratchRegister', not 'X86Registers::r11'.

> Source/JavaScriptCore/bytecode/CodeBlock.h:227
> +        PassOwnPtr<CodeBlock> releaseAlternative() { return m_alternative.release(); }

I think the code could probably be refactored quite easily to remove releaseAlternative(), and I think it would be easier to follow if we did so.

The code currently sets the m_fooBarCodeBlock variables on Executables quite aggressively, and then has to reset this value if compilation fails.  Now this involves switching CodeBlocks & switching back - I think it would be conceptually cleaner to just store the new code block in a local RefPtr, and assign across (releasing from the local) to the member only if compilation succeeds.

> Source/JavaScriptCore/dfg/DFGPropagation.cpp:35
> +class Propagator {

Sorry to be a pain, but this .cpp & .h should really probably be DFGPropagator.cpp/.h.

> Source/JavaScriptCore/dfg/DFGPropagation.cpp:138
> +            setPrediction(makePrediction(PredictInt32, DynamicPrediction));

If I understand this correctly, the value 'DynamicPrediction' seems to be being used here to cover two different types of static predictions.
In the case of bitops & valuetoint32, we can statically infer with absolute certainty that the result is an int32.
In the case of ArithMod & UInt32ToNumber I guess you are assuming the results are highly likely to be int32, and as such want to give a string prediction rather than a weak one, despite this being static & not dynamic?

If so, I'd suggest adding a couple of new enum values, say StrongStaticPrediction & StaticInference.  You could make their values equal to DynamicPrediction such that they will currently be handled the same for now, but give us the option to change how these predictions are merged with each other in the future.

> Source/JavaScriptCore/jit/JITCode.h:52
> +        

It might be more helpful to have a separate type indicating the compilation kind.  Using the JITType here artificially makes a tie between a policy (tiered compilation), and a set of particular mechanisms (BaselineJIT, DFGJIT).  Using the same enum to select may make it more difficult to reconfigure this in the future.  E.g. should we want to use the non-spec path of the DFG-JIT with optimization hooks as our low tier JIT & a speculating compile from the DFG-JIT exploiting value profiling.

In such a case, I guess we can just replace DFGJIT with two new enum entries, & deal with working out in all cases whether DFGJIT checks were checking for top tier or the new compiler.  But we may regret not having an explicit type for the tiers.

> Source/JavaScriptCore/wtf/Platform.h:978
> +#endif

Slightly verbose set of config, & requires two switches setting - could just make this:

#if !defined(ENABLE_VALUE_PROFILER)
#define ENABLE_VALUE_PROFILER ENABLE_TIERED_COMPILATION
#endif
#if !defined(ENABLE_DYNAMIC_OPTIMIZATION)
#define ENABLE_DYNAMIC_OPTIMIZATION ENABLE_TIERED_COMPILATION
#endif

So by default both follow the ENABLE_TIERED_COMPILATION setting.
Comment 30 Filip Pizlo 2011-09-05 17:23:45 PDT
> > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1006
> > +#if CPU(X86_64)
> 
> This should be using 'scratchRegister', not 'X86Registers::r11'.

I wanted to do that, but that won't work. :-(  scratchRegister is in MacroAssemblerX86_64.h.  I also can't move this branchAdd32 overload to the subclasses (MacroAssemblerX86_64 and MacroAssemblerX86) because if you have a subclass overloading something in a superclass, the compiler gets confused.  Basically if I have this particular overload of branchAdd32 in MacroAssemblerX86_64 then the branchAdd32's in MacroAssemblerX86Common get hidden.

So we have a number of choices for workarounds:

1) Keep it as is, and accept this as local ugliness.

2) Move the scratchRegister declaration to MacroAssemblerX86Common.

3) Move all branchAdd32 overloads to the MacroAssemblerX86_64 and MacroAssemblerX86.

4) Rename this particular overload of branchAdd32 and put it in MacroAssemblerX86_64.

Which do you prefer?  I'm starting to like (4) best.

> 
> > Source/JavaScriptCore/bytecode/CodeBlock.h:227
> > +        PassOwnPtr<CodeBlock> releaseAlternative() { return m_alternative.release(); }
> 
> I think the code could probably be refactored quite easily to remove releaseAlternative(), and I think it would be easier to follow if we did so.
> 
> The code currently sets the m_fooBarCodeBlock variables on Executables quite aggressively, and then has to reset this value if compilation fails.  Now this involves switching CodeBlocks & switching back - I think it would be conceptually cleaner to just store the new code block in a local RefPtr, and assign across (releasing from the local) to the member only if compilation succeeds.

+1

> 
> > Source/JavaScriptCore/dfg/DFGPropagation.cpp:35
> > +class Propagator {
> 
> Sorry to be a pain, but this .cpp & .h should really probably be DFGPropagator.cpp/.h.

+1

> 
> > Source/JavaScriptCore/dfg/DFGPropagation.cpp:138
> > +            setPrediction(makePrediction(PredictInt32, DynamicPrediction));
> 
> If I understand this correctly, the value 'DynamicPrediction' seems to be being used here to cover two different types of static predictions.
> In the case of bitops & valuetoint32, we can statically infer with absolute certainty that the result is an int32.
> In the case of ArithMod & UInt32ToNumber I guess you are assuming the results are highly likely to be int32, and as such want to give a string prediction rather than a weak one, despite this being static & not dynamic?
> 
> If so, I'd suggest adding a couple of new enum values, say StrongStaticPrediction & StaticInference.  You could make their values equal to DynamicPrediction such that they will currently be handled the same for now, but give us the option to change how these predictions are merged with each other in the future.

What about just renaming DynamicPrediction to StrongPrediction and StaticPrediction to WeakPrediction?  

> 
> > Source/JavaScriptCore/jit/JITCode.h:52
> > +        
> 
> It might be more helpful to have a separate type indicating the compilation kind.  Using the JITType here artificially makes a tie between a policy (tiered compilation), and a set of particular mechanisms (BaselineJIT, DFGJIT).  Using the same enum to select may make it more difficult to reconfigure this in the future.  E.g. should we want to use the non-spec path of the DFG-JIT with optimization hooks as our low tier JIT & a speculating compile from the DFG-JIT exploiting value profiling.
> 
> In such a case, I guess we can just replace DFGJIT with two new enum entries, & deal with working out in all cases whether DFGJIT checks were checking for top tier or the new compiler.  But we may regret not having an explicit type for the tiers.

There isn't a lot of code that relies on this enum.  So do we really need to future-proof this?

Right now we do indeed have a tie between mechanism and policy, because we're using the BaselineJIT as our bottom tier, and the DFGJIT for the top tier.  So to me it makes sense to have one enum for both policy and mechanism, does it not?

> 
> > Source/JavaScriptCore/wtf/Platform.h:978
> > +#endif
> 
> Slightly verbose set of config, & requires two switches setting - could just make this:
> 
> #if !defined(ENABLE_VALUE_PROFILER)
> #define ENABLE_VALUE_PROFILER ENABLE_TIERED_COMPILATION
> #endif
> #if !defined(ENABLE_DYNAMIC_OPTIMIZATION)
> #define ENABLE_DYNAMIC_OPTIMIZATION ENABLE_TIERED_COMPILATION
> #endif
> 
> So by default both follow the ENABLE_TIERED_COMPILATION setting.

+1
Comment 31 Filip Pizlo 2011-09-05 17:31:39 PDT
> > Source/JavaScriptCore/bytecode/CodeBlock.h:227
> > +        PassOwnPtr<CodeBlock> releaseAlternative() { return m_alternative.release(); }
> 
> I think the code could probably be refactored quite easily to remove releaseAlternative(), and I think it would be easier to follow if we did so.
> 
> The code currently sets the m_fooBarCodeBlock variables on Executables quite aggressively, and then has to reset this value if compilation fails.  Now this involves switching CodeBlocks & switching back - I think it would be conceptually cleaner to just store the new code block in a local RefPtr, and assign across (releasing from the local) to the member only if compilation succeeds.

Just remembered why I did it this way: CodeBlock is not ref counted.  But do we really want to make it ref counted just for this case?
Comment 32 Gavin Barraclough 2011-09-05 17:47:20 PDT
> Just remembered why I did it this way: CodeBlock is not ref counted.  But do we really want to make it ref counted just for this case?

Gah, sorry, meant OwnPtr rather than RefPtr.  I think the suggestion still stands? - code block could be a local OwnPtr, released & assigned to the executable only if compilation is successful.
Comment 33 Filip Pizlo 2011-09-05 17:49:31 PDT
Created attachment 106369 [details]
the patch - partial fix review

For the more subtle parts of the review I proceeded as follows.  For the last two bullets, I kept the code as-is, because AFAIK that's the lesser of the various evils right now.

branchAdd32 should use scratchRegister not r11:
    I renamed it to branchAdd32Absolute and moved it to the subclasses.

DynamicPrediction:
    I renamed DynamicPrediction to StrongPrediction and StaticPrediction to WeakPrediction.

setting m_fooBarCodeBlock before we know that optimizing will succeed:
    I kept this as-is, because CodeBlock is not ref-counted, and the CodeBlock::m_alternative pointer is an OwnPtr that needs to be initialized prior to compiling.  So we need a releaseAlternative() method, and we need to call this method in Executable.cpp in a number of places, if we want to back out of optimization.  We can either (1) make CodeBlock ref-counted just to satisfy this case and get rid of releaseAlternative(), or (2) keep it as-is.

JITType enum:
    I kept this as-is, because currently having two enums (policy and mechanism) would just mean having two enums that say the same thing.
Comment 34 Filip Pizlo 2011-09-05 17:51:26 PDT
(In reply to comment #32)
> > Just remembered why I did it this way: CodeBlock is not ref counted.  But do we really want to make it ref counted just for this case?
> 
> Gah, sorry, meant OwnPtr rather than RefPtr.  I think the suggestion still stands? - code block could be a local OwnPtr, released & assigned to the executable only if compilation is successful.

But then we still need releaseAlternative().

We need CodeBlock::m_alternative to be set prior to invoking optimized compilation, because optimized compilation reads this field.

So if optimized compilation fails, we need releaseAlternative().  I know, it's nasty.  But that leaves us with these choices:

1) Leave it as-is.

2) Make CodeBlock ref-counted to satisfy this case.

3) Don't set m_fooBarCodeBlock, but still release() it, prior to optimized compilation.  That might make the code make a bit more sense, but it wouldn't be too different from what I have right now.
Comment 35 Early Warning System Bot 2011-09-05 18:05:38 PDT
Comment on attachment 106369 [details]
the patch - partial fix review

Attachment 106369 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/9592670
Comment 36 Filip Pizlo 2011-09-05 18:07:39 PDT
Created attachment 106370 [details]
the patch - partial fix review, fix Qt build
Comment 37 Gavin Barraclough 2011-09-05 20:51:13 PDT
> branchAdd32 should use scratchRegister not r11:
>     I renamed it to branchAdd32Absolute and moved it to the subclasses.

You forgot to remove branchAdd32 from MacroAssemblerX86Common.h!
Also branchAdd32Absolute should be named branchAdd32, other AbsoluteAddress'ed methods aren't named in this fashion.

I'm fine with the other changes / review push-back, r+ with MacroAssemblerX86Common::branchAdd32 removed, branchAdd32Absolute renamed to branchAdd32.
Comment 38 Filip Pizlo 2011-09-05 21:09:18 PDT
(In reply to comment #37)
> > branchAdd32 should use scratchRegister not r11:
> >     I renamed it to branchAdd32Absolute and moved it to the subclasses.
> 
> You forgot to remove branchAdd32 from MacroAssemblerX86Common.h!
> Also branchAdd32Absolute should be named branchAdd32, other AbsoluteAddress'ed methods aren't named in this fashion.

If I rename it to branchAdd32() then I get overload hiding madness.  If I just change the name to branchAdd32() but leave it in the subclasses, I get the following errors for uses of the other branchAdd32() overloads:

/Volumes/Data/pizlo/quartary/OpenSource/Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:686:102:{686:34-686:51}{686:102-686:114}: error: too many arguments to function call, expected 3, have 4 [2]
                 speculationCheck(m_jit.branchAdd32(MacroAssembler::Overflow, op2.gpr(), Imm32(imm1), result.gpr()));
                                  ~~~~~~~~~~~~~~~~~                                                   ^~~~~~~~~~~~

Notice that here we're trying to call MacroAssemblerX86Common::branchAdd32(ResultCondition, RegisterID, TrustedImm32, RegisterID).  But the compiler assumes that MacroAssemblerX86_64::branchAdd32(ResultCondition, RegisterID, AbsoluteAddress) is hiding that method, along with the other overloads.  This is because C++ apparently dictates that name resolution is done before overload resolution, so as soon as MacroAssemblerX86_64::branchAdd32 is found, none of the overloads of MacroAssemblerX86Common::branchAdd32are visible.

> I'm fine with the other changes / review push-back, r+ with MacroAssemblerX86Common::branchAdd32 removed, branchAdd32Absolute renamed to branchAdd32.

OK, what would you prefer? :-)  Once again, the options are:

1) Do as I did before.  Move branchAdd32Absolute to MacroAssemblerX86Common, but then I have to use r11 directly instead of via the scratchRegister alias.

2) As with (1) but move the scratchRegister declaration to MacroAssemblerX86Common.

3) Move all branchAdd32 overloads to the MacroAssemblerX86_64 and MacroAssemblerX86.  I don't like this option because that's a lot of code duplication.

4) Rename this particular overload of branchAdd32 to branchAdd32Absolute and put it in MacroAssemblerX86_64, which is what I had done for this patch.

I tend to think that (4) is the lesser of the various evils, though I'm happy with any of the other approaches.  I'm not trying to advocate styles here ... I totally agree that branchAdd32Absolute "should" be called branchAdd32, and that we "should" refer to scratchRegister instead of directly to r11, but we're pushing up against bizarre corner cases of C++ here.
Comment 39 Filip Pizlo 2011-09-05 21:16:27 PDT
Created attachment 106384 [details]
the patch - more fix review, sort of

This removes branchAdd32 from MacroAssembler86Common, but still calls the absolute one branchAdd32Absolute due to C++ overload hiding rules.  I'm happy to use some other strategy to get around the overload hiding ... this was just the one I thought was least evil.
Comment 40 Gavin Barraclough 2011-09-05 23:32:18 PDT
(In reply to comment #39)
> Created an attachment (id=106384) [details]
> the patch - more fix review, sort of
> 
> This removes branchAdd32 from MacroAssembler86Common, but still calls the absolute one branchAdd32Absolute due to C++ overload hiding rules.  I'm happy to use some other strategy to get around the overload hiding ... this was just the one I thought was least evil.

Ah! - I see - you just need to add a using directive in the MacroAssemblerX86/MacroAssemblerX86_64 classes:
    using MacroAssemblerX86Common::branchAdd32;

You should see a bunch of them at these head of these classes already, around lines 45-55 of both files.
Comment 41 Filip Pizlo 2011-09-05 23:40:31 PDT
(In reply to comment #40)
> (In reply to comment #39)
> > Created an attachment (id=106384) [details] [details]
> > the patch - more fix review, sort of
> > 
> > This removes branchAdd32 from MacroAssembler86Common, but still calls the absolute one branchAdd32Absolute due to C++ overload hiding rules.  I'm happy to use some other strategy to get around the overload hiding ... this was just the one I thought was least evil.
> 
> Ah! - I see - you just need to add a using directive in the MacroAssemblerX86/MacroAssemblerX86_64 classes:
>     using MacroAssemblerX86Common::branchAdd32;
> 
> You should see a bunch of them at these head of these classes already, around lines 45-55 of both files.

Ah, that's pretty cool.  I didn't know using was that powerful! :-)  OK, I've fixed this and will commit with this change.
Comment 42 Filip Pizlo 2011-09-05 23:45:00 PDT
Created attachment 106393 [details]
the patch

Will run a final batch of tests and then cq+.
Comment 43 Gavin Barraclough 2011-09-06 00:02:08 PDT
Comment on attachment 106393 [details]
the patch

preemptive r+ pending cq+ on your testing :-)
Comment 44 WebKit Review Bot 2011-09-06 02:23:04 PDT
Comment on attachment 106393 [details]
the patch

Clearing flags on attachment: 106393

Committed r94559: <http://trac.webkit.org/changeset/94559>
Comment 45 WebKit Review Bot 2011-09-06 02:23:13 PDT
All reviewed patches have been landed.  Closing bug.