Bug 70418

Summary: DFG JIT 32_64 - improve double boxing/unboxing
Product: WebKit Reporter: Yuqiang Xian <yuqiang.xian>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, fpizlo, ggaren, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
proposed patch none

Yuqiang Xian
Reported 2011-10-19 07:17:28 PDT
The double boxing/unboxing in DFG JIT 32_64 is currently not implemented efficiently, which tries to exchange data through memory. On X86 some SSE instructions can help us on such operations with better performance, which is also marked as a FIXME in DFG 32_64.
Attachments
proposed patch (15.61 KB, patch)
2011-10-19 07:32 PDT, Yuqiang Xian
no flags
Yuqiang Xian
Comment 1 2011-10-19 07:32:24 PDT
Created attachment 111614 [details] proposed patch
WebKit Review Bot
Comment 2 2011-10-19 07:36:56 PDT
Attachment 111614 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/assembler/X86Assembler.h:1421: movd_rr is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Source/JavaScriptCore/assembler/X86Assembler.h:1491: psllq_i8r is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Source/JavaScriptCore/assembler/X86Assembler.h:1498: psrlq_i8r is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Source/JavaScriptCore/assembler/X86Assembler.h:1505: por_rr is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 4 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Yuqiang Xian
Comment 3 2011-10-19 07:40:25 PDT
Performance result on Linux IA32 (Core i7 Nehalem) : 29% on Kraken, 7% on SunSpider and 2% on V8. (Against the original 32bit DFG w/o this change) And with this fix the 32bit DFG performance advantage over the NON-DFG is - SunSpider: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.066x as fast 231.0ms +/- 0.8% 216.6ms +/- 1.0% significant ============================================================================= 3d: - 33.1ms +/- 3.9% 32.3ms +/- 4.8% cube: ?? 12.0ms +/- 10.5% 12.9ms +/- 11.2% not conclusive: might be *1.075x as slow* morph: - 10.1ms +/- 2.2% 10.0ms +/- 0.0% raytrace: 1.170x as fast 11.0ms +/- 0.0% 9.4ms +/- 3.9% significant access: 1.89x as fast 35.3ms +/- 1.4% 18.7ms +/- 3.6% significant binary-trees: 1.50x as fast 3.0ms +/- 0.0% 2.0ms +/- 0.0% significant fannkuch: 1.85x as fast 17.4ms +/- 2.1% 9.4ms +/- 3.9% significant nbody: 2.48x as fast 9.9ms +/- 2.3% 4.0ms +/- 0.0% significant nsieve: 1.52x as fast 5.0ms +/- 0.0% 3.3ms +/- 10.5% significant bitops: 1.26x as fast 19.3ms +/- 1.8% 15.3ms +/- 3.2% significant 3bit-bits-in-byte: 2.00x as fast 2.0ms +/- 0.0% 1.0ms +/- 0.0% significant bits-in-byte: 1.20x as fast 5.3ms +/- 6.5% 4.4ms +/- 8.4% significant bitwise-and: 1.54x as fast 6.0ms +/- 0.0% 3.9ms +/- 5.8% significant nsieve-bits: - 6.0ms +/- 0.0% 6.0ms +/- 0.0% controlflow: ?? 2.0ms +/- 0.0% 2.1ms +/- 10.8% not conclusive: might be *1.050x as slow* recursive: ?? 2.0ms +/- 0.0% 2.1ms +/- 10.8% not conclusive: might be *1.050x as slow* crypto: *1.026x as slow* 15.6ms +/- 2.4% 16.0ms +/- 0.0% significant aes: - 9.0ms +/- 0.0% 9.0ms +/- 0.0% md5: *1.111x as slow* 3.6ms +/- 10.3% 4.0ms +/- 0.0% significant sha1: - 3.0ms +/- 0.0% 3.0ms +/- 0.0% date: *1.109x as slow* 23.0ms +/- 2.5% 25.5ms +/- 1.5% significant format-tofte: *1.186x as slow* 11.3ms +/- 3.1% 13.4ms +/- 2.8% significant format-xparb: *1.034x as slow* 11.7ms +/- 3.0% 12.1ms +/- 1.9% significant math: 1.044x as fast 26.2ms +/- 1.2% 25.1ms +/- 2.1% significant cordic: *1.073x as slow* 8.2ms +/- 3.7% 8.8ms +/- 3.4% significant partial-sums: ?? 13.0ms +/- 0.0% 13.3ms +/- 2.6% not conclusive: might be *1.023x as slow* spectral-norm: 1.67x as fast 5.0ms +/- 0.0% 3.0ms +/- 0.0% significant regexp: ?? 14.6ms +/- 2.5% 14.9ms +/- 1.5% not conclusive: might be *1.021x as slow* dna: ?? 14.6ms +/- 2.5% 14.9ms +/- 1.5% not conclusive: might be *1.021x as slow* string: *1.078x as slow* 61.9ms +/- 0.9% 66.7ms +/- 0.5% significant base64: - 5.0ms +/- 0.0% 5.0ms +/- 0.0% fasta: *1.111x as slow* 9.0ms +/- 0.0% 10.0ms +/- 0.0% significant tagcloud: *1.032x as slow* 15.5ms +/- 2.4% 16.0ms +/- 0.0% significant unpack-code: *1.130x as slow* 25.4ms +/- 1.5% 28.7ms +/- 1.2% significant validate-input: - 7.0ms +/- 0.0% 7.0ms +/- 0.0% Kraken: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.77x as fast 10101.0ms +/- 0.5% 5712.0ms +/- 0.1% significant ============================================================================= ai: 2.66x as fast 2145.7ms +/- 1.2% 806.0ms +/- 0.6% significant astar: 2.66x as fast 2145.7ms +/- 1.2% 806.0ms +/- 0.6% significant audio: 1.54x as fast 2369.6ms +/- 0.7% 1543.1ms +/- 0.3% significant beat-detection: 1.64x as fast 693.7ms +/- 1.0% 422.2ms +/- 0.5% significant dft: 1.51x as fast 609.4ms +/- 1.6% 403.4ms +/- 0.6% significant fft: 2.12x as fast 533.9ms +/- 0.3% 251.7ms +/- 0.4% significant oscillator: 1.143x as fast 532.6ms +/- 0.2% 465.8ms +/- 0.2% significant imaging: 1.90x as fast 4127.1ms +/- 0.2% 2172.9ms +/- 0.2% significant gaussian-blur: 3.55x as fast 2569.5ms +/- 0.2% 724.2ms +/- 0.2% significant darkroom: 1.179x as fast 640.5ms +/- 0.8% 543.1ms +/- 0.4% significant desaturate: 1.013x as fast 917.1ms +/- 0.2% 905.6ms +/- 0.2% significant json: ?? 194.4ms +/- 0.5% 195.1ms +/- 0.4% not conclusive: might be *1.004x as slow* parse-financial: 1.009x as fast 82.4ms +/- 0.4% 81.7ms +/- 0.6% significant stringify-tinderbox: *1.012x as slow* 112.0ms +/- 0.6% 113.4ms +/- 0.7% significant stanford: 1.27x as fast 1264.2ms +/- 0.6% 994.9ms +/- 0.2% significant crypto-aes: *1.021x as slow* 164.3ms +/- 1.0% 167.8ms +/- 0.4% significant crypto-ccm: *1.027x as slow* 137.7ms +/- 0.6% 141.4ms +/- 0.4% significant crypto-pbkdf2: 1.25x as fast 724.3ms +/- 0.6% 579.9ms +/- 0.4% significant crypto-sha256-iterative: 2.25x as fast 237.9ms +/- 0.4% 105.8ms +/- 0.4% significant V8: w/o DFG - Richards: 3791 DeltaBlue: 3567 Crypto: 4285 RayTrace: 6387 EarleyBoyer: 6394 RegExp: 1827 Splay: 3960 ---- Score (version 6): 4025 w/ DFG - Richards: 4829 DeltaBlue: 4192 Crypto: 10176 RayTrace: 7731 EarleyBoyer: 6336 RegExp: 1756 Splay: 4563 ---- Score (version 6): 5025
Geoffrey Garen
Comment 4 2011-10-19 12:40:54 PDT
This patch is a great improvement. Long term, it seems like it would be even better to teach the DFG JIT not to load into GPRs in the first place when dealing with doubles.
Gavin Barraclough
Comment 5 2011-10-19 12:47:59 PDT
Comment on attachment 111614 [details] proposed patch This is excellent. I think we might be able to do even better in some cases - from fillSpeculateDouble, if a value is currently in memory we should probably be loading directly into fpr rather than going via a register - we could then compare to memory to test the tag, or compare the fpr to itself to test form NaN-encoded values. Still, this improvement is simple awesome!
Yuqiang Xian
Comment 6 2011-10-19 16:05:10 PDT
(In reply to comment #4) > This patch is a great improvement. Long term, it seems like it would be even better to teach the DFG JIT not to load into GPRs in the first place when dealing with doubles. Thanks. Yes, we're also seeking the opportunity to eliminate some of the double boxing/unboxings. We kept some code to let a JSValue able to be represented by a single FPR if we know it's actually a Double at compile time, while it's currently unused as it introduces a bit complex logic in code generation. But we'll revisit this for sure.
Yuqiang Xian
Comment 7 2011-10-19 16:05:57 PDT
(In reply to comment #5) > (From update of attachment 111614 [details]) > This is excellent. I think we might be able to do even better in some cases - from fillSpeculateDouble, if a value is currently in memory we should probably be loading directly into fpr rather than going via a register - we could then compare to memory to test the tag, or compare the fpr to itself to test form NaN-encoded values. > Still, this improvement is simple awesome! Hi Gavin, good point! I think it's something that we can do immediately.
Yuqiang Xian
Comment 8 2011-10-19 17:45:02 PDT
(In reply to comment #7) > (In reply to comment #5) > > (From update of attachment 111614 [details] [details]) > > This is excellent. I think we might be able to do even better in some cases - from fillSpeculateDouble, if a value is currently in memory we should probably be loading directly into fpr rather than going via a register - we could then compare to memory to test the tag, or compare the fpr to itself to test form NaN-encoded values. > > Still, this improvement is simple awesome! > > Hi Gavin, good point! I think it's something that we can do immediately. bug #70460 is created for this.
WebKit Review Bot
Comment 9 2011-10-19 17:47:28 PDT
Comment on attachment 111614 [details] proposed patch Clearing flags on attachment: 111614 Committed r97903: <http://trac.webkit.org/changeset/97903>
WebKit Review Bot
Comment 10 2011-10-19 17:47:33 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.