Bug 153019 - Improved results profiling for div.
Summary: Improved results profiling for div.
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords: InRadar
Depends on: 153046 153080 153091 153379
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-12 09:41 PST by Mark Lam
Modified: 2016-02-16 14:51 PST (History)
7 users (show)

See Also:


Attachments
work in progress for archiving. (23.05 KB, patch)
2016-01-13 11:20 PST, Mark Lam
no flags Details | Formatted Diff | Diff
rebased work in progress. (13.02 KB, patch)
2016-01-13 15:15 PST, Mark Lam
no flags Details | Formatted Diff | Diff
proposed fix. (32.88 KB, patch)
2016-01-13 21:56 PST, Mark Lam
no flags Details | Formatted Diff | Diff
x86_64 benchmark result. (62.73 KB, text/plain)
2016-01-13 21:56 PST, Mark Lam
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2016-01-12 09:41:20 PST
The existing system makes implications from how many times a certain div slower path is taken.  We can collect more accurate result profiling data.
Comment 1 Mark Lam 2016-01-13 11:20:40 PST
Created attachment 268882 [details]
work in progress for archiving.
Comment 2 Mark Lam 2016-01-13 15:15:05 PST
Created attachment 268907 [details]
rebased work in progress.
Comment 3 Mark Lam 2016-01-13 21:56:05 PST
Created attachment 268939 [details]
proposed fix.
Comment 4 Mark Lam 2016-01-13 21:56:47 PST
Created attachment 268940 [details]
x86_64 benchmark result.
Comment 5 Mark Lam 2016-01-13 21:58:56 PST
From the attached x86_64 benchmark results, the following appear to be real (i.e. reproducible on retests):
LongSpider
   crypto-aes                                       772.5283+-4.2715     !    879.6557+-5.7639        ! definitely 1.1387x slower
JSRegress
   adapt-to-double-divide                            19.0715+-0.4259     ^     17.5865+-0.2160        ^ definitely 1.0844x faster
   boolean-test                                       4.4102+-0.0386     ^      3.9437+-0.0649        ^ definitely 1.1183x faster
   function-test                                      4.2060+-0.0576     ^      3.6067+-0.0295        ^ definitely 1.1662x faster
   number-test                                        4.3185+-0.0423     ^      3.9034+-0.0380        ^ definitely 1.1064x faster
   object-test                                        4.1588+-0.0930     ^      3.6375+-0.1198        ^ definitely 1.1433x faster
   string-sub                                        53.1008+-1.3066     ^     49.2135+-1.5713        ^ definitely 1.0790x faster
   string-test                                        4.2141+-0.0252     ^      3.7333+-0.1327        ^ definitely 1.1288x faster
   undefined-test                                     4.4363+-0.0612     ^      3.8700+-0.0653        ^ definitely 1.1463x faster

Details on why crypto-aes is slower to follow after I gather a bit more data to make my case.
Comment 6 Mark Lam 2016-01-13 22:58:05 PST
(In reply to comment #5)
> From the attached x86_64 benchmark results, the following appear to be real
> (i.e. reproducible on retests):
> LongSpider
>    crypto-aes                                       772.5283+-4.2715     !   879.6557+-5.7639        ! definitely 1.1387x slower
> 
> Details on why crypto-aes is slower to follow after I gather a bit more data to make my case.

The baseline reference build has 202 OSR exits.
The patched build has 2322 OSR exits.

Of those OSR exits, about 1179 is due to AESEncryptCtr#Dp5nqr, and 1153 is due to AESDecryptCtr#BpghTe.  Note the count of OSR exits is imprecise due to concurrent JIT'ing.  However, they are always in that ballpark.

In contrast, with the baseline build, there were 202 OSR exits due to AESEncryptCtr#Dp5nqr, and 0 due to AESDecryptCtr#BpghTe.

The AESEncryptCtr#Dp5nqr OSR exits are due to a pre-existing bug in how we use the Int52Rep node.  See https://bugs.webkit.org/show_bug.cgi?id=153091.

The AESDecryptCtr#BpghTe looks like it is due to the same issue manifested in a different way.  Here's the IR:

 435:<!3:loc17>	ArithDiv(DoubleRep:@814<Double>, DoubleRep:@815<Double>, Double|MustGen|PureInt|MayHaveNonIntResult|UseAsInt, Int52, NotSet, Exits, bc#546)
 436:<!0:->	MovHint(DoubleRep:@435<Double>, MustGen, loc24, W:SideState, ClobbersExit, bc#546, ExitInvalid)
 438:< 1:loc18>	JSConstant(JS|PureInt|UseAsInt, Boolint32, Int32: 1, bc#551)
 817:< 1:loc20>	Int52Rep(Check:DoubleRepMachineInt:@435<Double>, Int52|PureInt, Boolint32Nonboolint32Int52, Exits, bc#551)
 818:< 1:loc19>	Int52Constant(Int52|PureInt, Boolint32Nonboolint32Int52, Int32: 1, bc#551)
 439:<!3:loc19>	ArithSub(Int52Rep:@817<Int52>, Int52Rep:@818<Int52>, Int52|MustGen|PureInt|UseAsInt, Int52, CheckOverflow, Exits, bc#551)

I haven't dug deep enough on this one yet, but note that there's also an Int52Rep node used to convert the result of an ArithDiv which produces a double.  This is similar to the AESEncryptCtr#Dp5nqr case.

I'm not sure what the right solution is yet:
1. Should the ArithDiv have coerced its result into an int because it was told that the result will be UseAsInt?
2. Should we have a ForceInt52Rep which will do truncation of fraction parts unlike the Int52Rep node?
Comment 7 Mark Lam 2016-01-22 15:45:30 PST
The fix for https://bugs.webkit.org/show_bug.cgi?id=153379 has mitigated the regression due to OSR exits from AESDecryptCtr#BpghTe.  However, I'm still seeing a significant increase in OSR exits from AESEncryptCtr#Dp5nqr.  Continuing to investigate.
Comment 8 Radar WebKit Bug Importer 2016-02-16 14:44:46 PST
<rdar://problem/24685476>
Comment 9 Geoffrey Garen 2016-02-16 14:51:38 PST
The proposed fix for the OSR exit regression is https://bugs.webkit.org/show_bug.cgi?id=153091.