ASSIGNED 153019
Improved results profiling for div.
https://bugs.webkit.org/show_bug.cgi?id=153019
Summary Improved results profiling for div.
Mark Lam
Reported 2016-01-12 09:41:20 PST
The existing system makes implications from how many times a certain div slower path is taken. We can collect more accurate result profiling data.
Attachments
work in progress for archiving. (23.05 KB, patch)
2016-01-13 11:20 PST, Mark Lam
no flags
rebased work in progress. (13.02 KB, patch)
2016-01-13 15:15 PST, Mark Lam
no flags
proposed fix. (32.88 KB, patch)
2016-01-13 21:56 PST, Mark Lam
no flags
x86_64 benchmark result. (62.73 KB, text/plain)
2016-01-13 21:56 PST, Mark Lam
no flags
Mark Lam
Comment 1 2016-01-13 11:20:40 PST
Created attachment 268882 [details] work in progress for archiving.
Mark Lam
Comment 2 2016-01-13 15:15:05 PST
Created attachment 268907 [details] rebased work in progress.
Mark Lam
Comment 3 2016-01-13 21:56:05 PST
Created attachment 268939 [details] proposed fix.
Mark Lam
Comment 4 2016-01-13 21:56:47 PST
Created attachment 268940 [details] x86_64 benchmark result.
Mark Lam
Comment 5 2016-01-13 21:58:56 PST
From the attached x86_64 benchmark results, the following appear to be real (i.e. reproducible on retests): LongSpider crypto-aes 772.5283+-4.2715 ! 879.6557+-5.7639 ! definitely 1.1387x slower JSRegress adapt-to-double-divide 19.0715+-0.4259 ^ 17.5865+-0.2160 ^ definitely 1.0844x faster boolean-test 4.4102+-0.0386 ^ 3.9437+-0.0649 ^ definitely 1.1183x faster function-test 4.2060+-0.0576 ^ 3.6067+-0.0295 ^ definitely 1.1662x faster number-test 4.3185+-0.0423 ^ 3.9034+-0.0380 ^ definitely 1.1064x faster object-test 4.1588+-0.0930 ^ 3.6375+-0.1198 ^ definitely 1.1433x faster string-sub 53.1008+-1.3066 ^ 49.2135+-1.5713 ^ definitely 1.0790x faster string-test 4.2141+-0.0252 ^ 3.7333+-0.1327 ^ definitely 1.1288x faster undefined-test 4.4363+-0.0612 ^ 3.8700+-0.0653 ^ definitely 1.1463x faster Details on why crypto-aes is slower to follow after I gather a bit more data to make my case.
Mark Lam
Comment 6 2016-01-13 22:58:05 PST
(In reply to comment #5) > From the attached x86_64 benchmark results, the following appear to be real > (i.e. reproducible on retests): > LongSpider > crypto-aes 772.5283+-4.2715 ! 879.6557+-5.7639 ! definitely 1.1387x slower > > Details on why crypto-aes is slower to follow after I gather a bit more data to make my case. The baseline reference build has 202 OSR exits. The patched build has 2322 OSR exits. Of those OSR exits, about 1179 is due to AESEncryptCtr#Dp5nqr, and 1153 is due to AESDecryptCtr#BpghTe. Note the count of OSR exits is imprecise due to concurrent JIT'ing. However, they are always in that ballpark. In contrast, with the baseline build, there were 202 OSR exits due to AESEncryptCtr#Dp5nqr, and 0 due to AESDecryptCtr#BpghTe. The AESEncryptCtr#Dp5nqr OSR exits are due to a pre-existing bug in how we use the Int52Rep node. See https://bugs.webkit.org/show_bug.cgi?id=153091. The AESDecryptCtr#BpghTe looks like it is due to the same issue manifested in a different way. Here's the IR: 435:<!3:loc17> ArithDiv(DoubleRep:@814<Double>, DoubleRep:@815<Double>, Double|MustGen|PureInt|MayHaveNonIntResult|UseAsInt, Int52, NotSet, Exits, bc#546) 436:<!0:-> MovHint(DoubleRep:@435<Double>, MustGen, loc24, W:SideState, ClobbersExit, bc#546, ExitInvalid) 438:< 1:loc18> JSConstant(JS|PureInt|UseAsInt, Boolint32, Int32: 1, bc#551) 817:< 1:loc20> Int52Rep(Check:DoubleRepMachineInt:@435<Double>, Int52|PureInt, Boolint32Nonboolint32Int52, Exits, bc#551) 818:< 1:loc19> Int52Constant(Int52|PureInt, Boolint32Nonboolint32Int52, Int32: 1, bc#551) 439:<!3:loc19> ArithSub(Int52Rep:@817<Int52>, Int52Rep:@818<Int52>, Int52|MustGen|PureInt|UseAsInt, Int52, CheckOverflow, Exits, bc#551) I haven't dug deep enough on this one yet, but note that there's also an Int52Rep node used to convert the result of an ArithDiv which produces a double. This is similar to the AESEncryptCtr#Dp5nqr case. I'm not sure what the right solution is yet: 1. Should the ArithDiv have coerced its result into an int because it was told that the result will be UseAsInt? 2. Should we have a ForceInt52Rep which will do truncation of fraction parts unlike the Int52Rep node?
Mark Lam
Comment 7 2016-01-22 15:45:30 PST
The fix for https://bugs.webkit.org/show_bug.cgi?id=153379 has mitigated the regression due to OSR exits from AESDecryptCtr#BpghTe. However, I'm still seeing a significant increase in OSR exits from AESEncryptCtr#Dp5nqr. Continuing to investigate.
Radar WebKit Bug Importer
Comment 8 2016-02-16 14:44:46 PST
Geoffrey Garen
Comment 9 2016-02-16 14:51:38 PST
The proposed fix for the OSR exit regression is https://bugs.webkit.org/show_bug.cgi?id=153091.
Note You need to log in before you can comment on or make changes to this bug.