RESOLVED FIXED Bug 139847
BytecodeGenerator ".call" and ".apply" is exponential in nesting depth
https://bugs.webkit.org/show_bug.cgi?id=139847
Summary BytecodeGenerator ".call" and ".apply" is exponential in nesting depth
Mike Fikes
Reported 2014-12-19 16:51:41 PST
If I define a function inc=function( x ) { return x + 1; } and make a deeply composed invocation of it inc(inc(inc(inc(inc(inc(inc(inc(inc(inc(inc(inc (inc(inc(inc(inc(inc(inc(inc(inc(inc(1))))))))))))))))))))) this will result in the value 22. If I revise the deeply composed expression to instead make use of call, passing in null for this, as inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, 1))))))))))))))))))))) this will also produce the value 22, but on JavaScriptCore this appears to consume O(2^n) memory, where n is the number of nested calls. This can easily exhaust memory; in the case above, evaluating this expression can consume several GB. (This is not the case if I try this JavaScript in Firefox or Chrome.) What have I done so far includes producing a stack trace for when it fails on iOS, and in that trace, I see a deep call stack (80 to 100 frames) in the JavaScriptCore engine with frames like the following 4 repeated down the stack: frame #77: 0x00f324dc JavaScriptCore`JSC::CallFunctionCallDotNode::emitBytecode(JSC::BytecodeGenerator&, JSC::RegisterID*) + 1852 frame #78: 0x00f30661 JavaScriptCore`JSC::ArgumentListNode::emitBytecode(JSC::BytecodeGenerator&, JSC::RegisterID*) + 65 frame #79: 0x00bef410 JavaScriptCore`JSC::BytecodeGenerator::emitCall(JSC::OpcodeID, JSC::RegisterID*, JSC::RegisterID*, JSC::ExpectedFunction, JSC::CallArguments&, JSC::JSTextPosition const&, JSC::JSTextPosition const&, JSC::JSTextPosition const&) + 368 frame #80: 0x00bef28d JavaScriptCore`JSC::BytecodeGenerator::emitCall(JSC::RegisterID*, JSC::RegisterID*, JSC::ExpectedFunction, JSC::CallArguments&, JSC::JSTextPosition const&, JSC::JSTextPosition const&, JSC::JSTextPosition const&) + 77 I've also done some experimentation where refactoring a few of the inner calls to temporaries will "truncate" the memory doubling behavior and allow things to succeed var temp1 = inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, 1))))))); var temp2 = inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, temp1))))))); inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, temp2))))))); This makes me thinks that perhaps this is simply a bug with the way JavaScriptCore evaluates deeply nested call invocations. Steps to Reproduce: 1. Get set up to execute JavaScript in JavaScriptCore. (Go to jsfiddle.net with Safari, for example). 2. Evaluate the "inc" function definition in the Description 3. Evaluate the deeply composed "inc.call" expression in the Description 4. You could wrap the result of the "inc.call" in an alert call. The entire JavaScript is: inc=function( x ) { return x + 1; } alert(inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, 1)))))))))))))))))))))) Expected Results: This should result in the number 22 being evaluated quickly. Actual Results: The number 22 is produced, but only after several seconds transpire. Additionally, if you watch in Activity Monitor, you will see several GB being consumed. Version: iOS 7 & 8, Yosemite. On Yosemite: 8.0.2 (10600.2.5) Notes: Configuration: This occurs on Safari on iOS 7, and 8, and on Safari on Yosemite. Haven't tested on Mavericks.
Attachments
WIP (20.77 KB, patch)
2017-04-17 17:03 PDT, Saam Barati
no flags
patch (24.22 KB, patch)
2017-04-17 18:27 PDT, Saam Barati
no flags
Jérémy Lal
Comment 1 2014-12-21 13:43:28 PST
Confirmed reproduced on Linux debian jessie using webkit2gtk 2.6.2 on epiphany browser. WebKitWebProcess goes 100% CPU and consumes many gigabytes as well.
Radar WebKit Bug Importer
Comment 2 2014-12-21 14:07:18 PST
Chris Aljoudi
Comment 3 2014-12-21 18:09:04 PST
I've been able to reproduce using run-jsc from the command line (OS X 10.10.2). The "delay" happens BEFORE any of the functions ever gets to be called. To see this, you can modify the inc function: inc = function(x) { debug(x); return x + 1; }; The observed behavior is: long delay with NO OUTPUT (suggesting the function calls aren't actually 'happening'), then all output is emitted quickly as expected. Tried using -p with jsc to get a profile, but jsc kept running and it started eating huge amounts of memory (> 16GBs just for the test case running with -p). I'll investigate the issue some more.
Mike Fikes
Comment 4 2014-12-22 04:50:07 PST
I was able to reproduce this on an older PowerPC Mac running OS X 10.4.11 and Safari 4.1.3 (circa 2010).
Dieter Komendera
Comment 5 2017-02-13 00:51:11 PST
This still reproduces with Safari Technology Preview Release 23 (Safari 10.2, WebKit 12604.1.5)
Saam Barati
Comment 6 2017-02-14 12:03:56 PST
My guess is we're recursively emitting bytecode in a way that has exponential blowup.
Filip Pizlo
Comment 7 2017-02-14 12:06:52 PST
(In reply to comment #6) > My guess is we're recursively emitting bytecode in a way that has > exponential blowup. Yeah that's totally what's happening. That's hilarious. We should just have a back-off on that optimization. Like, maintain a count of how deep in the "doubling" due to .call, .apply, or other jneq_ptr-based opts. If more than K deep then don't do the optimization. Probably the most optimal way to do it would be upside down: don't do the optimization if there are more than K people below you in the tree who want to do it, so that the optimization kicks in for the leaves of that gross call tree.
Saam Barati
Comment 8 2017-02-14 12:08:45 PST
(In reply to comment #6) > My guess is we're recursively emitting bytecode in a way that has > exponential blowup. Yup. This is indeed what's happening.
Saam Barati
Comment 9 2017-02-14 12:09:06 PST
(In reply to comment #8) > (In reply to comment #6) > > My guess is we're recursively emitting bytecode in a way that has > > exponential blowup. > > Yup. This is indeed what's happening. We generated 134217673 bytecodes for this function!
Filip Pizlo
Comment 10 2017-02-14 12:09:41 PST
(In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #6) > > > My guess is we're recursively emitting bytecode in a way that has > > > exponential blowup. > > > > Yup. This is indeed what's happening. > > We generated 134217673 bytecodes for this function! That's epic!
Saam Barati
Comment 11 2017-02-14 12:10:50 PST
(In reply to comment #7) > (In reply to comment #6) > > My guess is we're recursively emitting bytecode in a way that has > > exponential blowup. > > Yeah that's totally what's happening. That's hilarious. > > We should just have a back-off on that optimization. Like, maintain a count > of how deep in the "doubling" due to .call, .apply, or other jneq_ptr-based > opts. If more than K deep then don't do the optimization. > > Probably the most optimal way to do it would be upside down: don't do the > optimization if there are more than K people below you in the tree who want > to do it, so that the optimization kicks in for the leaves of that gross > call tree. Yeah this sounds reasonable to me.
Dieter Komendera
Comment 12 2017-02-15 05:49:33 PST
For some context, this issue was found here: http://dev.clojure.org/jira/browse/CLJS-910
Saam Barati
Comment 13 2017-04-17 15:04:54 PDT
patch forthcoming.
Saam Barati
Comment 14 2017-04-17 17:03:33 PDT
Created attachment 307321 [details] WIP needs a changelog
Saam Barati
Comment 15 2017-04-17 18:27:19 PDT
Saam Barati
Comment 16 2017-04-17 19:11:00 PDT
Hmmm, something looks wrong. I ran tests locally where we always force the "slow" path, and some tests are failing. I think this means that our bytecode is already slightly broken, since we emit the slow path alongside a bytecode branch. I'm gonna try to figure out why tests are failing.
Saam Barati
Comment 17 2017-04-17 19:12:03 PDT
(In reply to Saam Barati from comment #16) > Hmmm, something looks wrong. I ran tests locally where we always force the > "slow" path, and some tests are failing. I think this means that our > bytecode is already slightly broken, since we emit the slow path alongside a > bytecode branch. I'm gonna try to figure out why tests are failing. Actually, it's fine. The slow path just emits a different error message.
Oliver Hunt
Comment 18 2017-04-17 20:10:32 PDT
Comment on attachment 307331 [details] patch lgtm
Mark Lam
Comment 19 2017-04-17 22:01:20 PDT
Comment on attachment 307331 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=307331&action=review LGTM too with some suggestions and comments. > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:1206 > + if (m_callOrApplyChildDepth > 4 && emitCallCheck) { The ChangeLog said 6, but you're using a limit of 4 here. Also, should this be >= instead of > because distance of the innermost child from itself is 0, and you wanted to allow up to k levels of calls out from the innermost child, right? Or did I misunderstand your intent? Please also use a symbolic constant here instead of a literal? Maybe define the constant at the top of this file? Alternatively, if this you think this may make a difference for performance tuning, use a new JSC option instead. > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:1286 > + if (m_callOrApplyChildDepth > 4 && emitCallCheck) { Ditto. > Source/JavaScriptCore/parser/Parser.h:890 > + struct CallOrApplyDepth { I suggest calling this CallOrApplyDepthScope since it is used like a scope. It'll also make the above "callOrApplyDepth ? callOrApplyDepth->maxChildDepth() : 0" line read better as "callOrApplyDepthScope ? callOrApplyDepthScope->maxChildDepth() : 0". I was confused at first when I read "callOrApplyDepth->maxChildDepth()". > Source/JavaScriptCore/parser/Parser.h:895 > + , m_childDepth(m_depth) I suggest renaming this to m_depthOfInnermostChild. The name m_childDepth is a bit ambiguous and can be confused with the depth of my immediate child when this value is actually for tracking the depth of the innermost child. > Source/JavaScriptCore/parser/Parser.h:900 > + size_t maxChildDepth() const I suggest naming this distanceFromInnermostChild(). The name maxChildDepth() is a bit confusing to me because "m_childDepth" (or my preferred "m_depthOfInnermostChild") is an absolute value set from the depth value that each CallOrApplyDepthScope is at, while the value returned by this function is actually a relative value of distance from the innermost child. I suggested "distanceFrom..." which it implies a relative value with the innermost child as the reference. Similarly, I suggest renaming the argument / parameter values to match m_depthOfInnermostChild or distanceFromInnermostChild as appropriate.
WebKit Commit Bot
Comment 20 2017-04-17 23:43:57 PDT
Comment on attachment 307331 [details] patch Clearing flags on attachment: 307331 Committed r215453: <http://trac.webkit.org/changeset/215453>
WebKit Commit Bot
Comment 21 2017-04-17 23:43:59 PDT
All reviewed patches have been landed. Closing bug.
Ryan Haddad
Comment 22 2017-04-18 09:49:43 PDT
The test for this change is failing on debug bots: https://build.webkit.org/builders/Apple%20Sierra%20Debug%20JSC%20%28Tests%29/builds/191 stress/call-apply-exponential-bytecode-size.js.default: Exception: RangeError: Maximum call stack size exceeded. stress/call-apply-exponential-bytecode-size.js.default: eval@[native code] stress/call-apply-exponential-bytecode-size.js.default: randomApplyOrCall@call-apply-exponential-bytecode-size.js:40:16 stress/call-apply-exponential-bytecode-size.js.default: global code@call-apply-exponential-bytecode-size.js:44:25 stress/call-apply-exponential-bytecode-size.js.default: ERROR: Unexpected exit code: 3
Saam Barati
Comment 23 2017-04-18 11:39:55 PDT
(In reply to Ryan Haddad from comment #22) > The test for this change is failing on debug bots: > > https://build.webkit.org/builders/Apple%20Sierra%20Debug%20JSC%20%28Tests%29/ > builds/191 > > stress/call-apply-exponential-bytecode-size.js.default: Exception: > RangeError: Maximum call stack size exceeded. > stress/call-apply-exponential-bytecode-size.js.default: eval@[native code] > stress/call-apply-exponential-bytecode-size.js.default: > randomApplyOrCall@call-apply-exponential-bytecode-size.js:40:16 > stress/call-apply-exponential-bytecode-size.js.default: global > code@call-apply-exponential-bytecode-size.js:44:25 > stress/call-apply-exponential-bytecode-size.js.default: ERROR: Unexpected > exit code: 3 Will fix shortly as I address Mark's comments.
Saam Barati
Comment 24 2017-04-18 12:02:55 PDT
Addressed Mark's comments and hopefully the debug stack overflow in: https://trac.webkit.org/changeset/215474/webkit
Note You need to log in before you can comment on or make changes to this bug.