Bug 139847

Summary:

BytecodeGenerator ".call" and ".apply" is exponential in nesting depth

Product:

WebKit

Reporter:

Mike Fikes <mike>

Component:

JavaScriptCore

Assignee:

Saam Barati <saam>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

benapfiel, buildbot, chris, commit-queue, dieter, fpizlo, ggaren, jfbastien, kapouer, keith_miller, mark.lam, mcatanzaro, Meanjeanaco, mrowe, msaboff, nekohayo, oliver, ryanhaddad, saam, sherbondye, webkit-bug-importer

Priority:

Keywords:

InRadar

Version:

528+ (Nightly build)

Hardware:

Mac (Intel)

OS:

OS X 10.10

Attachments:

Description	Flags
WIP	none
patch	none

Mike Fikes

Reported 2014-12-19 16:51:41 PST

If I define a function inc=function( x ) { return x + 1; } and make a deeply composed invocation of it inc(inc(inc(inc(inc(inc(inc(inc(inc(inc(inc(inc (inc(inc(inc(inc(inc(inc(inc(inc(inc(1))))))))))))))))))))) this will result in the value 22. If I revise the deeply composed expression to instead make use of call, passing in null for this, as inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, 1))))))))))))))))))))) this will also produce the value 22, but on JavaScriptCore this appears to consume O(2^n) memory, where n is the number of nested calls. This can easily exhaust memory; in the case above, evaluating this expression can consume several GB. (This is not the case if I try this JavaScript in Firefox or Chrome.) What have I done so far includes producing a stack trace for when it fails on iOS, and in that trace, I see a deep call stack (80 to 100 frames) in the JavaScriptCore engine with frames like the following 4 repeated down the stack: frame #77: 0x00f324dc JavaScriptCore`JSC::CallFunctionCallDotNode::emitBytecode(JSC::BytecodeGenerator&, JSC::RegisterID*) + 1852 frame #78: 0x00f30661 JavaScriptCore`JSC::ArgumentListNode::emitBytecode(JSC::BytecodeGenerator&, JSC::RegisterID*) + 65 frame #79: 0x00bef410 JavaScriptCore`JSC::BytecodeGenerator::emitCall(JSC::OpcodeID, JSC::RegisterID*, JSC::RegisterID*, JSC::ExpectedFunction, JSC::CallArguments&, JSC::JSTextPosition const&, JSC::JSTextPosition const&, JSC::JSTextPosition const&) + 368 frame #80: 0x00bef28d JavaScriptCore`JSC::BytecodeGenerator::emitCall(JSC::RegisterID*, JSC::RegisterID*, JSC::ExpectedFunction, JSC::CallArguments&, JSC::JSTextPosition const&, JSC::JSTextPosition const&, JSC::JSTextPosition const&) + 77 I've also done some experimentation where refactoring a few of the inner calls to temporaries will "truncate" the memory doubling behavior and allow things to succeed var temp1 = inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, 1))))))); var temp2 = inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, temp1))))))); inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, temp2))))))); This makes me thinks that perhaps this is simply a bug with the way JavaScriptCore evaluates deeply nested call invocations. Steps to Reproduce: 1. Get set up to execute JavaScript in JavaScriptCore. (Go to jsfiddle.net with Safari, for example). 2. Evaluate the "inc" function definition in the Description 3. Evaluate the deeply composed "inc.call" expression in the Description 4. You could wrap the result of the "inc.call" in an alert call. The entire JavaScript is: inc=function( x ) { return x + 1; } alert(inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, inc.call(null, 1)))))))))))))))))))))) Expected Results: This should result in the number 22 being evaluated quickly. Actual Results: The number 22 is produced, but only after several seconds transpire. Additionally, if you watch in Activity Monitor, you will see several GB being consumed. Version: iOS 7 & 8, Yosemite. On Yosemite: 8.0.2 (10600.2.5) Notes: Configuration: This occurs on Safari on iOS 7, and 8, and on Safari on Yosemite. Haven't tested on Mavericks.

Attachments
WIP (20.77 KB, patch) 2017-04-17 17:03 PDT, Saam Barati	no flags	Details Formatted Diff Diff
patch (24.22 KB, patch) 2017-04-17 18:27 PDT, Saam Barati	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Jérémy Lal

Comment 1 2014-12-21 13:43:28 PST

Confirmed reproduced on Linux debian jessie using webkit2gtk 2.6.2 on epiphany browser. WebKitWebProcess goes 100% CPU and consumes many gigabytes as well.

Radar WebKit Bug Importer

Comment 2 2014-12-21 14:07:18 PST

<rdar://problem/19321122>

Chris Aljoudi

Comment 3 2014-12-21 18:09:04 PST

I've been able to reproduce using run-jsc from the command line (OS X 10.10.2). The "delay" happens BEFORE any of the functions ever gets to be called. To see this, you can modify the inc function: inc = function(x) { debug(x); return x + 1; }; The observed behavior is: long delay with NO OUTPUT (suggesting the function calls aren't actually 'happening'), then all output is emitted quickly as expected. Tried using -p with jsc to get a profile, but jsc kept running and it started eating huge amounts of memory (> 16GBs just for the test case running with -p). I'll investigate the issue some more.

Mike Fikes

Comment 4 2014-12-22 04:50:07 PST

I was able to reproduce this on an older PowerPC Mac running OS X 10.4.11 and Safari 4.1.3 (circa 2010).

Dieter Komendera

Comment 5 2017-02-13 00:51:11 PST

This still reproduces with Safari Technology Preview Release 23 (Safari 10.2, WebKit 12604.1.5)

Saam Barati

Comment 6 2017-02-14 12:03:56 PST

My guess is we're recursively emitting bytecode in a way that has exponential blowup.

Filip Pizlo

Comment 7 2017-02-14 12:06:52 PST

(In reply to comment #6) > My guess is we're recursively emitting bytecode in a way that has > exponential blowup. Yeah that's totally what's happening. That's hilarious. We should just have a back-off on that optimization. Like, maintain a count of how deep in the "doubling" due to .call, .apply, or other jneq_ptr-based opts. If more than K deep then don't do the optimization. Probably the most optimal way to do it would be upside down: don't do the optimization if there are more than K people below you in the tree who want to do it, so that the optimization kicks in for the leaves of that gross call tree.

Saam Barati

Comment 8 2017-02-14 12:08:45 PST

(In reply to comment #6) > My guess is we're recursively emitting bytecode in a way that has > exponential blowup. Yup. This is indeed what's happening.

Saam Barati

Comment 9 2017-02-14 12:09:06 PST

(In reply to comment #8) > (In reply to comment #6) > > My guess is we're recursively emitting bytecode in a way that has > > exponential blowup. > > Yup. This is indeed what's happening. We generated 134217673 bytecodes for this function!

Filip Pizlo

Comment 10 2017-02-14 12:09:41 PST

(In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #6) > > > My guess is we're recursively emitting bytecode in a way that has > > > exponential blowup. > > > > Yup. This is indeed what's happening. > > We generated 134217673 bytecodes for this function! That's epic!

Saam Barati

Comment 11 2017-02-14 12:10:50 PST

(In reply to comment #7) > (In reply to comment #6) > > My guess is we're recursively emitting bytecode in a way that has > > exponential blowup. > > Yeah that's totally what's happening. That's hilarious. > > We should just have a back-off on that optimization. Like, maintain a count > of how deep in the "doubling" due to .call, .apply, or other jneq_ptr-based > opts. If more than K deep then don't do the optimization. > > Probably the most optimal way to do it would be upside down: don't do the > optimization if there are more than K people below you in the tree who want > to do it, so that the optimization kicks in for the leaves of that gross > call tree. Yeah this sounds reasonable to me.

Dieter Komendera

Comment 12 2017-02-15 05:49:33 PST

For some context, this issue was found here: http://dev.clojure.org/jira/browse/CLJS-910

Saam Barati

Comment 13 2017-04-17 15:04:54 PDT

patch forthcoming.

Saam Barati

Comment 14 2017-04-17 17:03:33 PDT

Created attachment 307321 [details] WIP needs a changelog

Saam Barati

Comment 15 2017-04-17 18:27:19 PDT

Created attachment 307331 [details] patch

Saam Barati

Comment 16 2017-04-17 19:11:00 PDT

Hmmm, something looks wrong. I ran tests locally where we always force the "slow" path, and some tests are failing. I think this means that our bytecode is already slightly broken, since we emit the slow path alongside a bytecode branch. I'm gonna try to figure out why tests are failing.

Saam Barati

Comment 17 2017-04-17 19:12:03 PDT

(In reply to Saam Barati from comment #16) > Hmmm, something looks wrong. I ran tests locally where we always force the > "slow" path, and some tests are failing. I think this means that our > bytecode is already slightly broken, since we emit the slow path alongside a > bytecode branch. I'm gonna try to figure out why tests are failing. Actually, it's fine. The slow path just emits a different error message.

Oliver Hunt

Comment 18 2017-04-17 20:10:32 PDT

Comment on attachment 307331 [details] patch lgtm

Mark Lam

Comment 19 2017-04-17 22:01:20 PDT

Comment on attachment 307331 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=307331&action=review LGTM too with some suggestions and comments. > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:1206 > + if (m_callOrApplyChildDepth > 4 && emitCallCheck) { The ChangeLog said 6, but you're using a limit of 4 here. Also, should this be >= instead of > because distance of the innermost child from itself is 0, and you wanted to allow up to k levels of calls out from the innermost child, right? Or did I misunderstand your intent? Please also use a symbolic constant here instead of a literal? Maybe define the constant at the top of this file? Alternatively, if this you think this may make a difference for performance tuning, use a new JSC option instead. > Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:1286 > + if (m_callOrApplyChildDepth > 4 && emitCallCheck) { Ditto. > Source/JavaScriptCore/parser/Parser.h:890 > + struct CallOrApplyDepth { I suggest calling this CallOrApplyDepthScope since it is used like a scope. It'll also make the above "callOrApplyDepth ? callOrApplyDepth->maxChildDepth() : 0" line read better as "callOrApplyDepthScope ? callOrApplyDepthScope->maxChildDepth() : 0". I was confused at first when I read "callOrApplyDepth->maxChildDepth()". > Source/JavaScriptCore/parser/Parser.h:895 > + , m_childDepth(m_depth) I suggest renaming this to m_depthOfInnermostChild. The name m_childDepth is a bit ambiguous and can be confused with the depth of my immediate child when this value is actually for tracking the depth of the innermost child. > Source/JavaScriptCore/parser/Parser.h:900 > + size_t maxChildDepth() const I suggest naming this distanceFromInnermostChild(). The name maxChildDepth() is a bit confusing to me because "m_childDepth" (or my preferred "m_depthOfInnermostChild") is an absolute value set from the depth value that each CallOrApplyDepthScope is at, while the value returned by this function is actually a relative value of distance from the innermost child. I suggested "distanceFrom..." which it implies a relative value with the innermost child as the reference. Similarly, I suggest renaming the argument / parameter values to match m_depthOfInnermostChild or distanceFromInnermostChild as appropriate.

WebKit Commit Bot

Comment 20 2017-04-17 23:43:57 PDT

Comment on attachment 307331 [details] patch Clearing flags on attachment: 307331 Committed r215453: <http://trac.webkit.org/changeset/215453>

WebKit Commit Bot

Comment 21 2017-04-17 23:43:59 PDT

All reviewed patches have been landed. Closing bug.

Ryan Haddad

Comment 22 2017-04-18 09:49:43 PDT

The test for this change is failing on debug bots: https://build.webkit.org/builders/Apple%20Sierra%20Debug%20JSC%20%28Tests%29/builds/191 stress/call-apply-exponential-bytecode-size.js.default: Exception: RangeError: Maximum call stack size exceeded. stress/call-apply-exponential-bytecode-size.js.default: eval@[native code] stress/call-apply-exponential-bytecode-size.js.default: randomApplyOrCall@call-apply-exponential-bytecode-size.js:40:16 stress/call-apply-exponential-bytecode-size.js.default: global code@call-apply-exponential-bytecode-size.js:44:25 stress/call-apply-exponential-bytecode-size.js.default: ERROR: Unexpected exit code: 3

Saam Barati

Comment 23 2017-04-18 11:39:55 PDT

(In reply to Ryan Haddad from comment #22) > The test for this change is failing on debug bots: > > https://build.webkit.org/builders/Apple%20Sierra%20Debug%20JSC%20%28Tests%29/ > builds/191 > > stress/call-apply-exponential-bytecode-size.js.default: Exception: > RangeError: Maximum call stack size exceeded. > stress/call-apply-exponential-bytecode-size.js.default: eval@[native code] > stress/call-apply-exponential-bytecode-size.js.default: > randomApplyOrCall@call-apply-exponential-bytecode-size.js:40:16 > stress/call-apply-exponential-bytecode-size.js.default: global > code@call-apply-exponential-bytecode-size.js:44:25 > stress/call-apply-exponential-bytecode-size.js.default: ERROR: Unexpected > exit code: 3 Will fix shortly as I address Mark's comments.

Saam Barati

Comment 24 2017-04-18 12:02:55 PDT

Addressed Mark's comments and hopefully the debug stack overflow in: https://trac.webkit.org/changeset/215474/webkit

Note You need to log in before you can comment on or make changes to this bug.