Bug 139847

Summary: BytecodeGenerator ".call" and ".apply" is exponential in nesting depth
Product: WebKit Reporter: Mike Fikes <mike>
Component: JavaScriptCoreAssignee: Saam Barati <sbarati>
Status: RESOLVED FIXED    
Severity: Normal CC: buildbot, chris, commit-queue, dieter, fpizlo, ggaren, jfbastien, kapouer, keith_miller, mark.lam, mcatanzaro, Meanjeanaco, mrowe, msaboff, nekohayo, oliver, ryanhaddad, sbarati, sherbondye, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Macintosh Intel   
OS: OS X 10.10   
Attachments:
Description Flags
WIP
none
patch none

Description Mike Fikes 2014-12-19 16:51:41 PST
If I define a function

inc=function( x ) { return x + 1; }

and make a deeply composed invocation of it

inc(inc(inc(inc(inc(inc(inc(inc(inc(inc(inc(inc
  (inc(inc(inc(inc(inc(inc(inc(inc(inc(1)))))))))))))))))))))

this will result in the value 22. If I revise the deeply composed expression to instead make use of call, passing in null for this, as

inc.call(null, inc.call(null, inc.call(null, inc.call(null,
  inc.call(null, inc.call(null, inc.call(null, inc.call(null,
    inc.call(null, inc.call(null, inc.call(null, inc.call(null,
      inc.call(null, inc.call(null, inc.call(null, inc.call(null,
        inc.call(null, inc.call(null, inc.call(null, inc.call(null,
          inc.call(null, 1)))))))))))))))))))))

this will also produce the value 22, but on JavaScriptCore this appears to consume O(2^n) memory, where n is the number of nested calls. This can easily exhaust memory; in the case above, evaluating this expression can consume several GB. (This is not the case if I try this JavaScript in Firefox or Chrome.)

What have I done so far includes producing a stack trace for when it fails on iOS, and in that trace, I see a deep call stack (80 to 100 frames) in the JavaScriptCore engine with frames like the following 4 repeated down the stack:

frame #77: 0x00f324dc JavaScriptCore`JSC::CallFunctionCallDotNode::emitBytecode(JSC::BytecodeGenerator&, JSC::RegisterID*) + 1852
frame #78: 0x00f30661 JavaScriptCore`JSC::ArgumentListNode::emitBytecode(JSC::BytecodeGenerator&, JSC::RegisterID*) + 65
frame #79: 0x00bef410 JavaScriptCore`JSC::BytecodeGenerator::emitCall(JSC::OpcodeID, JSC::RegisterID*, JSC::RegisterID*, JSC::ExpectedFunction, JSC::CallArguments&, JSC::JSTextPosition const&, JSC::JSTextPosition const&, JSC::JSTextPosition const&) + 368
frame #80: 0x00bef28d JavaScriptCore`JSC::BytecodeGenerator::emitCall(JSC::RegisterID*, JSC::RegisterID*, JSC::ExpectedFunction, JSC::CallArguments&, JSC::JSTextPosition const&, JSC::JSTextPosition const&, JSC::JSTextPosition const&) + 77

I've also done some experimentation where refactoring a few of the inner calls to temporaries will "truncate" the memory doubling behavior and allow things to succeed

var temp1 = inc.call(null, inc.call(null, inc.call(null, inc.call(null,
              inc.call(null, inc.call(null, inc.call(null, 1)))))));

var temp2 = inc.call(null, inc.call(null, inc.call(null, inc.call(null,
              inc.call(null, inc.call(null, inc.call(null, temp1)))))));

inc.call(null, inc.call(null, inc.call(null, inc.call(null,
  inc.call(null, inc.call(null, inc.call(null, temp2)))))));
  
This makes me thinks that perhaps this is simply a bug with the way JavaScriptCore evaluates deeply nested call invocations.

Steps to Reproduce:
1. Get set up to execute JavaScript in JavaScriptCore. (Go to jsfiddle.net with Safari, for example).
2. Evaluate the "inc" function definition in the Description
3. Evaluate the deeply composed "inc.call" expression in the Description
4. You could wrap the result of the "inc.call" in an alert call. The entire JavaScript is:

inc=function( x ) { return x + 1; }

alert(inc.call(null, inc.call(null, inc.call(null, inc.call(null,
  inc.call(null, inc.call(null, inc.call(null, inc.call(null,
    inc.call(null, inc.call(null, inc.call(null, inc.call(null,
      inc.call(null, inc.call(null, inc.call(null, inc.call(null,
        inc.call(null, inc.call(null, inc.call(null, inc.call(null,
          inc.call(null, 1))))))))))))))))))))))

Expected Results:
This should result in the number 22 being evaluated quickly.

Actual Results:
The number 22 is produced, but only after several seconds transpire. Additionally, if you watch in Activity Monitor, you will see several GB being consumed.

Version:
iOS 7 & 8, Yosemite. On Yosemite: 8.0.2 (10600.2.5)

Notes:


Configuration:
This occurs on Safari on iOS 7, and 8, and on Safari on Yosemite. Haven't tested on Mavericks.
Comment 1 Jérémy Lal 2014-12-21 13:43:28 PST
Confirmed reproduced on Linux debian jessie using webkit2gtk 2.6.2 on epiphany browser. WebKitWebProcess goes 100% CPU and consumes many gigabytes as well.
Comment 2 Radar WebKit Bug Importer 2014-12-21 14:07:18 PST
<rdar://problem/19321122>
Comment 3 Chris Aljoudi 2014-12-21 18:09:04 PST
I've been able to reproduce using run-jsc from the command line (OS X 10.10.2).

The "delay" happens BEFORE any of the functions ever gets to be called. To see this, you can modify the inc function:

inc = function(x) {
    debug(x);
    return x + 1;
};

The observed behavior is: long delay with NO OUTPUT (suggesting the function calls aren't actually 'happening'), then all output is emitted quickly as expected.

Tried using -p with jsc to get a profile, but jsc kept running and it started eating huge amounts of memory (> 16GBs just for the test case running with -p).

I'll investigate the issue some more.
Comment 4 Mike Fikes 2014-12-22 04:50:07 PST
I was able to reproduce this on an older PowerPC Mac running OS X 10.4.11 and Safari 4.1.3 (circa 2010).
Comment 5 Dieter Komendera 2017-02-13 00:51:11 PST
This still reproduces with Safari Technology Preview Release 23 (Safari 10.2, WebKit 12604.1.5)
Comment 6 Saam Barati 2017-02-14 12:03:56 PST
My guess is we're recursively emitting bytecode in a way that has exponential blowup.
Comment 7 Filip Pizlo 2017-02-14 12:06:52 PST
(In reply to comment #6)
> My guess is we're recursively emitting bytecode in a way that has
> exponential blowup.

Yeah that's totally what's happening.  That's hilarious.

We should just have a back-off on that optimization.  Like, maintain a count of how deep in the "doubling" due to .call, .apply, or other jneq_ptr-based opts.  If more than K deep then don't do the optimization.

Probably the most optimal way to do it would be upside down: don't do the optimization if there are more than K people below you in the tree who want to do it, so that the optimization kicks in for the leaves of that gross call tree.
Comment 8 Saam Barati 2017-02-14 12:08:45 PST
(In reply to comment #6)
> My guess is we're recursively emitting bytecode in a way that has
> exponential blowup.

Yup. This is indeed what's happening.
Comment 9 Saam Barati 2017-02-14 12:09:06 PST
(In reply to comment #8)
> (In reply to comment #6)
> > My guess is we're recursively emitting bytecode in a way that has
> > exponential blowup.
> 
> Yup. This is indeed what's happening.

We generated 134217673 bytecodes for this function!
Comment 10 Filip Pizlo 2017-02-14 12:09:41 PST
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #6)
> > > My guess is we're recursively emitting bytecode in a way that has
> > > exponential blowup.
> > 
> > Yup. This is indeed what's happening.
> 
> We generated 134217673 bytecodes for this function!

That's epic!
Comment 11 Saam Barati 2017-02-14 12:10:50 PST
(In reply to comment #7)
> (In reply to comment #6)
> > My guess is we're recursively emitting bytecode in a way that has
> > exponential blowup.
> 
> Yeah that's totally what's happening.  That's hilarious.
> 
> We should just have a back-off on that optimization.  Like, maintain a count
> of how deep in the "doubling" due to .call, .apply, or other jneq_ptr-based
> opts.  If more than K deep then don't do the optimization.
> 
> Probably the most optimal way to do it would be upside down: don't do the
> optimization if there are more than K people below you in the tree who want
> to do it, so that the optimization kicks in for the leaves of that gross
> call tree.

Yeah this sounds reasonable to me.
Comment 12 Dieter Komendera 2017-02-15 05:49:33 PST
For some context, this issue was found here: http://dev.clojure.org/jira/browse/CLJS-910
Comment 13 Saam Barati 2017-04-17 15:04:54 PDT
patch forthcoming.
Comment 14 Saam Barati 2017-04-17 17:03:33 PDT
Created attachment 307321 [details]
WIP

needs a changelog
Comment 15 Saam Barati 2017-04-17 18:27:19 PDT
Created attachment 307331 [details]
patch
Comment 16 Saam Barati 2017-04-17 19:11:00 PDT
Hmmm, something looks wrong. I ran tests locally where we always force the "slow" path, and some tests are failing. I think this means that our bytecode is already slightly broken, since we emit the slow path alongside a bytecode branch. I'm gonna try to figure out why tests are failing.
Comment 17 Saam Barati 2017-04-17 19:12:03 PDT
(In reply to Saam Barati from comment #16)
> Hmmm, something looks wrong. I ran tests locally where we always force the
> "slow" path, and some tests are failing. I think this means that our
> bytecode is already slightly broken, since we emit the slow path alongside a
> bytecode branch. I'm gonna try to figure out why tests are failing.

Actually, it's fine. The slow path just emits a different error message.
Comment 18 Oliver Hunt 2017-04-17 20:10:32 PDT
Comment on attachment 307331 [details]
patch

lgtm
Comment 19 Mark Lam 2017-04-17 22:01:20 PDT
Comment on attachment 307331 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307331&action=review

LGTM too with some suggestions and comments.

> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:1206
> +    if (m_callOrApplyChildDepth > 4 && emitCallCheck) {

The ChangeLog said 6, but you're using a limit of 4 here.
Also, should this be >= instead of > because distance of the innermost child from itself is 0, and you wanted to allow up to k levels of calls out from the innermost child, right?  Or did I misunderstand your intent?

Please also use a symbolic constant here instead of a literal?  Maybe define the constant at the top of this file?
Alternatively, if this you think this may make a difference for performance tuning, use a new JSC option instead.

> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:1286
> +    if (m_callOrApplyChildDepth > 4 && emitCallCheck) {

Ditto.

> Source/JavaScriptCore/parser/Parser.h:890
> +    struct CallOrApplyDepth {

I suggest calling this CallOrApplyDepthScope since it is used like a scope.  It'll also make the above "callOrApplyDepth ? callOrApplyDepth->maxChildDepth() : 0" line read better as "callOrApplyDepthScope ? callOrApplyDepthScope->maxChildDepth() : 0".  I was confused at first when I read "callOrApplyDepth->maxChildDepth()".

> Source/JavaScriptCore/parser/Parser.h:895
> +            , m_childDepth(m_depth)

I suggest renaming this to m_depthOfInnermostChild.  The name m_childDepth is a bit ambiguous and can be confused with the depth of my immediate child when this value is actually for tracking the depth of the innermost child.

> Source/JavaScriptCore/parser/Parser.h:900
> +        size_t maxChildDepth() const

I suggest naming this distanceFromInnermostChild().  The name maxChildDepth() is a bit confusing to me because "m_childDepth" (or my preferred "m_depthOfInnermostChild") is an absolute value set from the depth value that each CallOrApplyDepthScope is at, while the value returned by this function is actually a relative value of distance from the innermost child.  I suggested "distanceFrom..." which it implies a relative value with the innermost child as the reference.

Similarly, I suggest renaming the argument / parameter values to match m_depthOfInnermostChild or distanceFromInnermostChild as appropriate.
Comment 20 WebKit Commit Bot 2017-04-17 23:43:57 PDT
Comment on attachment 307331 [details]
patch

Clearing flags on attachment: 307331

Committed r215453: <http://trac.webkit.org/changeset/215453>
Comment 21 WebKit Commit Bot 2017-04-17 23:43:59 PDT
All reviewed patches have been landed.  Closing bug.
Comment 22 Ryan Haddad 2017-04-18 09:49:43 PDT
The test for this change is failing on debug bots: 

https://build.webkit.org/builders/Apple%20Sierra%20Debug%20JSC%20%28Tests%29/builds/191

stress/call-apply-exponential-bytecode-size.js.default: Exception: RangeError: Maximum call stack size exceeded.
stress/call-apply-exponential-bytecode-size.js.default: eval@[native code]
stress/call-apply-exponential-bytecode-size.js.default: randomApplyOrCall@call-apply-exponential-bytecode-size.js:40:16
stress/call-apply-exponential-bytecode-size.js.default: global code@call-apply-exponential-bytecode-size.js:44:25
stress/call-apply-exponential-bytecode-size.js.default: ERROR: Unexpected exit code: 3
Comment 23 Saam Barati 2017-04-18 11:39:55 PDT
(In reply to Ryan Haddad from comment #22)
> The test for this change is failing on debug bots: 
> 
> https://build.webkit.org/builders/Apple%20Sierra%20Debug%20JSC%20%28Tests%29/
> builds/191
> 
> stress/call-apply-exponential-bytecode-size.js.default: Exception:
> RangeError: Maximum call stack size exceeded.
> stress/call-apply-exponential-bytecode-size.js.default: eval@[native code]
> stress/call-apply-exponential-bytecode-size.js.default:
> randomApplyOrCall@call-apply-exponential-bytecode-size.js:40:16
> stress/call-apply-exponential-bytecode-size.js.default: global
> code@call-apply-exponential-bytecode-size.js:44:25
> stress/call-apply-exponential-bytecode-size.js.default: ERROR: Unexpected
> exit code: 3

Will fix shortly as I address Mark's comments.
Comment 24 Saam Barati 2017-04-18 12:02:55 PDT
Addressed Mark's comments and hopefully the debug stack overflow in:
https://trac.webkit.org/changeset/215474/webkit