Bug 139847 - JavaScriptCore deeply nested "call" performance issue
Summary: JavaScriptCore deeply nested "call" performance issue
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Macintosh Intel OS X 10.10
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2014-12-19 16:51 PST by Mike Fikes
Modified: 2017-02-28 17:18 PST (History)
15 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Fikes 2014-12-19 16:51:41 PST
If I define a function

inc=function( x ) { return x + 1; }

and make a deeply composed invocation of it

inc(inc(inc(inc(inc(inc(inc(inc(inc(inc(inc(inc
  (inc(inc(inc(inc(inc(inc(inc(inc(inc(1)))))))))))))))))))))

this will result in the value 22. If I revise the deeply composed expression to instead make use of call, passing in null for this, as

inc.call(null, inc.call(null, inc.call(null, inc.call(null,
  inc.call(null, inc.call(null, inc.call(null, inc.call(null,
    inc.call(null, inc.call(null, inc.call(null, inc.call(null,
      inc.call(null, inc.call(null, inc.call(null, inc.call(null,
        inc.call(null, inc.call(null, inc.call(null, inc.call(null,
          inc.call(null, 1)))))))))))))))))))))

this will also produce the value 22, but on JavaScriptCore this appears to consume O(2^n) memory, where n is the number of nested calls. This can easily exhaust memory; in the case above, evaluating this expression can consume several GB. (This is not the case if I try this JavaScript in Firefox or Chrome.)

What have I done so far includes producing a stack trace for when it fails on iOS, and in that trace, I see a deep call stack (80 to 100 frames) in the JavaScriptCore engine with frames like the following 4 repeated down the stack:

frame #77: 0x00f324dc JavaScriptCore`JSC::CallFunctionCallDotNode::emitBytecode(JSC::BytecodeGenerator&, JSC::RegisterID*) + 1852
frame #78: 0x00f30661 JavaScriptCore`JSC::ArgumentListNode::emitBytecode(JSC::BytecodeGenerator&, JSC::RegisterID*) + 65
frame #79: 0x00bef410 JavaScriptCore`JSC::BytecodeGenerator::emitCall(JSC::OpcodeID, JSC::RegisterID*, JSC::RegisterID*, JSC::ExpectedFunction, JSC::CallArguments&, JSC::JSTextPosition const&, JSC::JSTextPosition const&, JSC::JSTextPosition const&) + 368
frame #80: 0x00bef28d JavaScriptCore`JSC::BytecodeGenerator::emitCall(JSC::RegisterID*, JSC::RegisterID*, JSC::ExpectedFunction, JSC::CallArguments&, JSC::JSTextPosition const&, JSC::JSTextPosition const&, JSC::JSTextPosition const&) + 77

I've also done some experimentation where refactoring a few of the inner calls to temporaries will "truncate" the memory doubling behavior and allow things to succeed

var temp1 = inc.call(null, inc.call(null, inc.call(null, inc.call(null,
              inc.call(null, inc.call(null, inc.call(null, 1)))))));

var temp2 = inc.call(null, inc.call(null, inc.call(null, inc.call(null,
              inc.call(null, inc.call(null, inc.call(null, temp1)))))));

inc.call(null, inc.call(null, inc.call(null, inc.call(null,
  inc.call(null, inc.call(null, inc.call(null, temp2)))))));
  
This makes me thinks that perhaps this is simply a bug with the way JavaScriptCore evaluates deeply nested call invocations.

Steps to Reproduce:
1. Get set up to execute JavaScript in JavaScriptCore. (Go to jsfiddle.net with Safari, for example).
2. Evaluate the "inc" function definition in the Description
3. Evaluate the deeply composed "inc.call" expression in the Description
4. You could wrap the result of the "inc.call" in an alert call. The entire JavaScript is:

inc=function( x ) { return x + 1; }

alert(inc.call(null, inc.call(null, inc.call(null, inc.call(null,
  inc.call(null, inc.call(null, inc.call(null, inc.call(null,
    inc.call(null, inc.call(null, inc.call(null, inc.call(null,
      inc.call(null, inc.call(null, inc.call(null, inc.call(null,
        inc.call(null, inc.call(null, inc.call(null, inc.call(null,
          inc.call(null, 1))))))))))))))))))))))

Expected Results:
This should result in the number 22 being evaluated quickly.

Actual Results:
The number 22 is produced, but only after several seconds transpire. Additionally, if you watch in Activity Monitor, you will see several GB being consumed.

Version:
iOS 7 & 8, Yosemite. On Yosemite: 8.0.2 (10600.2.5)

Notes:


Configuration:
This occurs on Safari on iOS 7, and 8, and on Safari on Yosemite. Haven't tested on Mavericks.
Comment 1 Jérémy Lal 2014-12-21 13:43:28 PST
Confirmed reproduced on Linux debian jessie using webkit2gtk 2.6.2 on epiphany browser. WebKitWebProcess goes 100% CPU and consumes many gigabytes as well.
Comment 2 Radar WebKit Bug Importer 2014-12-21 14:07:18 PST
<rdar://problem/19321122>
Comment 3 Chris Aljoudi 2014-12-21 18:09:04 PST
I've been able to reproduce using run-jsc from the command line (OS X 10.10.2).

The "delay" happens BEFORE any of the functions ever gets to be called. To see this, you can modify the inc function:

inc = function(x) {
    debug(x);
    return x + 1;
};

The observed behavior is: long delay with NO OUTPUT (suggesting the function calls aren't actually 'happening'), then all output is emitted quickly as expected.

Tried using -p with jsc to get a profile, but jsc kept running and it started eating huge amounts of memory (> 16GBs just for the test case running with -p).

I'll investigate the issue some more.
Comment 4 Mike Fikes 2014-12-22 04:50:07 PST
I was able to reproduce this on an older PowerPC Mac running OS X 10.4.11 and Safari 4.1.3 (circa 2010).
Comment 5 Dieter Komendera 2017-02-13 00:51:11 PST
This still reproduces with Safari Technology Preview Release 23 (Safari 10.2, WebKit 12604.1.5)
Comment 6 Saam Barati 2017-02-14 12:03:56 PST
My guess is we're recursively emitting bytecode in a way that has exponential blowup.
Comment 7 Filip Pizlo 2017-02-14 12:06:52 PST
(In reply to comment #6)
> My guess is we're recursively emitting bytecode in a way that has
> exponential blowup.

Yeah that's totally what's happening.  That's hilarious.

We should just have a back-off on that optimization.  Like, maintain a count of how deep in the "doubling" due to .call, .apply, or other jneq_ptr-based opts.  If more than K deep then don't do the optimization.

Probably the most optimal way to do it would be upside down: don't do the optimization if there are more than K people below you in the tree who want to do it, so that the optimization kicks in for the leaves of that gross call tree.
Comment 8 Saam Barati 2017-02-14 12:08:45 PST
(In reply to comment #6)
> My guess is we're recursively emitting bytecode in a way that has
> exponential blowup.

Yup. This is indeed what's happening.
Comment 9 Saam Barati 2017-02-14 12:09:06 PST
(In reply to comment #8)
> (In reply to comment #6)
> > My guess is we're recursively emitting bytecode in a way that has
> > exponential blowup.
> 
> Yup. This is indeed what's happening.

We generated 134217673 bytecodes for this function!
Comment 10 Filip Pizlo 2017-02-14 12:09:41 PST
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #6)
> > > My guess is we're recursively emitting bytecode in a way that has
> > > exponential blowup.
> > 
> > Yup. This is indeed what's happening.
> 
> We generated 134217673 bytecodes for this function!

That's epic!
Comment 11 Saam Barati 2017-02-14 12:10:50 PST
(In reply to comment #7)
> (In reply to comment #6)
> > My guess is we're recursively emitting bytecode in a way that has
> > exponential blowup.
> 
> Yeah that's totally what's happening.  That's hilarious.
> 
> We should just have a back-off on that optimization.  Like, maintain a count
> of how deep in the "doubling" due to .call, .apply, or other jneq_ptr-based
> opts.  If more than K deep then don't do the optimization.
> 
> Probably the most optimal way to do it would be upside down: don't do the
> optimization if there are more than K people below you in the tree who want
> to do it, so that the optimization kicks in for the leaves of that gross
> call tree.

Yeah this sounds reasonable to me.
Comment 12 Dieter Komendera 2017-02-15 05:49:33 PST
For some context, this issue was found here: http://dev.clojure.org/jira/browse/CLJS-910