Bug 134142

Summary: Refactor our current implementation of for-in
Product: WebKit Reporter: Mark Hahnenberg <mhahnenberg>
Component: JavaScriptCoreAssignee: Mark Hahnenberg <mhahnenberg>
Status: RESOLVED FIXED    
Severity: Normal CC: fpizlo, vaag
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 135033, 135034, 135068, 134141, 135066, 135067    
Attachments:
Description Flags
Patch
none
Patch
none
more
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
benchmark results
none
Patch
none
Patch fpizlo: review+

Description Mark Hahnenberg 2014-06-20 16:06:55 PDT
The baseline JIT and the LLInt currently do some simple static optimizations in the presence of for-in. We can refactor and expand upon this implementation to add support for iterating indexed properties as well as non-indexed properties for a variety of objects (e.g. vanilla JS objects, JS arrays, and typed arrays) in an efficient fashion that will also make it easier for our higher tier compilers to support these opcodes.

for-in iteration currently works something like the following. Assuming I had the following for-in loop:

for (var p in object) {
   <body using object and p>
}

This loop would be roughly equivalent to the following more explicit loop:

var o = object;
while (o != null) {
    for (var i = 0; i < o.length; ++i) {
        <body using object and i (instead of p)>
    }
    var propertyNames = o.getOwnPropertyNames();
    for (var i = 0; i < propertyNames.length; ++i) {
        var p = propertyNames[i];
        <body using object and p>
    }
    o = o.getPrototype();
}

There are a few additional details, like what to do if the object changes as its being iterated and avoiding visiting the same propertyName twice, but this is roughly what should happen for most objects. The current plan is to expand the original loop into something that looks like the second loop by duplicating the loop body and adding some additional bytecodes to support advancing to the next property and making sure the object hasn't changed while we are iterating over it.
Comment 1 Mark Hahnenberg 2014-06-20 16:32:26 PDT
Created attachment 233484 [details]
Patch
Comment 2 Mark Hahnenberg 2014-06-20 16:32:42 PDT
(In reply to comment #1)
> Created an attachment (id=233484) [details]
> Patch

Work in progress.
Comment 3 Mark Hahnenberg 2014-06-20 17:33:28 PDT
Created attachment 233490 [details]
Patch
Comment 4 Mark Hahnenberg 2014-06-20 17:36:21 PDT
(In reply to comment #3)
> Created an attachment (id=233490) [details]
> Patch

More work.
Comment 5 Mark Hahnenberg 2014-06-24 16:46:53 PDT
Created attachment 233765 [details]
more
Comment 6 Mark Hahnenberg 2014-07-02 18:40:27 PDT
Created attachment 234305 [details]
Patch
Comment 7 Mark Hahnenberg 2014-07-02 18:41:19 PDT
(In reply to comment #6)
> Created an attachment (id=234305) [details]
> Patch

More work. I decided to take a simpler approach and just iterate the indexed and Structure properties of the base object, and then leave the remaining properties on the base object and any object in its prototype chain to the fully generic loop.
Comment 8 Mark Hahnenberg 2014-07-07 18:30:31 PDT
Created attachment 234531 [details]
Patch
Comment 9 Mark Hahnenberg 2014-07-07 18:36:57 PDT
(In reply to comment #8)
> Created an attachment (id=234531) [details]
> Patch

More work. Implemented the DFG part of this initial refactoring. Right now it's still mostly calls to C++ except for HasIterableProperty, which does some inline hole checks for the most common ArrayModes. Everything compiles, passes tests, and I'm able to surf the web in a debug build without crashing.

There are still some performance issues. The current bottleneck is ToIndexString, which spends a lot of time converting ints to Identifiers and then allocating JSStrings. We can take a lot of the load off by having get_by_val use the integer local rather than the return value of to_index_string. Then we can do something clever with the to_index_string bytecode to avoid having to do all of the extra conversion and allocation work.
Comment 10 Mark Hahnenberg 2014-07-08 16:06:47 PDT
Created attachment 234604 [details]
Patch
Comment 11 Mark Hahnenberg 2014-07-08 16:11:32 PDT
(In reply to comment #10)
> Created an attachment (id=234604) [details]
> Patch

More work. Modified the BytecodeGenerator's ForInContext stuff so that we can successfully use the integer index for get_by_vals in the indexed property loop. Next is to make Structure property access fast.
Comment 12 Mark Hahnenberg 2014-07-09 20:43:54 PDT
Created attachment 234685 [details]
Patch
Comment 13 Mark Hahnenberg 2014-07-09 20:46:27 PDT
(In reply to comment #12)
> Created an attachment (id=234685) [details]
> Patch

More work. Added support for fast loading of Structure properties via the "get_direct_pname" opcode (and corresponding DFG nodes). There are some potential optimizations we could do in the DFG (e.g. avoid the Structure check if we know the Structure isn't clobbered in the body of the loop). Now getting the next property name is the bottleneck, so I'll rework JSPropertyNameEnumerator so that it already has JSStrings pre-allocated and in a more JIT-friendly layout so that we can quickly load the next property name rather than calling out to C++ code during each loop iteration.
Comment 14 Mark Hahnenberg 2014-07-10 19:24:27 PDT
Created attachment 234738 [details]
Patch
Comment 15 Mark Hahnenberg 2014-07-10 19:31:40 PDT
(In reply to comment #14)
> Created an attachment (id=234738) [details]
> Patch

More work. Refactored JSPropertyNameEnumerator to pre-allocate JSStrings. We now grab the next property from the enumerator in the Structure and generic sub loops. We also now cache generic enumerators when possible. We're ~45% slower than typical idiomatic indexed property iteration (from being 20x slower) and we're faster for Structure property enumeration (exactly how much faster I haven't measured accurately, but a good upper bound would be ~20% faster which is the branch vs. trunk). Keep in mind these are iteration micro benchmarks, and all of this is DFG-only performance. With the FTL we can expect further gains.

I think we're in a good enough state w.r.t. performance and stability to start landing this stuff. The only major thing standing in the way now is the 32-bit side of the JIT code.
Comment 16 Mark Hahnenberg 2014-07-17 17:46:31 PDT
Created attachment 235101 [details]
Patch
Comment 17 Mark Hahnenberg 2014-07-17 17:47:23 PDT
(In reply to comment #16)
> Created an attachment (id=235101) [details]
> Patch

Fixed some erroneous tests, added 32-bit support. All that remains is to clean some stylistic things up for a final review.
Comment 18 Mark Hahnenberg 2014-07-17 18:45:15 PDT
Created attachment 235107 [details]
Patch
Comment 19 Mark Hahnenberg 2014-07-17 19:17:15 PDT
Comment on attachment 235107 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=235107&action=review

> Source/JavaScriptCore/API/JSCallbackObjectFunctions.h:519
> +                if (entry->getProperty && (!(entry->attributes & kJSPropertyAttributeDontEnum) || (shouldIncludeDontEnumProperties(mode))))

extra parens

> Source/JavaScriptCore/API/JSCallbackObjectFunctions.h:530
> +                if (!(entry->attributes & kJSPropertyAttributeDontEnum) || (shouldIncludeDontEnumProperties(mode)))

extra parens

> Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:2683
> +    // Lexically invalidating ForInContexts is kind of weak sauce, but it's sufficiently 
> +    // rare to either (1) re-assign the loop iteration variable in the body of the loop
> +    // or (2) capture the loop the iteration variable. Because of the rarity of either of 
> +    // these situations, it's okay to poison the remainder of the loop body whenever we encounter them.

A better way to state this:

Lexically invalidating ForInContexts is kind of weak sauce, but it only occurs if either of the following conditions is true:
(1) The loop iteration variable is re-assigned within the body of the loop.
(2) The loop iteration variable is captured in the lexical scope of the function.
These two situations occur sufficiently rarely that it's okay to use this style of "analysis" to make iteration faster. If we didn't want to do this, we would either have to perform some flow-sensitive analysis to see if/when the loop iteration variable was reassigned, or we'd have to resort to runtime checks to see if the variable had been reassigned from its original value.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:5534
> +

Revert this file.

> Source/JavaScriptCore/jit/JITOperations.cpp:1892
> +JSCell* JIT_OPERATION operationGetStructurePropertyEnumerator(ExecState* exec, JSCell* cell, int32_t length)

Refactor this into a separate function that can be called from both JITOperations.cpp and CommonSlowPaths.cpp

> Source/JavaScriptCore/jit/JITOperations.cpp:1916
> +JSCell* JIT_OPERATION operationGetGenericPropertyEnumerator(ExecState* exec, JSCell* baseCell, int32_t length, JSCell* structureEnumeratorCell)

Ditto.

> Source/JavaScriptCore/jit/JITOperations.cpp:1936
> +    // If we still have the same Structure that we started with, we don't need to revisit the JSObject properties.

If we still have the same Structure that we started with, our Structure allows us to access its properties quickly (i.e. the Structure property loop was able to do things), and we iterated the full length of the object (i.e. there are no more own indexed properties that need to be enumerated), then the generic property iteration can skip any properties it would get from the JSObject base class. This turns out to be important for hot loops because most of our time is then dominated by trying to add the own Structure properties to the new generic PropertyNameArray and failing because we've already visited them.

> Source/JavaScriptCore/runtime/JSCell.h:49
>  enum EnumerationMode {

Break this and associated functions out into a separate file.
Comment 20 Mark Hahnenberg 2014-07-18 12:35:04 PDT
Created attachment 235139 [details]
Patch
Comment 21 Mark Hahnenberg 2014-07-18 14:15:09 PDT
Created attachment 235147 [details]
benchmark results

~1.5% improvement on kraken due to a ~30% win on audio-beat-detection a few smaller wins on some of the other kraken benchmarks. Neutral on the other benchmarks we track.
Comment 22 Mark Hahnenberg 2014-07-18 18:26:29 PDT
Created attachment 235162 [details]
Patch
Comment 23 Filip Pizlo 2014-07-18 18:27:03 PDT
Mmmm, looks juicy.  I will review this weekend.
Comment 24 Mark Hahnenberg 2014-07-18 18:27:41 PDT
(In reply to comment #22)
> Created an attachment (id=235162) [details]
> Patch

Just a couple DFG fixes.
Comment 25 Filip Pizlo 2014-07-20 14:53:28 PDT
Comment on attachment 235162 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=235162&action=review

I think it's almost there, but I'm not sure about the inconsistencies between clobberize() and AI.

> Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:1799
> +    case GetEnumerableLength: {
> +        forNode(node).setType(SpecInt32);
> +        break;
> +    }
> +    case HasGenericProperty: {
> +        forNode(node).setType(SpecBoolean);
> +        break;
> +    }
> +    case HasStructureProperty: {
> +        forNode(node).setType(SpecBoolean);
> +        break;
> +    }

In clobberize(), these have write(World).  If that's right then you should be calling clobberWorld() here.

> Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:1803
> +    case HasIndexedProperty: {
> +        forNode(node).setType(SpecBoolean);
> +        break;
> +    }

In clobberize(), this has some cases where you claim to write(World).  If it's true that this cal clobber the world, then there should be a call to clobberWorld() here (predicated on the array modes that actually cause such clobberage, of course).

> Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:1815
> +    case GetDirectPname: {
> +        forNode(node).makeHeapTop();
> +        break;
> +    }
> +    case GetStructurePropertyEnumerator: {
> +        forNode(node).setType(SpecCell);
> +        break;
> +    }
> +    case GetGenericPropertyEnumerator: {
> +        forNode(node).setType(SpecCell);
> +        break;
> +    }

See other comments about clobberize() and write(World).

> Source/JavaScriptCore/dfg/DFGClobberize.h:158
> +    case HasGenericProperty:
> +    case HasStructureProperty:
> +    case GetEnumerableLength:
> +    case GetStructurePropertyEnumerator:
> +    case GetGenericPropertyEnumerator:
> +    case GetDirectPname: {
> +        read(World);
> +        write(World);
> +        return;
> +    }
> +

If these can write the world, then the AbstractInterpreter needs to know about it.  I'm kind of confused as to why these are so impure, though.  It seems like they're actually either pure, or at least, that they have a fairly confined set of things they can read.  Also, it would be good to have CSE rules for these (i.e. def()'s).

> Source/JavaScriptCore/dfg/DFGClobberize.h:163
> +    case ToIndexString: {
> +        read(HeapObjectCount);
> +        write(HeapObjectCount);
> +        return;
> +    }

Does this really read+write HeapObjectCount?  Strings don't have object identity.  It's never necessary to create a second string with identical contents.  So, this feels like it's a pure operation.

Also, this should probably have def(PureValue(node)) to enable CSE to eliminate redundant index string creations.

> Source/JavaScriptCore/dfg/DFGClobberize.h:168
> +    case NextEnumeratorPname: {
> +        read(JSPropertyNameEnumerator_cachedPropertyNames);
> +        return;
> +    }

Can this ever return something different if called twice with the same arguments?  If it is pure then you don't need to claim to read anything.

Either way, it feels like this should have a def() of either a PureValue (if it's pure and doesn't read() anything) or a HeapLocation (if it's impure).

> Source/JavaScriptCore/dfg/DFGClobberize.h:170
> +    case HasIndexedProperty: {

This feels like it should def() a HeapLocation of some kind.

> Source/JavaScriptCore/dfg/DFGClobberize.h:183
> +            read(World);
> +            write(World);

If this can clobber the world, then it should also be clobbering the world in the corresponding AbstractInterpreter code.  I somewhat suspect that this doesn't *actually* write the world, though.

Note that if you're too lazy to enumerate what an out-of-bounds access does, but you know that it cannot write anything, then it's totally cool to say that it read(World).

> Source/JavaScriptCore/dfg/DFGClobberize.h:195
> +            read(World);
> +            write(World);

Ditto.

> Source/JavaScriptCore/dfg/DFGClobberize.h:207
> +            read(World);
> +            write(World);

Ditto.

> Source/JavaScriptCore/dfg/DFGClobberize.h:219
> +            read(World);
> +            write(World);

Ditto.

> Source/JavaScriptCore/dfg/DFGClobberize.h:225
> +            read(World);
> +            write(World);

Ditto.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:4757
> +    case NextEnumeratorPname: {

General comment about the name of this node: it has "next" in the name which made it seem to me that this increments some enumerator/iterator.  In fact it appears like this is not the case - it only retrieves something from a list.

> Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4743
> +        m_jit.load64(MacroAssembler::BaseIndex(scratch2GPR, scratch1GPR, MacroAssembler::TimesEight, 
> +            -2 * static_cast<int32_t>(sizeof(EncodedJSValue))), resultGPR);

I wonder if the math here could be abstracted somehow - maybe something from PropertyOffset.h?
Comment 26 Mark Hahnenberg 2014-07-23 18:20:33 PDT
Created attachment 235400 [details]
Patch
Comment 27 Filip Pizlo 2014-07-24 16:25:00 PDT
Comment on attachment 235400 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=235400&action=review

It would be really cool if you could cover some of the corner cases with stress tests - i.e. tests that would go into JavaScriptCore/tests/stress.  I'll let you use your judgement for that.  r=me.

> Source/JavaScriptCore/dfg/DFGClobberize.h:160
> +        read(World);
> +        write(World);

So, this reads+writes world because of the possibility that the structure changed, right?  Can you add a comment to that effect?
Comment 28 Mark Hahnenberg 2014-07-24 18:30:22 PDT
(In reply to comment #27)
> (From update of attachment 235400 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=235400&action=review
> 
> It would be really cool if you could cover some of the corner cases with stress tests - i.e. tests that would go into JavaScriptCore/tests/stress.  I'll let you use your judgement for that.  r=me.

I wrote some corner-case-y tests while I was implementing various pieces of this, so I can just rework those and include them as stress tests.

> 
> > Source/JavaScriptCore/dfg/DFGClobberize.h:160
> > +        read(World);
> > +        write(World);
> 
> So, this reads+writes world because of the possibility that the structure changed, right?  Can you add a comment to that effect?

Will do.
Comment 29 Mark Hahnenberg 2014-07-25 11:47:38 PDT
Fixed in http://trac.webkit.org/changeset/171605
Comment 30 Mark Hahnenberg 2014-07-25 14:28:08 PDT
*** Bug 134141 has been marked as a duplicate of this bug. ***
Comment 31 Joseph Pecoraro 2015-11-12 21:19:08 PST
*** Bug 118127 has been marked as a duplicate of this bug. ***