Bug 134142

Summary: Refactor our current implementation of for-in
Product: WebKit Reporter: Mark Hahnenberg <mhahnenberg>
Component: JavaScriptCoreAssignee: Mark Hahnenberg <mhahnenberg>
Status: RESOLVED FIXED    
Severity: Normal CC: fpizlo, vaag
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 135033, 135034, 135068, 134141, 135066, 135067    
Attachments:
Description Flags
Patch
none
Patch
none
more
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
benchmark results
none
Patch
none
Patch fpizlo: review+

Mark Hahnenberg
Reported 2014-06-20 16:06:55 PDT
The baseline JIT and the LLInt currently do some simple static optimizations in the presence of for-in. We can refactor and expand upon this implementation to add support for iterating indexed properties as well as non-indexed properties for a variety of objects (e.g. vanilla JS objects, JS arrays, and typed arrays) in an efficient fashion that will also make it easier for our higher tier compilers to support these opcodes. for-in iteration currently works something like the following. Assuming I had the following for-in loop: for (var p in object) { <body using object and p> } This loop would be roughly equivalent to the following more explicit loop: var o = object; while (o != null) { for (var i = 0; i < o.length; ++i) { <body using object and i (instead of p)> } var propertyNames = o.getOwnPropertyNames(); for (var i = 0; i < propertyNames.length; ++i) { var p = propertyNames[i]; <body using object and p> } o = o.getPrototype(); } There are a few additional details, like what to do if the object changes as its being iterated and avoiding visiting the same propertyName twice, but this is roughly what should happen for most objects. The current plan is to expand the original loop into something that looks like the second loop by duplicating the loop body and adding some additional bytecodes to support advancing to the next property and making sure the object hasn't changed while we are iterating over it.
Attachments
Patch (187.44 KB, patch)
2014-06-20 16:32 PDT, Mark Hahnenberg
no flags
Patch (72.11 KB, patch)
2014-06-20 17:33 PDT, Mark Hahnenberg
no flags
more (163.47 KB, patch)
2014-06-24 16:46 PDT, Mark Hahnenberg
no flags
Patch (93.67 KB, patch)
2014-07-02 18:40 PDT, Mark Hahnenberg
no flags
Patch (126.36 KB, patch)
2014-07-07 18:30 PDT, Mark Hahnenberg
no flags
Patch (127.48 KB, patch)
2014-07-08 16:06 PDT, Mark Hahnenberg
no flags
Patch (148.50 KB, patch)
2014-07-09 20:43 PDT, Mark Hahnenberg
no flags
Patch (162.26 KB, patch)
2014-07-10 19:24 PDT, Mark Hahnenberg
no flags
Patch (213.21 KB, patch)
2014-07-17 17:46 PDT, Mark Hahnenberg
no flags
Patch (211.20 KB, patch)
2014-07-17 18:45 PDT, Mark Hahnenberg
no flags
Patch (215.60 KB, patch)
2014-07-18 12:35 PDT, Mark Hahnenberg
no flags
benchmark results (44.85 KB, text/plain)
2014-07-18 14:15 PDT, Mark Hahnenberg
no flags
Patch (216.47 KB, patch)
2014-07-18 18:26 PDT, Mark Hahnenberg
no flags
Patch (219.87 KB, patch)
2014-07-23 18:20 PDT, Mark Hahnenberg
fpizlo: review+
Mark Hahnenberg
Comment 1 2014-06-20 16:32:26 PDT
Mark Hahnenberg
Comment 2 2014-06-20 16:32:42 PDT
(In reply to comment #1) > Created an attachment (id=233484) [details] > Patch Work in progress.
Mark Hahnenberg
Comment 3 2014-06-20 17:33:28 PDT
Mark Hahnenberg
Comment 4 2014-06-20 17:36:21 PDT
(In reply to comment #3) > Created an attachment (id=233490) [details] > Patch More work.
Mark Hahnenberg
Comment 5 2014-06-24 16:46:53 PDT
Mark Hahnenberg
Comment 6 2014-07-02 18:40:27 PDT
Mark Hahnenberg
Comment 7 2014-07-02 18:41:19 PDT
(In reply to comment #6) > Created an attachment (id=234305) [details] > Patch More work. I decided to take a simpler approach and just iterate the indexed and Structure properties of the base object, and then leave the remaining properties on the base object and any object in its prototype chain to the fully generic loop.
Mark Hahnenberg
Comment 8 2014-07-07 18:30:31 PDT
Mark Hahnenberg
Comment 9 2014-07-07 18:36:57 PDT
(In reply to comment #8) > Created an attachment (id=234531) [details] > Patch More work. Implemented the DFG part of this initial refactoring. Right now it's still mostly calls to C++ except for HasIterableProperty, which does some inline hole checks for the most common ArrayModes. Everything compiles, passes tests, and I'm able to surf the web in a debug build without crashing. There are still some performance issues. The current bottleneck is ToIndexString, which spends a lot of time converting ints to Identifiers and then allocating JSStrings. We can take a lot of the load off by having get_by_val use the integer local rather than the return value of to_index_string. Then we can do something clever with the to_index_string bytecode to avoid having to do all of the extra conversion and allocation work.
Mark Hahnenberg
Comment 10 2014-07-08 16:06:47 PDT
Mark Hahnenberg
Comment 11 2014-07-08 16:11:32 PDT
(In reply to comment #10) > Created an attachment (id=234604) [details] > Patch More work. Modified the BytecodeGenerator's ForInContext stuff so that we can successfully use the integer index for get_by_vals in the indexed property loop. Next is to make Structure property access fast.
Mark Hahnenberg
Comment 12 2014-07-09 20:43:54 PDT
Mark Hahnenberg
Comment 13 2014-07-09 20:46:27 PDT
(In reply to comment #12) > Created an attachment (id=234685) [details] > Patch More work. Added support for fast loading of Structure properties via the "get_direct_pname" opcode (and corresponding DFG nodes). There are some potential optimizations we could do in the DFG (e.g. avoid the Structure check if we know the Structure isn't clobbered in the body of the loop). Now getting the next property name is the bottleneck, so I'll rework JSPropertyNameEnumerator so that it already has JSStrings pre-allocated and in a more JIT-friendly layout so that we can quickly load the next property name rather than calling out to C++ code during each loop iteration.
Mark Hahnenberg
Comment 14 2014-07-10 19:24:27 PDT
Mark Hahnenberg
Comment 15 2014-07-10 19:31:40 PDT
(In reply to comment #14) > Created an attachment (id=234738) [details] > Patch More work. Refactored JSPropertyNameEnumerator to pre-allocate JSStrings. We now grab the next property from the enumerator in the Structure and generic sub loops. We also now cache generic enumerators when possible. We're ~45% slower than typical idiomatic indexed property iteration (from being 20x slower) and we're faster for Structure property enumeration (exactly how much faster I haven't measured accurately, but a good upper bound would be ~20% faster which is the branch vs. trunk). Keep in mind these are iteration micro benchmarks, and all of this is DFG-only performance. With the FTL we can expect further gains. I think we're in a good enough state w.r.t. performance and stability to start landing this stuff. The only major thing standing in the way now is the 32-bit side of the JIT code.
Mark Hahnenberg
Comment 16 2014-07-17 17:46:31 PDT
Mark Hahnenberg
Comment 17 2014-07-17 17:47:23 PDT
(In reply to comment #16) > Created an attachment (id=235101) [details] > Patch Fixed some erroneous tests, added 32-bit support. All that remains is to clean some stylistic things up for a final review.
Mark Hahnenberg
Comment 18 2014-07-17 18:45:15 PDT
Mark Hahnenberg
Comment 19 2014-07-17 19:17:15 PDT
Comment on attachment 235107 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=235107&action=review > Source/JavaScriptCore/API/JSCallbackObjectFunctions.h:519 > + if (entry->getProperty && (!(entry->attributes & kJSPropertyAttributeDontEnum) || (shouldIncludeDontEnumProperties(mode)))) extra parens > Source/JavaScriptCore/API/JSCallbackObjectFunctions.h:530 > + if (!(entry->attributes & kJSPropertyAttributeDontEnum) || (shouldIncludeDontEnumProperties(mode))) extra parens > Source/JavaScriptCore/bytecompiler/BytecodeGenerator.cpp:2683 > + // Lexically invalidating ForInContexts is kind of weak sauce, but it's sufficiently > + // rare to either (1) re-assign the loop iteration variable in the body of the loop > + // or (2) capture the loop the iteration variable. Because of the rarity of either of > + // these situations, it's okay to poison the remainder of the loop body whenever we encounter them. A better way to state this: Lexically invalidating ForInContexts is kind of weak sauce, but it only occurs if either of the following conditions is true: (1) The loop iteration variable is re-assigned within the body of the loop. (2) The loop iteration variable is captured in the lexical scope of the function. These two situations occur sufficiently rarely that it's okay to use this style of "analysis" to make iteration faster. If we didn't want to do this, we would either have to perform some flow-sensitive analysis to see if/when the loop iteration variable was reassigned, or we'd have to resort to runtime checks to see if the variable had been reassigned from its original value. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp:5534 > + Revert this file. > Source/JavaScriptCore/jit/JITOperations.cpp:1892 > +JSCell* JIT_OPERATION operationGetStructurePropertyEnumerator(ExecState* exec, JSCell* cell, int32_t length) Refactor this into a separate function that can be called from both JITOperations.cpp and CommonSlowPaths.cpp > Source/JavaScriptCore/jit/JITOperations.cpp:1916 > +JSCell* JIT_OPERATION operationGetGenericPropertyEnumerator(ExecState* exec, JSCell* baseCell, int32_t length, JSCell* structureEnumeratorCell) Ditto. > Source/JavaScriptCore/jit/JITOperations.cpp:1936 > + // If we still have the same Structure that we started with, we don't need to revisit the JSObject properties. If we still have the same Structure that we started with, our Structure allows us to access its properties quickly (i.e. the Structure property loop was able to do things), and we iterated the full length of the object (i.e. there are no more own indexed properties that need to be enumerated), then the generic property iteration can skip any properties it would get from the JSObject base class. This turns out to be important for hot loops because most of our time is then dominated by trying to add the own Structure properties to the new generic PropertyNameArray and failing because we've already visited them. > Source/JavaScriptCore/runtime/JSCell.h:49 > enum EnumerationMode { Break this and associated functions out into a separate file.
Mark Hahnenberg
Comment 20 2014-07-18 12:35:04 PDT
Mark Hahnenberg
Comment 21 2014-07-18 14:15:09 PDT
Created attachment 235147 [details] benchmark results ~1.5% improvement on kraken due to a ~30% win on audio-beat-detection a few smaller wins on some of the other kraken benchmarks. Neutral on the other benchmarks we track.
Mark Hahnenberg
Comment 22 2014-07-18 18:26:29 PDT
Filip Pizlo
Comment 23 2014-07-18 18:27:03 PDT
Mmmm, looks juicy. I will review this weekend.
Mark Hahnenberg
Comment 24 2014-07-18 18:27:41 PDT
(In reply to comment #22) > Created an attachment (id=235162) [details] > Patch Just a couple DFG fixes.
Filip Pizlo
Comment 25 2014-07-20 14:53:28 PDT
Comment on attachment 235162 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=235162&action=review I think it's almost there, but I'm not sure about the inconsistencies between clobberize() and AI. > Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:1799 > + case GetEnumerableLength: { > + forNode(node).setType(SpecInt32); > + break; > + } > + case HasGenericProperty: { > + forNode(node).setType(SpecBoolean); > + break; > + } > + case HasStructureProperty: { > + forNode(node).setType(SpecBoolean); > + break; > + } In clobberize(), these have write(World). If that's right then you should be calling clobberWorld() here. > Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:1803 > + case HasIndexedProperty: { > + forNode(node).setType(SpecBoolean); > + break; > + } In clobberize(), this has some cases where you claim to write(World). If it's true that this cal clobber the world, then there should be a call to clobberWorld() here (predicated on the array modes that actually cause such clobberage, of course). > Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:1815 > + case GetDirectPname: { > + forNode(node).makeHeapTop(); > + break; > + } > + case GetStructurePropertyEnumerator: { > + forNode(node).setType(SpecCell); > + break; > + } > + case GetGenericPropertyEnumerator: { > + forNode(node).setType(SpecCell); > + break; > + } See other comments about clobberize() and write(World). > Source/JavaScriptCore/dfg/DFGClobberize.h:158 > + case HasGenericProperty: > + case HasStructureProperty: > + case GetEnumerableLength: > + case GetStructurePropertyEnumerator: > + case GetGenericPropertyEnumerator: > + case GetDirectPname: { > + read(World); > + write(World); > + return; > + } > + If these can write the world, then the AbstractInterpreter needs to know about it. I'm kind of confused as to why these are so impure, though. It seems like they're actually either pure, or at least, that they have a fairly confined set of things they can read. Also, it would be good to have CSE rules for these (i.e. def()'s). > Source/JavaScriptCore/dfg/DFGClobberize.h:163 > + case ToIndexString: { > + read(HeapObjectCount); > + write(HeapObjectCount); > + return; > + } Does this really read+write HeapObjectCount? Strings don't have object identity. It's never necessary to create a second string with identical contents. So, this feels like it's a pure operation. Also, this should probably have def(PureValue(node)) to enable CSE to eliminate redundant index string creations. > Source/JavaScriptCore/dfg/DFGClobberize.h:168 > + case NextEnumeratorPname: { > + read(JSPropertyNameEnumerator_cachedPropertyNames); > + return; > + } Can this ever return something different if called twice with the same arguments? If it is pure then you don't need to claim to read anything. Either way, it feels like this should have a def() of either a PureValue (if it's pure and doesn't read() anything) or a HeapLocation (if it's impure). > Source/JavaScriptCore/dfg/DFGClobberize.h:170 > + case HasIndexedProperty: { This feels like it should def() a HeapLocation of some kind. > Source/JavaScriptCore/dfg/DFGClobberize.h:183 > + read(World); > + write(World); If this can clobber the world, then it should also be clobbering the world in the corresponding AbstractInterpreter code. I somewhat suspect that this doesn't *actually* write the world, though. Note that if you're too lazy to enumerate what an out-of-bounds access does, but you know that it cannot write anything, then it's totally cool to say that it read(World). > Source/JavaScriptCore/dfg/DFGClobberize.h:195 > + read(World); > + write(World); Ditto. > Source/JavaScriptCore/dfg/DFGClobberize.h:207 > + read(World); > + write(World); Ditto. > Source/JavaScriptCore/dfg/DFGClobberize.h:219 > + read(World); > + write(World); Ditto. > Source/JavaScriptCore/dfg/DFGClobberize.h:225 > + read(World); > + write(World); Ditto. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT32_64.cpp:4757 > + case NextEnumeratorPname: { General comment about the name of this node: it has "next" in the name which made it seem to me that this increments some enumerator/iterator. In fact it appears like this is not the case - it only retrieves something from a list. > Source/JavaScriptCore/dfg/DFGSpeculativeJIT64.cpp:4743 > + m_jit.load64(MacroAssembler::BaseIndex(scratch2GPR, scratch1GPR, MacroAssembler::TimesEight, > + -2 * static_cast<int32_t>(sizeof(EncodedJSValue))), resultGPR); I wonder if the math here could be abstracted somehow - maybe something from PropertyOffset.h?
Mark Hahnenberg
Comment 26 2014-07-23 18:20:33 PDT
Filip Pizlo
Comment 27 2014-07-24 16:25:00 PDT
Comment on attachment 235400 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=235400&action=review It would be really cool if you could cover some of the corner cases with stress tests - i.e. tests that would go into JavaScriptCore/tests/stress. I'll let you use your judgement for that. r=me. > Source/JavaScriptCore/dfg/DFGClobberize.h:160 > + read(World); > + write(World); So, this reads+writes world because of the possibility that the structure changed, right? Can you add a comment to that effect?
Mark Hahnenberg
Comment 28 2014-07-24 18:30:22 PDT
(In reply to comment #27) > (From update of attachment 235400 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=235400&action=review > > It would be really cool if you could cover some of the corner cases with stress tests - i.e. tests that would go into JavaScriptCore/tests/stress. I'll let you use your judgement for that. r=me. I wrote some corner-case-y tests while I was implementing various pieces of this, so I can just rework those and include them as stress tests. > > > Source/JavaScriptCore/dfg/DFGClobberize.h:160 > > + read(World); > > + write(World); > > So, this reads+writes world because of the possibility that the structure changed, right? Can you add a comment to that effect? Will do.
Mark Hahnenberg
Comment 29 2014-07-25 11:47:38 PDT
Mark Hahnenberg
Comment 30 2014-07-25 14:28:08 PDT
*** Bug 134141 has been marked as a duplicate of this bug. ***
Joseph Pecoraro
Comment 31 2015-11-12 21:19:08 PST
*** Bug 118127 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.