Summary: | Move some profiling to UnlinkedCodeBlock | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Saam Barati <saam> | ||||||||||||||||||||||||||||||||||
Component: | JavaScriptCore | Assignee: | Saam Barati <saam> | ||||||||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||||||||||
Severity: | Normal | CC: | commit-queue, ews-watchlist, keith_miller, mark.lam, msaboff, tzagallo, webkit-bug-importer, xan.lopez, ysuzuki | ||||||||||||||||||||||||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||||||||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||||||||||
Bug Depends on: | 230358 | ||||||||||||||||||||||||||||||||||||
Bug Blocks: | 230316 | ||||||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Saam Barati
2021-09-08 20:20:15 PDT
Created attachment 437766 [details]
WIP
Created attachment 437774 [details]
patch
Created attachment 437778 [details]
patch
Created attachment 437787 [details]
patch
Created attachment 437788 [details]
patch
Created attachment 437790 [details]
patch
Created attachment 437791 [details]
patch
Comment on attachment 437778 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=437778&action=review r=me. I hope we will deploy 32bit pointers so that we do not have bunch of pointer width things in metadata. > Source/JavaScriptCore/bytecode/BytecodeList.rb:1069 > + profile: ValueProfile.*, indentation looks wrong. > Source/JavaScriptCore/bytecode/BytecodeList.rb:1257 > iterationMetadata: IterationModeMetadata, > - iterableProfile: ValueProfile, > + iterableProfile: ValueProfile.*, > callLinkInfo: LLIntCallLinkInfo, > - iteratorProfile: ValueProfile, > + iteratorProfile: ValueProfile.*, > modeMetadata: GetByIdModeMetadata, > - nextProfile: ValueProfile, > + nextProfile: ValueProfile.*, We should order these fields to make metadata small. > Source/JavaScriptCore/bytecode/BytecodeList.rb:1282 > iterationMetadata: IterationModeMetadata, > - iterableProfile: ArrayProfile, > + iterableProfile: ArrayProfile.*, > callLinkInfo: LLIntCallLinkInfo, > - nextResultProfile: ValueProfile, > + nextResultProfile: ValueProfile.*, > doneModeMetadata: GetByIdModeMetadata, > - doneProfile: ValueProfile, > + doneProfile: ValueProfile.*, > valueModeMetadata: GetByIdModeMetadata, > - valueProfile: ValueProfile, > + valueProfile: ValueProfile.*, Ditto. > Source/JavaScriptCore/bytecode/CodeBlock.cpp:782 > + unsigned size = numberOfArgumentValueProfiles(); > + if (m_metadata) { > +#define COUNT(__op) \ > + size += m_metadata->numEntries<__op>(); > + FOR_EACH_OPCODE_WITH_VALUE_PROFILE(COUNT) > +#undef COUNT > + size += m_metadata->numEntries<OpIteratorOpen>() * 3; > + size += m_metadata->numEntries<OpIteratorNext>() * 3; > + } > + > + RELEASE_ASSERT(size); > + > + auto& valueProfiles = unlinkedCodeBlock->ensureValueProfiles(size); Can we allocate it apriori when creating UnlinkedCodeBlock? The size won't be changed based on CodeBlock, so allocating that when UnlinkedCodeBlock is created is simpler. > Source/JavaScriptCore/bytecode/CodeBlock.cpp:834 > + unsigned size = 0; > + > + if (m_metadata) { > +#define COUNT(__op) \ > + size += m_metadata->numEntries<__op>(); > + FOR_EACH_OPCODE_WITH_ARRAY_PROFILE(COUNT) > +#undef COUNT > + size += m_metadata->numEntries<OpIteratorNext>(); > + } > + > + if (size) { > + auto& arrayProfiles = unlinkedCodeBlock->ensureArrayProfiles(size); > + unsigned index = 0; Ditto. Comment on attachment 437778 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=437778&action=review >> Source/JavaScriptCore/bytecode/BytecodeList.rb:1069 >> + profile: ValueProfile.*, > > indentation looks wrong. will fix. >> Source/JavaScriptCore/bytecode/BytecodeList.rb:1257 >> + nextProfile: ValueProfile.*, > > We should order these fields to make metadata small. Will do. >> Source/JavaScriptCore/bytecode/CodeBlock.cpp:782 >> + auto& valueProfiles = unlinkedCodeBlock->ensureValueProfiles(size); > > Can we allocate it apriori when creating UnlinkedCodeBlock? The size won't be changed based on CodeBlock, so allocating that when UnlinkedCodeBlock is created is simpler. yeah, we can put this in `UnlinkedCodeBlockGenerator::finalize` per your suggestion on Slack. Regarding the size of pointers. In the future, we could make these pointer fields an index into the profile vectors. (In reply to Saam Barati from comment #10) > Regarding the size of pointers. In the future, we could make these pointer > fields an index into the profile vectors. That way, it'd live in the bytecode instead of the metadata, and it'd be smaller. Created attachment 437796 [details]
patch for landing
I believe I found the crashers out. The issue is like this: Say we have an UnlinkedCodeBlock UCB and two CodeBlocks that point to it, CB1 and CB2. Let's say CB1 is in old space, and CB2 is in new space. CB1 doesn't run. But CB2 runs, and does some profiling, then dies. Now CB1 points to some things. It's not dead, but it also wasn't visited. Therefore we won't finalize it. So I'll have to adjust our finalization algorithm to finalize all live code blocks, not just all visited in this GC cycle code blocks. Created attachment 437938 [details]
WIP
This version fixes the GC issues, I believe (EWS will tell us). But it seems just a touch slower than the other version (with the GC bugs).
I'm uploading this here while I try out a totally new version of a similar. I have an idea of a different way to share profiling, and see if it'll be a bigger win than we have now.
To resolve the GC issue, I had to teach the code to finalize UCBs that "ran" as CodeBlocks ran. I think this finalization costs us maybe 0.25% or so on Speedometer.
We still have a win overall though, but it's around 0.5-1% instead of a solid 1%. So we'll take this if the other version doesn't work out. But let's see if we can get back to a solid 1%
Created attachment 438051 [details]
WIP (alternate implementation)
Ok, the original implementation seems more performant than the simple implementation, unfortunately. But we'll go with what performs best. Created attachment 438082 [details]
Patch
Comment on attachment 438082 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=438082&action=review > Source/JavaScriptCore/bytecode/ArrayProfile.cpp:170 > + JSGlobalObject* globalObject = lastSeenStructure->globalObject(); > + if (lexicalGlobalObject && globalObject && globalObject != lexicalGlobalObject) > + m_observedDifferentGlobalObject = true; > + if (globalObject > + && !globalObject->isOriginalArrayStructure(lastSeenStructure) Hmm, I have a question. Two CodeBlocks ("A" and "B") from different JSGlobalObject can use the same UnlinkedCodeBlock at the same time. And in that case, it is possible that 1. "A" stores StructureID and JSGlobalObject to ArrayProfile 2. Getting prediction from "B". Then, "B" taints this profile as, m_observedDifferentGlobalObject = true m_usesOriginalArrayStructures = false But this is not correct since "A" and "B" are using its own JSGlobalObject and using their original Arrays, correct? And tainting this value can cause an effect on the result of DFG since it makes CheckStructure to CheckArray. Comment on attachment 438082 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=438082&action=review >> Source/JavaScriptCore/bytecode/ArrayProfile.cpp:170 >> + && !globalObject->isOriginalArrayStructure(lastSeenStructure) > > Hmm, I have a question. > > Two CodeBlocks ("A" and "B") from different JSGlobalObject can use the same UnlinkedCodeBlock at the same time. And in that case, it is possible that > > 1. "A" stores StructureID and JSGlobalObject to ArrayProfile > 2. Getting prediction from "B". > > Then, "B" taints this profile as, > > m_observedDifferentGlobalObject = true > m_usesOriginalArrayStructures = false > > But this is not correct since "A" and "B" are using its own JSGlobalObject and using their original Arrays, correct? > > And tainting this value can cause an effect on the result of DFG since it makes CheckStructure to CheckArray. Discussed with Saam, I missed that we are using globalObject from Structure instead of CodeBlock. So when setting m_usesOriginalArrayStructures, we don't care about Realm. It is cared in DFG side. Comment on attachment 438082 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=438082&action=review r=me with one issue. > Source/JavaScriptCore/runtime/JSScope.cpp:84 > + catchScope.releaseAssertNoException(); Unfortunately, I think we can have TerminatedException. One of the failures EWS claims this introduces is: Starting program: /home/igalia/xlopez/WebKit/WebKitBuild/Debug/bin/jsc --forceMiniVMMode=true -f JSTests/stress/has-own-property-name-cache-symbols-and-strings.js [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". [New Thread 0xf3fc0440 (LWP 3478)] Thread 1 "jsc" received signal SIGABRT, Aborted. __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47 47 ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory. (gdb) bt #0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47 #1 0xf5e75ea0 in __libc_signal_restore_set (set=0xfffed324) at ../sysdeps/unix/sysv/linux/internal-signals.h:86 #2 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:48 #3 0xf5e667a2 in __GI_abort () at abort.c:79 #4 0xf6cc46f8 in JSC::ScratchRegisterAllocator::allocateScratch<JSC::GPRInfo> (this=0xfffed648) at ../../Source/JavaScriptCore/jit/ScratchRegisterAllocator.cpp:97 #5 0xf6cbb558 in JSC::ScratchRegisterAllocator::allocateScratchGPR (this=0xfffed648) at ../../Source/JavaScriptCore/jit/ScratchRegisterAllocator.cpp:102 #6 0xf62d9948 in JSC::AccessCase::generateImpl (this=0xf3699460, state=...) at ../../Source/JavaScriptCore/bytecode/AccessCase.cpp:2276 #7 0xf62d78b2 in JSC::AccessCase::generateWithGuard (this=0xf3699460, state=..., fallThrough=...) at ../../Source/JavaScriptCore/bytecode/AccessCase.cpp:1755 #8 0xf63b9862 in JSC::PolymorphicAccess::regenerate (this=0xf369cf80, locker=..., vm=..., globalObject=0xf20f9038, codeBlock=0xf1eac000, ecmaMode=..., stubInfo=...) at ../../Source/JavaScriptCore/bytecode/PolymorphicAccess.cpp:672 #9 0xf63e1720 in operator() (__closure=0xfffeec9c) at ../../Source/JavaScriptCore/bytecode/StructureStubInfo.cpp:221 #10 0xf63e187a in JSC::StructureStubInfo::addAccessCase (this=0xf36f69a0, locker=..., globalObject=0xf20f9038, codeBlock=0xf1eac000, ecmaMode=..., ident=..., accessCase=...) at ../../Source/JavaScriptCore/bytecode/StructureStubInfo.cpp:245 #11 0xf6cb6b96 in JSC::tryCachePutBy (globalObject=0xf20f9038, codeBlock=0xf1eac000, baseValue=..., oldStructure=0xf1efb840, propertyName=..., slot=..., stubInfo=..., putByKind=JSC::PutByKind::ByVal, putKind=JSC::PutKind::NotDirect) at ../../Source/JavaScriptCore/jit/Repatch.cpp:877 #12 0xf6cb6d56 in JSC::repatchPutBy (globalObject=0xf20f9038, codeBlock=0xf1eac000, baseValue=..., oldStructure=0xf1efb840, propertyName=..., slot=..., stubInfo=..., putByKind=JSC::PutByKind::ByVal, putKind=JSC::PutKind::NotDirect) at ../../Source/JavaScriptCore/jit/Repatch.cpp:897 #13 0xf6c5a820 in JSC::putByValOptimize (globalObject=0xf20f9038, codeBlock=0xf1eac000, baseValue=..., subscript=..., value=..., stubInfo=0xf36f69a0, profile=0xf36e8e30, ecmaMode=...) at ../../Source/JavaScriptCore/jit/JITOperations.cpp:1053 #14 0xf6c5a9d2 in JSC::operationPutByValNonStrictOptimize (globalObject=0xf20f9038, encodedBaseValue=-17416011648, encodedSubscript=-17415806464, encodedValue=-4294967271, stubInfo=0xf36f69a0, profile=0xf36e8e30) at ../../Source/JavaScriptCore/jit/JITOperations.cpp:1085 #15 0xf35fd662 in ?? () Pretty sure this is bug #230215, we are working on a fix. Trying to get a trace for the other failure, but it's also in a for-in test (for-in-cached.js in LayouTests/js), so I suspect something similar is going on. (In reply to Yusuke Suzuki from comment #21) > Comment on attachment 438082 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=438082&action=review > > r=me with one issue. > > > Source/JavaScriptCore/runtime/JSScope.cpp:84 > > + catchScope.releaseAssertNoException(); > > Unfortunately, I think we can have TerminatedException. Right. Spoke to Yusuke on Slack. We're going to use DeferTerminationForAWhile inside abstractAccess. Created attachment 438270 [details]
patch for landing
Comment on attachment 438270 [details] patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=438270&action=review > Source/JavaScriptCore/bytecode/UnlinkedMetadataTable.cpp:58 > + buffer[i] = offset; // We align when we access this. This fixes a bug that this patch exposes, where we would potentially think we had more metadata table entries than we really had in practice. This is because "forEach" used the next opcode's start pointer. That pointer is aligned to that opcode's metadata alignment. So that might make the previous opcode think it had an extra 1-7 entries (depending on size, alignment, etc). This patch fixes that by having the next opcode's start offset in the table always be the end offset of the previous opcode. This fixes the bug in forEach. Then, when we access the metadata, we round up to alignment. Created attachment 438274 [details]
patch for landing
(In reply to Saam Barati from comment #25) > Comment on attachment 438270 [details] > patch for landing > > View in context: > https://bugs.webkit.org/attachment.cgi?id=438270&action=review > > > Source/JavaScriptCore/bytecode/UnlinkedMetadataTable.cpp:58 > > + buffer[i] = offset; // We align when we access this. > > This fixes a bug that this patch exposes, where we would potentially think > we had more metadata table entries than we really had in practice. This is > because "forEach" used the next opcode's start pointer. That pointer is > aligned to that opcode's metadata alignment. So that might make the previous > opcode think it had an extra 1-7 entries (depending on size, alignment, > etc). > > This patch fixes that by having the next opcode's start offset in the table > always be the end offset of the previous opcode. This fixes the bug in > forEach. > > Then, when we access the metadata, we round up to alignment. Another approach in the future would be to generate the opcodeIDs for byte codes in descending order of alignment. So all things with alignment 8 come first, followed by 4, 2, 1. That would then allow us to never align (and instead assert we're always aligned). We could do that in the ruby code in two ways: 1. figure out the alignment somehow. Perhaps compile some dummy code and return alignof of dummy structs 2. Just specify in the ruby bytecodelist.rb what the alignment is (perhaps of type, or per each metadata). Then we'd just have to keep that updated. Committed r282478 (241726@main): <https://commits.webkit.org/241726@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 438274 [details]. Re-opened since this is blocked by bug 230358 (In reply to WebKit Commit Bot from comment #29) > Re-opened since this is blocked by bug 230358 It was a speedup on Speedo2. But slowdown on JetStream2. Gonna see if the simple version of the patch in prior versions is a speedup or neutral everywhere. Created attachment 438391 [details]
Patch
Comment on attachment 438391 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=438391&action=review r=me > Source/JavaScriptCore/bytecode/ValueProfile.h:220 > +struct UnlinkedValueProfile { I think `class` is better in this case. > Source/JavaScriptCore/runtime/CachedTypes.cpp:2164 > + m_valueProfiles = FixedVector<UnlinkedValueProfile>(cachedCodeBlock.numValueProfiles()); > + m_arrayProfiles = FixedVector<UnlinkedArrayProfile>(cachedCodeBlock.numArrayProfiles()); Why not initializing them in the initializer list? , m_valueProfiles(cachedCodeBlock.numValueProfiles()) (In reply to Yusuke Suzuki from comment #32) > Comment on attachment 438391 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=438391&action=review > > r=me > > > Source/JavaScriptCore/bytecode/ValueProfile.h:220 > > +struct UnlinkedValueProfile { > > I think `class` is better in this case. > > > Source/JavaScriptCore/runtime/CachedTypes.cpp:2164 > > + m_valueProfiles = FixedVector<UnlinkedValueProfile>(cachedCodeBlock.numValueProfiles()); > > + m_arrayProfiles = FixedVector<UnlinkedArrayProfile>(cachedCodeBlock.numArrayProfiles()); > > Why not initializing them in the initializer list? > > , m_valueProfiles(cachedCodeBlock.numValueProfiles()) Will fix. Created attachment 438393 [details]
patch for landing
Created attachment 438403 [details]
[fast-cq] patch for landing
Committed r282621 (241779@main): <https://commits.webkit.org/241779@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 438403 [details]. |