Summary: | Add support for incremental bytecode cache updates | ||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Tadeu Zagallo <tzagallo> | ||||||||||||||||||||||||||||||||||||||||||
Component: | JavaScriptCore | Assignee: | Tadeu Zagallo <tzagallo> | ||||||||||||||||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||||||||||||||||||
Severity: | Normal | CC: | commit-queue, ews-watchlist, fpizlo, keith_miller, mark.lam, msaboff, rniwa, saam, webkit-bug-importer | ||||||||||||||||||||||||||||||||||||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||||||||||||||||||||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Tadeu Zagallo
2019-02-25 04:17:36 PST
Created attachment 362901 [details]
Patch
Created attachment 362918 [details]
Patch
Rebase
Created attachment 362925 [details]
Patch
Fix mac-32bit and windows builds
Comment on attachment 362925 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362925&action=review > Source/JavaScriptCore/ChangeLog:8 > + Add CacheUpdates to the bytecode to allow to incrementally build the "CacheUpdates" => "cache updates" > Source/JavaScriptCore/ChangeLog:11 > + In order to that, we keep track of the UnlinkedCodeBlocks missing from "to that" => "to do that" > Source/JavaScriptCore/jsc.cpp:1000 > + if (m_cachedBytecode.size()) > + munmap(const_cast<uint8_t*>(m_cachedBytecode.data()), m_cachedBytecode.size()); > + How is this correct? This is called from writeCodeBlock, which may be called many times. Maybe I'm missing something? If I'm not, we need tests for this. I suspect this just needs to move back to the destructor. But this brings up an interesting question, since your mmap is for only the initial size. Maybe you need to mmap to reload at the end of this call? I'm not sure what mmap semantics for size are when you're mmapping a file descriptor. > Source/JavaScriptCore/jsc.cpp:1005 > + bool success = m_cachedBytecode.forEachUpdate([&](off_t offset, const void* data, size_t size) { style nit: space after "]" > Source/JavaScriptCore/runtime/CachedBytecode.cpp:54 > + other.m_owned = false; Should we just null out it's other fields too? > Source/JavaScriptCore/runtime/CachedBytecode.cpp:157 > + case CacheUpdate::Function: { How are we supposed to know all the fields that need to be updated? Is there just some header struct we can just write out all at once in contiguous memory to make this code easier to digest and keep updated? > Source/JavaScriptCore/runtime/CachedBytecode.cpp:162 > + // update m_unlinkedCodeBlockFor(Call|Construct).m_ptr.m_offset worth a static assert that this field is ptrdiff_t > Source/JavaScriptCore/runtime/CachedBytecode.cpp:190 > + } } style nit: should be on different lines > Source/JavaScriptCore/runtime/CachedBytecode.cpp:200 > + ASSERT(static_cast<size_t>(offset) == m_size); > + Do we not need to update m_payload here? > Source/JavaScriptCore/runtime/CachedBytecode.h:132 > + Type m_type; > + union { > + GlobalUpdate m_globalUpdate; > + FunctionUpdate m_functionUpdate; > + }; WTF::Variant? > Source/JavaScriptCore/runtime/CachedTypes.cpp:141 > + void addLeafCodeBlock(const UnlinkedFunctionExecutable* executable, ptrdiff_t offset) > + { > + m_leafCodeBlocks.add(executable, offset); > + } This naming is weird given that it's an executable. > Source/JavaScriptCore/runtime/CachedTypes.cpp:350 > + style nit: we don't really need a newline here > Source/JavaScriptCore/runtime/CachedTypes.cpp:1860 > + ASSERT(codeBlock->isStrictMode()); Why? > Source/JavaScriptCore/runtime/CachedTypes.cpp:2010 > + remove > Source/JavaScriptCore/runtime/CachedTypes.cpp:2015 > + encoder.addLeafCodeBlock(&executable, encoder.offsetOf(this)); nit: Not this patch, but we tend to use pointers instead of references for GC pointers just as a style policy in JSC. Comment on attachment 362925 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362925&action=review >> Source/JavaScriptCore/jsc.cpp:1000 >> + > > How is this correct? This is called from writeCodeBlock, which may be called many times. Maybe I'm missing something? If I'm not, we need tests for this. I suspect this just needs to move back to the destructor. > > But this brings up an interesting question, since your mmap is for only the initial size. Maybe you need to mmap to reload at the end of this call? I'm not sure what mmap semantics for size are when you're mmapping a file descriptor. I think I need to clear the CachedBytecode here. I also realized that we probably won't update the cache, since we are checking for SourceCodeValue::written, and we never clear it. I'll move the mmap to cachedBytecode(), that way it will be reloaded if we try to read it again, does that make sense? >> Source/JavaScriptCore/runtime/CachedBytecode.cpp:54 >> + other.m_owned = false; > > Should we just null out it's other fields too? Sure, I can do that. >> Source/JavaScriptCore/runtime/CachedBytecode.cpp:157 >> + case CacheUpdate::Function: { > > How are we supposed to know all the fields that need to be updated? Is there just some header struct we can just write out all at once in contiguous memory to make this code easier to digest and keep updated? Yeah, I'm not happy about this either. I'll try to move all of this into a struct as you suggested. >> Source/JavaScriptCore/runtime/CachedBytecode.cpp:162 >> + // update m_unlinkedCodeBlockFor(Call|Construct).m_ptr.m_offset > > worth a static assert that this field is ptrdiff_t that's a good idea. >> Source/JavaScriptCore/runtime/CachedTypes.cpp:141 >> + } > > This naming is weird given that it's an executable. Yeah, I was recording the code blocks at first. I'll update it. >> Source/JavaScriptCore/runtime/CachedTypes.cpp:1860 >> + ASSERT(codeBlock->isStrictMode()); > > Why? Oops, this is leftover from testing. >> Source/JavaScriptCore/runtime/CachedTypes.cpp:2015 >> + encoder.addLeafCodeBlock(&executable, encoder.offsetOf(this)); > > nit: Not this patch, but we tend to use pointers instead of references for GC pointers just as a style policy in JSC. This is how the cache hierarchy was designed: A UnlinkedFunctionExecutable* is cached as CachedPtr<CachedFunctionExecutable>. The way CachedPtr works is that it takes a T*, and encodes it with `encode(*T)`, so all the encode functions take a reference. Created attachment 363869 [details]
Patch
Created attachment 363887 [details]
Patch
Created attachment 363903 [details]
Patch
Created attachment 363995 [details]
Patch
Comment on attachment 363995 [details] Patch Attachment 363995 [details] did not pass mac-wk2-ews (mac-wk2): Output: https://webkit-queues.webkit.org/results/11424742 New failing tests: accessibility/mac/selection-notification-focus-change.html Created attachment 363998 [details]
Archive of layout-test-results from ews106 for mac-highsierra-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106 Port: mac-highsierra-wk2 Platform: Mac OS X 10.13.6
Comment on attachment 363995 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=363995&action=review > Source/JavaScriptCore/ChangeLog:14 > + bytecode. The CodeBlock to be cached is then encoded on its own buffer nit: CodeBlock => UnlinkedCodeBlock on => in > Source/JavaScriptCore/ChangeLog:15 > + and later appended to the existing cache. I think it's worth hammering this home in another sentence: incremental updates work because we basically purely append to the existing cache + update a few fixed fields in the existing cache > Source/JavaScriptCore/jsc.cpp:1003 > + m_cachedBytecode.clear(); do we ever set up a new m_cachedBytecode? > Source/JavaScriptCore/runtime/CachedBytecode.cpp:121 > + // global code block can we just each of these functions better names? like addGlobalUpdate and addFunctionUpdate? > Source/JavaScriptCore/runtime/CachedBytecode.cpp:145 > +bool CachedBytecode::forEachUpdate(const ForEachUpdateCallback& callback) const This is a weird name for this function since it's about success. Maybe "bool commitUpdates(...)"? > Source/JavaScriptCore/runtime/CachedBytecode.cpp:147 > + off_t offset = m_payload.size(); Do we ever update m_payload? > Source/JavaScriptCore/runtime/CachedBytecode.cpp:155 > + ptrdiff_t kindOffset = functionUpdate.m_kind == CodeForCall ? CachedFunctionExecutableOffsets::codeBlockForCallOffset() : CachedFunctionExecutableOffsets::codeBlockForConstructOffset(); style nit: this should go in the below block. > Source/JavaScriptCore/runtime/CachedBytecode.cpp:157 > + // update m_unlinkedCodeBlockFor(Call|Construct).m_ptr.m_offset nit: I don't think these comments are adding anything, since the code makes it obvious what's going on. > Source/JavaScriptCore/runtime/CachedBytecode.cpp:160 > + if (!callback(codeBlockOffset, &offsetPayload, sizeof(ptrdiff_t))) you should static assert whatever field this is is sizeof(ptrdiff_t) Comment on attachment 363995 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=363995&action=review >> Source/JavaScriptCore/jsc.cpp:1003 >> + m_cachedBytecode.clear(); > > do we ever set up a new m_cachedBytecode? Yes, in `loadBytecode`. >> Source/JavaScriptCore/runtime/CachedBytecode.cpp:121 >> + // global code block > > can we just each of these functions better names? > like addGlobalUpdate and addFunctionUpdate? Sure, I will do that. >> Source/JavaScriptCore/runtime/CachedBytecode.cpp:147 >> + off_t offset = m_payload.size(); > > Do we ever update m_payload? No, `m_payload` is only set in the constructor. Created attachment 364854 [details]
Patch
Created attachment 365013 [details]
Patch
Rebase and update ChangeLog
Comment on attachment 365013 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=365013&action=review > Source/JavaScriptCore/runtime/CachedBytecode.cpp:139 > + auto addResult = m_leafExecutables.add(it.key, it.value + m_size); I'm a little bit lost on seeing why this is the correct offset math. Can you explain why this works? > Source/JavaScriptCore/runtime/CachedBytecode.cpp:145 > +bool CachedBytecode::commitUpdates(const ForEachUpdateCallback& callback) const What happens when one of the later updates fails but an earlier one doesn't? Don't we end up in some weird corrupted state? E.g, what if we successfully update the offset but fail to write the payload the offset points to? Maybe the right API to use is to reserve the final file size. If that fails, we bail. If that succeeds, we can just write everything out to disk assuming success? I think we should find a way to make this update atomic. > Source/JavaScriptCore/runtime/CachedBytecode.cpp:171 > + // write the payload nit: remove comment Comment on attachment 365013 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=365013&action=review >> Source/JavaScriptCore/runtime/CachedBytecode.cpp:139 >> + auto addResult = m_leafExecutables.add(it.key, it.value + m_size); > > I'm a little bit lost on seeing why this is the correct offset math. Can you explain why this works? The leaf executable was recorded while encoding the function in its own buffer, so when we append to the cache we need to offset it by the current size of the cache so that it's relative to the start of the file. >> Source/JavaScriptCore/runtime/CachedBytecode.cpp:145 >> +bool CachedBytecode::commitUpdates(const ForEachUpdateCallback& callback) const > > What happens when one of the later updates fails but an earlier one doesn't? Don't we end up in some weird corrupted state? E.g, what if we successfully update the offset but fail to write the payload the offset points to? > > Maybe the right API to use is to reserve the final file size. If that fails, we bail. If that succeeds, we can just write everything out to disk assuming success? > > I think we should find a way to make this update atomic. Right now we just truncate the file if any of the updates fail, but I agree your solution is better. I'll update it. Created attachment 365181 [details]
Patch
Comment on attachment 365013 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=365013&action=review >>> Source/JavaScriptCore/runtime/CachedBytecode.cpp:139 >>> + auto addResult = m_leafExecutables.add(it.key, it.value + m_size); >> >> I'm a little bit lost on seeing why this is the correct offset math. Can you explain why this works? > > The leaf executable was recorded while encoding the function in its own buffer, so when we append to the cache we need to offset it by the current size of the cache so that it's relative to the start of the file. Just to verify I'm understanding this correctly: this is dependent on the order of m_updates too, right? Comment on attachment 365013 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=365013&action=review >>>> Source/JavaScriptCore/runtime/CachedBytecode.cpp:139 >>>> + auto addResult = m_leafExecutables.add(it.key, it.value + m_size); >>> >>> I'm a little bit lost on seeing why this is the correct offset math. Can you explain why this works? >> >> The leaf executable was recorded while encoding the function in its own buffer, so when we append to the cache we need to offset it by the current size of the cache so that it's relative to the start of the file. > > Just to verify I'm understanding this correctly: this is dependent on the order of m_updates too, right? Yes, that's correct. Comment on attachment 365181 [details] Patch Attachment 365181 [details] did not pass mac-debug-ews (mac): Output: https://webkit-queues.webkit.org/results/11570460 New failing tests: http/tests/inspector/dom/didFireEvent.html Created attachment 365239 [details]
Archive of layout-test-results from ews112 for mac-highsierra
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112 Port: mac-highsierra Platform: Mac OS X 10.13.6
Comment on attachment 365181 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=365181&action=review Can you file a follow-up patch to hook the ObjC API into incremental caching? Mostly LGTM, one final question/comment. > Source/JavaScriptCore/runtime/CodeCache.cpp:205 > if (!codeBlock) > return; When is this false? Why not just make this only take key? This function now feels slightly weird since we may call commitCachedBytecode repeatedly on the same provider. Is that expected? It seems inefficient. Comment on attachment 365181 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=365181&action=review >> Source/JavaScriptCore/runtime/CodeCache.cpp:205 >> return; > > When is this false? Why not just make this only take key? > > This function now feels slightly weird since we may call commitCachedBytecode repeatedly on the same provider. Is that expected? It seems inefficient. This is false for the UnlinkedFunctionExecutable entries in the cache. We can't take just the key because the CodeBlock is the value and we need the VM for the jsDynamicCast. (We could get the VM for the cell, but it seemed inefficient in the places that we already have it.) I don't expect us to call it multiple times on the same provider. We only cache UnlinkedGlobalCodeBlocks, is it possible/common to have the same provider for multiple global code blocks? Unless it's very common, it should be ok since the provider will clear the updates after successfully committing them, so it should be a no-op. Comment on attachment 365181 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=365181&action=review > Source/JavaScriptCore/jsc.cpp:1000 > + munmap(const_cast<uint8_t*>(m_cachedBytecode.data()), m_cachedBytecode.size()); As mentioned on IRC, this may be problematic. >>> Source/JavaScriptCore/runtime/CodeCache.cpp:205 >>> return; >> >> When is this false? Why not just make this only take key? >> >> This function now feels slightly weird since we may call commitCachedBytecode repeatedly on the same provider. Is that expected? It seems inefficient. > > This is false for the UnlinkedFunctionExecutable entries in the cache. We can't take just the key because the CodeBlock is the value and we need the VM for the jsDynamicCast. (We could get the VM for the cell, but it seemed inefficient in the places that we already have it.) > > I don't expect us to call it multiple times on the same provider. We only cache UnlinkedGlobalCodeBlocks, is it possible/common to have the same provider for multiple global code blocks? Unless it's very common, it should be ok since the provider will clear the updates after successfully committing them, so it should be a no-op. Got it. Makes sense. I agree we shouldn't have the same provider for different global code blocks. However, your commit doesn't seem like it'll be a nop if called multiple times in a row. Created attachment 366478 [details]
Patch
Created attachment 366533 [details]
Patch
Fix build
Created attachment 366542 [details]
Patch
Fix windows build
Comment on attachment 366542 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=366542&action=review > Source/JavaScriptCore/API/JSScript.mm:53 > + RefPtr<JSC::CachedBytecode> m_cachedBytecode; Why not Ref? When is this ever null? Or why not make it null to represent not having a cache? Why create the empty BytecodeCache instead of just keeping null to signify the same thing? > Source/JavaScriptCore/jsc.cpp:995 > + void commitCachedBytecode() const override feels like a weird function to be marked as const given the name and the desired semantics. Also, I'm still confused on what happens if this gets called twice? If we end up clearing this in the makeScopeExit below, how will the second call succeed? Maybe the way other things are structured says this won't happen, but I feel like we should be able to handle it. Logically, I feel like I should be able to repeatedly call commitCachedBytecode on the same source provider. > Source/JavaScriptCore/jsc.cpp:1002 > + m_cachedBytecode = CachedBytecode::create(); same comment as earlier. Why not just use nullptr to represent this? > Source/JavaScriptCore/jsc.cpp:1021 > + size_t bytesWritten = static_cast<size_t>(write(fd, data, size)); > + ASSERT_UNUSED(bytesWritten, bytesWritten == size); this feels like a dangerous thing to assert. Does POSIX guarantee this succeeds if the file is of sufficient size? > Source/JavaScriptCore/jsc.cpp:1081 > + mutable Ref<CachedBytecode> m_cachedBytecode; if you unmark the commit function as const you can also remove this mutable. > Source/JavaScriptCore/runtime/CachedBytecode.cpp:63 > +#if OS(DARWIN) > + munmap(m_data, m_size); > +#endif Should this be: #else ASSERT_NOT_REACHED() or similar? That said, Darwin feels weird here. Why not just do this for posix? I also think that means the constructor that says mapped should be compiled out for non posix? Or crash for non posix? > Source/JavaScriptCore/runtime/CachedBytecode.cpp:93 > + this->~CacheUpdate(); you should add: ASSERT(&other != this) > Source/JavaScriptCore/runtime/CachedBytecode.h:61 > + CachePayload(void* data, size_t size) > + : m_mapped(true) > + , m_size(size) > + , m_data(static_cast<uint8_t*>(data)) > + { > + } > + > + CachePayload(MallocPtr<uint8_t>&& data, size_t size) > + : m_mapped(false) > + , m_size(size) > + , m_data(data.leakPtr()) > + { > + } nit: I think having static functions with names is a bit better to read at the callsite. e.g, static CachePayload makeMappedPayload(void*, size_t) > Source/JavaScriptCore/runtime/CachedBytecode.h:156 > + size_t sizeForUpdate() { return m_size; } style nit: can be marked as const > Source/JavaScriptCore/runtime/CodeCache.h:170 > + VERBOSE_LOG("Found cached CodeBlock in the SourceProvider"); I know this was here before, but do we really want to keep this? Comment on attachment 366542 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=366542&action=review >> Source/JavaScriptCore/API/JSScript.mm:53 >> + RefPtr<JSC::CachedBytecode> m_cachedBytecode; > > Why not Ref? When is this ever null? > > Or why not make it null to represent not having a cache? Why create the empty BytecodeCache instead of just keeping null to signify the same thing? The only reason this is `RefPtr` instead of `Ref` is because there's no default initializer for `Ref` and I believe there's no way to initialize it from an Obj-C class. I believe we could use `null` to represent not having the cache for now, but it doesn't work as a general solution because we need to append the updates to the empty cache. >> Source/JavaScriptCore/jsc.cpp:995 >> + void commitCachedBytecode() const override > > feels like a weird function to be marked as const given the name and the desired semantics. > > Also, I'm still confused on what happens if this gets called twice? If we end up clearing this in the makeScopeExit below, how will the second call succeed? Maybe the way other things are structured says this won't happen, but I feel like we should be able to handle it. Logically, I feel like I should be able to repeatedly call commitCachedBytecode on the same source provider. All the cache functions had to be marked `const` and `m_cachedBytecode` because we can only have a const reference to the source provider. I'm not happy about it, but couldn't find a better solution. The second call will bail because after clearing `m_cachedBytecode`, hasUpdates will return false in the first check. I added this after your comment in the last patch, and I think it should handle multiple calls. >> Source/JavaScriptCore/jsc.cpp:1002 >> + m_cachedBytecode = CachedBytecode::create(); > > same comment as earlier. Why not just use nullptr to represent this? Here is the case where we can't use null, because here we actually support updates. We could have an `ensureCachedBytecode()` helper, but it felt like overkill. >> Source/JavaScriptCore/jsc.cpp:1021 >> + ASSERT_UNUSED(bytesWritten, bytesWritten == size); > > this feels like a dangerous thing to assert. Does POSIX guarantee this succeeds if the file is of sufficient size? I couldn't find any reason in the man pages why it would fail for a file if there's sufficient space, but I'm honestly not sure. Either way, if this is possible, we're in a bad place, because we're no longer checking for successful writes of every chunk and we don't do any integrity check in the bytecode cache. If you think this is dangerous, I believe we'll have to rollback to my previous version where every write returns where it was successful. >> Source/JavaScriptCore/jsc.cpp:1081 >> + mutable Ref<CachedBytecode> m_cachedBytecode; > > if you unmark the commit function as const you can also remove this mutable. see my comment above that we can only find the SourceProvider through a const reference. >> Source/JavaScriptCore/runtime/CachedBytecode.cpp:63 >> +#endif > > Should this be: > #else > ASSERT_NOT_REACHED() or similar? > > That said, Darwin feels weird here. Why not just do this for posix? > > I also think that means the constructor that says mapped should be compiled out for non posix? Or crash for non posix? Technically you could still use it, but it would be up to the caller to free the data, which as I write I realize wouldn't be very useful. I'll remove the constructor and change it to posix. Created attachment 366923 [details]
Patch
Comment on attachment 366923 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=366923&action=review r=me but it seems like this needs to be rebased and please try to put classes in separate files. > Source/JavaScriptCore/runtime/CachedBytecode.h:42 > +class CachePayload { Ideally this would be a separate header. And the implementation in a separate .cpp file. > Source/JavaScriptCore/runtime/CachedBytecode.h:72 > +class LeafExecutable { Ditto. Created attachment 367111 [details]
Patch for landing
I'll just wait for EWS to confirm that all platforms are green
Created attachment 367114 [details]
Patch
Fix build
Created attachment 367124 [details]
Patch
Try to fix the build again
Created attachment 367128 [details]
Patch
Try to fix the build yet another time
Comment on attachment 367128 [details] Patch Clearing flags on attachment: 367128 Committed r244143: <https://trac.webkit.org/changeset/244143> All reviewed patches have been landed. Closing bug. Committed r244149: <https://trac.webkit.org/changeset/244149> Ugh, webkit-patch automatically commented... r244149 is a fix for the watch builds. |