194047 – Integrate JSC bytecode cache with WebKit

NEW 194047

Integrate JSC bytecode cache with WebKit

https://bugs.webkit.org/show_bug.cgi?id=194047

Summary Integrate JSC bytecode cache with WebKit

Tadeu Zagallo

Reported 2019-01-30 14:13:15 PST

Add the ability to the network process to generate, cache and retrieve bytecode for JavaScript files.

Attachments
Patch (71.33 KB, patch) 2019-01-30 14:27 PST, Tadeu Zagallo	no flags	Details Formatted Diff Diff
Patch (68.25 KB, patch) 2019-02-05 16:08 PST, Tadeu Zagallo	no flags	Details Formatted Diff Diff
Patch (67.32 KB, patch) 2019-02-05 16:19 PST, Tadeu Zagallo	no flags	Details Formatted Diff Diff
Patch (69.50 KB, patch) 2019-02-11 15:50 PST, Tadeu Zagallo	no flags	Details Formatted Diff Diff
WIP (109.57 KB, patch) 2019-04-17 14:52 PDT, Tadeu Zagallo	no flags	Details Formatted Diff Diff
Patch (174.73 KB, patch) 2019-05-11 05:41 PDT, Tadeu Zagallo	no flags	Details Formatted Diff Diff
Patch (171.53 KB, patch) 2019-05-13 05:43 PDT, Tadeu Zagallo	tzagallo: review?	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Tadeu Zagallo

Comment 1 2019-01-30 14:27:39 PST

Created attachment 360620 [details] Patch

Alex Christensen

Comment 2 2019-01-30 21:26:43 PST

Comment on attachment 360620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=360620&action=review I would expect more SessionIDs in this patch to say which WebsiteDataStore to cache the bytecode in. I guess we can get it from the pageID or get it the same way we get the pageID. > Source/JavaScriptCore/runtime/CodeCache.h:87 > int64_t age; > + bool written; Both of these could use initializer lists, or get rid of the default constructor. > Source/WebCore/loader/cache/CachedScript.cpp:117 > + memcpy(memory.get() + it.beginPosition, it.segment->data(), it.segment->size()); This seems like an unnecessary copy. > Source/WebCore/loader/cache/CachedScript.cpp:118 > + m_cachedBytecode = JSC::CachedBytecode { WTFMove(memory), size }; This is basically just a Vector<uint8_t> > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:850 > + Vector<IPC::DerivedData>* derivedData = new Vector<IPC::DerivedData>(); No need to call new here. Just WTFMove an instance into the lambda capture list. > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:857 > + if (loader->hasOneRef()) > + return; Why? > Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:2 > + * Copyright (C) 2019 Apple Inc. All rights reserved. There seems to be an extra space between Inc. and All. > Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:75 > + SHA1::Digest m_bodyHash; Why SHA1? > Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:111 > + for (RefPtr<Plan> plan : m_plans) Would "const auto&" work, or do we need to keep a reference to this plan as we use it? > Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:135 > + RefPtr<JSC::VM> m_vm; Ref<JSC::VM> > Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:144 > + m_thread = new Thread(locker, *this); This is a memory leak. It should be adopted. I'm also not sure you need to create a thread just for this. I'm also curious why WTF::Thread::create returns a Ref<Thread> but it has a comment saying it can return nullptr. So many thread questions. > Source/WebKit/NetworkProcess/cache/BytecodeCache.h:50 > + using Plans = Vector<RefPtr<Plan>, 8>; Can this be a Vector<Ref<Plan>>? > Source/WebKit/NetworkProcess/cache/BytecodeCache.h:55 > + static RefPtr<BytecodeCache> create(NetworkProcess& networkProcess) This could return a Ref<BytecodeCache> > Source/WebKit/NetworkProcess/cache/BytecodeCache.h:71 > + RefPtr<AutomaticThread> m_thread; Can this be a Ref<AutomaticThread>? Also still wondering why you need a thread.

Tadeu Zagallo

Comment 3 2019-02-01 13:41:41 PST

Comment on attachment 360620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=360620&action=review >> Source/JavaScriptCore/runtime/CodeCache.h:87 >> + bool written; > > Both of these could use initializer lists, or get rid of the default constructor. I believe the default constructor is necessary for using it in the HashMap below, no? >> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:850 >> + Vector<IPC::DerivedData>* derivedData = new Vector<IPC::DerivedData>(); > > No need to call new here. Just WTFMove an instance into the lambda capture list. That would move it multiple times. >> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:857 >> + return; > > Why? Not sure actually, this was part of the code that you had deleted and I added back. I assume that if this is the only ref to the loader, then we don't care about the response anymore? >> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:75 >> + SHA1::Digest m_bodyHash; > > Why SHA1? That's what the NetworkCache uses. >> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:135 >> + RefPtr<JSC::VM> m_vm; > > Ref<JSC::VM> The VM might be null if we don't have an underlying thread. >> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:144 >> + m_thread = new Thread(locker, *this); > > This is a memory leak. It should be adopted. I'm also not sure you need to create a thread just for this. I'm also curious why WTF::Thread::create returns a Ref<Thread> but it has a comment saying it can return nullptr. So many thread questions. Oops, I copied it from the JITWorklist, I guess I should fix it there too... >> Source/WebKit/NetworkProcess/cache/BytecodeCache.h:71 >> + RefPtr<AutomaticThread> m_thread; > > Can this be a Ref<AutomaticThread>? Also still wondering why you need a thread. I'm using the thread to generate the bytecode for the script. I don't think we should block the main thread for that.

Antti Koivisto

Comment 4 2019-02-04 09:35:21 PST

A patch that applies to trunk would be nice.

Antti Koivisto

Comment 5 2019-02-04 09:42:11 PST

Comment on attachment 360620 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=360620&action=review Looks generally good but lets get it apply before r+ing. >>> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:850 >>> + Vector<IPC::DerivedData>* derivedData = new Vector<IPC::DerivedData>(); >> >> No need to call new here. Just WTFMove an instance into the lambda capture list. > > That would move it multiple times. Please use smart pointers for memory management. Calling new (outside create() type functions that return smart pointers) is discouraged. auto derivedData = std::make_unique<Vector<IPC::DerivedData>>(); >>> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:857 >>> + return; >> >> Why? > > Not sure actually, this was part of the code that you had deleted and I added back. I assume that if this is the only ref to the loader, then we don't care about the response anymore? This lambda is then the only ref keeping the loader alive which implies there are no clients. A more stylish solution would use WeakPtr.

Tadeu Zagallo

Comment 6 2019-02-05 16:08:35 PST

Created attachment 361237 [details] Patch

Tadeu Zagallo

Comment 7 2019-02-05 16:19:55 PST

Created attachment 361242 [details] Patch

Tadeu Zagallo

Comment 8 2019-02-11 15:50:43 PST

Created attachment 361724 [details] Patch

Antti Koivisto

Comment 9 2019-02-13 05:09:30 PST

Comment on attachment 361724 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=361724&action=review > Source/WebCore/loader/cache/CachedResource.cpp:319 > + auto* frameLoader = m_loader->frameLoader(); > + auto* frameLoaderClient = frameLoader ? &frameLoader->client() : nullptr; > + if (frameLoaderClient) { > + m_webPageID = frameLoaderClient->pageID().valueOr(0); > + m_webFrameID = frameLoaderClient->frameID().valueOr(0); > + } This is not correct. A CachedResource can have clients in multiple pages and frames. > Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:166 > + cache->retrieve(request, { webPageID, webFrameID }, [bytecodeCache = makeRef(*this), request = WebCore::ResourceRequest { request }](auto entry, auto) { I think you can just pass empty frameId and never pass webPageID/webFrameID around in the first place. It appears to be used for logging purposes only.

Tadeu Zagallo

Comment 10 2019-04-17 14:52:11 PDT

Created attachment 367681 [details] WIP

Tadeu Zagallo

Comment 11 2019-05-11 05:41:02 PDT

Created attachment 369651 [details] Patch

Tadeu Zagallo

Comment 12 2019-05-11 05:42:20 PDT

I still need to update the ChangeLog for WebKit, and I'm still running more benchmarks, but this is mostly working, so it might be nice to get some comments already if anyone wants to have a look at it.

Tadeu Zagallo

Comment 13 2019-05-13 05:43:46 PDT

Created attachment 369724 [details] Patch Rebase

Geoffrey Garen

Comment 14 2019-05-14 10:53:24 PDT

Comment on attachment 369724 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=369724&action=review What's your plan for validating the performance benefit of this change? > Source/JavaScriptCore/runtime/Completion.cpp:148 > + vm.codeCache()->write(vm); Will this write out all code in the cache? If so, can we do something more efficient to write out only the new code we just generated? > Source/WebCore/loader/cache/CachedScript.cpp:121 > + int fd = open(filename.ascii().data(), O_RDONLY | O_SHLOCK | O_NONBLOCK); It would be nice to use WebCore::FileHandle instead of int. It's more portable, and more type-safe. Since this fd is ultimately used in JavaScriptCore, you would need to move the FileHandle class to WTF. Since FileSystem is already in WTF, that might not be too hard. I'd suggest doing that in a follow-up patch.

Tadeu Zagallo

Comment 15 2019-05-15 09:22:22 PDT

Comment on attachment 369724 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=369724&action=review I have been testing with PLT and JetStream2, but not much luck there. I also ran some manual tests with some big scripts with short execution, and there I see a ~10% slowdown for the second request (when we generate the bytecode in the network process) and a ~65% speedup for the 3rd request onwards, once the cache has been populated. >> Source/JavaScriptCore/runtime/Completion.cpp:148 >> + vm.codeCache()->write(vm); > > Will this write out all code in the cache? If so, can we do something more efficient to write out only the new code we just generated? This will send a message to the network process for every SourceProvider that has been updated since the last call to write. The message has the metadata with what should be be cached. >> Source/WebCore/loader/cache/CachedScript.cpp:121 >> + int fd = open(filename.ascii().data(), O_RDONLY | O_SHLOCK | O_NONBLOCK); > > It would be nice to use WebCore::FileHandle instead of int. It's more portable, and more type-safe. > > Since this fd is ultimately used in JavaScriptCore, you would need to move the FileHandle class to WTF. Since FileSystem is already in WTF, that might not be too hard. > > I'd suggest doing that in a follow-up patch. Sounds good. The reason I haven't been using the FileSystem in most places is that it doesn't allow for passing custom flags to open when mmapping a file, but I should probably add the functionality and updates the bytecode cache code.

Geoffrey Garen

Comment 16 2019-05-28 15:42:58 PDT

> I have been testing with PLT and JetStream2, but not much luck there. I also > ran some manual tests with some big scripts with short execution, and there > I see a ~10% slowdown for the second request (when we generate the bytecode > in the network process) and a ~65% speedup for the 3rd request onwards, once > the cache has been populated. We need to figure out a strategy for performance testing that can tell us whether the strategy in this patch is OK, or needs improvement. I think the ideal test would find some way to run PLT5 where iteration 1 commits entries to the cache and iteration 2 uses those cached entries. One way to do this might be just to run PLT5 with the iteration count explicitly set to 3. That way, the first run is uncached, the second run is cached (but without a bytecode cache), and the third run is fully cached (with a bytecode cache). > >> Source/JavaScriptCore/runtime/Completion.cpp:148 > >> + vm.codeCache()->write(vm); > > > > Will this write out all code in the cache? If so, can we do something more efficient to write out only the new code we just generated? > > This will send a message to the network process for every SourceProvider > that has been updated since the last call to write. The message has the > metadata with what should be be cached. Seems potentially inefficient to iterate all CodeBlocks in the cache every time you call evaluate(). Can we do better here? Seems like we might be able to keep an explicit set of CodeBlocks with updates, or batch this iteration work with a timer.

Tadeu Zagallo

Comment 17 2019-05-29 02:33:01 PDT

(In reply to Geoffrey Garen from comment #16) > We need to figure out a strategy for performance testing that can tell us > whether the strategy in this patch is OK, or needs improvement. > > I think the ideal test would find some way to run PLT5 where iteration 1 > commits entries to the cache and iteration 2 uses those cached entries. One > way to do this might be just to run PLT5 with the iteration count explicitly > set to 3. That way, the first run is uncached, the second run is cached (but > without a bytecode cache), and the third run is fully cached (with a > bytecode cache). I tried PLT5 with 3 iterations, it was not an improvement. Even the 3rd run was not faster. I think the next step is to measure how much time we spend in parsing + bytecode generation vs loading from cache + deserialization to see where are we not fast enough. We can chat about it if you have any ideas/suggestions. > Seems potentially inefficient to iterate all CodeBlocks in the cache every > time you call evaluate(). Can we do better here? Seems like we might be able > to keep an explicit set of CodeBlocks with updates, or batch this iteration > work with a timer. Yes, we can definitely do better here.

Geoffrey Garen

Comment 18 2019-06-04 16:17:51 PDT

> I tried PLT5 with 3 iterations, it was not an improvement. Even the 3rd run > was not faster. That's really interesting! Seems to indicate that something is too slow about the WebKit version of deserialization. I'd suggest focusing on the 3rd run number for now. We probably shouldn't enable this feature in WebKit until it's at least a speedup on the 3rd run. > I think the next step is to measure how much time we spend > in parsing + bytecode generation vs loading from cache + deserialization to > see where are we not fast enough. We can chat about it if you have any > ideas/suggestions. Sounds good. One surprising detail is that we know that deserialization is a speedup in the JSC-only context. I think the best next step is to profile the 3rd run of the PLT and look for time spent in bytecode stuff. > > Seems potentially inefficient to iterate all CodeBlocks in the cache every > > time you call evaluate(). Can we do better here? Seems like we might be able > > to keep an explicit set of CodeBlocks with updates, or batch this iteration > > work with a timer. > > Yes, we can definitely do better here. 👍

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebKit2

Assignee

Tadeu Zagallo

Reported

2019-01-30 14:13 PST

Modified

2019-06-04 16:17 PDT History

CC List

17 users Show

URL

Keywords

Depends on

Blocks