Bug 194047 - Integrate JSC bytecode cache with WebKit
Summary: Integrate JSC bytecode cache with WebKit
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Tadeu Zagallo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-30 14:13 PST by Tadeu Zagallo
Modified: 2019-06-04 16:17 PDT (History)
17 users (show)

See Also:


Attachments
Patch (71.33 KB, patch)
2019-01-30 14:27 PST, Tadeu Zagallo
no flags Details | Formatted Diff | Diff
Patch (68.25 KB, patch)
2019-02-05 16:08 PST, Tadeu Zagallo
no flags Details | Formatted Diff | Diff
Patch (67.32 KB, patch)
2019-02-05 16:19 PST, Tadeu Zagallo
no flags Details | Formatted Diff | Diff
Patch (69.50 KB, patch)
2019-02-11 15:50 PST, Tadeu Zagallo
no flags Details | Formatted Diff | Diff
WIP (109.57 KB, patch)
2019-04-17 14:52 PDT, Tadeu Zagallo
no flags Details | Formatted Diff | Diff
Patch (174.73 KB, patch)
2019-05-11 05:41 PDT, Tadeu Zagallo
no flags Details | Formatted Diff | Diff
Patch (171.53 KB, patch)
2019-05-13 05:43 PDT, Tadeu Zagallo
tzagallo: review?
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tadeu Zagallo 2019-01-30 14:13:15 PST
Add the ability to the network process to generate, cache and retrieve bytecode for JavaScript files.
Comment 1 Tadeu Zagallo 2019-01-30 14:27:39 PST
Created attachment 360620 [details]
Patch
Comment 2 Alex Christensen 2019-01-30 21:26:43 PST
Comment on attachment 360620 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=360620&action=review

I would expect more SessionIDs in this patch to say which WebsiteDataStore to cache the bytecode in.  I guess we can get it from the pageID or get it the same way we get the pageID.

> Source/JavaScriptCore/runtime/CodeCache.h:87
>      int64_t age;
> +    bool written;

Both of these could use initializer lists, or get rid of the default constructor.

> Source/WebCore/loader/cache/CachedScript.cpp:117
> +        memcpy(memory.get() + it.beginPosition, it.segment->data(), it.segment->size());

This seems like an unnecessary copy.

> Source/WebCore/loader/cache/CachedScript.cpp:118
> +    m_cachedBytecode = JSC::CachedBytecode { WTFMove(memory), size };

This is basically just a Vector<uint8_t>

> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:850
> +        Vector<IPC::DerivedData>* derivedData = new Vector<IPC::DerivedData>();

No need to call new here.  Just WTFMove an instance into the lambda capture list.

> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:857
> +                if (loader->hasOneRef())
> +                    return;

Why?

> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:2
> + * Copyright (C) 2019 Apple Inc.  All rights reserved.

There seems to be an extra space between Inc. and All.

> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:75
> +    SHA1::Digest m_bodyHash;

Why SHA1?

> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:111
> +        for (RefPtr<Plan> plan : m_plans)

Would "const auto&" work, or do we need to keep a reference to this plan as we use it?

> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:135
> +    RefPtr<JSC::VM> m_vm;

Ref<JSC::VM>

> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:144
> +    m_thread = new Thread(locker, *this);

This is a memory leak.  It should be adopted.  I'm also not sure you need to create a thread just for this.  I'm also curious why WTF::Thread::create returns a Ref<Thread> but it has a comment saying it can return nullptr.  So many thread questions.

> Source/WebKit/NetworkProcess/cache/BytecodeCache.h:50
> +    using Plans = Vector<RefPtr<Plan>, 8>;

Can this be a Vector<Ref<Plan>>?

> Source/WebKit/NetworkProcess/cache/BytecodeCache.h:55
> +    static RefPtr<BytecodeCache> create(NetworkProcess& networkProcess)

This could return a Ref<BytecodeCache>

> Source/WebKit/NetworkProcess/cache/BytecodeCache.h:71
> +    RefPtr<AutomaticThread> m_thread;

Can this be a Ref<AutomaticThread>?  Also still wondering why you need a thread.
Comment 3 Tadeu Zagallo 2019-02-01 13:41:41 PST
Comment on attachment 360620 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=360620&action=review

>> Source/JavaScriptCore/runtime/CodeCache.h:87
>> +    bool written;
> 
> Both of these could use initializer lists, or get rid of the default constructor.

I believe the default constructor is necessary for using it in the HashMap below, no?

>> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:850
>> +        Vector<IPC::DerivedData>* derivedData = new Vector<IPC::DerivedData>();
> 
> No need to call new here.  Just WTFMove an instance into the lambda capture list.

That would move it multiple times.

>> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:857
>> +                    return;
> 
> Why?

Not sure actually, this was part of the code that you had deleted and I added back. I assume that if this is the only ref to the loader, then we don't care about the response anymore?

>> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:75
>> +    SHA1::Digest m_bodyHash;
> 
> Why SHA1?

That's what the NetworkCache uses.

>> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:135
>> +    RefPtr<JSC::VM> m_vm;
> 
> Ref<JSC::VM>

The VM might be null if we don't have an underlying thread.

>> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:144
>> +    m_thread = new Thread(locker, *this);
> 
> This is a memory leak.  It should be adopted.  I'm also not sure you need to create a thread just for this.  I'm also curious why WTF::Thread::create returns a Ref<Thread> but it has a comment saying it can return nullptr.  So many thread questions.

Oops, I copied it from the JITWorklist, I guess I should fix it there too...

>> Source/WebKit/NetworkProcess/cache/BytecodeCache.h:71
>> +    RefPtr<AutomaticThread> m_thread;
> 
> Can this be a Ref<AutomaticThread>?  Also still wondering why you need a thread.

I'm using the thread to generate the bytecode for the script. I don't think we should block the main thread for that.
Comment 4 Antti Koivisto 2019-02-04 09:35:21 PST
A patch that applies to trunk would be nice.
Comment 5 Antti Koivisto 2019-02-04 09:42:11 PST
Comment on attachment 360620 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=360620&action=review

Looks generally good but lets get it apply before r+ing.

>>> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:850
>>> +        Vector<IPC::DerivedData>* derivedData = new Vector<IPC::DerivedData>();
>> 
>> No need to call new here.  Just WTFMove an instance into the lambda capture list.
> 
> That would move it multiple times.

Please use smart pointers for memory management. Calling new (outside create() type functions that return smart pointers) is discouraged. 

auto derivedData = std::make_unique<Vector<IPC::DerivedData>>();

>>> Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:857
>>> +                    return;
>> 
>> Why?
> 
> Not sure actually, this was part of the code that you had deleted and I added back. I assume that if this is the only ref to the loader, then we don't care about the response anymore?

This lambda is then the only ref keeping the loader alive which implies there are no clients. A more stylish solution would use WeakPtr.
Comment 6 Tadeu Zagallo 2019-02-05 16:08:35 PST
Created attachment 361237 [details]
Patch
Comment 7 Tadeu Zagallo 2019-02-05 16:19:55 PST
Created attachment 361242 [details]
Patch
Comment 8 Tadeu Zagallo 2019-02-11 15:50:43 PST
Created attachment 361724 [details]
Patch
Comment 9 Antti Koivisto 2019-02-13 05:09:30 PST
Comment on attachment 361724 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=361724&action=review

> Source/WebCore/loader/cache/CachedResource.cpp:319
> +        auto* frameLoader = m_loader->frameLoader();
> +        auto* frameLoaderClient = frameLoader ? &frameLoader->client() : nullptr;
> +        if (frameLoaderClient) {
> +            m_webPageID = frameLoaderClient->pageID().valueOr(0);
> +            m_webFrameID = frameLoaderClient->frameID().valueOr(0);
> +        }

This is not correct. A CachedResource can have clients in multiple pages and frames.

> Source/WebKit/NetworkProcess/cache/BytecodeCache.cpp:166
> +    cache->retrieve(request, { webPageID,  webFrameID }, [bytecodeCache = makeRef(*this), request = WebCore::ResourceRequest { request }](auto entry, auto) {

I think you can just pass empty frameId and never pass webPageID/webFrameID around in the first place. It appears to be used for logging purposes only.
Comment 10 Tadeu Zagallo 2019-04-17 14:52:11 PDT
Created attachment 367681 [details]
WIP
Comment 11 Tadeu Zagallo 2019-05-11 05:41:02 PDT
Created attachment 369651 [details]
Patch
Comment 12 Tadeu Zagallo 2019-05-11 05:42:20 PDT
I still need to update the ChangeLog for WebKit, and I'm still running more benchmarks, but this is mostly working, so it might be nice to get some comments already if anyone wants to have a look at it.
Comment 13 Tadeu Zagallo 2019-05-13 05:43:46 PDT
Created attachment 369724 [details]
Patch

Rebase
Comment 14 Geoffrey Garen 2019-05-14 10:53:24 PDT
Comment on attachment 369724 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=369724&action=review

What's your plan for validating the performance benefit of this change?

> Source/JavaScriptCore/runtime/Completion.cpp:148
> +    vm.codeCache()->write(vm);

Will this write out all code in the cache? If so, can we do something more efficient to write out only the new code we just generated?

> Source/WebCore/loader/cache/CachedScript.cpp:121
> +    int fd = open(filename.ascii().data(), O_RDONLY | O_SHLOCK | O_NONBLOCK);

It would be nice to use WebCore::FileHandle instead of int. It's more portable, and more type-safe.

Since this fd is ultimately used in JavaScriptCore, you would need to move the FileHandle class to WTF. Since FileSystem is already in WTF, that might not be too hard.

I'd suggest doing that in a follow-up patch.
Comment 15 Tadeu Zagallo 2019-05-15 09:22:22 PDT
Comment on attachment 369724 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=369724&action=review

I have been testing with PLT and JetStream2, but not much luck there. I also ran some manual tests with some big scripts with short execution, and there I see a ~10% slowdown for the second request (when we generate the bytecode in the network process) and a ~65% speedup for the 3rd request onwards, once the cache has been populated.

>> Source/JavaScriptCore/runtime/Completion.cpp:148
>> +    vm.codeCache()->write(vm);
> 
> Will this write out all code in the cache? If so, can we do something more efficient to write out only the new code we just generated?

This will send a message to the network process for every SourceProvider that has been updated since the last call to write. The message has the metadata with what should be be cached.

>> Source/WebCore/loader/cache/CachedScript.cpp:121
>> +    int fd = open(filename.ascii().data(), O_RDONLY | O_SHLOCK | O_NONBLOCK);
> 
> It would be nice to use WebCore::FileHandle instead of int. It's more portable, and more type-safe.
> 
> Since this fd is ultimately used in JavaScriptCore, you would need to move the FileHandle class to WTF. Since FileSystem is already in WTF, that might not be too hard.
> 
> I'd suggest doing that in a follow-up patch.

Sounds good. The reason I haven't been using the FileSystem in most places is that it doesn't allow for passing custom flags to open when mmapping a file, but I should probably add the functionality and updates the bytecode cache code.
Comment 16 Geoffrey Garen 2019-05-28 15:42:58 PDT
> I have been testing with PLT and JetStream2, but not much luck there. I also
> ran some manual tests with some big scripts with short execution, and there
> I see a ~10% slowdown for the second request (when we generate the bytecode
> in the network process) and a ~65% speedup for the 3rd request onwards, once
> the cache has been populated.

We need to figure out a strategy for performance testing that can tell us whether the strategy in this patch is OK, or needs improvement.

I think the ideal test would find some way to run PLT5 where iteration 1 commits entries to the cache and iteration 2 uses those cached entries. One way to do this might be just to run PLT5 with the iteration count explicitly set to 3. That way, the first run is uncached, the second run is cached (but without a bytecode cache), and the third run is fully cached (with a bytecode cache).

> >> Source/JavaScriptCore/runtime/Completion.cpp:148
> >> +    vm.codeCache()->write(vm);
> > 
> > Will this write out all code in the cache? If so, can we do something more efficient to write out only the new code we just generated?
> 
> This will send a message to the network process for every SourceProvider
> that has been updated since the last call to write. The message has the
> metadata with what should be be cached.

Seems potentially inefficient to iterate all CodeBlocks in the cache every time you call evaluate(). Can we do better here? Seems like we might be able to keep an explicit set of CodeBlocks with updates, or batch this iteration work with a timer.
Comment 17 Tadeu Zagallo 2019-05-29 02:33:01 PDT
(In reply to Geoffrey Garen from comment #16)
> We need to figure out a strategy for performance testing that can tell us
> whether the strategy in this patch is OK, or needs improvement.
> 
> I think the ideal test would find some way to run PLT5 where iteration 1
> commits entries to the cache and iteration 2 uses those cached entries. One
> way to do this might be just to run PLT5 with the iteration count explicitly
> set to 3. That way, the first run is uncached, the second run is cached (but
> without a bytecode cache), and the third run is fully cached (with a
> bytecode cache).

I tried PLT5 with 3 iterations, it was not an improvement. Even the 3rd run was not faster. I think the next step is to measure how much time we spend in parsing + bytecode generation vs loading from cache + deserialization to see where are we not fast enough. We can chat about it if you have any ideas/suggestions.

> Seems potentially inefficient to iterate all CodeBlocks in the cache every
> time you call evaluate(). Can we do better here? Seems like we might be able
> to keep an explicit set of CodeBlocks with updates, or batch this iteration
> work with a timer.

Yes, we can definitely do better here.
Comment 18 Geoffrey Garen 2019-06-04 16:17:51 PDT
> I tried PLT5 with 3 iterations, it was not an improvement. Even the 3rd run
> was not faster.

That's really interesting! Seems to indicate that something is too slow about the WebKit version of deserialization. I'd suggest focusing on the 3rd run number for now. We probably shouldn't enable this feature in WebKit until it's at least a speedup on the 3rd run.

> I think the next step is to measure how much time we spend
> in parsing + bytecode generation vs loading from cache + deserialization to
> see where are we not fast enough. We can chat about it if you have any
> ideas/suggestions.

Sounds good. One surprising detail is that we know that deserialization is a speedup in the JSC-only context.

I think the best next step is to profile the 3rd run of the PLT and look for time spent in bytecode stuff.

> > Seems potentially inefficient to iterate all CodeBlocks in the cache every
> > time you call evaluate(). Can we do better here? Seems like we might be able
> > to keep an explicit set of CodeBlocks with updates, or batch this iteration
> > work with a timer.
> 
> Yes, we can definitely do better here.

👍