Bug 164342

Summary: Web Inspector: Associate Worker Resources with the Worker and not the Page
Product: WebKit Reporter: Joseph Pecoraro <joepeck>
Component: Web InspectorAssignee: Joseph Pecoraro <joepeck>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, bburg, buildbot, cdumez, commit-queue, dbates, esprehn+autocc, japhet, joepeck, kangil.han, keith_miller, mark.lam, mattbaker, msaboff, nvasilyev, rniwa, saam, timothy, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
Attachments:
Description Flags
[PATCH] Proposed Fix
buildbot: commit-queue-
Archive of layout-test-results from ews113 for mac-yosemite
none
Archive of layout-test-results from ews100 for mac-yosemite
none
[PATCH] Proposed Fix timothy: review+, timothy: commit-queue-

Description Joseph Pecoraro 2016-11-02 14:52:56 PDT
Summary:
Associate Worker Resources with the Worker and not the Page

Test:
// worker.js
onmessage = function() {
    var xhr = new XMLHttpRequest;
    xhr.open("GET", "data.json?" + Date.now());
    xhr.send();
}

// test.html
<script>
let worker = new Worker("worker.js");
setInterval(() => {
    worker.postMessage("load");
}, 1000);

Steps to Reproduce:
1. Inspect test page
2. Show Resources sidebar
  => "worker.js" / "data.json" resources show up under the Page, expected them under the Worker

Notes:
- Scripts loaded in the worker (importScripts) are already associated with the Worker because we get that information differently
- Resources (e.g. XHRs, Fetch) are currently associated with the page because WebCore actually loads these through the Page

The problem we have to solve here is to who originally started a request and pass that data to Web Inspector so the frontend knows how to display that resource. There are two scenarios to consider:

    1. Web Inspector is already open when a load happens
      => Network.requestWillBeSent, Network.responseReceived/loadingFinished/loadingFailed
      => having the targetIdentifier here Web Inspector can act on this information.

    2. Web Inspector is opened after the load happened
      => Page.getResourceTree
      => Web Inspector has never shown XHR/Fetch/Media raw resource resources unless inspector is open
      => However we should not show "worker.js" as associated with the Page if it was the main resource of a Worker

Edge cases to consider:

    • new Worker("file-does-not-exist.js") => 404 should appear where?
    • new Worker("worker.js") => fronted doesn't know about this worker yet, so this resource will be pending for a future worker identifier
    • resource load is served from the Memory Cache and we do weird stuff (CachedResourceLoader::shouldContinueAfterNotifyingLoadedFromMemoryCache)

In terms of implementation. The ResourceRequest is the only object that seems to carry through from the person originally starting a request, persist through cached resource requests, and available when notifying resource load observers (e.g. Inspector). We could add an identifier that Inspector could use.

On the frontend, this now means that a Target will need to maintain a list of Resources and Scripts, just like a Page's Frame contains Resources and Scripts. I'll probably end up duplicating a lot of the logic that Frame has here. Ideally we can unify that later, it is already a bit messy. (Frame.extraScripts vs DebuggerManager.scripts).
Comment 1 Radar WebKit Bug Importer 2016-11-02 14:53:21 PDT
<rdar://problem/29075775>
Comment 2 Joseph Pecoraro 2016-11-04 14:46:28 PDT
Created attachment 293933 [details]
[PATCH] Proposed Fix
Comment 3 Build Bot 2016-11-04 16:02:04 PDT
Comment on attachment 293933 [details]
[PATCH] Proposed Fix

Attachment 293933 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/2463814

New failing tests:
inspector/worker/resources-in-worker.html
Comment 4 Build Bot 2016-11-04 16:02:08 PDT
Created attachment 293946 [details]
Archive of layout-test-results from ews113 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews113  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 5 Build Bot 2016-11-04 16:03:55 PDT
Comment on attachment 293933 [details]
[PATCH] Proposed Fix

Attachment 293933 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/2463847

New failing tests:
inspector/worker/resources-in-worker.html
Comment 6 Build Bot 2016-11-04 16:03:59 PDT
Created attachment 293947 [details]
Archive of layout-test-results from ews100 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 7 Joseph Pecoraro 2016-11-04 16:33:18 PDT
WebKit1 failures. Curious!
Comment 8 Joseph Pecoraro 2016-11-04 19:08:46 PDT
Created attachment 293973 [details]
[PATCH] Proposed Fix
Comment 9 youenn fablet 2016-11-07 05:17:24 PST
Comment on attachment 293973 [details]
[PATCH] Proposed Fix

Here is some feedback related to the loading resource code part of the patch.

The patch seems mostly right to me.
That said, I am still not a big fan of adding the id to ResourceRequest.
The id seems more related to CachedResourceRequest/ResourceLoader.

As it is now, we are adding it to ResourceRequest but accessing it through CachedResource.
If it is difficult to pass it from CachedResourceRequest to ResourceLoader, storing it in CachedResource seems as good to me. 

If we were adding a new method to pass this parameter to InspectorNetworkAgent, this method could be called from CachedResourceLoader based on CachedResourceRequest information.
A bit similar to how ResourceTiming is tied to CachedResourceLoader.

View in context: https://bugs.webkit.org/attachment.cgi?id=293973&action=review

> Source/WebCore/dom/ScriptExecutionContext.h:95
> +    virtual String resourceRequestIdentifier() const { return String(); };

Would "return { };" work?

> Source/WebCore/inspector/InspectorNetworkAgent.cpp:314
> +    m_frontendDispatcher->requestWillBeSent(requestId, m_pageAgent->frameId(loader.frame()), m_pageAgent->loaderId(&loader), loader.url().string(), buildObjectForResourceRequest(request), timestamp(), initiatorObject, buildObjectForResourceResponse(redirectResponse, nullptr), type != InspectorPageAgent::OtherResource ? &resourceType : nullptr, targetId.isEmpty() ? nullptr : &targetId);

Is requestId specific to that particular request or is it global to the requests needed to convey a single CachedResource load?
If that is the latter, we are passing several times the same targetId associated with the same requestId.
It seems more appropriate to add a new call (if none is existing) to say that a new resource loading will start, something like:
void InspectorNetworkAgent::willStartResourceLoading(unsigned long identifier, const String* targetId /* plus whatever other parameters that might be good to pass like the type of resource?*/);

I am not familiar with InspectorNetworkAgent so maybe I am missing something.

> Source/WebCore/inspector/InspectorPageAgent.cpp:964
> +        String targetId = cachedResource->resourceRequest().initiatorIdentifier();

As can be seen here, even though the id is owned by resource request, we end up storing it in the CachedResource.
It might be just better to have it in CachedResource.
Ideally, it should be in ResourceLoader.

> Source/WebCore/platform/network/ResourceRequestBase.h:169
> +    void setInitiatorIdentifier(const String& identifier) { m_initiatorIdentifier = identifier; }

Should take a String&&

> Source/WebCore/platform/network/cf/ResourceRequestCFNet.cpp:341
> +    auto oldInitiatorIdentifier = initiatorIdentifier();

This is that kind of code that leads me to think we should not put it at ResourceRequest level, even if that is somehow convenient.

> Source/WebCore/workers/WorkerGlobalScope.cpp:55
> +WorkerGlobalScope::WorkerGlobalScope(const URL& url, const String& identifier, const String& userAgent, WorkerThread& thread, bool shouldBypassMainWorldContentSecurityPolicy, RefPtr<SecurityOrigin>&& topOrigin, IDBClient::IDBConnectionProxy* connectionProxy, SocketProvider* socketProvider)

identifier should probable take a String&& here.
Probably true also for other parameters.

> Source/WebCore/workers/WorkerScriptLoader.h:55
> +        void loadAsynchronously(ScriptExecutionContext*, const URL&, FetchOptions::Mode, ContentSecurityPolicyEnforcement, const String& initiatorIdentifier, WorkerScriptLoaderClient*);

Can first and last parameters be null?
If that is not possible, we should try to make them references.

> Source/WebCore/workers/WorkerThread.cpp:164
> +        m_workerGlobalScope = createWorkerGlobalScope(m_startupData->m_scriptURL, m_startupData->m_identifier, m_startupData->m_userAgent, m_startupData->m_contentSecurityPolicyResponseHeaders, m_startupData->m_shouldBypassMainWorldContentSecurityPolicy, WTFMove(m_startupData->m_topOrigin));

If we move m_topOrigin, we should try to move all strings as well. At least m_identifier since it is the newly added.

> Source/WebCore/xml/XMLHttpRequest.cpp:652
> +    request.setInitiatorIdentifier(scriptExecutionContext()->resourceRequestIdentifier());

It would probably be better to not do that at each ThreadableLoaderClient, but within ThreadableLoader code.
That way, we will handle all clients: fetch, xhr, event source...
If the resource request identifier is only useful for workers, I guess that can be handled solely in WorkerThreadableLoader.
Comment 10 Joseph Pecoraro 2016-11-07 11:11:02 PST
Comment on attachment 293973 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=293973&action=review

>> Source/WebCore/dom/ScriptExecutionContext.h:95
>> +    virtual String resourceRequestIdentifier() const { return String(); };
> 
> Would "return { };" work?

I have never seen us use this syntax for Strings. Seems that would make it really really hard to find cases of null String.

>> Source/WebCore/inspector/InspectorNetworkAgent.cpp:314
>> +    m_frontendDispatcher->requestWillBeSent(requestId, m_pageAgent->frameId(loader.frame()), m_pageAgent->loaderId(&loader), loader.url().string(), buildObjectForResourceRequest(request), timestamp(), initiatorObject, buildObjectForResourceResponse(redirectResponse, nullptr), type != InspectorPageAgent::OtherResource ? &resourceType : nullptr, targetId.isEmpty() ? nullptr : &targetId);
> 
> Is requestId specific to that particular request or is it global to the requests needed to convey a single CachedResource load?
> If that is the latter, we are passing several times the same targetId associated with the same requestId.
> It seems more appropriate to add a new call (if none is existing) to say that a new resource loading will start, something like:
> void InspectorNetworkAgent::willStartResourceLoading(unsigned long identifier, const String* targetId /* plus whatever other parameters that might be good to pass like the type of resource?*/);
> 
> I am not familiar with InspectorNetworkAgent so maybe I am missing something.

ResourceLoader::willSendRequestInternal initializes the requestIdentifier. So it seems that is a ResourceLoader level. I am not sure if that is 1-to-1 with a ResourceRequest or not.

Either way, we only ever pass the targetId once per request, right here. So I think this willSendRequest is identical to the willStartResourceLoading you are describing.

>> Source/WebCore/inspector/InspectorPageAgent.cpp:964
>> +        String targetId = cachedResource->resourceRequest().initiatorIdentifier();
> 
> As can be seen here, even though the id is owned by resource request, we end up storing it in the CachedResource.
> It might be just better to have it in CachedResource.
> Ideally, it should be in ResourceLoader.

I don't understand this comment. ResourceLoader goes away, right? We need this information even after the loader goes away, such as now when opening Web Inspector after a page has completed loading, and knowing that a CachedResource was loaded outside of the page in a Worker.

You may be pointing out an issue? Maybe there is only a single CachedResource where there may have been multiple ResourceRequests so we don't get a complete picture.

>> Source/WebCore/workers/WorkerThread.cpp:164
>> +        m_workerGlobalScope = createWorkerGlobalScope(m_startupData->m_scriptURL, m_startupData->m_identifier, m_startupData->m_userAgent, m_startupData->m_contentSecurityPolicyResponseHeaders, m_startupData->m_shouldBypassMainWorldContentSecurityPolicy, WTFMove(m_startupData->m_topOrigin));
> 
> If we move m_topOrigin, we should try to move all strings as well. At least m_identifier since it is the newly added.

TopOrigin is a PassRefPtr<SecurityOrigin>, so there is cleanup that can/should be done here anyways.
Comment 11 Timothy Hatcher 2016-11-07 11:42:50 PST
Comment on attachment 293973 [details]
[PATCH] Proposed Fix

The front-end changes look good.
Comment 12 youenn fablet 2016-11-07 12:21:08 PST
> >> Source/WebCore/dom/ScriptExecutionContext.h:95
> >> +    virtual String resourceRequestIdentifier() const { return String(); };
> > 
> > Would "return { };" work?

I guess it is a matter of taste. It is used more and more.
Coding style might want to say something about it.

> I have never seen us use this syntax for Strings. Seems that would make it
> really really hard to find cases of null String.
> 
> >> Source/WebCore/inspector/InspectorNetworkAgent.cpp:314
> >> +    m_frontendDispatcher->requestWillBeSent(requestId, m_pageAgent->frameId(loader.frame()), m_pageAgent->loaderId(&loader), loader.url().string(), buildObjectForResourceRequest(request), timestamp(), initiatorObject, buildObjectForResourceResponse(redirectResponse, nullptr), type != InspectorPageAgent::OtherResource ? &resourceType : nullptr, targetId.isEmpty() ? nullptr : &targetId);
> > 
> > Is requestId specific to that particular request or is it global to the requests needed to convey a single CachedResource load?
> > If that is the latter, we are passing several times the same targetId associated with the same requestId.
> > It seems more appropriate to add a new call (if none is existing) to say that a new resource loading will start, something like:
> > void InspectorNetworkAgent::willStartResourceLoading(unsigned long identifier, const String* targetId /* plus whatever other parameters that might be good to pass like the type of resource?*/);
> > 
> > I am not familiar with InspectorNetworkAgent so maybe I am missing something.
> 
> ResourceLoader::willSendRequestInternal initializes the requestIdentifier.
> So it seems that is a ResourceLoader level. I am not sure if that is 1-to-1
> with a ResourceRequest or not.

So what is happening precisely in case of redirections?

It would be convenient to display in the network pane each individual request/response (if there are N redirections, display a group of N+1 entries) when debugging.
Maybe also being able to group preflight request/response.
The group would be related to the single CachedResourceRequest/CachedResource.
To add to the complexity of grouping, DocumentThreadableLoader is sometimes stopping a ResourceLoader to restart the load after doing preflight with a new CachedResource/ResourceLoader.

> >> Source/WebCore/inspector/InspectorPageAgent.cpp:964
> >> +        String targetId = cachedResource->resourceRequest().initiatorIdentifier();
> > 
> > As can be seen here, even though the id is owned by resource request, we end up storing it in the CachedResource.
> > It might be just better to have it in CachedResource.
> > Ideally, it should be in ResourceLoader.
> 
> I don't understand this comment. ResourceLoader goes away, right? We need
> this information even after the loader goes away, such as now when opening
> Web Inspector after a page has completed loading, and knowing that a
> CachedResource was loaded outside of the page in a Worker.

Are you referring to the resource pane?
I am not sure we can expect web inspector to reconstruct network pane entries correctly if it was launched after the loads completed.
Probably only cached resources will be displayed in that case and with a reduced set of information.

> You may be pointing out an issue? Maybe there is only a single
> CachedResource where there may have been multiple ResourceRequests so we
> don't get a complete picture.

Right, a single CachedResource may be reused and loaded several times from different users.
Comment 13 Joseph Pecoraro 2016-11-09 22:10:43 PST
<https://trac.webkit.org/changeset/208520>