Bug 172519 - [WK2] Address thread safety issues with ResourceLoadStatistics
Summary: [WK2] Address thread safety issues with ResourceLoadStatistics
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Brent Fulgham
URL:
Keywords: InRadar
Depends on:
Blocks: 172653 172665
  Show dependency treegraph
 
Reported: 2017-05-23 12:07 PDT by Brent Fulgham
Modified: 2017-05-27 10:59 PDT (History)
12 users (show)

See Also:


Attachments
Patch (14.47 KB, patch)
2017-05-23 13:36 PDT, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (44.55 KB, patch)
2017-05-23 18:03 PDT, Brent Fulgham
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews107 for mac-elcapitan-wk2 (429.97 KB, application/zip)
2017-05-23 18:54 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews103 for mac-elcapitan (1.77 MB, application/zip)
2017-05-23 19:04 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews124 for ios-simulator-wk2 (11.00 MB, application/zip)
2017-05-23 19:18 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews112 for mac-elcapitan (2.47 MB, application/zip)
2017-05-23 19:22 PDT, Build Bot
no flags Details
Patch (47.87 KB, patch)
2017-05-23 21:58 PDT, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (51.51 KB, patch)
2017-05-24 18:04 PDT, Brent Fulgham
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews125 for ios-simulator-wk2 (12.12 MB, application/zip)
2017-05-24 19:29 PDT, Build Bot
no flags Details
Patch (53.04 KB, patch)
2017-05-25 10:58 PDT, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (60.74 KB, patch)
2017-05-26 14:37 PDT, Brent Fulgham
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (922.62 KB, application/zip)
2017-05-26 15:50 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews125 for ios-simulator-wk2 (916.53 KB, application/zip)
2017-05-26 16:00 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews115 for mac-elcapitan (1.55 MB, application/zip)
2017-05-26 16:06 PDT, Build Bot
no flags Details
Patch for landing. (62.33 KB, patch)
2017-05-26 16:27 PDT, Brent Fulgham
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brent Fulgham 2017-05-23 12:07:19 PDT
The ResourceLoadStatistics object can get called from multiple threads, which is bad because HashTable does not support concurrent access.

This patch cleans up a few incorrect accesses, and adds assertions to help avoid this in the future.
Comment 1 Brent Fulgham 2017-05-23 12:07:44 PDT
<rdar://problem/31707642>
Comment 2 Brent Fulgham 2017-05-23 13:36:02 PDT
Created attachment 311044 [details]
Patch
Comment 3 Chris Dumez 2017-05-23 14:03:21 PDT
Comment on attachment 311044 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311044&action=review

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:124
> +            m_statisticsQueue->dispatch([this, protectedThis = makeRef(*this), topDomains = WTFMove(domainsWithDeletedWebsiteData)] () mutable {

Just moving a Vector<String> from one thread to another is not necessarily safe. Since someone else on this thread my hold references to the same strings.
I believe to need to create isolatedCopy() of the strings before you pass them to another thread.

I think you can use CrossThreadCopier<Vector<String>>::copy().

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:212
> +            m_statisticsQueue->dispatch([this, protectedThis = makeRef(*this), topDomains = WTFMove(topPrivatelyControlledDomainsWithWebsiteData)] () mutable {

ditto here.
Comment 4 Chris Dumez 2017-05-23 14:04:08 PDT
Comment on attachment 311044 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311044&action=review

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:121
>      RunLoop::main().dispatch([prevalentResourceDomains = WTFMove(prevalentResourceDomains), this] () mutable {

This does not protect |this| :(
Comment 5 Brent Fulgham 2017-05-23 18:03:01 PDT
Created attachment 311085 [details]
Patch
Comment 6 Build Bot 2017-05-23 18:54:39 PDT
Comment on attachment 311085 [details]
Patch

Attachment 311085 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/3803816

Number of test failures exceeded the failure limit.
Comment 7 Build Bot 2017-05-23 18:54:40 PDT
Created attachment 311086 [details]
Archive of layout-test-results from ews107 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 8 Chris Dumez 2017-05-23 18:54:51 PDT
Comment on attachment 311085 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311085&action=review

> Source/WebCore/loader/ResourceLoadObserver.cpp:145
> +    m_queue->dispatch([this, isMainFrame, isRedirect, sourcePrimaryDomain = WTFMove(sourcePrimaryDomain), mainFramePrimaryDomain = WTFMove(mainFramePrimaryDomain), targetURL = CrossThreadCopier<URL>::copy(targetURL), mainFrameURL = CrossThreadCopier<URL>::copy(mainFrameURL)] () {

sourcePrimaryDomain = sourcePrimaryDomain.isolatedString(), it is not safe otherwise to pass the string to another thread.
Ditto for mainFramePrimaryDomain.

For the URLs, I personally think it would looks simpler to use URL::isolatedCopy() rather than CrossThreadCopier.

> Source/WebCore/loader/ResourceLoadObserver.cpp:148
> +        if (targetPrimaryDomain == mainFramePrimaryDomain || targetPrimaryDomain == sourcePrimaryDomain)

Seems like this check could be moved before the dispatch and done on the main thread?

> Source/WebCore/loader/ResourceLoadObserver.cpp:234
> +    m_queue->dispatch([this, isRedirect, sourcePrimaryDomain = WTFMove(sourcePrimaryDomain), mainFramePrimaryDomain = WTFMove(mainFramePrimaryDomain), targetURL = CrossThreadCopier<URL>::copy(targetURL), mainFrameURL = CrossThreadCopier<URL>::copy(mainFrameURL)] () {

Same comment about using isolatedCopy() for both Strings and URLs.

> Source/WebCore/loader/ResourceLoadObserver.cpp:238
> +        if (targetPrimaryDomain == mainFramePrimaryDomain || (isRedirect && targetPrimaryDomain == sourcePrimaryDomain))

Seems like this could be moved before the dispatch.

> Source/WebCore/loader/ResourceLoadObserver.cpp:312
> +    m_queue->dispatch([this, targetPrimaryDomain = WTFMove(targetPrimaryDomain), mainFramePrimaryDomain = WTFMove(mainFramePrimaryDomain), mainFrameURL = CrossThreadCopier<URL>::copy(mainFrameURL)] () {

Same comment about isolatedCopy() for Strings and URLs.

> Source/WebCore/loader/ResourceLoadObserver.cpp:350
>      auto primaryDomainStr = primaryDomain(url);

primaryDomainStr is not a good name.

> Source/WebCore/loader/ResourceLoadObserver.cpp:353
> +    m_queue->dispatch([this, primaryDomainStr = WTFMove(primaryDomainStr)] () {

Need to isolatedCopy() that string.

> Source/WebCore/loader/ResourceLoadObserver.cpp:374
> +    m_queue->dispatch([this, primaryDomainStr = WTFMove(primaryDomainStr)] () {

Need to isolatedCopy() that string.

> Source/WebCore/loader/ResourceLoadObserver.cpp:379
> +        m_store->fireShouldPartitionCookiesHandler({primaryDomainStr}, { }, false);

I think we usually have spaces inside the curly brackets, like so { primaryDomain }

> Source/WebCore/loader/ResourceLoadObserver.cpp:391
> +    m_queue->dispatch([this, primaryDomainStr = WTFMove(primaryDomainStr)] () {

Need isolatedCopy().

> Source/WebCore/loader/ResourceLoadObserver.cpp:404
> +    BinarySemaphore semaphore;

Should confirm with Brady or Phil or Geoff that this is the best way to do this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:410
> +    m_queue->dispatch([this, &semaphore, &hadUserInteraction, primaryDomainStr = WTFMove(primaryDomainStr)] () {

Needs isolatedCopy() of the string.

> Source/WebCore/loader/ResourceLoadObserver.cpp:425
> +    auto primaryDomainStr = primaryDomain(url);

Bad name.

> Source/WebCore/loader/ResourceLoadObserver.cpp:428
> +    m_queue->dispatch([this, primaryDomainStr = WTFMove(primaryDomainStr)] () {

Need isolatedCopy().

> Source/WebCore/loader/ResourceLoadObserver.cpp:445
> +    m_queue->dispatch([this, &semaphore, &isPrevalentResource, primaryDomainStr = WTFMove(primaryDomainStr)] () {

Need isolatedCopy() for the string.

> Source/WebCore/loader/ResourceLoadObserver.cpp:460
> +    auto primaryDomainStr = primaryDomain(url);

Bad name.

> Source/WebCore/loader/ResourceLoadObserver.cpp:463
> +    m_queue->dispatch([this, primaryDomainStr = WTFMove(primaryDomainStr)] () {

Need isolatedCopy().

> Source/WebCore/loader/ResourceLoadObserver.cpp:474
> +    auto primaryDomainStr = primaryDomain(url);

Bad name

> Source/WebCore/loader/ResourceLoadObserver.cpp:477
> +    m_queue->dispatch([this, value, primaryDomainStr = WTFMove(primaryDomainStr)] () {

Need isolatedCopy().

> Source/WebCore/loader/ResourceLoadObserver.cpp:491
> +    auto primaryDomainStr = primaryDomain(url);

Bad name

> Source/WebCore/loader/ResourceLoadObserver.cpp:494
> +    m_queue->dispatch([this, &semaphore, &grandfathered, primaryDomainStr = WTFMove(primaryDomainStr)] () {

Need isolatedCopy() for the string.

> Source/WebCore/loader/ResourceLoadObserver.cpp:509
> +    auto primaryTopFrameDomainStr = primaryDomain(topFrame);

Bad name

> Source/WebCore/loader/ResourceLoadObserver.cpp:513
> +    m_queue->dispatch([this, primaryTopFrameDomainStr = WTFMove(primaryTopFrameDomainStr), primarySubFrameDomainStr = WTFMove(primarySubFrameDomainStr)] () {

Need isolatedCopy() for the strings.

> Source/WebCore/loader/ResourceLoadObserver.cpp:524
> +    auto primaryTopFrameDomainStr = primaryDomain(topFrame);

bad name

> Source/WebCore/loader/ResourceLoadObserver.cpp:528
> +    m_queue->dispatch([this, primaryTopFrameDomainStr = WTFMove(primaryTopFrameDomainStr), primarySubresourceDomainStr = WTFMove(primarySubresourceDomainStr)] () {

Need isolatedCopy() for the strings.

> Source/WebCore/loader/ResourceLoadObserver.cpp:539
> +    auto primarySubresourceDomainStr = primaryDomain(subresource);

bad name

> Source/WebCore/loader/ResourceLoadObserver.cpp:543
> +    m_queue->dispatch([this, primaryRedirectDomainStr = WTFMove(primaryRedirectDomainStr), primarySubresourceDomainStr = WTFMove(primarySubresourceDomainStr)] () {

Need isolatedCopy() for the strings.

> Source/WebCore/loader/ResourceLoadObserver.cpp:624
> +    m_queue->dispatch([this, &semaphore, &statistics, origin = origin.isolatedCopy()] () {

statistics is not used inside the lambda.

> Source/WebCore/loader/ResourceLoadObserver.cpp:625
> +        m_store->statisticsForOrigin(origin);

I am assuming you meant to assign the return value to statistics?

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:122
> +    RunLoop::main().dispatch([prevalentResourceDomains = WTFMove(prevalentResourceDomains), this, protectedThis = makeRef(*this)] () mutable {

Passing the Vector of Strings to another thread without isolatedCopying is not safe here. Should probably use CrossThreadCopier.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:125
> +            m_statisticsQueue->dispatch([this, protectedThis = makeRef(*this), topDomains = WTFMove(domainsWithDeletedWebsiteData)] () mutable {

protectedThis = WTFMove(protectedThis) ?
domainsWithDeletedWebsiteData needs cross-thread copying I believe.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:214
> +            m_statisticsQueue->dispatch([this, protectedThis = makeRef(*this), topDomains = WTFMove(topPrivatelyControlledDomainsWithWebsiteData)] () mutable {

protectedThis = WTFMove(protectedThis) ?

I believe topPrivatelyControlledDomainsWithWebsiteData needs to be cross-thread copied.
Comment 9 Build Bot 2017-05-23 19:04:10 PDT
Comment on attachment 311085 [details]
Patch

Attachment 311085 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/3803837

New failing tests:
http/tests/navigation/statistics.html
Comment 10 Build Bot 2017-05-23 19:04:12 PDT
Created attachment 311087 [details]
Archive of layout-test-results from ews103 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 11 Build Bot 2017-05-23 19:18:34 PDT
Comment on attachment 311085 [details]
Patch

Attachment 311085 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/3803841

Number of test failures exceeded the failure limit.
Comment 12 Build Bot 2017-05-23 19:18:36 PDT
Created attachment 311089 [details]
Archive of layout-test-results from ews124 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.6
Comment 13 Build Bot 2017-05-23 19:22:33 PDT
Comment on attachment 311085 [details]
Patch

Attachment 311085 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/3803855

New failing tests:
http/tests/navigation/statistics.html
Comment 14 Build Bot 2017-05-23 19:22:34 PDT
Created attachment 311090 [details]
Archive of layout-test-results from ews112 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 15 Brent Fulgham 2017-05-23 21:58:18 PDT
Created attachment 311099 [details]
Patch
Comment 16 Brady Eidson 2017-05-23 22:01:28 PDT
FWIW:

IndexedDB code does a lot of thread hopping. To make it significantly harder to make threading mistakes I made use of createCrossThreadTask in WTF/CrossThreadTask.h

Right now that class is set up to always call a member function on an object, but we can easily change it to allow arbitrary dispatching.
Comment 17 Chris Dumez 2017-05-24 08:48:12 PDT
Does not seem to build anywhere:
normal/armv7s/ResourceLoadObserver.o
/Volumes/Data/EWS/WebKit/Source/WebCore/loader/ResourceLoadObserver.cpp:529:56: error: variable 'primaryTopFrameDomainString' cannot be implicitly captured in a lambda with no capture-default specified
        statistics.subresourceUnderTopFrameOrigins.add(primaryTopFrameDomainString);
                                                       ^
/Volumes/Data/EWS/WebKit/Source/WebCore/loader/ResourceLoadObserver.cpp:523:10: note: 'primaryTopFrameDomainString' declared here
    auto primaryTopFrameDomainString = primaryDomain(topFrame);
         ^
/Volumes/Data/EWS/WebKit/Source/WebCore/loader/ResourceLoadObserver.cpp:527:23: note: lambda expression begins here
    m_queue->dispatch([this, primaryTopFrameDomainStr = primaryTopFrameDomainString.isolatedCopy(), primarySubresourceDomainString = primarySubresourceDomainString.isolatedCopy()] () {
                      ^
1 error generated.
Comment 18 Brent Fulgham 2017-05-24 08:50:22 PDT
(In reply to Chris Dumez from comment #17)
> Does not seem to build anywhere:
> normal/armv7s/ResourceLoadObserver.o
> /Volumes/Data/EWS/WebKit/Source/WebCore/loader/ResourceLoadObserver.cpp:529:
> 56: error: variable 'primaryTopFrameDomainString' cannot be implicitly
> captured in a lambda with no capture-default specified
>        
> statistics.subresourceUnderTopFrameOrigins.add(primaryTopFrameDomainString);
>                                                        ^
> /Volumes/Data/EWS/WebKit/Source/WebCore/loader/ResourceLoadObserver.cpp:523:
> 10: note: 'primaryTopFrameDomainString' declared here
>     auto primaryTopFrameDomainString = primaryDomain(topFrame);
>          ^
> /Volumes/Data/EWS/WebKit/Source/WebCore/loader/ResourceLoadObserver.cpp:527:
> 23: note: lambda expression begins here
>     m_queue->dispatch([this, primaryTopFrameDomainStr =
> primaryTopFrameDomainString.isolatedCopy(), primarySubresourceDomainString =
> primarySubresourceDomainString.isolatedCopy()] () {
>                       ^
> 1 error generated.

Sorry -- I'm working on this now. I was using a slow laptop at home and using EWS as my builder. :-\
Comment 19 Brent Fulgham 2017-05-24 08:53:31 PDT
(In reply to Brady Eidson from comment #16)
> FWIW:
> 
> IndexedDB code does a lot of thread hopping. To make it significantly harder
> to make threading mistakes I made use of createCrossThreadTask in
> WTF/CrossThreadTask.h
> 
> Right now that class is set up to always call a member function on an
> object, but we can easily change it to allow arbitrary dispatching.

I was also interested on your take on the use of the BinarySemaphore to coordinate work for methods that needed to return a value. That might be a stupidly inefficient approach, but adding a mutex and manually checking it all over the place seemed like a pretty big (and error prone) project.

Does the approach I took in this patch seem rational? Or would you recommend manually coordinating work by adding a mutex?
Comment 20 Brady Eidson 2017-05-24 09:46:49 PDT
(In reply to Brent Fulgham from comment #19)
> (In reply to Brady Eidson from comment #16)
> > FWIW:
> > 
> > IndexedDB code does a lot of thread hopping. To make it significantly harder
> > to make threading mistakes I made use of createCrossThreadTask in
> > WTF/CrossThreadTask.h
> > 
> > Right now that class is set up to always call a member function on an
> > object, but we can easily change it to allow arbitrary dispatching.
> 
> I was also interested on your take on the use of the BinarySemaphore to
> coordinate work for methods that needed to return a value. That might be a
> stupidly inefficient approach, but adding a mutex and manually checking it
> all over the place seemed like a pretty big (and error prone) project.
> 
> Does the approach I took in this patch seem rational? Or would you recommend
> manually coordinating work by adding a mutex?

I'm completely ignorant on this code and what needs it is conforming to.

What methods need to return a value?

Are you suggesting there are methods that are called on the main thread that need to synchronously return a value, but they require dispatching expensive work to a background thread?

If so, that is a fundamentally broken design - Moving i/o and other slow operations to a background thread is meant to free up the main thread to remain responsive, yet if the main thread blocks on the background thread then...
Comment 21 Brent Fulgham 2017-05-24 10:10:45 PDT
(In reply to Brady Eidson from comment #20)
> What methods need to return a value?
> 
> Are you suggesting there are methods that are called on the main thread that
> need to synchronously return a value, but they require dispatching expensive
> work to a background thread?

No. The general pattern is dispatching to a background thread to do expensive work and returning immediately. These all involve touching a HashTable that holds various statistics we care about.

There are, however, a handful of places where we want to ask the HashTable for some information ("is this key in the table", or whatever). These were being called on multiple threads, leading to badness.

I wanted to avoid this, and in this patch I tried making the queries on the same queue that the other work is done.

I'm wondering if it would be better to add a mutex for HashTable access, used by all threads trying to interact with the HashTable.

In this patch, we pop the request onto the work queue and wait for it to return, which means it will wait for a potentially long task to complete.

I guess that answers my own question.
Comment 22 Chris Dumez 2017-05-24 10:24:31 PDT
(In reply to Brent Fulgham from comment #21)
> (In reply to Brady Eidson from comment #20)
> > What methods need to return a value?
> > 
> > Are you suggesting there are methods that are called on the main thread that
> > need to synchronously return a value, but they require dispatching expensive
> > work to a background thread?
> 
> No. The general pattern is dispatching to a background thread to do
> expensive work and returning immediately. These all involve touching a
> HashTable that holds various statistics we care about.
> 
> There are, however, a handful of places where we want to ask the HashTable
> for some information ("is this key in the table", or whatever). These were
> being called on multiple threads, leading to badness.
> 
> I wanted to avoid this, and in this patch I tried making the queries on the
> same queue that the other work is done.
> 
> I'm wondering if it would be better to add a mutex for HashTable access,
> used by all threads trying to interact with the HashTable.
> 
> In this patch, we pop the request onto the work queue and wait for it to
> return, which means it will wait for a potentially long task to complete.
> 
> I guess that answers my own question.

Using a Mutex would have the same synchronization effect though, no? I think the way you went looks OK given the current design. Although it'd be nice if we would make those hash table query functions asynchronous (passing a lambda in parameter).
Comment 23 Brady Eidson 2017-05-24 11:19:07 PDT
(In reply to Brent Fulgham from comment #21)
> (In reply to Brady Eidson from comment #20)
> > What methods need to return a value?
> > 
> > Are you suggesting there are methods that are called on the main thread that
> > need to synchronously return a value, but they require dispatching expensive
> > work to a background thread?
> 
> No. The general pattern is dispatching to a background thread to do
> expensive work and returning immediately. These all involve touching a
> HashTable that holds various statistics we care about.
> 
> There are, however, a handful of places where we want to ask the HashTable
> for some information ("is this key in the table", or whatever). These were
> being called on multiple threads, leading to badness.
> 
> I wanted to avoid this, and in this patch I tried making the queries on the
> same queue that the other work is done.
> 
> I'm wondering if it would be better to add a mutex for HashTable access,
> used by all threads trying to interact with the HashTable.
> 
> In this patch, we pop the request onto the work queue and wait for it to
> return, which means it will wait for a potentially long task to complete.
> 
> I guess that answers my own question.

Got it.

Yes, protect hash table access. But not with a semaphore or mutex - With a WTF::Lock
Comment 24 Brady Eidson 2017-05-24 11:20:53 PDT
(In reply to Chris Dumez from comment #22)
> (In reply to Brent Fulgham from comment #21)
> > (In reply to Brady Eidson from comment #20)
> > > What methods need to return a value?
> > > 
> > > Are you suggesting there are methods that are called on the main thread that
> > > need to synchronously return a value, but they require dispatching expensive
> > > work to a background thread?
> > 
> > No. The general pattern is dispatching to a background thread to do
> > expensive work and returning immediately. These all involve touching a
> > HashTable that holds various statistics we care about.
> > 
> > There are, however, a handful of places where we want to ask the HashTable
> > for some information ("is this key in the table", or whatever). These were
> > being called on multiple threads, leading to badness.
> > 
> > I wanted to avoid this, and in this patch I tried making the queries on the
> > same queue that the other work is done.
> > 
> > I'm wondering if it would be better to add a mutex for HashTable access,
> > used by all threads trying to interact with the HashTable.
> > 
> > In this patch, we pop the request onto the work queue and wait for it to
> > return, which means it will wait for a potentially long task to complete.
> > 
> > I guess that answers my own question.
> 
> Using a Mutex would have the same synchronization effect though, no? I think
> the way you went looks OK given the current design. Although it'd be nice if
> we would make those hash table query functions asynchronous (passing a
> lambda in parameter).

Not necessarily - If multiple long running tasks have been dispatched, and then "simply access the hashtable" is dispatched behind them, it might take "a long time" for the simple access to happen.

Vs protecting with a lock, the "simple access to happen" will happen after the *current* long running task completes.
Comment 25 Brent Fulgham 2017-05-24 18:04:22 PDT
Created attachment 311173 [details]
Patch
Comment 26 Build Bot 2017-05-24 19:29:37 PDT
Comment on attachment 311173 [details]
Patch

Attachment 311173 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/3809969

New failing tests:
webrtc/peer-connection-audio-mute.html
Comment 27 Build Bot 2017-05-24 19:29:39 PDT
Created attachment 311181 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.5
Comment 28 Brent Fulgham 2017-05-25 08:50:49 PDT
Comment on attachment 311173 [details]
Patch

The iso-simulator failure doesn't seem related to this patch. It also occurs without the patch.
Comment 29 Chris Dumez 2017-05-25 09:34:23 PDT
Comment on attachment 311173 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311173&action=review

Here are some comments but I think it'd be good if someone who's done more multithreading (like Brady) to review this since it is very easy to miss something here.

> Source/WebCore/loader/ResourceLoadObserver.cpp:78
> +    m_queue->dispatch([this] () {

If the queue can outlive the ResourceLoadObserver, then |this| needs to be protected in this lambda (It seems possible since the queue is set by someone else via ResourceLoadObserver::setStatisticsQueue().

> Source/WebCore/loader/ResourceLoadObserver.cpp:89
> +    m_queue->dispatch([this] () {

If the queue can outlive the ResourceLoadObserver, then |this| needs to be protected in this lambda (It seems possible since the queue is set by someone else via ResourceLoadObserver::setStatisticsQueue().

> Source/WebCore/loader/ResourceLoadObserver.cpp:148
> +    m_queue->dispatch([this, isMainFrame, isRedirect, sourcePrimaryDomain = sourcePrimaryDomain.isolatedCopy(), mainFramePrimaryDomain = mainFramePrimaryDomain.isolatedCopy(), targetURL = CrossThreadCopier<URL>::copy(targetURL), mainFrameURL = CrossThreadCopier<URL>::copy(mainFrameURL), targetPrimaryDomain = targetPrimaryDomain.isolatedCopy()] () {

ditto, |this| may need to protected here depending on the lifetime of m_queue.

> Source/WebCore/loader/ResourceLoadObserver.cpp:204
> +            RunLoop::main().dispatch([this] () {

|this| is not protected here, this looks definitely unsafe.

> Source/WebCore/loader/ResourceLoadObserver.cpp:240
> +    m_queue->dispatch([this, isRedirect, sourcePrimaryDomain = sourcePrimaryDomain.isolatedCopy(), mainFramePrimaryDomain = mainFramePrimaryDomain.isolatedCopy(), targetPrimaryDomain = targetPrimaryDomain.isolatedCopy(), mainFrameURL = mainFrameURL.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:282
> +            RunLoop::main().dispatch([this] () {

I believe |this| needs protecting here.

> Source/WebCore/loader/ResourceLoadObserver.cpp:316
> +    m_queue->dispatch([this, targetPrimaryDomain = targetPrimaryDomain.isolatedCopy(), mainFramePrimaryDomain = mainFramePrimaryDomain.isolatedCopy(), mainFrameURL = mainFrameURL.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:334
> +            RunLoop::main().dispatch([this] () {

I believe |this| needs protecting.

> Source/WebCore/loader/ResourceLoadObserver.cpp:360
> +    m_queue->dispatch([this, primaryDomainString = primaryDomainString.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:369
> +        RunLoop::main().dispatch([this] () {

I believe |this| needs protecting.

> Source/WebCore/loader/ResourceLoadObserver.cpp:383
> +    m_queue->dispatch([this, primaryDomainString = primaryDomainString.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:388
> +        RunLoop::main().dispatch([this, primaryDomainString = primaryDomainString.isolatedCopy()] () {

I believe |this| needs protecting.

> Source/WebCore/loader/ResourceLoadObserver.cpp:402
> +    m_queue->dispatch([this, primaryDomainString = primaryDomainString.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:429
> +    m_queue->dispatch([this, primaryDomainString = primaryDomainString.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:454
> +    m_queue->dispatch([this, primaryDomainString = primaryDomainString.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:468
> +    m_queue->dispatch([this, value, primaryDomainString = primaryDomainString.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:494
> +    m_queue->dispatch([this, primaryTopFrameDomainString = primaryTopFrameDomainString.isolatedCopy(), primarySubFrameDomainString = primarySubFrameDomainString.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:524
> +    m_queue->dispatch([this, primaryRedirectDomainString = primaryRedirectDomainString.isolatedCopy(), primarySubresourceDomainString = primarySubresourceDomainString.isolatedCopy()] () {

Ditto about potentially protecting this.

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:179
> +    auto locker = holdLock(m_statisticsLock);

This looks unsafe, you query m_resourceStatisticsMap.size() before locking, then lock, then use statistics.uncheckedAppend() after you lock. Sounds like a source of security bugs.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:48
> +template<typename T> struct CrossThreadCopierBase<false, false, HashSet<T> > {

Can we move this to CrossThreadCopier.h next to the specialization for Vector? This is reusable. Also, you don't need the space in "HashSet<T> >" anymore.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:136
> +            // But always touch the ResourceLoadStatistics store on the worker queue

Missing period at the end.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:137
> +            m_statisticsQueue->dispatch([this, protectedThis = makeRef(*this), topDomains = CrossThreadCopier<Vector<String>>::copy(domainsWithDeletedWebsiteData)] () mutable {

protectedThis = WTFMove(protectedThis) ?

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:225
> +            m_statisticsQueue->dispatch([this, protectedThis = makeRef(*this), topDomains = CrossThreadCopier<HashSet<String>>::copy(topPrivatelyControlledDomainsWithWebsiteData)] () mutable {

protectedThis = WTFMove(protectedThis) ?
Comment 30 Brent Fulgham 2017-05-25 10:57:56 PDT
Comment on attachment 311173 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311173&action=review

>> Source/WebCore/loader/ResourceLoadObserver.cpp:78
>> +    m_queue->dispatch([this] () {
> 
> If the queue can outlive the ResourceLoadObserver, then |this| needs to be protected in this lambda (It seems possible since the queue is set by someone else via ResourceLoadObserver::setStatisticsQueue().

ResourceLoadObserver is NeverDestroyed<>, so I don't think the queue will outlive it.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:89
>> +    m_queue->dispatch([this] () {
> 
> If the queue can outlive the ResourceLoadObserver, then |this| needs to be protected in this lambda (It seems possible since the queue is set by someone else via ResourceLoadObserver::setStatisticsQueue().

Ditto.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:148
>> +    m_queue->dispatch([this, isMainFrame, isRedirect, sourcePrimaryDomain = sourcePrimaryDomain.isolatedCopy(), mainFramePrimaryDomain = mainFramePrimaryDomain.isolatedCopy(), targetURL = CrossThreadCopier<URL>::copy(targetURL), mainFrameURL = CrossThreadCopier<URL>::copy(mainFrameURL), targetPrimaryDomain = targetPrimaryDomain.isolatedCopy()] () {
> 
> ditto, |this| may need to protected here depending on the lifetime of m_queue.

Ditto! :-)

>> Source/WebCore/loader/ResourceLoadObserver.cpp:204
>> +            RunLoop::main().dispatch([this] () {
> 
> |this| is not protected here, this looks definitely unsafe.

|this| is the ResourceLoadObserver, which is NeverDestroyed<>

>> Source/WebCore/loader/ResourceLoadObserver.cpp:240
>> +    m_queue->dispatch([this, isRedirect, sourcePrimaryDomain = sourcePrimaryDomain.isolatedCopy(), mainFramePrimaryDomain = mainFramePrimaryDomain.isolatedCopy(), targetPrimaryDomain = targetPrimaryDomain.isolatedCopy(), mainFrameURL = mainFrameURL.isolatedCopy()] () {
> 
> Ditto about potentially protecting this.

Ditto.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:282
>> +            RunLoop::main().dispatch([this] () {
> 
> I believe |this| needs protecting here.

|this| outlasts the executing code since it is NeverDestroyed<>

>> Source/WebCore/loader/ResourceLoadObserver.cpp:316
>> +    m_queue->dispatch([this, targetPrimaryDomain = targetPrimaryDomain.isolatedCopy(), mainFramePrimaryDomain = mainFramePrimaryDomain.isolatedCopy(), mainFrameURL = mainFrameURL.isolatedCopy()] () {
> 
> Ditto about potentially protecting this.

Ditto here, and for the remaining comments of this nature.

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:179
>> +    auto locker = holdLock(m_statisticsLock);
> 
> This looks unsafe, you query m_resourceStatisticsMap.size() before locking, then lock, then use statistics.uncheckedAppend() after you lock. Sounds like a source of security bugs.

Yikes! Thank you for catching this!

>> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:48
>> +template<typename T> struct CrossThreadCopierBase<false, false, HashSet<T> > {
> 
> Can we move this to CrossThreadCopier.h next to the specialization for Vector? This is reusable. Also, you don't need the space in "HashSet<T> >" anymore.

Yes -- I was considering that earlier, but wasn't whether it would be worth sharing this implementation.

>> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:136
>> +            // But always touch the ResourceLoadStatistics store on the worker queue
> 
> Missing period at the end.

Doh!

>> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:137
>> +            m_statisticsQueue->dispatch([this, protectedThis = makeRef(*this), topDomains = CrossThreadCopier<Vector<String>>::copy(domainsWithDeletedWebsiteData)] () mutable {
> 
> protectedThis = WTFMove(protectedThis) ?

That doesn't work. It complains that the ref() method has been deleted, and it won't work. Not sure why.
Comment 31 Brent Fulgham 2017-05-25 10:58:12 PDT
Created attachment 311246 [details]
Patch
Comment 32 Brady Eidson 2017-05-25 15:14:50 PDT
Comment on attachment 311246 [details]
Patch

Seems fine to me, but I (admittedly) did not look as closely as Chris.

So... r=Chris :)
Comment 33 Chris Dumez 2017-05-25 15:18:19 PDT
(In reply to Brady Eidson from comment #32)
> Comment on attachment 311246 [details]
> Patch
> 
> Seems fine to me, but I (admittedly) did not look as closely as Chris.
> 
> So... r=Chris :)

Before r=me, please let me take another look. I did not look too deeply because I thought Brady would. I'll try and find some time later today.
Comment 34 Chris Dumez 2017-05-26 08:35:24 PDT
(In reply to Chris Dumez from comment #33)
> (In reply to Brady Eidson from comment #32)
> > Comment on attachment 311246 [details]
> > Patch
> > 
> > Seems fine to me, but I (admittedly) did not look as closely as Chris.
> > 
> > So... r=Chris :)
> 
> Before r=me, please let me take another look. I did not look too deeply
> because I thought Brady would. I'll try and find some time later today.

Ok, I'll do a proper review now. Sorry about the delay.
Comment 35 Chris Dumez 2017-05-26 09:30:28 PDT
Comment on attachment 311246 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311246&action=review

> Source/WebCore/loader/ResourceLoadObserver.cpp:69
> +    m_queue = WTFMove(queue);

Can we assert that  ASSERT(!m_queue); before doing the assignment?

> Source/WebCore/loader/ResourceLoadObserver.cpp:74
> +    if (!m_store)

Can you explain why we assert that we have a queue but we have an if check to see if we have a store? It seems we always set the store first, then the queue. So I would think an assertion would suffice.

> Source/WebCore/loader/ResourceLoadObserver.cpp:85
> +    if (!m_store)

ditto

> Source/WebCore/loader/ResourceLoadObserver.cpp:151
> +        auto targetStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);

Are we making a copy of the targetStatistics here? It is not 100% clear to me due to the use of auto, but I guess we are? If we are not, then it is not thread safe.

> Source/WebCore/loader/ResourceLoadObserver.cpp:168
> +            auto& redirectingOriginResourceStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(sourcePrimaryDomain);

Nothing prevents the main thread from modifying the store's hash map after you've retrieved a reference to one of the hash table's values. This means that your reference (targetStatistics) may get invalidated at any point later in this function.

> Source/WebCore/loader/ResourceLoadObserver.cpp:190
> +                auto& sourceOriginResourceStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(sourcePrimaryDomain);

ditto here.

> Source/WebCore/loader/ResourceLoadObserver.cpp:205
> +                m_store->fireDataModificationHandler();

Could we add an assertion in fireDataModificationHandler() to make sure it is always called on the main thread? I don't see one.

> Source/WebCore/loader/ResourceLoadObserver.cpp:242
> +        auto& targetStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);

ditto here about your reference potentially getting invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:317
> +        auto& targetStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);

Ditto about your reference getting invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:361
> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);

ditto about your reference getting invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:384
> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);

Ditto about your reference getting invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:389
> +            m_store->fireShouldPartitionCookiesHandler({ primaryDomainString }, { }, false);

Could we add an assert in fireShouldPartitionCookiesHandler() to make sure it is always called on the main thread? I do not see one.

> Source/WebCore/loader/ResourceLoadObserver.cpp:403
> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);

Ditto about your reference getting invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:416
>      auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomain(url));

See here an example of the main thread modifying the store's hash table while the statistic's thread is potentially holding a reference to one of the hash table's values. In case of re-hashing for e.g., that reference will be invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:430
> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);

ditto here about the reference potentially getting invalidated before the next line of code.

> Source/WebCore/loader/ResourceLoadObserver.cpp:455
> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);

ditto about your reference getting invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:469
> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);

ditto about your reference getting invalidated.

So we the current design (dispatch for setting, lock for retrieving), if you do:
setGrandfathered(url, true);
v = isGrandfathered(url); // v may still be false I believe

Is this an issue?

> Source/WebCore/loader/ResourceLoadObserver.cpp:495
> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primarySubFrameDomainString);

ditto about your reference potentially getting invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:510
> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primarySubresourceDomainString);

ditto about your reference potentially getting invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:525
> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primarySubresourceDomainString);

ditto about your reference potentially getting invalidated.

> Source/WebCore/loader/ResourceLoadObserver.cpp:599
> +    if (!m_store)

Can m_store ever be null?

> Source/WebCore/loader/ResourceLoadObserver.cpp:602
> +    auto locker = holdLock(m_store->statisticsLock());

So here you lock and then store::statisticsForOrigin() will also lock. Is this OK? Do we really need this lock?

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:67
> +    auto locker = holdLock(m_statisticsLock);

Even though you lock here before accessing the hashMap, you return an iterator to that hash map and then we unlock. This does not look safe given that the iterator may be invalidated at any point after you drop the lock.

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:90
>      encoder->encodeDouble("endOfGrandfatheringTimestamp", m_endOfGrandfatheringTimestamp);

Is it really OK to use m_endOfGrandfatheringTimestamp before locking?

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:141
>      fireShouldPartitionCookiesHandler({ }, prevalentResourceDomainsWithoutUserInteraction, true);

fireShouldPartitionCookiesHandler is called from a background thread here but you explicitly called it on the main thread earlier in this patch. Is this OK?

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:151
>      fireShouldPartitionCookiesHandler({ }, { }, true);

fireShouldPartitionCookiesHandler is called from a background thread here but you explicitly called it on the main thread earlier in this patch. Is this OK?

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:190
>  void ResourceLoadStatisticsStore::mergeStatistics(const Vector<ResourceLoadStatistics>& statistics)

FYI, I see that ResourceLoadStatisticsStore::setNotificationCallback() below is using std::function below. It's be good to make sure we use WTF::Function everywhere, especially considering we're doing multithreading. std::function does implicit copies on the captured variables so calling isolatedCopy() when capturing is not enough. We have been caught before and this is why we have WTF::Function now.

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:303
>  void ResourceLoadStatisticsStore::processStatistics(std::function<void(ResourceLoadStatistics&)>&& processFunction)

Should probably use WTF::Function.

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:373
>      if (m_dataRecordsRemovalPending)

Can we add an assertion here to make sure this is always called on a non-main thread since m_dataRecordsRemovalPending is not protected by locks and is always updated on a background thread?

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:-361
> -

Can we add assertions above in dataRecordsBeingRemoved() / dataRecordsWereRemoved() to make sure they're always called on a background thread?
Comment 36 Brent Fulgham 2017-05-26 10:17:43 PDT
Comment on attachment 311246 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311246&action=review

>> Source/WebCore/loader/ResourceLoadObserver.cpp:69
>> +    m_queue = WTFMove(queue);
> 
> Can we assert that  ASSERT(!m_queue); before doing the assignment?

Will do.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:151
>> +        auto targetStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
> 
> Are we making a copy of the targetStatistics here? It is not 100% clear to me due to the use of auto, but I guess we are? If we are not, then it is not thread safe.

You are right! The function is returning a reference, so auto is mapping to auto&. I'll change auto -> auto& to make this clear, and protect access to m_store.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:168
>> +            auto& redirectingOriginResourceStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(sourcePrimaryDomain);
> 
> Nothing prevents the main thread from modifying the store's hash map after you've retrieved a reference to one of the hash table's values. This means that your reference (targetStatistics) may get invalidated at any point later in this function.

Yes -- that's clearly wrong. It should be fixed by locking earlier as you pointed out for the other call to 'ensureResourceStatisticsForPrimaryDomain'.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:190
>> +                auto& sourceOriginResourceStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(sourcePrimaryDomain);
> 
> ditto here.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:205
>> +                m_store->fireDataModificationHandler();
> 
> Could we add an assertion in fireDataModificationHandler() to make sure it is always called on the main thread? I don't see one.

Will do.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:242
>> +        auto& targetStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
> 
> ditto here about your reference potentially getting invalidated.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:317
>> +        auto& targetStatistics = m_store->ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
> 
> Ditto about your reference getting invalidated.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:361
>> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);
> 
> ditto about your reference getting invalidated.

Ugh! So many mistakes. :-(

>> Source/WebCore/loader/ResourceLoadObserver.cpp:384
>> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);
> 
> Ditto about your reference getting invalidated.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:389
>> +            m_store->fireShouldPartitionCookiesHandler({ primaryDomainString }, { }, false);
> 
> Could we add an assert in fireShouldPartitionCookiesHandler() to make sure it is always called on the main thread? I do not see one.

Will do.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:403
>> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);
> 
> Ditto about your reference getting invalidated.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:430
>> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);
> 
> ditto here about the reference potentially getting invalidated before the next line of code.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:455
>> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);
> 
> ditto about your reference getting invalidated.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:469
>> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primaryDomainString);
> 
> ditto about your reference getting invalidated.
> 
> So we the current design (dispatch for setting, lock for retrieving), if you do:
> setGrandfathered(url, true);
> v = isGrandfathered(url); // v may still be false I believe
> 
> Is this an issue?

Probably not in practice. The grandfathering setting is done once at startup when upgrading the browser, or after deleting all state. It's not set again, though it is read fairly frequently.

Still, maybe these various set functions should just happen on the main thread, since they are not significantly more expensive than the read version of these operations.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:495
>> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primarySubFrameDomainString);
> 
> ditto about your reference potentially getting invalidated.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:510
>> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primarySubresourceDomainString);
> 
> ditto about your reference potentially getting invalidated.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:525
>> +        auto& statistics = m_store->ensureResourceStatisticsForPrimaryDomain(primarySubresourceDomainString);
> 
> ditto about your reference potentially getting invalidated.

Will fix.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:599
>> +    if (!m_store)
> 
> Can m_store ever be null?

In earlier testing this did happen from time to time. It may not be a valid state anymore.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:602
>> +    auto locker = holdLock(m_store->statisticsLock());
> 
> So here you lock and then store::statisticsForOrigin() will also lock. Is this OK? Do we really need this lock?

It's okay, since the lock is recursive, but is wasteful. So I'll remove it!

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:67
>> +    auto locker = holdLock(m_statisticsLock);
> 
> Even though you lock here before accessing the hashMap, you return an iterator to that hash map and then we unlock. This does not look safe given that the iterator may be invalidated at any point after you drop the lock.

Good point! I'll assert that we are locked here, to make sure all calling sites are properly protected.

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:90
>>      encoder->encodeDouble("endOfGrandfatheringTimestamp", m_endOfGrandfatheringTimestamp);
> 
> Is it really OK to use m_endOfGrandfatheringTimestamp before locking?

Hmm. I'll err on the side of safety and lock earlier.

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:141
>>      fireShouldPartitionCookiesHandler({ }, prevalentResourceDomainsWithoutUserInteraction, true);
> 
> fireShouldPartitionCookiesHandler is called from a background thread here but you explicitly called it on the main thread earlier in this patch. Is this OK?

Oh! I don't think so. I'll fix that, too.

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:151
>>      fireShouldPartitionCookiesHandler({ }, { }, true);
> 
> fireShouldPartitionCookiesHandler is called from a background thread here but you explicitly called it on the main thread earlier in this patch. Is this OK?

:-(

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:190
>>  void ResourceLoadStatisticsStore::mergeStatistics(const Vector<ResourceLoadStatistics>& statistics)
> 
> FYI, I see that ResourceLoadStatisticsStore::setNotificationCallback() below is using std::function below. It's be good to make sure we use WTF::Function everywhere, especially considering we're doing multithreading. std::function does implicit copies on the captured variables so calling isolatedCopy() when capturing is not enough. We have been caught before and this is why we have WTF::Function now.

Ah! I didn't know that. I will make sure this is done consistently.

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:303
>>  void ResourceLoadStatisticsStore::processStatistics(std::function<void(ResourceLoadStatistics&)>&& processFunction)
> 
> Should probably use WTF::Function.

Will do.

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:373
>>      if (m_dataRecordsRemovalPending)
> 
> Can we add an assertion here to make sure this is always called on a non-main thread since m_dataRecordsRemovalPending is not protected by locks and is always updated on a background thread?

Of course!

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:-361
>> -
> 
> Can we add assertions above in dataRecordsBeingRemoved() / dataRecordsWereRemoved() to make sure they're always called on a background thread?

Will do.
Comment 37 Brent Fulgham 2017-05-26 11:57:40 PDT
Comment on attachment 311246 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311246&action=review

>>> Source/WebCore/loader/ResourceLoadObserver.cpp:69
>>> +    m_queue = WTFMove(queue);
>> 
>> Can we assert that  ASSERT(!m_queue); before doing the assignment?
> 
> Will do.

There seems to be an issue here at startup of some WebKit clients. I filed Bug 172653 to fix that properly.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:74
>> +    if (!m_store)
> 
> Can you explain why we assert that we have a queue but we have an if check to see if we have a store? It seems we always set the store first, then the queue. So I would think an assertion would suffice.

Agreed, but I would like to tackle that under Bug 172653. I know John ran into some crashes early on due to m_store not being set for some WebKit uses and I'd like to figure that out and deal with it in a separate issue.
Comment 38 Brent Fulgham 2017-05-26 14:14:13 PDT
Comment on attachment 311246 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311246&action=review

>>> Source/WebCore/loader/ResourceLoadObserver.cpp:205
>>> +                m_store->fireDataModificationHandler();
>> 
>> Could we add an assertion in fireDataModificationHandler() to make sure it is always called on the main thread? I don't see one.
> 
> Will do.

Actually, this stuff isn't quite right. Some of these m_store->fireXYZ things do a bunch of work before calling into the main-thread-allocated timers and other stuff. Instead, we should dispatch to the main runloop in the implementation at the point it matters. This will allow us to do the expensive work on the queue.

>>> Source/WebCore/loader/ResourceLoadObserver.cpp:389
>>> +            m_store->fireShouldPartitionCookiesHandler({ primaryDomainString }, { }, false);
>> 
>> Could we add an assert in fireShouldPartitionCookiesHandler() to make sure it is always called on the main thread? I do not see one.
> 
> Will do.

Again, we actually don't want this on RunLoop::main() yet. It should stay on the worker queue, and dispatch to main when it's time to interact with UI or other main-thread allocated items.
Comment 39 Brent Fulgham 2017-05-26 14:37:13 PDT
Created attachment 311375 [details]
Patch
Comment 40 Chris Dumez 2017-05-26 14:56:05 PDT
Comment on attachment 311375 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311375&action=review

r=me if the bots are happy. But let's file a bug about simplifying the design. This is too error prone.

> Source/WebCore/loader/ResourceLoadObserver.cpp:209
> +        m_store->setResourceStatisticsForPrimaryDomain(targetPrimaryDomain, WTFMove(targetStatistics));

I think this setResourceStatisticsForPrimaryDomain() is no longer needed now that you are modifying a reference to the value in the hash map.

> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:206
> +void ResourceLoadStatisticsStore::setNotificationCallback(WTF::Function<void()> handler)

WTF::Function<void()>&& would be clearer since a WTF::Function has to be moved around.
Comment 41 Brent Fulgham 2017-05-26 15:10:36 PDT
Comment on attachment 311375 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311375&action=review

>> Source/WebCore/loader/ResourceLoadObserver.cpp:209
>> +        m_store->setResourceStatisticsForPrimaryDomain(targetPrimaryDomain, WTFMove(targetStatistics));
> 
> I think this setResourceStatisticsForPrimaryDomain() is no longer needed now that you are modifying a reference to the value in the hash map.

You are right!

>> Source/WebCore/loader/ResourceLoadStatisticsStore.cpp:206
>> +void ResourceLoadStatisticsStore::setNotificationCallback(WTF::Function<void()> handler)
> 
> WTF::Function<void()>&& would be clearer since a WTF::Function has to be moved around.

Will do.
Comment 42 Build Bot 2017-05-26 15:50:19 PDT
Comment on attachment 311375 [details]
Patch

Attachment 311375 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/3823913

New failing tests:
http/tests/misc/write-while-waiting.html
Comment 43 Build Bot 2017-05-26 15:50:21 PDT
Created attachment 311387 [details]
Archive of layout-test-results from ews105 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 44 Chris Dumez 2017-05-26 15:53:27 PDT
(In reply to Build Bot from comment #42)
> Comment on attachment 311375 [details]
> Patch
> 
> Attachment 311375 [details] did not pass mac-wk2-ews (mac-wk2):
> Output: http://webkit-queues.webkit.org/results/3823913
> 
> New failing tests:
> http/tests/misc/write-while-waiting.html

It is crashing :(

Thread 7 Crashed:: Dispatch queue: WebResourceLoadStatisticsStore Process Data Queue
0   com.apple.WebCore             	0x0000000109913ea4 WebCore::ResourceLoadStatisticsStore::mergeStatistics(WTF::Vector<WebCore::ResourceLoadStatistics, 0ul, WTF::CrashOnOverflow, 16ul> const&) + 244 (StringImpl.h:505)
1   com.apple.WebKit              	0x0000000107aa5866 WebKit::WebResourceLoadStatisticsStore::resourceLoadStatisticsUpdated(WTF::Vector<WebCore::ResourceLoadStatistics, 0ul, WTF::CrashOnOverflow, 16ul> const&) + 18 (Ref.h:138)
2   com.apple.WebKit              	0x0000000107aa6c29 void IPC::handleMessage<Messages::WebResourceLoadStatisticsStore::ResourceLoadStatisticsUpdated, WebKit::WebResourceLoadStatisticsStore, void (WebKit::WebResourceLoadStatisticsStore::*)(WTF::Vector<WebCore::ResourceLoadStatistics, 0ul, WTF::CrashOnOverflow, 16ul> const&)>(IPC::Decoder&, WebKit::WebResourceLoadStatisticsStore*, void (WebKit::WebResourceLoadStatisticsStore::*)(WTF::Vector<WebCore::ResourceLoadStatistics, 0ul, WTF::CrashOnOverflow, 16ul> const&)) + 69 (HandleMessage.h:40)
3   libdispatch.dylib             	0x00007fff8e26793d _dispatch_call_block_and_release + 12
4   libdispatch.dylib             	0x00007fff8e25c40b _dispatch_client_callout + 8
5   libdispatch.dylib             	0x00007fff8e26103b _dispatch_queue_drain + 754
6   libdispatch.dylib             	0x00007fff8e267707 _dispatch_queue_invoke + 549
7   libdispatch.dylib             	0x00007fff8e25fd53 _dispatch_root_queue_drain + 538
8   libdispatch.dylib             	0x00007fff8e25fb00 _dispatch_worker_thread3 + 91
9   libsystem_pthread.dylib       	0x00007fff901124de _pthread_wqthread + 1129
10  libsystem_pthread.dylib       	0x00007fff90110341 start_wqthread + 13
Comment 45 Build Bot 2017-05-26 15:59:59 PDT
Comment on attachment 311375 [details]
Patch

Attachment 311375 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/3823886

New failing tests:
fast/events/before-unload-returnValue.html
http/tests/cache/cancel-during-revalidation-succeeded.html
webrtc/peer-connection-audio-mute.html
http/tests/misc/write-while-waiting.html
Comment 46 Build Bot 2017-05-26 16:00:01 PDT
Created attachment 311389 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.5
Comment 47 Build Bot 2017-05-26 16:06:50 PDT
Comment on attachment 311375 [details]
Patch

Attachment 311375 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/3823921

New failing tests:
imported/w3c/web-platform-tests/streams/readable-streams/general.html
Comment 48 Build Bot 2017-05-26 16:06:52 PDT
Created attachment 311390 [details]
Archive of layout-test-results from ews115 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 49 Brent Fulgham 2017-05-26 16:27:09 PDT
Created attachment 311394 [details]
Patch for landing.
Comment 50 Brent Fulgham 2017-05-26 16:28:13 PDT
Comment on attachment 311394 [details]
Patch for landing.

I missed three functions used by the test harness that get called from the main thread. I make them dispatch to the work queue to match the regular engine uses.
Comment 51 Brent Fulgham 2017-05-26 16:29:16 PDT
Waiting for EWS to land.
Comment 52 Brent Fulgham 2017-05-26 17:38:39 PDT
Committed r217515: <http://trac.webkit.org/changeset/217515>