Bug 202372 - Updated resource load statistics are never merged into the SQLite Database backend
Summary: Updated resource load statistics are never merged into the SQLite Database ba...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Kate Cheney
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-09-30 14:23 PDT by Kate Cheney
Modified: 2019-10-02 12:22 PDT (History)
5 users (show)

See Also:


Attachments
Patch (66.03 KB, patch)
2019-09-30 16:00 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (67.82 KB, patch)
2019-09-30 17:43 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (105.13 KB, patch)
2019-10-01 12:48 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (112.63 KB, patch)
2019-10-01 17:46 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (110.35 KB, patch)
2019-10-02 10:46 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (109.58 KB, patch)
2019-10-02 11:27 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kate Cheney 2019-09-30 14:23:56 PDT
Updated resource load statistics are never merged into the SQLite Database backend.
<rdar://problem/55854542>
Comment 1 Kate Cheney 2019-09-30 16:00:43 PDT
Created attachment 379850 [details]
Patch
Comment 2 Kate Cheney 2019-09-30 17:43:05 PDT
Created attachment 379866 [details]
Patch
Comment 3 John Wilander 2019-09-30 18:07:18 PDT
Comment on attachment 379866 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=379866&action=review

Does anything need to be cleared up between tests as part of resetToConsistentState()?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:360
> +        RELEASE_LOG_ERROR_IF_ALLOWED(m_sessionID, "ResourceLoadStatisticsDatabaseStore::insertObservedDomain can only be called on domains not in the database");

Missing period.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:547
> +    return other.lastSeen <= mostRecentUserInteractionTime + 24_h;

Where do these 24 hours come from? I don't understand their meaning here. Is something missing in the function name to explain the 24 hours?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:562
> +    unsigned currentTimesAccessedDueToFirstPartyInteraction = current.getColumnInt(9);

I'm wondering if these integers could be constants so that we don't mess it up down the road, changing them in one place but forgetting about another.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:597
> +        RELEASE_LOG_ERROR_IF_ALLOWED(m_sessionID, "%p - ResourceLoadStatisticsDatabaseStore::mergeStatistic. Statement failed to bind or domain was not found, error message: %{public}s", this, m_database.lastErrorMsg());

What will go into this error message? Since we're making it public, we need to make sure nothing sensitive can end up there.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:617
> +    // can refer to the ObservedDomain table entries

Missing period.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1303
> +        || m_updateGrandfatheredStatement.bindText(2, domain.string()) != SQLITE_OK

Could this 2 be made a constant?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1776
> +        || m_updateTimesAccessedDueToFirstPartyInteractionStatement.bindText(2, domain.string()) != SQLITE_OK

This is the same 2 used here, right?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1778
> +        RELEASE_LOG_ERROR_IF_ALLOWED(m_sessionID, "%p - ResourceLoadStatisticsDatabaseStore::updateTimesAccessedAsFirstPartyDueToUserInteraction failed to bind, error message: %{public}s", this, m_database.lastErrorMsg());

Again, we need to make sure nothing sensitive can go in the public log.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1792
> +        || m_updateDataRecordsRemovedStatement.bindText(2, domain.string()) != SQLITE_OK

Same 2 again?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1794
> +        RELEASE_LOG_ERROR_IF_ALLOWED(m_sessionID, "%p - ResourceLoadStatisticsDatabaseStore::updateDataRecordsRemoved failed to bind, error message: %{public}s", this, m_database.lastErrorMsg());

Public error message.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:267
> +        if (!m_statisticsStore)

Why is the !is<ResourceLoadStatisticsMemoryStore>(*m_statisticsStore) check no longer needed?

> Source/WebKit/NetworkProcess/NetworkProcess.h:245
> +    void mergeStatisticForTesting(PAL::SessionID, const RegistrableDomain&, const TopFrameDomain&, Seconds, bool, Seconds, bool isGrandfathered, bool, bool, unsigned, CompletionHandler<void()>&&);

Three of the bool parameters don't have names. I think they should since the header declares the interface. Same thing for the two Seconds and the unsigned.

Or could we use a named struct here?

> Source/WebKit/UIProcess/Network/NetworkProcessProxy.h:116
> +    void mergeStatisticForTesting(PAL::SessionID, const RegistrableDomain&, const TopFrameDomain&, Seconds, bool, Seconds, bool, bool, bool, unsigned, CompletionHandler<void()>&&);

The bools and Seconds parameters, and the unsigned parameter should be named.

Or could we use a named struct here?

> Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.h:165
> +    void mergeStatisticForTesting(const URL&, const URL& topFrameUrl, Seconds, bool, Seconds, bool, bool, bool, unsigned, CompletionHandler<void()>&&);

The bools and Seconds parameters, and the unsigned parameter should be named.

Or could we use a named struct here?

> LayoutTests/http/tests/resourceLoadStatistics/merge-statistic-does-not-overwrite-database-expected.txt:1
> +PASS successfullyParsed is true

Could we get one or more non-default PASS statements in here to explain what passed?

> LayoutTests/http/tests/resourceLoadStatistics/merge-statistic-does-overwrite-database-expected.txt:1
> +PASS successfullyParsed is true

Ditto.
Comment 4 Chris Dumez 2019-10-01 08:39:14 PDT
Comment on attachment 379866 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=379866&action=review

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:432
>  bool ResourceLoadStatisticsDatabaseStore::confirmDomainDoesNotExist(const RegistrableDomain& domain) const

This method is badly named. Since it returns a boolean. I'd suggest domainExists() (reversing the meaning for clarity since a method name)

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:540
> +static bool wasAccessedAsFirstPartyDueToUserInteraction(bool currentHadUserInteraction, double currentMostRecentUserInteraction, ResourceLoadStatistics& other)

other should be const.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:550
> +void ResourceLoadStatisticsDatabaseStore::merge(WebCore::SQLiteStatement& current, ResourceLoadStatistics& other)

other should be const. current should be too if possible.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:57
> +    void mergeStatistics(Vector<ResourceLoadStatistics>&&) override;

Could this be final?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:64
> +    void mergeStatistics(Vector<ResourceLoadStatistics>&&) override;

Could this be final?

>> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:267
>> +        if (!m_statisticsStore)
> 
> Why is the !is<ResourceLoadStatisticsMemoryStore>(*m_statisticsStore) check no longer needed?

This is correct, mergeStatistics() is now a pure virtual function on the interface so we don't care which subclass it is.
Looks good to me.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:585
> +void WebResourceLoadStatisticsStore::mergeStatisticForTesting(const RegistrableDomain& domain, const RegistrableDomain& topFrameDomain, Seconds lastSeen, bool hadUserInteraction, Seconds mostRecentUserInteraction, bool isGrandfathered, bool isPrevalent, bool isVeryPrevalent, unsigned dataRecordsRemoved, CompletionHandler<void()>&& completionHandler)

Using Seconds for lastSeen / mostRecentUserInteraction is confusing. More likely they should be WallTime.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:591
> +            auto statistic = ResourceLoadStatistics(domain);

I think this would read better:
ResourceLoadStatistics statistic(domain);

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:606
> +            Vector<ResourceLoadStatistics> statistics;

Would this build?
Vector<ResourceLoadStatistics> statistics({ WTFMove(statistic) });

Leveraging this constructor:
Vector(std::initializer_list<T> initializerList);

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:608
> +            m_statisticsStore->mergeStatistics(WTFMove(statistics));

Or maybe even:
m_statisticsStore->mergeStatistics(Vector<ResourceLoadStatistics>({ WTFMove(statistic) }));

> Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:540
> +    if (!canSendMessage()) {

Not needed, sendWithAsyncReply() will do the right thing for you.

> LayoutTests/http/tests/resourceLoadStatistics/merge-statistic-does-not-overwrite-database.html:11
> +    if (testRunner)

Please add a call to description("Tests that merged statistic does not overwrite old statistic"); since you're using js-test. The output will look much better.

> LayoutTests/http/tests/resourceLoadStatistics/merge-statistic-does-not-overwrite-database.html:12
> +        testRunner.waitUntilDone();

Since you're using js-test, you can simply use jsTestIsAsync = true;

> LayoutTests/http/tests/resourceLoadStatistics/merge-statistic-does-not-overwrite-database.html:33
> +          testRunner.notifyDone();

You're supposed to call finishJSTest() when using js-test.

> LayoutTests/http/tests/resourceLoadStatistics/merge-statistic-does-not-overwrite-database.html:63
> +            testRunner.notifyDone();

finishJSTest();

> LayoutTests/http/tests/resourceLoadStatistics/merge-statistic-does-overwrite-database.html:11
> +    if (testRunner)

Same comments as for the other test.
Comment 5 Brent Fulgham 2019-10-01 09:12:41 PDT
Comment on attachment 379866 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=379866&action=review

This is really close, but needs a bit of work before it's ready to land. I'm glad to see all the tests pass!

Please change the %{public} to %{private} in the log statements

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:487
> +    for (auto& topFrameDomain : loadStatistics.storageAccessUnderTopFrameDomains) {

We should probably get rid of these loops of SQLite queries and replace with a single query that does all of the updates at once. To do so, you'll need to revise the insert statements to ignore instances where the rows in the table already exist.

Note: Let's do this improvement under a separate bug!

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:489
> +            insertDomainRelationship(m_storageAccessUnderTopFrameDomainsStatement, registrableDomainID.value(), topFrameDomain);

Does attempting to insert an already-existing relationship cause an error? On other SQL systems it used to be possible to indicate that you don't care if something fails (because it already exists). That would remove the need to pay the cost of this check when inserting.

This would involve changing the existing INSERT statements into "INSERT OR IGNORE", which should be less closely than running a Query to see if a relationship exists, then a second query to insert (if it doesn't exist).

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:547
>> +    return other.lastSeen <= mostRecentUserInteractionTime + 24_h;
> 
> Where do these 24 hours come from? I don't understand their meaning here. Is something missing in the function name to explain the 24 hours?

This looks like some hold-over from the 24-hour window for user interaction (which is no longer a thing). Kate: Check what this code does in the in-memory store. We should match that behavior.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:562
>> +    unsigned currentTimesAccessedDueToFirstPartyInteraction = current.getColumnInt(9);
> 
> I'm wondering if these integers could be constants so that we don't mess it up down the road, changing them in one place but forgetting about another.

I'm not sure how easy this would be, since it's controlled by the form of the SQL statement and the columns in the database. Maybe you could define them at the same point as the 'constexpr auto createObservedDomain = "CREATE TABLE...' declarations, which would at least help keep them in sync. Ditto for the parameters to the queries.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:571
> +        // Else, do nothing.

I don't think the Else comment is useful here.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:597
>> +        RELEASE_LOG_ERROR_IF_ALLOWED(m_sessionID, "%p - ResourceLoadStatisticsDatabaseStore::mergeStatistic. Statement failed to bind or domain was not found, error message: %{public}s", this, m_database.lastErrorMsg());
> 
> What will go into this error message? Since we're making it public, we need to make sure nothing sensitive can end up there.

We should switch all of these "%{public}" log entries that relate to registrable domain strings to "%{private}" so they don't go into the permanent log. The ID's are fine, since they are machine-specific and don't reveal anything useful (unless you obtain the ITP database from the machine generating the log).

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1303
>> +        || m_updateGrandfatheredStatement.bindText(2, domain.string()) != SQLITE_OK
> 
> Could this 2 be made a constant?

Whoops! I'm surprised we didn't see errors from binding text to the integer valued parameter. I guess we don't hit this code often.

John: I don't think creating labels for these is super-useful. These are the arguments in the Query (parameter 1, 2, etc.) Our general practice hasn't been to create labels for these in our other database code, since each SQL statement would then expand into a declaration for the statement, plus a declaration for each parameter.

Kate: Maybe check with Brady to see if we have a SQL statement class that might wrap all of this stuff up in some package that would help us maintain integrity between statements and arguments. Might be a good idea for a future refinement.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1792
>> +        || m_updateDataRecordsRemovedStatement.bindText(2, domain.string()) != SQLITE_OK
> 
> Same 2 again?

John: It's the second parameter of the SQL statement. Which is likely not the same for 'm_updateDataRecordsRemovedStatement' as it was for 'm_updateTimesAccessedDueToFirstPartyInteraction'

>> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:267
>> +        if (!m_statisticsStore)
> 
> Why is the !is<ResourceLoadStatisticsMemoryStore>(*m_statisticsStore) check no longer needed?

That is the point of this entire patch -- to get the updates into the database! :-)

>> Source/WebKit/NetworkProcess/NetworkProcess.h:245
>> +    void mergeStatisticForTesting(PAL::SessionID, const RegistrableDomain&, const TopFrameDomain&, Seconds, bool, Seconds, bool isGrandfathered, bool, bool, unsigned, CompletionHandler<void()>&&);
> 
> Three of the bool parameters don't have names. I think they should since the header declares the interface. Same thing for the two Seconds and the unsigned.
> 
> Or could we use a named struct here?

Agree with both of these points.

>> Source/WebKit/UIProcess/Network/NetworkProcessProxy.h:116
>> +    void mergeStatisticForTesting(PAL::SessionID, const RegistrableDomain&, const TopFrameDomain&, Seconds, bool, Seconds, bool, bool, bool, unsigned, CompletionHandler<void()>&&);
> 
> The bools and Seconds parameters, and the unsigned parameter should be named.
> 
> Or could we use a named struct here?

+1
Comment 6 Brent Fulgham 2019-10-01 09:16:30 PDT
Comment on attachment 379866 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=379866&action=review

>>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1303
>>> +        || m_updateGrandfatheredStatement.bindText(2, domain.string()) != SQLITE_OK
>> 
>> Could this 2 be made a constant?
> 
> Whoops! I'm surprised we didn't see errors from binding text to the integer valued parameter. I guess we don't hit this code often.
> 
> John: I don't think creating labels for these is super-useful. These are the arguments in the Query (parameter 1, 2, etc.) Our general practice hasn't been to create labels for these in our other database code, since each SQL statement would then expand into a declaration for the statement, plus a declaration for each parameter.
> 
> Kate: Maybe check with Brady to see if we have a SQL statement class that might wrap all of this stuff up in some package that would help us maintain integrity between statements and arguments. Might be a good idea for a future refinement.

We do have 'WebCore/platform/sql/SQLiteStatement.h", but it doesn't seem like it offers much in the way of type checking or parameter/query matching.

It's probably a good idea (for future) to create a SQLiteStatement-derived class template that lets you define the statement and parameters, and ensures you always bind the right type to the right argument when writing the code.
Comment 7 Kate Cheney 2019-10-01 09:25:37 PDT
> Comment on attachment 379866 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=379866&action=review

Thanks for all the feedback everyone!


Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:547
> >> +    return other.lastSeen <= mostRecentUserInteractionTime + 24_h;
> > 
> > Where do these 24 hours come from? I don't understand their meaning here. Is something missing in the function name to explain the 24 hours?
> 
> This looks like some hold-over from the 24-hour window for user interaction
> (which is no longer a thing). Kate: Check what this code does in the
> in-memory store. We should match that behavior.
> 

I copied this functionality from the following function in the ResourceLoadStatisticsMemoryStore: 

bool ResourceLoadStatisticsMemoryStore::wasAccessedAsFirstPartyDueToUserInteraction(const ResourceLoadStatistics& current, const ResourceLoadStatistics& updated) const
{
    if (!current.hadUserInteraction && !updated.hadUserInteraction)
        return false;

    auto mostRecentUserInteractionTime = std::max(current.mostRecentUserInteractionTime, updated.mostRecentUserInteractionTime);

    return updated.lastSeen <= mostRecentUserInteractionTime + 24_h;
}


From my understanding, this was checking if the "last seen" time happened within the last 24hrs since the user interaction. If so, the resource was accessed as a first party. John, let me know if this seems incorrect.
Comment 8 Brent Fulgham 2019-10-01 09:30:10 PDT
(In reply to Katherine_cheney from comment #7)
> I copied this functionality from the following function in the
> ResourceLoadStatisticsMemoryStore: 
... 
>     return updated.lastSeen <= mostRecentUserInteractionTime + 24_h;
> }
> 
> 
> From my understanding, this was checking if the "last seen" time happened
> within the last 24hrs since the user interaction. If so, the resource was
> accessed as a first party. John, let me know if this seems incorrect.

I think this is old stuff related to the original 24-hour rule. We can probably get rid of it (in both places) now that we block immediately, rather than granting a 24-hour exemption for first parties.

This was part of some telemetry to see how frequently this was being encountered, but I don't think we need that telemetry anymore.
Comment 9 Kate Cheney 2019-10-01 09:39:27 PDT
(In reply to Brent Fulgham from comment #8)
> (In reply to Katherine_cheney from comment #7)
> > I copied this functionality from the following function in the
> > ResourceLoadStatisticsMemoryStore: 
> ... 
> >     return updated.lastSeen <= mostRecentUserInteractionTime + 24_h;
> > }
> > 
> > 
> > From my understanding, this was checking if the "last seen" time happened
> > within the last 24hrs since the user interaction. If so, the resource was
> > accessed as a first party. John, let me know if this seems incorrect.
> 
> I think this is old stuff related to the original 24-hour rule. We can
> probably get rid of it (in both places) now that we block immediately,
> rather than granting a 24-hour exemption for first parties.
> 
> This was part of some telemetry to see how frequently this was being
> encountered, but I don't think we need that telemetry anymore.

Okay, so to clarify, the timesAccessedAsFirstPartyDueToUserInteraction data is no longer relevant, and does not need to be updated in mergeStatistic?
Comment 10 John Wilander 2019-10-01 11:03:57 PDT
(In reply to Katherine_cheney from comment #7)
> > Comment on attachment 379866 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=379866&action=review
> 
> Thanks for all the feedback everyone!
> 
> 
> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.
> cpp:547
> > >> +    return other.lastSeen <= mostRecentUserInteractionTime + 24_h;
> > > 
> > > Where do these 24 hours come from? I don't understand their meaning here. Is something missing in the function name to explain the 24 hours?
> > 
> > This looks like some hold-over from the 24-hour window for user interaction
> > (which is no longer a thing). Kate: Check what this code does in the
> > in-memory store. We should match that behavior.
> > 
> 
> I copied this functionality from the following function in the
> ResourceLoadStatisticsMemoryStore: 
> 
> bool
> ResourceLoadStatisticsMemoryStore::
> wasAccessedAsFirstPartyDueToUserInteraction(const ResourceLoadStatistics&
> current, const ResourceLoadStatistics& updated) const
> {
>     if (!current.hadUserInteraction && !updated.hadUserInteraction)
>         return false;
> 
>     auto mostRecentUserInteractionTime =
> std::max(current.mostRecentUserInteractionTime,
> updated.mostRecentUserInteractionTime);
> 
>     return updated.lastSeen <= mostRecentUserInteractionTime + 24_h;
> }
> 
> 
> From my understanding, this was checking if the "last seen" time happened
> within the last 24hrs since the user interaction. If so, the resource was
> accessed as a first party. John, let me know if this seems incorrect.

I believe Brent implemented wasAccessedAsFirstPartyDueToUserInteraction() which is why I don't recognize this logic. I think it's fine as-is if it matches what we have in the memory store.
Comment 11 Kate Cheney 2019-10-01 12:48:52 PDT
Created attachment 379932 [details]
Patch
Comment 12 Kate Cheney 2019-10-01 12:52:54 PDT
(In reply to John Wilander from comment #10)
> (In reply to Katherine_cheney from comment #7)
> > > Comment on attachment 379866 [details]
> > > Patch
> > > 
> > > View in context:
> > > https://bugs.webkit.org/attachment.cgi?id=379866&action=review
> > 
> > Thanks for all the feedback everyone!
> > 
> > 
> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.
> > cpp:547
> > > >> +    return other.lastSeen <= mostRecentUserInteractionTime + 24_h;
> > > > 
> > > > Where do these 24 hours come from? I don't understand their meaning here. Is something missing in the function name to explain the 24 hours?
> > > 
> > > This looks like some hold-over from the 24-hour window for user interaction
> > > (which is no longer a thing). Kate: Check what this code does in the
> > > in-memory store. We should match that behavior.
> > > 
> > 
> > I copied this functionality from the following function in the
> > ResourceLoadStatisticsMemoryStore: 
> > 
> > bool
> > ResourceLoadStatisticsMemoryStore::
> > wasAccessedAsFirstPartyDueToUserInteraction(const ResourceLoadStatistics&
> > current, const ResourceLoadStatistics& updated) const
> > {
> >     if (!current.hadUserInteraction && !updated.hadUserInteraction)
> >         return false;
> > 
> >     auto mostRecentUserInteractionTime =
> > std::max(current.mostRecentUserInteractionTime,
> > updated.mostRecentUserInteractionTime);
> > 
> >     return updated.lastSeen <= mostRecentUserInteractionTime + 24_h;
> > }
> > 
> > 
> > From my understanding, this was checking if the "last seen" time happened
> > within the last 24hrs since the user interaction. If so, the resource was
> > accessed as a first party. John, let me know if this seems incorrect.
> 
> I believe Brent implemented wasAccessedAsFirstPartyDueToUserInteraction()
> which is why I don't recognize this logic. I think it's fine as-is if it
> matches what we have in the memory store.

I interpreted Brent's last comment to mean that I should remove the updating of wasAccessedAsFirstPartyDueToUserInteraction in both the memory and database store, so I took them out in this patch. But I can add the checks back in if they are still applicable to ITP.
Comment 13 Brent Fulgham 2019-10-01 13:25:28 PDT
(In reply to Katherine_cheney from comment #12)

> I interpreted Brent's last comment to mean that I should remove the updating
> of wasAccessedAsFirstPartyDueToUserInteraction in both the memory and
> database store, so I took them out in this patch. But I can add the checks
> back in if they are still applicable to ITP.

Correct. This is old junk we should get rid of (in both places).
Comment 14 Kate Cheney 2019-10-01 17:46:59 PDT
Created attachment 379975 [details]
Patch
Comment 15 Kate Cheney 2019-10-01 17:53:05 PDT
found a bug in WKWebsiteDataStoreRef, and wrote another test to be safe.
Comment 16 Brent Fulgham 2019-10-02 10:16:58 PDT
Comment on attachment 379975 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=379975&action=review

Looks good. Please flip the public/private on that handful of statements and I think we're good here.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:220
> +        RELEASE_LOG_ERROR(Network, "%p - ResourceLoadStatisticsDatabaseStore::open failed, error message: %{private}s, database path: %{private}s", this, m_database.lastErrorMsg(), m_storageDirectoryPath.utf8().data());

This can be %public. The location of the database file is not private information.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:230
> +            RELEASE_LOG_ERROR(Network, "%p - ResourceLoadStatisticsDatabaseStore::createSchema failed, error message: %{private}s, database path: %{private}s", this, m_database.lastErrorMsg(), m_storageDirectoryPath.utf8().data());

This can be %public. This just creates empty tables, and error messages about that is not private information.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:240
> +        RELEASE_LOG_ERROR(Network, "%p - ResourceLoadStatisticsDatabaseStore::prepareStatements failed, error message: %{private}s, database path: %{private}s", this, m_database.lastErrorMsg(), m_storageDirectoryPath.utf8().data());

This can be %public. This just prepares parameterized SQL statements, which does not contain private information.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:358
> +        RELEASE_LOG_ERROR(Network, "%p - ResourceLoadStatisticsDatabaseStore::prepareStatements failed to prepare, error message: %{private}s", this, m_database.lastErrorMsg());

The prepare-statement error messages can go in %public. There is not private information in this query (just creating empty tables and preparing SQL calls).
Comment 17 Kate Cheney 2019-10-02 10:21:22 PDT
Thanks Brent! I am mildly concerned about the failing mac-debug-wk1 tests because of what happened on the testing patch, I'm looking into it now to make sure they are unrelated.
Comment 18 Brent Fulgham 2019-10-02 10:22:57 PDT
(In reply to Katherine_cheney from comment #17)
> Thanks Brent! I am mildly concerned about the failing mac-debug-wk1 tests
> because of what happened on the testing patch, I'm looking into it now to
> make sure they are unrelated.

I looked at them, and don't think they are related:

1. Other patches I have reviewed today show that Mac-debug-wk1 failure, too. Not just your patch.
2. The crash signature is in JSC::JSLock::DropAllLocks::~DropAllLock, and seems to be related to some JIT change.
Comment 19 Kate Cheney 2019-10-02 10:46:36 PDT
Created attachment 380034 [details]
Patch
Comment 20 Brent Fulgham 2019-10-02 11:23:03 PDT
Comment on attachment 380034 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=380034&action=review

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:237
> +        RELEASE_LOG_ERROR(Network, "%p - ResourceLoadStatisticsDatabaseStore::turnOnIncrementalAutoVacuum failed, error message: %{private}s", this, m_database.lastErrorMsg());

Is this the right patch? It looks like this is still private (I suggested public in my review).
Comment 21 Kate Cheney 2019-10-02 11:27:13 PDT
Created attachment 380041 [details]
Patch
Comment 22 Kate Cheney 2019-10-02 11:27:36 PDT
(In reply to Brent Fulgham from comment #20)
> Comment on attachment 380034 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=380034&action=review
> 
> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:237
> > +        RELEASE_LOG_ERROR(Network, "%p - ResourceLoadStatisticsDatabaseStore::turnOnIncrementalAutoVacuum failed, error message: %{private}s", this, m_database.lastErrorMsg());
> 
> Is this the right patch? It looks like this is still private (I suggested
> public in my review).

Sorry, missed that one. Should be ok now.
Comment 23 WebKit Commit Bot 2019-10-02 12:22:02 PDT
Comment on attachment 380041 [details]
Patch

Clearing flags on attachment: 380041

Committed r250621: <https://trac.webkit.org/changeset/250621>
Comment 24 WebKit Commit Bot 2019-10-02 12:22:04 PDT
All reviewed patches have been landed.  Closing bug.