Bug 195088 - Implement Telemetry and Dumping Routines for SQLite backend
Summary: Implement Telemetry and Dumping Routines for SQLite backend
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Katherine_cheney
URL:
Keywords: InRadar
Depends on: 194867
Blocks: 195419
  Show dependency treegraph
 
Reported: 2019-02-26 19:19 PST by Brent Fulgham
Modified: 2019-10-08 15:13 PDT (History)
9 users (show)

See Also:


Attachments
Patch (41.46 KB, patch)
2019-08-29 11:03 PDT, Katherine_cheney
no flags Details | Formatted Diff | Diff
Patch (39.27 KB, patch)
2019-08-29 13:49 PDT, Katherine_cheney
no flags Details | Formatted Diff | Diff
sketch1.cpp (7.03 KB, text/plain)
2019-08-30 08:30 PDT, Sam Weinig
no flags Details
sketch2.cpp (5.84 KB, text/plain)
2019-08-30 08:31 PDT, Sam Weinig
no flags Details
sketch3.cpp (5.37 KB, text/plain)
2019-08-30 08:32 PDT, Sam Weinig
no flags Details
sketch2.cpp (5.60 KB, text/plain)
2019-08-30 08:47 PDT, Sam Weinig
no flags Details
sketch3.cpp (5.16 KB, text/plain)
2019-08-30 08:47 PDT, Sam Weinig
no flags Details
Patch (28.91 KB, patch)
2019-08-30 18:32 PDT, Katherine_cheney
no flags Details | Formatted Diff | Diff
Patch (28.91 KB, patch)
2019-08-30 20:08 PDT, Katherine_cheney
no flags Details | Formatted Diff | Diff
Patch (30.41 KB, patch)
2019-09-03 15:14 PDT, Katherine_cheney
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews212 for win-future (13.58 MB, application/zip)
2019-09-03 16:41 PDT, Build Bot
no flags Details
Patch (86.46 KB, patch)
2019-10-03 10:53 PDT, Katherine_cheney
no flags Details | Formatted Diff | Diff
Patch (87.28 KB, patch)
2019-10-08 14:47 PDT, Katherine_cheney
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brent Fulgham 2019-02-26 19:19:42 PST
Implement the telemetry and record dumping routines for the SQLite version of the ITP infrastructure once the core features are in place.
Comment 1 Radar WebKit Bug Importer 2019-08-12 09:58:25 PDT
<rdar://problem/54213407>
Comment 2 Katherine_cheney 2019-08-29 11:03:43 PDT
Created attachment 377597 [details]
Patch
Comment 3 Brent Fulgham 2019-08-29 11:36:07 PDT
Comment on attachment 377597 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377597&action=review

I think this looks good overall, but could be made better by factoring out some of the common SQL logic into 'make' functions that would allow for easier maintenance in the future. r- to fix that.

> Source/WebKit/ChangeLog:4
> +        https://bugs.webkit.org/show_bug.cgi?id=195088

Please always include the radar number here underneath the Bugzilla line:
<rdar://problem/54213407>

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:518
> +    if ((prevalentResourcesWithUserInteractionCount & 1)) {

What is the significance of the bitwise & here? Are you checking for an odd number?

If this is just a check for non-zero we would typically write this as "if (prevalentResourcesWIthUserInteractionCount)"

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:531
> +        SQLiteStatement upperMedianDaysSinceUIStatement(m_database, makeString("SELECT mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ", pre, " AND hadUserInteraction = 1 ", post, ") as q ON ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ", String::number((prevalentResourcesWithUserInteractionCount+1) / 2)));

The three SQL statements in this section are almost identical. I suggest you write a generator function that returns the appropriate SQLLiteStatement based on inputs:

static SQLiteStatement makeMedianQuery(SQLDatabase& database, int count, const String& pre, const String& post) {
    return SQLiteStatement(database, makeString("SELECT mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ", pre, " AND hadUserInteraction = 1 ", post, ") as q ON ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ", String::number(count)));
}

Then you can just say:

SQLiteStatement medianDaysSinceUIStatement = makeMedianQuery(prevalentResourcesWithUserInteractionCount / 2, pre, post);

and

SQLiteStatement lowerMedianDaysSinceUIStatement = makeMedianQuery((prevalentResourcesWithUserInteractionCount - 1) / 2 pre, post);

and

SQLiteStatement upperMedianDaysSinceUIStatement = makeMedianQuery((prevalentResourcesWithUserInteractionCount + 1) / 2 pre, post);

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:538
> +                double rawSecondsUpper = upperMedianDaysSinceUIStatement.getColumnDouble(0);

Very nice!

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:624
> +        SQLiteStatement upperMedianStatistic(m_database, makeString(preamble, pre, " AND hadUserInteraction = 0 ", post, end, String::number((range+1) / 2)));

Again, I think these three median statements could be converted into. a 'make' function.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:680
> +    data.medianDaysSinceUserInteractionPrevalentResourceWithUserInteraction = getMedianOfPrevalentResourcesWithUserInteraction(data.numberOfPrevalentResourcesWithUserInteraction, joinPrevalentResourcesQueryPre, joinPrevalentResourcesQueryPost);

Someday we need to rename these to shorter labels.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:-508
> -    // FIXME(195088): Implement for Database version.

Great!
Comment 4 Katherine_cheney 2019-08-29 11:51:39 PDT
(In reply to Brent Fulgham from comment #3)
> Comment on attachment 377597 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=377597&action=review
> 
> I think this looks good overall, but could be made better by factoring out
> some of the common SQL logic into 'make' functions that would allow for
> easier maintenance in the future. r- to fix that.
> 

On it! Thanks for all the feedback.

> > Source/WebKit/ChangeLog:4
> > +        https://bugs.webkit.org/show_bug.cgi?id=195088
> 
> Please always include the radar number here underneath the Bugzilla line:
> <rdar://problem/54213407>
> 
> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:518
> > +    if ((prevalentResourcesWithUserInteractionCount & 1)) {
> 
> What is the significance of the bitwise & here? Are you checking for an odd
> number?
> 
> If this is just a check for non-zero we would typically write this as "if
> (prevalentResourcesWIthUserInteractionCount)"
> 

It checks for an odd number to calculate the median. If the total number is odd, it can just take the middle entry. Otherwise it averages the "lower" and "upper" middle entry. I copied the logic from the Memory Store median function: 

"
 if (size % 2)
     return v[middle];
 return (v[middle - 1] + v[middle]) / 2;
"

But I actually found a mistake on my part -- I add [middle + 1] instead of [middle] as the second argument, so I will update that.

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:531
> > +        SQLiteStatement upperMedianDaysSinceUIStatement(m_database, makeString("SELECT mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ", pre, " AND hadUserInteraction = 1 ", post, ") as q ON ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ", String::number((prevalentResourcesWithUserInteractionCount+1) / 2)));
> 
> The three SQL statements in this section are almost identical. I suggest you
> write a generator function that returns the appropriate SQLLiteStatement
> based on inputs:
> 
> static SQLiteStatement makeMedianQuery(SQLDatabase& database, int count,
> const String& pre, const String& post) {
>     return SQLiteStatement(database, makeString("SELECT
> mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ",
> pre, " AND hadUserInteraction = 1 ", post, ") as q ON
> ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ",
> String::number(count)));
> }
> 
> Then you can just say:
> 
> SQLiteStatement medianDaysSinceUIStatement =
> makeMedianQuery(prevalentResourcesWithUserInteractionCount / 2, pre, post);
> 
> and
> 
> SQLiteStatement lowerMedianDaysSinceUIStatement =
> makeMedianQuery((prevalentResourcesWithUserInteractionCount - 1) / 2 pre,
> post);
> 
> and
> 
> SQLiteStatement upperMedianDaysSinceUIStatement =
> makeMedianQuery((prevalentResourcesWithUserInteractionCount + 1) / 2 pre,
> post);
> 
> > 

Good idea! I'll implement it in the next upload.

Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:680
> > +    data.medianDaysSinceUserInteractionPrevalentResourceWithUserInteraction = getMedianOfPrevalentResourcesWithUserInteraction(data.numberOfPrevalentResourcesWithUserInteraction, joinPrevalentResourcesQueryPre, joinPrevalentResourcesQueryPost);
> 
> Someday we need to rename these to shorter labels.
> 

I will also change this.

Thanks again!
Comment 5 Katherine_cheney 2019-08-29 13:49:56 PDT
Created attachment 377623 [details]
Patch
Comment 6 Sam Weinig 2019-08-29 14:42:00 PDT
Comment on attachment 377623 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377623&action=review

Thanks for working on this. 

I'm a little concerned with the amount of string concatenation going on around the preparation of the SQLiteStatements. This is traditionally something to be avoided in order to reduce the chance for sql injection attacks.

> Source/WebKit/ChangeLog:10
> +        Implemented database telemetry calculating for ITP. Mimicked ResourceLoadStatisticsMemoryStore
> +        telemetry logging behavior using SQLite Queries as opposed to vector sorting/manipulation.

It would be good to explain the motivation for this.

Also, it looks like this doesn't have any new tests associated with it. Is this code path already tested?

> Source/WebKit/ChangeLog:27
> +        * NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::ResourceLoadStatisticsDatabaseStore):
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::prepareStatements):
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction const):
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::getNumberOfPrevalentResources const):
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::getNumberOfPrevalentResourcesWithUI const):
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::getTopPrevelentResourceDaysSinceUI const):
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::getMedianStatisticOfPrevalentResourceWithoutUserInteraction const):
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::getNumberOfPrevalentResourcesInTopResources const):
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::calculateTelemetryData const):
> +        (WebKit::ResourceLoadStatisticsDatabaseStore::calculateAndSubmitTelemetry const):
> +        * NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:
> +        * NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.cpp:
> +        (WebKit::databaseSubmitTopLists):
> +        (WebKit::WebResourceLoadStatisticsTelemetry::submitTelemetry):
> +        * NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.h:

Please fill in the per-function/per-file parts of the ChangeLog as well. Please include both what is changing and why the change is being made.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:518
> +    return SQLiteStatement(database, makeString("SELECT mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ", pre, " AND hadUserInteraction = 1 ", post, ") as q ON ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ", String::number(count)));

The String::number() shouldn't be needed. You can pass count directly to makeString().

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:527
> +// 1: subframe
> +// 2: subresource
> +// 3: unique redirects
> +// 4: number of times data records were removed
> +// 5: number of times accessed as first party due to user interaction
> +// 6: number of times accessed as first party due to storage access api
> +static SQLiteStatement makeMedianWithoutUIQuery(SQLiteDatabase& database, int offset, int statistic, const String& pre, const String& post)

int statistic should probably be an enum. Then, this comment would not be necessary.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:530
> +    String preamble;
> +    String end;

These can be StringViews or const char*.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:534
> +    case 1: preamble = "SELECT cnt1 FROM ObservedDomains o INNER JOIN(SELECT cnt1, ";
> +        end = ") as q ON o.domainID = q.domainID LIMIT 1 OFFSET ";

preamble = ... should go on the line after the case statement.

I think you should also consider pull out this switch statement into it's own static function that returns a std::pair<StringView, StringView> or std::pair<const char*, const char*>. Then you can call that function and use structured binding to do something like:

auto& [preamble, end] = statementPartsForStatistic(statistic);

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:552
> +    default:
> +        LOG_ERROR("ResourceLoadStatisticsDatabaseStore::getMedianStatisticOfPrevalentResourceWithoutUserInteraction only supports statistics 1-6. Undetermined Query behavior will result.");

This should be a RELEASE_ASSERT_NOT_REACHED(); Or, if you turn statistic into a enum, you can remove this entirely.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:555
> +    return SQLiteStatement(database, makeString(preamble, pre, " AND hadUserInteraction = 0 ", post, end, String::number(offset)));

String::number() should not be needed.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:588
> +    return (m_countPrevalentResourcesStatement.step() == SQLITE_ROW) ? m_countPrevalentResourcesStatement.getColumnInt(0) : -1;

Returning -1 in a function that returns an unsigned seems wrong. I am guessing getColumnInt also returns an int, not an unsigned. This should probably return an int. You should also consider using Optional<> or WTF::notFound (which is just a named constant for -1) instead of -1.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:593
> +    return (m_countPrevalentResourcesWithUserInteractionStatement.step() == SQLITE_ROW) ? m_countPrevalentResourcesWithUserInteractionStatement.getColumnInt(0) : -1;

Same comment as above.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:596
> +unsigned ResourceLoadStatisticsDatabaseStore::getTopPrevelentResourceDaysSinceUI(const String& pre, const String& post) const

pre and post are always the same here since there is only one caller. Do you imagine other callers in the future with different pre and post? If not, probably best to remove the parameters and either inline pre and post, or if they need to be in a shared location, move them into their own static functions. It would also be nice to give them slightly clearer names than pre and post, since while that does tell me where in the statement they go, it doesn't tell me what they do.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:611
> +unsigned ResourceLoadStatisticsDatabaseStore::getMedianStatisticOfPrevalentResourceWithoutUserInteraction(int range, int statistic, int numberOfPrevalentResourcesWithoutUI, const String& pre, const String& post) const

Same points about pre/post as above.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:639
> +    SQLiteStatement prevalentResourceCountInTop(m_database, makeString("SELECT COUNT(*) FROM (SELECT * FROM ObservedDomains o INNER JOIN(SELECT ", pre, " ", post, ") as q on q.domainID = o.domainID LIMIT ", String::number(range), ") as p WHERE p.hadUserInteraction = 1;"));

I think the idiomatic way of doing something like this to create one cached SQLiteStatement and then binding the changing variable using a bind function (e.g. SQLiteStatement::bindInt()). In this case, that would be range. It looks like prevalentResourceCount is unused.

Same points about pre/post as above.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:59
> +    unsigned numberOfPrevalentResourcesWithUITop1;
> +    unsigned numberOfPrevalentResourcesWithUITop3;
> +    unsigned numberOfPrevalentResourcesWithUITop10;
> +    unsigned numberOfPrevalentResourcesWithUITop50;
> +    unsigned numberOfPrevalentResourcesWithUITop100;

If these were std::tuples, or a new struct (e.g. struct Foo { unsigned UITop1; ...unsigned UITop100; };) I think this would be easier to work with. A bunch of the duplicated code in ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and databaseSubmitTopLists could be turned into loops.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.h:30
> +#include "ResourceLoadStatisticsDatabaseStore.h"

This shouldn't be needed if you forward declare PrevalentResourceDatabaseTelemetry.
Comment 7 Sam Weinig 2019-08-29 14:42:42 PDT
Brady, can you take a look over the SQLLite parts and provide feedback on best practices.
Comment 8 Katherine_cheney 2019-08-29 17:18:10 PDT
(In reply to Sam Weinig from comment #6)
> Comment on attachment 377623 [details]
> Patch
> 
> If these were std::tuples, or a new struct (e.g. struct Foo { unsigned
> UITop1; ...unsigned UITop100; };) I think this would be easier to work with.
> A bunch of the duplicated code in
> ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and
> databaseSubmitTopLists could be turned into loops.


Thanks for all the feedback! I'm working on changes now. Quick followup: for the std::tuples within the struct, did you envision 5 tuples categorized by top1, top3, top10..., or tuples categorized by subframe, subresource, uniqueRedirects, etc? The cons I see for the topX tuples would be confusion over which index indicates what statistic. The cons for the statistic tuples would be that if there were only, say 4 resources, the top1 and top3 statistics to be logged are split up across all tuples. I wasn't sure which would work best and I wanted your thoughts.
Comment 9 Sam Weinig 2019-08-30 08:30:32 PDT
Created attachment 377709 [details]
sketch1.cpp
Comment 10 Sam Weinig 2019-08-30 08:31:21 PDT Comment hidden (obsolete)
Comment 11 Sam Weinig 2019-08-30 08:32:00 PDT Comment hidden (obsolete)
Comment 12 Sam Weinig 2019-08-30 08:47:06 PDT
Created attachment 377714 [details]
sketch2.cpp
Comment 13 Sam Weinig 2019-08-30 08:47:46 PDT
Created attachment 377715 [details]
sketch3.cpp
Comment 14 Sam Weinig 2019-08-30 08:50:18 PDT
(In reply to Katherine_cheney from comment #8)
> (In reply to Sam Weinig from comment #6)
> > Comment on attachment 377623 [details]
> > Patch
> > 
> > If these were std::tuples, or a new struct (e.g. struct Foo { unsigned
> > UITop1; ...unsigned UITop100; };) I think this would be easier to work with.
> > A bunch of the duplicated code in
> > ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and
> > databaseSubmitTopLists could be turned into loops.
> 
> 
> Thanks for all the feedback! I'm working on changes now. Quick followup: for
> the std::tuples within the struct, did you envision 5 tuples categorized by
> top1, top3, top10..., or tuples categorized by subframe, subresource,
> uniqueRedirects, etc? The cons I see for the topX tuples would be confusion
> over which index indicates what statistic. The cons for the statistic tuples
> would be that if there were only, say 4 resources, the top1 and top3
> statistics to be logged are split up across all tuples. I wasn't sure which
> would work best and I wanted your thoughts.

I'm not sure which is the best, but I sketched out a few options and attached them:

sketch1.cpp:

This first one just switches to using a std::array<unsigned, 5> for each statistic kind allowing us to get rid of much of the repetition, and uses an enum + lookup function bucketSize() to represent the bucketing.


sketch2.cpp

This second one expands on the first, removing the enum of bucketing sizes, and replacing it with a lookup table (which is just a static std::array) called bucketSizes. This allows us to remove the explicit if-statements and replace them with a loop. The benefit here is that if we wanted to add more bucket sizes in the future, we would only have to change numberOfBucketsPerStatistic and bucketSizes.


sketch3.cpp


This third one expands on the second, replacing the explicit listing of all the statistics in the PrevalentResourceDatabaseTelemetry, with a an enum and a std::array. So now, all the statistics are stored in an 2d-array. This requires adding lookup functions for the diagnostic description (makeDescription) and the statistic generation (calculateStatistic), but has the benefit of making it a compile error if a new type of Statistic is added to the enum, but not updated everywhere.



At this point, it's kind of a complexity vs. length question. sketch2 is the shortest, but makes adding new statistics a little tricky. I'd suggest going with something akin to sketch3, to future proof it as much as possible.
Comment 15 Katherine_cheney 2019-08-30 18:32:17 PDT
Created attachment 377777 [details]
Patch
Comment 16 Katherine_cheney 2019-08-30 18:32:57 PDT
(In reply to Sam Weinig from comment #14)
> (In reply to Katherine_cheney from comment #8)
> > (In reply to Sam Weinig from comment #6)
> > > Comment on attachment 377623 [details]
> > > Patch
> > > 
> > > If these were std::tuples, or a new struct (e.g. struct Foo { unsigned
> > > UITop1; ...unsigned UITop100; };) I think this would be easier to work with.
> > > A bunch of the duplicated code in
> > > ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and
> > > databaseSubmitTopLists could be turned into loops.
> > 
> > 
> > Thanks for all the feedback! I'm working on changes now. Quick followup: for
> > the std::tuples within the struct, did you envision 5 tuples categorized by
> > top1, top3, top10..., or tuples categorized by subframe, subresource,
> > uniqueRedirects, etc? The cons I see for the topX tuples would be confusion
> > over which index indicates what statistic. The cons for the statistic tuples
> > would be that if there were only, say 4 resources, the top1 and top3
> > statistics to be logged are split up across all tuples. I wasn't sure which
> > would work best and I wanted your thoughts.
> 
> I'm not sure which is the best, but I sketched out a few options and
> attached them:
> 
> sketch1.cpp:
> 
> This first one just switches to using a std::array<unsigned, 5> for each
> statistic kind allowing us to get rid of much of the repetition, and uses an
> enum + lookup function bucketSize() to represent the bucketing.
> 
> 
> sketch2.cpp
> 
> This second one expands on the first, removing the enum of bucketing sizes,
> and replacing it with a lookup table (which is just a static std::array)
> called bucketSizes. This allows us to remove the explicit if-statements and
> replace them with a loop. The benefit here is that if we wanted to add more
> bucket sizes in the future, we would only have to change
> numberOfBucketsPerStatistic and bucketSizes.
> 
> 
> sketch3.cpp
> 
> 
> This third one expands on the second, replacing the explicit listing of all
> the statistics in the PrevalentResourceDatabaseTelemetry, with a an enum and
> a std::array. So now, all the statistics are stored in an 2d-array. This
> requires adding lookup functions for the diagnostic description
> (makeDescription) and the statistic generation (calculateStatistic), but has
> the benefit of making it a compile error if a new type of Statistic is added
> to the enum, but not updated everywhere.
> 
> 
> 
> At this point, it's kind of a complexity vs. length question. sketch2 is the
> shortest, but makes adding new statistics a little tricky. I'd suggest going
> with something akin to sketch3, to future proof it as much as possible.


I liked option #3 as well, thanks for helping out so much with this.

I converted all of the PrevelentResourceDatabaseTelemetry struct values to Optional<unsigned> so that WTF::nullopt will be returned if any SQL queries fail (instead of -1). I thought logging a crash on a Optional<unsigned>.value() call would be more useful than seeing a large unsigned integer value logged as a statistic. 

I debated using bindInt() for the queries with numbers, but since most of the queries have to be concatenated with other strings at runtime anyways, I thought it was less verbose to concatenate the numbers with the strings vs preparing the statement then binding integers. Does this have security implications?

Implementing the test infrastructure is a whole other patch I’ll be working on next. I manually enabled the ITP database and ran tests locally and they all ran as expected. I also clicked around a bunch and manually checked the database telemetry results against the memory store results. Since this only copies exactly what the memory store already does, I think its ok to integrate without full testing until the next patch.
Comment 17 Katherine_cheney 2019-08-30 20:08:18 PDT
Created attachment 377780 [details]
Patch
Comment 18 Sam Weinig 2019-08-31 15:45:33 PDT
Comment on attachment 377780 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377780&action=review

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:567
> +    auto[queryStart, queryEnd] = buildQueryStartAndEnd(statistic);

please add a space after auto. (the style checker may yell at you, but it is wrong).

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:578
> +    if (medianDaysSinceUIStatement.prepare() == SQLITE_OK) {
> +        if (medianDaysSinceUIStatement.step() == SQLITE_ROW) {

You should be able to combine these by using SQLiteStatement::prepareAndStep().  I think it would probably also read clearer if you returned early with WTF::nullopt, if the sql statement fails.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:586
> +    if (prevalentResourcesWithUserInteractionCount & 1)
> +        return median;

It seems like this could return an uninitialized value medianDaysSinceUIStatement.prepare() or medianDaysSinceUIStatement.step() somehow fail.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:637
> +    SQLiteStatement Statistic = makeMedianWithoutUIQuery(database, (range / 2), statistic);

Statistic should have a lowercase first letter, but since you are already using the variable name statistic, perhaps statisticStatement would work.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:645
> +    if (range & 1)
> +        return median;

Same issue with a potentially uninitialized value. Also, this median is an int being returned as an unsigned.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.cpp:268
> +static String makeDescription(PrevalentResourceDatabaseTelemetry::Statistic statistic)

This should return a const char* or an ASCIILiteral.
Comment 19 Sam Weinig 2019-08-31 15:50:21 PDT
(In reply to Katherine_cheney from comment #16)
> (In reply to Sam Weinig from comment #14)
> > (In reply to Katherine_cheney from comment #8)
> > > (In reply to Sam Weinig from comment #6)
> > > > Comment on attachment 377623 [details]
> > > > Patch
> > > > 
> > > > If these were std::tuples, or a new struct (e.g. struct Foo { unsigned
> > > > UITop1; ...unsigned UITop100; };) I think this would be easier to work with.
> > > > A bunch of the duplicated code in
> > > > ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and
> > > > databaseSubmitTopLists could be turned into loops.
> > > 
> > > 
> > > Thanks for all the feedback! I'm working on changes now. Quick followup: for
> > > the std::tuples within the struct, did you envision 5 tuples categorized by
> > > top1, top3, top10..., or tuples categorized by subframe, subresource,
> > > uniqueRedirects, etc? The cons I see for the topX tuples would be confusion
> > > over which index indicates what statistic. The cons for the statistic tuples
> > > would be that if there were only, say 4 resources, the top1 and top3
> > > statistics to be logged are split up across all tuples. I wasn't sure which
> > > would work best and I wanted your thoughts.
> > 
> > I'm not sure which is the best, but I sketched out a few options and
> > attached them:
> > 
> > sketch1.cpp:
> > 
> > This first one just switches to using a std::array<unsigned, 5> for each
> > statistic kind allowing us to get rid of much of the repetition, and uses an
> > enum + lookup function bucketSize() to represent the bucketing.
> > 
> > 
> > sketch2.cpp
> > 
> > This second one expands on the first, removing the enum of bucketing sizes,
> > and replacing it with a lookup table (which is just a static std::array)
> > called bucketSizes. This allows us to remove the explicit if-statements and
> > replace them with a loop. The benefit here is that if we wanted to add more
> > bucket sizes in the future, we would only have to change
> > numberOfBucketsPerStatistic and bucketSizes.
> > 
> > 
> > sketch3.cpp
> > 
> > 
> > This third one expands on the second, replacing the explicit listing of all
> > the statistics in the PrevalentResourceDatabaseTelemetry, with a an enum and
> > a std::array. So now, all the statistics are stored in an 2d-array. This
> > requires adding lookup functions for the diagnostic description
> > (makeDescription) and the statistic generation (calculateStatistic), but has
> > the benefit of making it a compile error if a new type of Statistic is added
> > to the enum, but not updated everywhere.
> > 
> > 
> > 
> > At this point, it's kind of a complexity vs. length question. sketch2 is the
> > shortest, but makes adding new statistics a little tricky. I'd suggest going
> > with something akin to sketch3, to future proof it as much as possible.
> 
> 
> I liked option #3 as well, thanks for helping out so much with this.
> 
> I converted all of the PrevelentResourceDatabaseTelemetry struct values to
> Optional<unsigned> so that WTF::nullopt will be returned if any SQL queries
> fail (instead of -1). I thought logging a crash on a
> Optional<unsigned>.value() call would be more useful than seeing a large
> unsigned integer value logged as a statistic. 
> 
> I debated using bindInt() for the queries with numbers, but since most of
> the queries have to be concatenated with other strings at runtime anyways, I
> thought it was less verbose to concatenate the numbers with the strings vs
> preparing the statement then binding integers. Does this have security
> implications?

It's been a long time since I did had to write any sql code, so I'd prefer if someone with more recent knowledge would weigh in. I cc'd Brady Eidson, who wrote a lot of the SQLite in WebCore. While I don't think any of the variables used in concatenation are directly user controllable, SQL Injection attacks (https://en.wikipedia.org/wiki/SQL_injection) always loom large in my head when I see code like this.

> 
> Implementing the test infrastructure is a whole other patch I’ll be working
> on next. I manually enabled the ITP database and ran tests locally and they
> all ran as expected. I also clicked around a bunch and manually checked the
> database telemetry results against the memory store results. Since this only
> copies exactly what the memory store already does, I think its ok to
> integrate without full testing until the next patch.

I think it would probably make sense to hold off on landing this fix until you have some infrastructure for testing in place, unless there is a pressing need to get this landed quickly.
Comment 20 Brent Fulgham 2019-09-03 09:06:49 PDT
(In reply to Sam Weinig from comment #19)
> 
> I think it would probably make sense to hold off on landing this fix until
> you have some infrastructure for testing in place, unless there is a
> pressing need to get this landed quickly.

I think Sam is right, that having testing infrastructure in place would be ideal. However, we are already in a state where the SQL code exists and has to be tested manually. I'm going to have Kate complete this patch, then focus on building out test infrastructure, which was another task I had assigned to her as part of this overall project.
Comment 21 Brady Eidson 2019-09-03 10:12:57 PDT
Comment on attachment 377780 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377780&action=review

There's a *lot* of string concat making these statements.

I understand the uses are complicated. But I think it both hurts future maintainability and makes it hard to reason about injections. I wonder if there's a better way to construct all of them.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:531
> +    return "domainID, "
> +        "(cnt1 + cnt2 + cnt3) as sum "
> +        "FROM ("
> +        "SELECT "
> +            "domainID, "
> +            "COUNT(DISTINCT f.topFrameDomainID) as cnt1, "
> +            "COUNT(DISTINCT r.topFrameDomainID) as cnt2, "
> +            "COUNT(DISTINCT toDomainID) as cnt3 "
> +        "FROM "
> +        "ObservedDomains o "
> +        "LEFT JOIN SubframeUnderTopFrameDomains f ON o.domainID = f.subFrameDomainID "
> +        "LEFT JOIN SubresourceUnderTopFrameDomains r ON o.domainID = r.subresourceDomainID "
> +        "LEFT JOIN SubresourceUniqueRedirectsTo u ON o.domainID = u.subresourceDomainID "
> +        "WHERE isPrevalent = 1";

I agree this string is long enough and structured enough to warrant a multi-line literal like this.
I think all of the quotes get in the way of readability. And with SQL code especially, they actually might hide a bug with the query itself.
I would prefer to see a C++ literal to be more readable and avoid the possible quote bug issue.
https://en.cppreference.com/w/cpp/language/string_literal

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:659
> +    SQLiteStatement prevalentResourceCountInTop(database, makeString("SELECT COUNT(*) FROM (SELECT * FROM ObservedDomains o INNER JOIN(SELECT ", joinSubStatisticsForSortingPreamble(), " ", joinSubStatisticsForSortingEnding(), ") as q on q.domainID = o.domainID LIMIT ", range, ") as p WHERE p.hadUserInteraction = 1;"));

Constructing the query from static compiled strings is fine, but `range` should be a bound parameter.

I agree with Sam that there's no obvious way an attacker controls it, but as code changes over time that observation can also change.
Comment 22 Katherine_cheney 2019-09-03 15:14:22 PDT
Created attachment 377926 [details]
Patch
Comment 23 Katherine_cheney 2019-09-03 15:16:14 PDT
I talked with Brady and reduced concatenation by a good amount through binding and re-writing the queries. Also, the style bots will be mad because they do not like the multi-line comment.
Comment 24 Katherine_cheney 2019-09-03 15:27:25 PDT
(In reply to Katherine_cheney from comment #23)
> I talked with Brady and reduced concatenation by a good amount through
> binding and re-writing the queries. Also, the style bots will be mad because
> they do not like the multi-line comment.

multi-line **string**, not comment
Comment 25 Sam Weinig 2019-09-03 15:59:40 PDT
(In reply to Brent Fulgham from comment #20)
> (In reply to Sam Weinig from comment #19)
> > 
> > I think it would probably make sense to hold off on landing this fix until
> > you have some infrastructure for testing in place, unless there is a
> > pressing need to get this landed quickly.
> 
> I think Sam is right, that having testing infrastructure in place would be
> ideal. However, we are already in a state where the SQL code exists and has
> to be tested manually. I'm going to have Kate complete this patch, then
> focus on building out test infrastructure, which was another task I had
> assigned to her as part of this overall project.

Can the tests land first? Or is there some urgency here?
Comment 26 Brent Fulgham 2019-09-03 16:08:46 PDT
(In reply to Sam Weinig from comment #25)
> (In reply to Brent Fulgham from comment #20)
> > (In reply to Sam Weinig from comment #19)
> > > 
> > > I think it would probably make sense to hold off on landing this fix until
> > > you have some infrastructure for testing in place, unless there is a
> > > pressing need to get this landed quickly.
> > 
> > I think Sam is right, that having testing infrastructure in place would be
> > ideal. However, we are already in a state where the SQL code exists and has
> > to be tested manually. I'm going to have Kate complete this patch, then
> > focus on building out test infrastructure, which was another task I had
> > assigned to her as part of this overall project.
> 
> Can the tests land first? Or is there some urgency here?

I would like to get this landed as-is. As I said in an earlier comment, I have tasked Kate with addressing the full test story in a follow-up.
Comment 27 Sam Weinig 2019-09-03 16:20:20 PDT
(In reply to Brent Fulgham from comment #26)
> (In reply to Sam Weinig from comment #25)
> > (In reply to Brent Fulgham from comment #20)
> > > (In reply to Sam Weinig from comment #19)
> > > > 
> > > > I think it would probably make sense to hold off on landing this fix until
> > > > you have some infrastructure for testing in place, unless there is a
> > > > pressing need to get this landed quickly.
> > > 
> > > I think Sam is right, that having testing infrastructure in place would be
> > > ideal. However, we are already in a state where the SQL code exists and has
> > > to be tested manually. I'm going to have Kate complete this patch, then
> > > focus on building out test infrastructure, which was another task I had
> > > assigned to her as part of this overall project.
> > 
> > Can the tests land first? Or is there some urgency here?
> 
> I would like to get this landed as-is. As I said in an earlier comment, I
> have tasked Kate with addressing the full test story in a follow-up.

Sorry to a stickler here, but why land this without tests?
Comment 28 Brent Fulgham 2019-09-03 16:26:46 PDT
(In reply to Sam Weinig from comment #27)
> > > Can the tests land first? Or is there some urgency here?
> > 
> > I would like to get this landed as-is. As I said in an earlier comment, I
> > have tasked Kate with addressing the full test story in a follow-up.
> 
> Sorry to a stickler here, but why land this without tests?

I guess you are concerned that this experimental code will be broken by someone else in the time between this landing, and later in the week when Kate builds infrastructure to test these database routines (along with all of the other experimental db code)?

The order of operations doesn't seem important for this.
Comment 29 Build Bot 2019-09-03 16:41:26 PDT
Comment on attachment 377926 [details]
Patch

Attachment 377926 [details] did not pass win-ews (win):
Output: https://webkit-queues.webkit.org/results/12994333

New failing tests:
js/dom/customAccessor-defineProperty.html
Comment 30 Build Bot 2019-09-03 16:41:28 PDT
Created attachment 377933 [details]
Archive of layout-test-results from ews212 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews212  Port: win-future  Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Comment 31 Katherine_cheney 2019-09-03 17:26:05 PDT
I will try to have the testing-infrastructure patch up as soon as possible! Also, I am perplexed by the test failures in win... I will look into it.
Comment 32 Chris Dumez 2019-09-03 18:55:09 PDT
(In reply to Katherine_cheney from comment #31)
> I will try to have the testing-infrastructure patch up as soon as possible!
> Also, I am perplexed by the test failures in win... I will look into it.

The win failure is very likely just a flaky test, I would not worry about it.
Comment 33 Sam Weinig 2019-09-06 08:18:11 PDT
Comment on attachment 377926 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377926&action=review

Getting real close.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:516
> +static const String joinSubStatisticsForSorting()

This should return either a StringView or const char*.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:566
> +    auto[queryStart, queryEnd] = buildQueryStartAndEnd(statistic);

space after auto / before the [.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:609
> +static Optional<unsigned> getMedianOfPrevalentResourcesWithUserInteraction(SQLiteDatabase& database, unsigned prevalentResourcesWithUserInteractionCount)
> +{
> +    SQLiteStatement medianDaysSinceUIStatement = makeMedianWithUIQuery(database);
> +    unsigned median;
> +
> +    if (medianDaysSinceUIStatement.prepare() == SQLITE_OK) {
> +        if (medianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK
> +            || medianDaysSinceUIStatement.bindInt(2, (prevalentResourcesWithUserInteractionCount / 2) != SQLITE_OK)) {
> +            RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
> +            ASSERT_NOT_REACHED();
> +            return WTF::nullopt;
> +        }
> +        if (medianDaysSinceUIStatement.step() == SQLITE_ROW) {
> +            double rawSeconds = medianDaysSinceUIStatement.getColumnDouble(0);
> +            WallTime wallTime = WallTime::fromRawSeconds(rawSeconds);
> +            median = wallTime <= WallTime() ? 0 : std::floor((WallTime::now() - wallTime) / 24_h);
> +        }
> +    }
> +
> +    if (prevalentResourcesWithUserInteractionCount & 1)
> +        return median;
> +
> +    SQLiteStatement lowerMedianDaysSinceUIStatement = makeMedianWithUIQuery(database);
> +    if (lowerMedianDaysSinceUIStatement.prepare() == SQLITE_OK) {
> +        if (lowerMedianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK
> +            || lowerMedianDaysSinceUIStatement.bindInt(2, ((prevalentResourcesWithUserInteractionCount - 1) / 2)) != SQLITE_OK) {
> +            RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
> +            ASSERT_NOT_REACHED();
> +            return WTF::nullopt;
> +        }
> +        if (lowerMedianDaysSinceUIStatement.step() == SQLITE_ROW) {
> +            double rawSecondsLower = lowerMedianDaysSinceUIStatement.getColumnDouble(0);
> +            WallTime wallTimeLower = WallTime::fromRawSeconds(rawSecondsLower);
> +            return ((wallTimeLower <= WallTime() ? 0 : std::floor((WallTime::now() - wallTimeLower) / 24_h)) + median) / 2;
> +        }
> +    }
> +
> +    return WTF::nullopt;
> +}

I think this would read cleaner and avoid the uninitialized unsigned would be something like this:

static Optional<unsigned> getMedianOfPrevalentResourcesWithUserInteraction(SQLiteDatabase& database, unsigned prevalentResourcesWithUserInteractionCount)
{
    SQLiteStatement medianDaysSinceUIStatement = makeMedianWithUIQuery(database);

    // Prepare
    if (medianDaysSinceUIStatement.prepare() != SQLITE_OK) {
        RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
        ASSERT_NOT_REACHED();
        return WTF::nullopt;
    }

    // Bind
    if (medianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK || medianDaysSinceUIStatement.bindInt(2, (prevalentResourcesWithUserInteractionCount / 2) != SQLITE_OK)) {
        RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
        ASSERT_NOT_REACHED();
        return WTF::nullopt;
    }

    // Step
    if (medianDaysSinceUIStatement.step() != SQLITE_ROW) {
        RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
        ASSERT_NOT_REACHED();
        return WTF::nullopt;
    }

    double rawSeconds = medianDaysSinceUIStatement.getColumnDouble(0);
    WallTime wallTime = WallTime::fromRawSeconds(rawSeconds);
    unsigned median = wallTime <= WallTime() ? 0 : std::floor((WallTime::now() - wallTime) / 24_h);

    if (prevalentResourcesWithUserInteractionCount & 1)
        return median;

    SQLiteStatement lowerMedianDaysSinceUIStatement = makeMedianWithUIQuery(database);

    // Prepare
    if (lowerMedianDaysSinceUIStatement.prepare() != SQLITE_OK) {
        RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
        ASSERT_NOT_REACHED();
        return WTF::nullopt;
    }

    // Bind
    if (lowerMedianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK || lowerMedianDaysSinceUIStatement.bindInt(2, ((prevalentResourcesWithUserInteractionCount - 1) / 2)) != SQLITE_OK) {
        RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
        ASSERT_NOT_REACHED();
        return WTF::nullopt;
    }

    // Step
    if (lowerMedianDaysSinceUIStatement.step() != SQLITE_ROW) {
        RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
        ASSERT_NOT_REACHED();
        return WTF::nullopt;
    }

    double rawSecondsLower = lowerMedianDaysSinceUIStatement.getColumnDouble(0);
    WallTime wallTimeLower = WallTime::fromRawSeconds(rawSecondsLower);
    return ((wallTimeLower <= WallTime() ? 0 : std::floor((WallTime::now() - wallTimeLower) / 24_h)) + median) / 2;
}

What do you think? You would want to use the same pattern in getTopPrevelentResourceDaysSinceUI and getMedianOfPrevalentResourceWithoutUserInteraction.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:642
> +static Optional<unsigned> getMedianOfPrevalentResourceWithoutUserInteraction(SQLiteDatabase& database, int range, PrevalentResourceDatabaseTelemetry::Statistic statistic, unsigned numberOfPrevalentResourcesWithoutUI)

I think range and median below should both be unsigned.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:648
> +    SQLiteStatement Statistic = makeMedianWithoutUIQuery(database, statistic);

Statistic should start with a lowercase s.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.cpp:268
> +static String makeDescription(PrevalentResourceDatabaseTelemetry::Statistic statistic)

This should return either a StringView or const char*.
Comment 34 Katherine_cheney 2019-09-10 14:56:01 PDT
(In reply to Sam Weinig from comment #33)
> Comment on attachment 377926 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=377926&action=review
> 
> Getting real close.
> 
Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:609
> > +static Optional<unsigned> getMedianOfPrevalentResourcesWithUserInteraction(SQLiteDatabase& database, unsigned prevalentResourcesWithUserInteractionCount)
> > +{
> > +    SQLiteStatement medianDaysSinceUIStatement = makeMedianWithUIQuery(database);
> > +    unsigned median;
> > +
> > +    if (medianDaysSinceUIStatement.prepare() == SQLITE_OK) {
> > +        if (medianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK
> > +            || medianDaysSinceUIStatement.bindInt(2, (prevalentResourcesWithUserInteractionCount / 2) != SQLITE_OK)) {
> > +            RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
> > +            ASSERT_NOT_REACHED();
> > +            return WTF::nullopt;
> > +        }
> > +        if (medianDaysSinceUIStatement.step() == SQLITE_ROW) {
> > +            double rawSeconds = medianDaysSinceUIStatement.getColumnDouble(0);
> > +            WallTime wallTime = WallTime::fromRawSeconds(rawSeconds);
> > +            median = wallTime <= WallTime() ? 0 : std::floor((WallTime::now() - wallTime) / 24_h);
> > +        }
> > +    }
> > +
> > +    if (prevalentResourcesWithUserInteractionCount & 1)
> > +        return median;
> > +
> > +    SQLiteStatement lowerMedianDaysSinceUIStatement = makeMedianWithUIQuery(database);
> > +    if (lowerMedianDaysSinceUIStatement.prepare() == SQLITE_OK) {
> > +        if (lowerMedianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK
> > +            || lowerMedianDaysSinceUIStatement.bindInt(2, ((prevalentResourcesWithUserInteractionCount - 1) / 2)) != SQLITE_OK) {
> > +            RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg());
> > +            ASSERT_NOT_REACHED();
> > +            return WTF::nullopt;
> > +        }
> > +        if (lowerMedianDaysSinceUIStatement.step() == SQLITE_ROW) {
> > +            double rawSecondsLower = lowerMedianDaysSinceUIStatement.getColumnDouble(0);
> > +            WallTime wallTimeLower = WallTime::fromRawSeconds(rawSecondsLower);
> > +            return ((wallTimeLower <= WallTime() ? 0 : std::floor((WallTime::now() - wallTimeLower) / 24_h)) + median) / 2;
> > +        }
> > +    }
> > +
> > +    return WTF::nullopt;
> > +}
> 
> I think this would read cleaner and avoid the uninitialized unsigned would
> be something like this:
> 
> static Optional<unsigned>
> getMedianOfPrevalentResourcesWithUserInteraction(SQLiteDatabase& database,
> unsigned prevalentResourcesWithUserInteractionCount)
> {
>     SQLiteStatement medianDaysSinceUIStatement =
> makeMedianWithUIQuery(database);
> 
>     // Prepare
>     if (medianDaysSinceUIStatement.prepare() != SQLITE_OK) {
>         RELEASE_LOG_ERROR(Network,
> "ResourceLoadStatisticsDatabaseStore::
> getMedianOfPrevalentResourcesWithUserInteraction, error message:
> %{public}s", database.lastErrorMsg());
>         ASSERT_NOT_REACHED();
>         return WTF::nullopt;
>     }
> 
>     // Bind
>     if (medianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK ||
> medianDaysSinceUIStatement.bindInt(2,
> (prevalentResourcesWithUserInteractionCount / 2) != SQLITE_OK)) {
>         RELEASE_LOG_ERROR(Network,
> "ResourceLoadStatisticsDatabaseStore::
> getMedianOfPrevalentResourcesWithUserInteraction, error message:
> %{public}s", database.lastErrorMsg());
>         ASSERT_NOT_REACHED();
>         return WTF::nullopt;
>     }
> 
>     // Step
>     if (medianDaysSinceUIStatement.step() != SQLITE_ROW) {
>         RELEASE_LOG_ERROR(Network,
> "ResourceLoadStatisticsDatabaseStore::
> getMedianOfPrevalentResourcesWithUserInteraction, error message:
> %{public}s", database.lastErrorMsg());
>         ASSERT_NOT_REACHED();
>         return WTF::nullopt;
>     }
> 
>     double rawSeconds = medianDaysSinceUIStatement.getColumnDouble(0);
>     WallTime wallTime = WallTime::fromRawSeconds(rawSeconds);
>     unsigned median = wallTime <= WallTime() ? 0 :
> std::floor((WallTime::now() - wallTime) / 24_h);
> 
>     if (prevalentResourcesWithUserInteractionCount & 1)
>         return median;
> 
>     SQLiteStatement lowerMedianDaysSinceUIStatement =
> makeMedianWithUIQuery(database);
> 
>     // Prepare
>     if (lowerMedianDaysSinceUIStatement.prepare() != SQLITE_OK) {
>         RELEASE_LOG_ERROR(Network,
> "ResourceLoadStatisticsDatabaseStore::
> getMedianOfPrevalentResourcesWithUserInteraction, error message:
> %{public}s", database.lastErrorMsg());
>         ASSERT_NOT_REACHED();
>         return WTF::nullopt;
>     }
> 
>     // Bind
>     if (lowerMedianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK ||
> lowerMedianDaysSinceUIStatement.bindInt(2,
> ((prevalentResourcesWithUserInteractionCount - 1) / 2)) != SQLITE_OK) {
>         RELEASE_LOG_ERROR(Network,
> "ResourceLoadStatisticsDatabaseStore::
> getMedianOfPrevalentResourcesWithUserInteraction, error message:
> %{public}s", database.lastErrorMsg());
>         ASSERT_NOT_REACHED();
>         return WTF::nullopt;
>     }
> 
>     // Step
>     if (lowerMedianDaysSinceUIStatement.step() != SQLITE_ROW) {
>         RELEASE_LOG_ERROR(Network,
> "ResourceLoadStatisticsDatabaseStore::
> getMedianOfPrevalentResourcesWithUserInteraction, error message:
> %{public}s", database.lastErrorMsg());
>         ASSERT_NOT_REACHED();
>         return WTF::nullopt;
>     }
> 
>     double rawSecondsLower =
> lowerMedianDaysSinceUIStatement.getColumnDouble(0);
>     WallTime wallTimeLower = WallTime::fromRawSeconds(rawSecondsLower);
>     return ((wallTimeLower <= WallTime() ? 0 : std::floor((WallTime::now() -
> wallTimeLower) / 24_h)) + median) / 2;
> }
> 
> What do you think? You would want to use the same pattern in
> getTopPrevelentResourceDaysSinceUI and
> getMedianOfPrevalentResourceWithoutUserInteraction.
> 

Yes agreed! This looks much cleaner and is easier to understand.

Also, I uploaded a patch for ITP Database testing (https://bugs.webkit.org/show_bug.cgi?id=195420).
Comment 35 Katherine_cheney 2019-10-03 10:53:51 PDT
Created attachment 380138 [details]
Patch
Comment 36 Katherine_cheney 2019-10-03 10:54:40 PDT
This patch has testing!! Yay
Comment 37 John Wilander 2019-10-04 15:25:29 PDT
Comment on attachment 380138 [details]
Patch

LGTM. Thanks Sam and Brandy for all the comments and suggestions here! I'll r+ and maybe you can get Sam or Brady to cq+ it for you?
Comment 38 Katherine_cheney 2019-10-04 16:15:24 PDT
(In reply to John Wilander from comment #37)
> Comment on attachment 380138 [details]
> Patch
> 
> LGTM. Thanks Sam and Brandy for all the comments and suggestions here! I'll
> r+ and maybe you can get Sam or Brady to cq+ it for you?

Thanks John!
Comment 39 Brent Fulgham 2019-10-08 13:10:27 PDT
Comment on attachment 380138 [details]
Patch

I think this looks fine. r=me.
Comment 40 WebKit Commit Bot 2019-10-08 13:50:40 PDT
Comment on attachment 380138 [details]
Patch

Rejecting attachment 380138 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-01', 'apply-attachment', '--no-update', '--non-interactive', 380138, '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit

Logging in as commit-queue@webkit.org...
Fetching: https://bugs.webkit.org/attachment.cgi?id=380138&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=195088&ctype=xml&excludefield=attachmentdata
Processing 1 patch from 1 bug.
Processing patch 380138 from bug 195088.
Fetching: https://bugs.webkit.org/attachment.cgi?id=380138
Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force', '--reviewer', u'John Wilander']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Parsed 21 diffs from patch file(s).
patching file Source/WebKit/ChangeLog
Hunk #1 succeeded at 1 with fuzz 3.
patching file Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp
Hunk #1 FAILED at 176.
Hunk #2 succeeded at 218 (offset -2 lines).
Hunk #3 succeeded at 257 (offset -2 lines).
Hunk #4 FAILED at 369.
Hunk #5 succeeded at 623 (offset 9 lines).
2 out of 5 hunks FAILED -- saving rejects to file Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp.rej
patching file Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h
Hunk #1 succeeded at 47 (offset 1 line).
Hunk #2 succeeded at 136 with fuzz 1 (offset 2 lines).
Hunk #3 succeeded at 230 (offset -6 lines).
patching file Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp
Hunk #2 succeeded at 1081 (offset 22 lines).
patching file Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h
Hunk #1 succeeded at 188 (offset 1 line).
patching file Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.cpp
patching file Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.h
patching file Source/WebKit/NetworkProcess/NetworkSession.cpp
patching file Source/WebKit/NetworkProcess/NetworkSession.h
patching file Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp
Hunk #1 succeeded at 956 (offset 6 lines).
patching file Source/WebKit/UIProcess/Network/NetworkProcessProxy.h
Hunk #1 succeeded at 232 (offset 1 line).
patching file Source/WebKit/UIProcess/Network/NetworkProcessProxy.messages.in
patching file Tools/ChangeLog
Hunk #1 succeeded at 1 with fuzz 3.
patching file Tools/WebKitTestRunner/InjectedBundle/InjectedBundle.cpp
patching file Tools/WebKitTestRunner/InjectedBundle/TestRunner.cpp
Hunk #1 succeeded at 2008 (offset 31 lines).
patching file Tools/WebKitTestRunner/InjectedBundle/TestRunner.h
patching file LayoutTests/ChangeLog
Hunk #1 succeeded at 1 with fuzz 3.
patching file LayoutTests/http/tests/resourceLoadStatistics/telemetry-generation-advanced-functionality-database-expected.txt
patching file LayoutTests/http/tests/resourceLoadStatistics/telemetry-generation-advanced-functionality-database.html
patching file LayoutTests/http/tests/resourceLoadStatistics/telemetry-generation-basic-functionality-database-expected.txt
patching file LayoutTests/http/tests/resourceLoadStatistics/telemetry-generation-basic-functionality-database.html

Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force', '--reviewer', u'John Wilander']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Full output: https://webkit-queues.webkit.org/results/13107476
Comment 41 Brent Fulgham 2019-10-08 13:53:22 PDT
Bummer. Bitrot! Can you upload a new version and I'll cq+ it asap?
Comment 42 Katherine_cheney 2019-10-08 14:47:49 PDT
Created attachment 380468 [details]
Patch
Comment 43 WebKit Commit Bot 2019-10-08 15:13:46 PDT
Comment on attachment 380468 [details]
Patch

Clearing flags on attachment: 380468

Committed r250866: <https://trac.webkit.org/changeset/250866>
Comment 44 WebKit Commit Bot 2019-10-08 15:13:48 PDT
All reviewed patches have been landed.  Closing bug.