Full Text Bug Listing

Brent Fulgham

Reported 2019-02-26 19:19:42 PST

Implement the telemetry and record dumping routines for the SQLite version of the ITP infrastructure once the core features are in place.

Radar WebKit Bug Importer

Comment 1 2019-08-12 09:58:25 PDT

<rdar://problem/54213407>

Kate Cheney

Comment 2 2019-08-29 11:03:43 PDT

Created attachment 377597 [details] Patch

Brent Fulgham

Comment 3 2019-08-29 11:36:07 PDT

Comment on attachment 377597 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377597&action=review I think this looks good overall, but could be made better by factoring out some of the common SQL logic into 'make' functions that would allow for easier maintenance in the future. r- to fix that. > Source/WebKit/ChangeLog:4 > + https://bugs.webkit.org/show_bug.cgi?id=195088 Please always include the radar number here underneath the Bugzilla line: <rdar://problem/54213407> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:518 > + if ((prevalentResourcesWithUserInteractionCount & 1)) { What is the significance of the bitwise & here? Are you checking for an odd number? If this is just a check for non-zero we would typically write this as "if (prevalentResourcesWIthUserInteractionCount)" > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:531 > + SQLiteStatement upperMedianDaysSinceUIStatement(m_database, makeString("SELECT mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ", pre, " AND hadUserInteraction = 1 ", post, ") as q ON ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ", String::number((prevalentResourcesWithUserInteractionCount+1) / 2))); The three SQL statements in this section are almost identical. I suggest you write a generator function that returns the appropriate SQLLiteStatement based on inputs: static SQLiteStatement makeMedianQuery(SQLDatabase& database, int count, const String& pre, const String& post) { return SQLiteStatement(database, makeString("SELECT mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ", pre, " AND hadUserInteraction = 1 ", post, ") as q ON ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ", String::number(count))); } Then you can just say: SQLiteStatement medianDaysSinceUIStatement = makeMedianQuery(prevalentResourcesWithUserInteractionCount / 2, pre, post); and SQLiteStatement lowerMedianDaysSinceUIStatement = makeMedianQuery((prevalentResourcesWithUserInteractionCount - 1) / 2 pre, post); and SQLiteStatement upperMedianDaysSinceUIStatement = makeMedianQuery((prevalentResourcesWithUserInteractionCount + 1) / 2 pre, post); > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:538 > + double rawSecondsUpper = upperMedianDaysSinceUIStatement.getColumnDouble(0); Very nice! > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:624 > + SQLiteStatement upperMedianStatistic(m_database, makeString(preamble, pre, " AND hadUserInteraction = 0 ", post, end, String::number((range+1) / 2))); Again, I think these three median statements could be converted into. a 'make' function. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:680 > + data.medianDaysSinceUserInteractionPrevalentResourceWithUserInteraction = getMedianOfPrevalentResourcesWithUserInteraction(data.numberOfPrevalentResourcesWithUserInteraction, joinPrevalentResourcesQueryPre, joinPrevalentResourcesQueryPost); Someday we need to rename these to shorter labels. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:-508 > - // FIXME(195088): Implement for Database version. Great!

Kate Cheney

Comment 4 2019-08-29 11:51:39 PDT

(In reply to Brent Fulgham from comment #3) > Comment on attachment 377597 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=377597&action=review > > I think this looks good overall, but could be made better by factoring out > some of the common SQL logic into 'make' functions that would allow for > easier maintenance in the future. r- to fix that. > On it! Thanks for all the feedback. > > Source/WebKit/ChangeLog:4 > > + https://bugs.webkit.org/show_bug.cgi?id=195088 > > Please always include the radar number here underneath the Bugzilla line: > <rdar://problem/54213407> > > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:518 > > + if ((prevalentResourcesWithUserInteractionCount & 1)) { > > What is the significance of the bitwise & here? Are you checking for an odd > number? > > If this is just a check for non-zero we would typically write this as "if > (prevalentResourcesWIthUserInteractionCount)" > It checks for an odd number to calculate the median. If the total number is odd, it can just take the middle entry. Otherwise it averages the "lower" and "upper" middle entry. I copied the logic from the Memory Store median function: " if (size % 2) return v[middle]; return (v[middle - 1] + v[middle]) / 2; " But I actually found a mistake on my part -- I add [middle + 1] instead of [middle] as the second argument, so I will update that. > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:531 > > + SQLiteStatement upperMedianDaysSinceUIStatement(m_database, makeString("SELECT mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ", pre, " AND hadUserInteraction = 1 ", post, ") as q ON ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ", String::number((prevalentResourcesWithUserInteractionCount+1) / 2))); > > The three SQL statements in this section are almost identical. I suggest you > write a generator function that returns the appropriate SQLLiteStatement > based on inputs: > > static SQLiteStatement makeMedianQuery(SQLDatabase& database, int count, > const String& pre, const String& post) { > return SQLiteStatement(database, makeString("SELECT > mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ", > pre, " AND hadUserInteraction = 1 ", post, ") as q ON > ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ", > String::number(count))); > } > > Then you can just say: > > SQLiteStatement medianDaysSinceUIStatement = > makeMedianQuery(prevalentResourcesWithUserInteractionCount / 2, pre, post); > > and > > SQLiteStatement lowerMedianDaysSinceUIStatement = > makeMedianQuery((prevalentResourcesWithUserInteractionCount - 1) / 2 pre, > post); > > and > > SQLiteStatement upperMedianDaysSinceUIStatement = > makeMedianQuery((prevalentResourcesWithUserInteractionCount + 1) / 2 pre, > post); > > > Good idea! I'll implement it in the next upload. Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:680 > > + data.medianDaysSinceUserInteractionPrevalentResourceWithUserInteraction = getMedianOfPrevalentResourcesWithUserInteraction(data.numberOfPrevalentResourcesWithUserInteraction, joinPrevalentResourcesQueryPre, joinPrevalentResourcesQueryPost); > > Someday we need to rename these to shorter labels. > I will also change this. Thanks again!

Kate Cheney

Comment 5 2019-08-29 13:49:56 PDT

Created attachment 377623 [details] Patch

Sam Weinig

Comment 6 2019-08-29 14:42:00 PDT

Comment on attachment 377623 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377623&action=review Thanks for working on this. I'm a little concerned with the amount of string concatenation going on around the preparation of the SQLiteStatements. This is traditionally something to be avoided in order to reduce the chance for sql injection attacks. > Source/WebKit/ChangeLog:10 > + Implemented database telemetry calculating for ITP. Mimicked ResourceLoadStatisticsMemoryStore > + telemetry logging behavior using SQLite Queries as opposed to vector sorting/manipulation. It would be good to explain the motivation for this. Also, it looks like this doesn't have any new tests associated with it. Is this code path already tested? > Source/WebKit/ChangeLog:27 > + * NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp: > + (WebKit::ResourceLoadStatisticsDatabaseStore::ResourceLoadStatisticsDatabaseStore): > + (WebKit::ResourceLoadStatisticsDatabaseStore::prepareStatements): > + (WebKit::ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction const): > + (WebKit::ResourceLoadStatisticsDatabaseStore::getNumberOfPrevalentResources const): > + (WebKit::ResourceLoadStatisticsDatabaseStore::getNumberOfPrevalentResourcesWithUI const): > + (WebKit::ResourceLoadStatisticsDatabaseStore::getTopPrevelentResourceDaysSinceUI const): > + (WebKit::ResourceLoadStatisticsDatabaseStore::getMedianStatisticOfPrevalentResourceWithoutUserInteraction const): > + (WebKit::ResourceLoadStatisticsDatabaseStore::getNumberOfPrevalentResourcesInTopResources const): > + (WebKit::ResourceLoadStatisticsDatabaseStore::calculateTelemetryData const): > + (WebKit::ResourceLoadStatisticsDatabaseStore::calculateAndSubmitTelemetry const): > + * NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h: > + * NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.cpp: > + (WebKit::databaseSubmitTopLists): > + (WebKit::WebResourceLoadStatisticsTelemetry::submitTelemetry): > + * NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.h: Please fill in the per-function/per-file parts of the ChangeLog as well. Please include both what is changing and why the change is being made. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:518 > + return SQLiteStatement(database, makeString("SELECT mostRecentUserInteractionTime FROM ObservedDomains INNER JOIN (SELECT ", pre, " AND hadUserInteraction = 1 ", post, ") as q ON ObservedDomains.domainID = q.domainID LIMIT 1 OFFSET ", String::number(count))); The String::number() shouldn't be needed. You can pass count directly to makeString(). > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:527 > +// 1: subframe > +// 2: subresource > +// 3: unique redirects > +// 4: number of times data records were removed > +// 5: number of times accessed as first party due to user interaction > +// 6: number of times accessed as first party due to storage access api > +static SQLiteStatement makeMedianWithoutUIQuery(SQLiteDatabase& database, int offset, int statistic, const String& pre, const String& post) int statistic should probably be an enum. Then, this comment would not be necessary. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:530 > + String preamble; > + String end; These can be StringViews or const char*. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:534 > + case 1: preamble = "SELECT cnt1 FROM ObservedDomains o INNER JOIN(SELECT cnt1, "; > + end = ") as q ON o.domainID = q.domainID LIMIT 1 OFFSET "; preamble = ... should go on the line after the case statement. I think you should also consider pull out this switch statement into it's own static function that returns a std::pair<StringView, StringView> or std::pair<const char*, const char*>. Then you can call that function and use structured binding to do something like: auto& [preamble, end] = statementPartsForStatistic(statistic); > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:552 > + default: > + LOG_ERROR("ResourceLoadStatisticsDatabaseStore::getMedianStatisticOfPrevalentResourceWithoutUserInteraction only supports statistics 1-6. Undetermined Query behavior will result."); This should be a RELEASE_ASSERT_NOT_REACHED(); Or, if you turn statistic into a enum, you can remove this entirely. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:555 > + return SQLiteStatement(database, makeString(preamble, pre, " AND hadUserInteraction = 0 ", post, end, String::number(offset))); String::number() should not be needed. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:588 > + return (m_countPrevalentResourcesStatement.step() == SQLITE_ROW) ? m_countPrevalentResourcesStatement.getColumnInt(0) : -1; Returning -1 in a function that returns an unsigned seems wrong. I am guessing getColumnInt also returns an int, not an unsigned. This should probably return an int. You should also consider using Optional<> or WTF::notFound (which is just a named constant for -1) instead of -1. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:593 > + return (m_countPrevalentResourcesWithUserInteractionStatement.step() == SQLITE_ROW) ? m_countPrevalentResourcesWithUserInteractionStatement.getColumnInt(0) : -1; Same comment as above. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:596 > +unsigned ResourceLoadStatisticsDatabaseStore::getTopPrevelentResourceDaysSinceUI(const String& pre, const String& post) const pre and post are always the same here since there is only one caller. Do you imagine other callers in the future with different pre and post? If not, probably best to remove the parameters and either inline pre and post, or if they need to be in a shared location, move them into their own static functions. It would also be nice to give them slightly clearer names than pre and post, since while that does tell me where in the statement they go, it doesn't tell me what they do. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:611 > +unsigned ResourceLoadStatisticsDatabaseStore::getMedianStatisticOfPrevalentResourceWithoutUserInteraction(int range, int statistic, int numberOfPrevalentResourcesWithoutUI, const String& pre, const String& post) const Same points about pre/post as above. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:639 > + SQLiteStatement prevalentResourceCountInTop(m_database, makeString("SELECT COUNT(*) FROM (SELECT * FROM ObservedDomains o INNER JOIN(SELECT ", pre, " ", post, ") as q on q.domainID = o.domainID LIMIT ", String::number(range), ") as p WHERE p.hadUserInteraction = 1;")); I think the idiomatic way of doing something like this to create one cached SQLiteStatement and then binding the changing variable using a bind function (e.g. SQLiteStatement::bindInt()). In this case, that would be range. It looks like prevalentResourceCount is unused. Same points about pre/post as above. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:59 > + unsigned numberOfPrevalentResourcesWithUITop1; > + unsigned numberOfPrevalentResourcesWithUITop3; > + unsigned numberOfPrevalentResourcesWithUITop10; > + unsigned numberOfPrevalentResourcesWithUITop50; > + unsigned numberOfPrevalentResourcesWithUITop100; If these were std::tuples, or a new struct (e.g. struct Foo { unsigned UITop1; ...unsigned UITop100; };) I think this would be easier to work with. A bunch of the duplicated code in ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and databaseSubmitTopLists could be turned into loops. > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.h:30 > +#include "ResourceLoadStatisticsDatabaseStore.h" This shouldn't be needed if you forward declare PrevalentResourceDatabaseTelemetry.

Sam Weinig

Comment 7 2019-08-29 14:42:42 PDT

Brady, can you take a look over the SQLLite parts and provide feedback on best practices.

Kate Cheney

Comment 8 2019-08-29 17:18:10 PDT

(In reply to Sam Weinig from comment #6) > Comment on attachment 377623 [details] > Patch > > If these were std::tuples, or a new struct (e.g. struct Foo { unsigned > UITop1; ...unsigned UITop100; };) I think this would be easier to work with. > A bunch of the duplicated code in > ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and > databaseSubmitTopLists could be turned into loops. Thanks for all the feedback! I'm working on changes now. Quick followup: for the std::tuples within the struct, did you envision 5 tuples categorized by top1, top3, top10..., or tuples categorized by subframe, subresource, uniqueRedirects, etc? The cons I see for the topX tuples would be confusion over which index indicates what statistic. The cons for the statistic tuples would be that if there were only, say 4 resources, the top1 and top3 statistics to be logged are split up across all tuples. I wasn't sure which would work best and I wanted your thoughts.

Sam Weinig

Comment 9 2019-08-30 08:30:32 PDT

Created attachment 377709 [details] sketch1.cpp

Sam Weinig

Comment 10 2019-08-30 08:31:21 PDT Comment hidden (obsolete)

Created attachment 377710 [details] sketch2.cpp

Sam Weinig

Comment 11 2019-08-30 08:32:00 PDT Comment hidden (obsolete)

Created attachment 377712 [details] sketch3.cpp

Sam Weinig

Comment 12 2019-08-30 08:47:06 PDT

Created attachment 377714 [details] sketch2.cpp

Sam Weinig

Comment 13 2019-08-30 08:47:46 PDT

Created attachment 377715 [details] sketch3.cpp

Sam Weinig

Comment 14 2019-08-30 08:50:18 PDT

(In reply to Katherine_cheney from comment #8) > (In reply to Sam Weinig from comment #6) > > Comment on attachment 377623 [details] > > Patch > > > > If these were std::tuples, or a new struct (e.g. struct Foo { unsigned > > UITop1; ...unsigned UITop100; };) I think this would be easier to work with. > > A bunch of the duplicated code in > > ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and > > databaseSubmitTopLists could be turned into loops. > > > Thanks for all the feedback! I'm working on changes now. Quick followup: for > the std::tuples within the struct, did you envision 5 tuples categorized by > top1, top3, top10..., or tuples categorized by subframe, subresource, > uniqueRedirects, etc? The cons I see for the topX tuples would be confusion > over which index indicates what statistic. The cons for the statistic tuples > would be that if there were only, say 4 resources, the top1 and top3 > statistics to be logged are split up across all tuples. I wasn't sure which > would work best and I wanted your thoughts. I'm not sure which is the best, but I sketched out a few options and attached them: sketch1.cpp: This first one just switches to using a std::array<unsigned, 5> for each statistic kind allowing us to get rid of much of the repetition, and uses an enum + lookup function bucketSize() to represent the bucketing. sketch2.cpp This second one expands on the first, removing the enum of bucketing sizes, and replacing it with a lookup table (which is just a static std::array) called bucketSizes. This allows us to remove the explicit if-statements and replace them with a loop. The benefit here is that if we wanted to add more bucket sizes in the future, we would only have to change numberOfBucketsPerStatistic and bucketSizes. sketch3.cpp This third one expands on the second, replacing the explicit listing of all the statistics in the PrevalentResourceDatabaseTelemetry, with a an enum and a std::array. So now, all the statistics are stored in an 2d-array. This requires adding lookup functions for the diagnostic description (makeDescription) and the statistic generation (calculateStatistic), but has the benefit of making it a compile error if a new type of Statistic is added to the enum, but not updated everywhere. At this point, it's kind of a complexity vs. length question. sketch2 is the shortest, but makes adding new statistics a little tricky. I'd suggest going with something akin to sketch3, to future proof it as much as possible.

Kate Cheney

Comment 15 2019-08-30 18:32:17 PDT

Created attachment 377777 [details] Patch

Kate Cheney

Comment 16 2019-08-30 18:32:57 PDT

(In reply to Sam Weinig from comment #14) > (In reply to Katherine_cheney from comment #8) > > (In reply to Sam Weinig from comment #6) > > > Comment on attachment 377623 [details] > > > Patch > > > > > > If these were std::tuples, or a new struct (e.g. struct Foo { unsigned > > > UITop1; ...unsigned UITop100; };) I think this would be easier to work with. > > > A bunch of the duplicated code in > > > ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and > > > databaseSubmitTopLists could be turned into loops. > > > > > > Thanks for all the feedback! I'm working on changes now. Quick followup: for > > the std::tuples within the struct, did you envision 5 tuples categorized by > > top1, top3, top10..., or tuples categorized by subframe, subresource, > > uniqueRedirects, etc? The cons I see for the topX tuples would be confusion > > over which index indicates what statistic. The cons for the statistic tuples > > would be that if there were only, say 4 resources, the top1 and top3 > > statistics to be logged are split up across all tuples. I wasn't sure which > > would work best and I wanted your thoughts. > > I'm not sure which is the best, but I sketched out a few options and > attached them: > > sketch1.cpp: > > This first one just switches to using a std::array<unsigned, 5> for each > statistic kind allowing us to get rid of much of the repetition, and uses an > enum + lookup function bucketSize() to represent the bucketing. > > > sketch2.cpp > > This second one expands on the first, removing the enum of bucketing sizes, > and replacing it with a lookup table (which is just a static std::array) > called bucketSizes. This allows us to remove the explicit if-statements and > replace them with a loop. The benefit here is that if we wanted to add more > bucket sizes in the future, we would only have to change > numberOfBucketsPerStatistic and bucketSizes. > > > sketch3.cpp > > > This third one expands on the second, replacing the explicit listing of all > the statistics in the PrevalentResourceDatabaseTelemetry, with a an enum and > a std::array. So now, all the statistics are stored in an 2d-array. This > requires adding lookup functions for the diagnostic description > (makeDescription) and the statistic generation (calculateStatistic), but has > the benefit of making it a compile error if a new type of Statistic is added > to the enum, but not updated everywhere. > > > > At this point, it's kind of a complexity vs. length question. sketch2 is the > shortest, but makes adding new statistics a little tricky. I'd suggest going > with something akin to sketch3, to future proof it as much as possible. I liked option #3 as well, thanks for helping out so much with this. I converted all of the PrevelentResourceDatabaseTelemetry struct values to Optional<unsigned> so that WTF::nullopt will be returned if any SQL queries fail (instead of -1). I thought logging a crash on a Optional<unsigned>.value() call would be more useful than seeing a large unsigned integer value logged as a statistic. I debated using bindInt() for the queries with numbers, but since most of the queries have to be concatenated with other strings at runtime anyways, I thought it was less verbose to concatenate the numbers with the strings vs preparing the statement then binding integers. Does this have security implications? Implementing the test infrastructure is a whole other patch I’ll be working on next. I manually enabled the ITP database and ran tests locally and they all ran as expected. I also clicked around a bunch and manually checked the database telemetry results against the memory store results. Since this only copies exactly what the memory store already does, I think its ok to integrate without full testing until the next patch.

Kate Cheney

Comment 17 2019-08-30 20:08:18 PDT

Created attachment 377780 [details] Patch

Sam Weinig

Comment 18 2019-08-31 15:45:33 PDT

Comment on attachment 377780 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377780&action=review > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:567 > + auto[queryStart, queryEnd] = buildQueryStartAndEnd(statistic); please add a space after auto. (the style checker may yell at you, but it is wrong). > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:578 > + if (medianDaysSinceUIStatement.prepare() == SQLITE_OK) { > + if (medianDaysSinceUIStatement.step() == SQLITE_ROW) { You should be able to combine these by using SQLiteStatement::prepareAndStep(). I think it would probably also read clearer if you returned early with WTF::nullopt, if the sql statement fails. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:586 > + if (prevalentResourcesWithUserInteractionCount & 1) > + return median; It seems like this could return an uninitialized value medianDaysSinceUIStatement.prepare() or medianDaysSinceUIStatement.step() somehow fail. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:637 > + SQLiteStatement Statistic = makeMedianWithoutUIQuery(database, (range / 2), statistic); Statistic should have a lowercase first letter, but since you are already using the variable name statistic, perhaps statisticStatement would work. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:645 > + if (range & 1) > + return median; Same issue with a potentially uninitialized value. Also, this median is an int being returned as an unsigned. > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.cpp:268 > +static String makeDescription(PrevalentResourceDatabaseTelemetry::Statistic statistic) This should return a const char* or an ASCIILiteral.

Sam Weinig

Comment 19 2019-08-31 15:50:21 PDT

(In reply to Katherine_cheney from comment #16) > (In reply to Sam Weinig from comment #14) > > (In reply to Katherine_cheney from comment #8) > > > (In reply to Sam Weinig from comment #6) > > > > Comment on attachment 377623 [details] > > > > Patch > > > > > > > > If these were std::tuples, or a new struct (e.g. struct Foo { unsigned > > > > UITop1; ...unsigned UITop100; };) I think this would be easier to work with. > > > > A bunch of the duplicated code in > > > > ResourceLoadStatisticsDatabaseStore::calculateTelemetryData and > > > > databaseSubmitTopLists could be turned into loops. > > > > > > > > > Thanks for all the feedback! I'm working on changes now. Quick followup: for > > > the std::tuples within the struct, did you envision 5 tuples categorized by > > > top1, top3, top10..., or tuples categorized by subframe, subresource, > > > uniqueRedirects, etc? The cons I see for the topX tuples would be confusion > > > over which index indicates what statistic. The cons for the statistic tuples > > > would be that if there were only, say 4 resources, the top1 and top3 > > > statistics to be logged are split up across all tuples. I wasn't sure which > > > would work best and I wanted your thoughts. > > > > I'm not sure which is the best, but I sketched out a few options and > > attached them: > > > > sketch1.cpp: > > > > This first one just switches to using a std::array<unsigned, 5> for each > > statistic kind allowing us to get rid of much of the repetition, and uses an > > enum + lookup function bucketSize() to represent the bucketing. > > > > > > sketch2.cpp > > > > This second one expands on the first, removing the enum of bucketing sizes, > > and replacing it with a lookup table (which is just a static std::array) > > called bucketSizes. This allows us to remove the explicit if-statements and > > replace them with a loop. The benefit here is that if we wanted to add more > > bucket sizes in the future, we would only have to change > > numberOfBucketsPerStatistic and bucketSizes. > > > > > > sketch3.cpp > > > > > > This third one expands on the second, replacing the explicit listing of all > > the statistics in the PrevalentResourceDatabaseTelemetry, with a an enum and > > a std::array. So now, all the statistics are stored in an 2d-array. This > > requires adding lookup functions for the diagnostic description > > (makeDescription) and the statistic generation (calculateStatistic), but has > > the benefit of making it a compile error if a new type of Statistic is added > > to the enum, but not updated everywhere. > > > > > > > > At this point, it's kind of a complexity vs. length question. sketch2 is the > > shortest, but makes adding new statistics a little tricky. I'd suggest going > > with something akin to sketch3, to future proof it as much as possible. > > > I liked option #3 as well, thanks for helping out so much with this. > > I converted all of the PrevelentResourceDatabaseTelemetry struct values to > Optional<unsigned> so that WTF::nullopt will be returned if any SQL queries > fail (instead of -1). I thought logging a crash on a > Optional<unsigned>.value() call would be more useful than seeing a large > unsigned integer value logged as a statistic. > > I debated using bindInt() for the queries with numbers, but since most of > the queries have to be concatenated with other strings at runtime anyways, I > thought it was less verbose to concatenate the numbers with the strings vs > preparing the statement then binding integers. Does this have security > implications? It's been a long time since I did had to write any sql code, so I'd prefer if someone with more recent knowledge would weigh in. I cc'd Brady Eidson, who wrote a lot of the SQLite in WebCore. While I don't think any of the variables used in concatenation are directly user controllable, SQL Injection attacks (https://en.wikipedia.org/wiki/SQL_injection) always loom large in my head when I see code like this. > > Implementing the test infrastructure is a whole other patch I’ll be working > on next. I manually enabled the ITP database and ran tests locally and they > all ran as expected. I also clicked around a bunch and manually checked the > database telemetry results against the memory store results. Since this only > copies exactly what the memory store already does, I think its ok to > integrate without full testing until the next patch. I think it would probably make sense to hold off on landing this fix until you have some infrastructure for testing in place, unless there is a pressing need to get this landed quickly.

Brent Fulgham

Comment 20 2019-09-03 09:06:49 PDT

(In reply to Sam Weinig from comment #19) > > I think it would probably make sense to hold off on landing this fix until > you have some infrastructure for testing in place, unless there is a > pressing need to get this landed quickly. I think Sam is right, that having testing infrastructure in place would be ideal. However, we are already in a state where the SQL code exists and has to be tested manually. I'm going to have Kate complete this patch, then focus on building out test infrastructure, which was another task I had assigned to her as part of this overall project.

Brady Eidson

Comment 21 2019-09-03 10:12:57 PDT

Comment on attachment 377780 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377780&action=review There's a *lot* of string concat making these statements. I understand the uses are complicated. But I think it both hurts future maintainability and makes it hard to reason about injections. I wonder if there's a better way to construct all of them. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:531 > + return "domainID, " > + "(cnt1 + cnt2 + cnt3) as sum " > + "FROM (" > + "SELECT " > + "domainID, " > + "COUNT(DISTINCT f.topFrameDomainID) as cnt1, " > + "COUNT(DISTINCT r.topFrameDomainID) as cnt2, " > + "COUNT(DISTINCT toDomainID) as cnt3 " > + "FROM " > + "ObservedDomains o " > + "LEFT JOIN SubframeUnderTopFrameDomains f ON o.domainID = f.subFrameDomainID " > + "LEFT JOIN SubresourceUnderTopFrameDomains r ON o.domainID = r.subresourceDomainID " > + "LEFT JOIN SubresourceUniqueRedirectsTo u ON o.domainID = u.subresourceDomainID " > + "WHERE isPrevalent = 1"; I agree this string is long enough and structured enough to warrant a multi-line literal like this. I think all of the quotes get in the way of readability. And with SQL code especially, they actually might hide a bug with the query itself. I would prefer to see a C++ literal to be more readable and avoid the possible quote bug issue. https://en.cppreference.com/w/cpp/language/string_literal > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:659 > + SQLiteStatement prevalentResourceCountInTop(database, makeString("SELECT COUNT(*) FROM (SELECT * FROM ObservedDomains o INNER JOIN(SELECT ", joinSubStatisticsForSortingPreamble(), " ", joinSubStatisticsForSortingEnding(), ") as q on q.domainID = o.domainID LIMIT ", range, ") as p WHERE p.hadUserInteraction = 1;")); Constructing the query from static compiled strings is fine, but `range` should be a bound parameter. I agree with Sam that there's no obvious way an attacker controls it, but as code changes over time that observation can also change.

Kate Cheney

Comment 22 2019-09-03 15:14:22 PDT

Created attachment 377926 [details] Patch

Kate Cheney

Comment 23 2019-09-03 15:16:14 PDT

I talked with Brady and reduced concatenation by a good amount through binding and re-writing the queries. Also, the style bots will be mad because they do not like the multi-line comment.

Kate Cheney

Comment 24 2019-09-03 15:27:25 PDT

(In reply to Katherine_cheney from comment #23) > I talked with Brady and reduced concatenation by a good amount through > binding and re-writing the queries. Also, the style bots will be mad because > they do not like the multi-line comment. multi-line **string**, not comment

Sam Weinig

Comment 25 2019-09-03 15:59:40 PDT

(In reply to Brent Fulgham from comment #20) > (In reply to Sam Weinig from comment #19) > > > > I think it would probably make sense to hold off on landing this fix until > > you have some infrastructure for testing in place, unless there is a > > pressing need to get this landed quickly. > > I think Sam is right, that having testing infrastructure in place would be > ideal. However, we are already in a state where the SQL code exists and has > to be tested manually. I'm going to have Kate complete this patch, then > focus on building out test infrastructure, which was another task I had > assigned to her as part of this overall project. Can the tests land first? Or is there some urgency here?

Brent Fulgham

Comment 26 2019-09-03 16:08:46 PDT

(In reply to Sam Weinig from comment #25) > (In reply to Brent Fulgham from comment #20) > > (In reply to Sam Weinig from comment #19) > > > > > > I think it would probably make sense to hold off on landing this fix until > > > you have some infrastructure for testing in place, unless there is a > > > pressing need to get this landed quickly. > > > > I think Sam is right, that having testing infrastructure in place would be > > ideal. However, we are already in a state where the SQL code exists and has > > to be tested manually. I'm going to have Kate complete this patch, then > > focus on building out test infrastructure, which was another task I had > > assigned to her as part of this overall project. > > Can the tests land first? Or is there some urgency here? I would like to get this landed as-is. As I said in an earlier comment, I have tasked Kate with addressing the full test story in a follow-up.

Sam Weinig

Comment 27 2019-09-03 16:20:20 PDT

(In reply to Brent Fulgham from comment #26) > (In reply to Sam Weinig from comment #25) > > (In reply to Brent Fulgham from comment #20) > > > (In reply to Sam Weinig from comment #19) > > > > > > > > I think it would probably make sense to hold off on landing this fix until > > > > you have some infrastructure for testing in place, unless there is a > > > > pressing need to get this landed quickly. > > > > > > I think Sam is right, that having testing infrastructure in place would be > > > ideal. However, we are already in a state where the SQL code exists and has > > > to be tested manually. I'm going to have Kate complete this patch, then > > > focus on building out test infrastructure, which was another task I had > > > assigned to her as part of this overall project. > > > > Can the tests land first? Or is there some urgency here? > > I would like to get this landed as-is. As I said in an earlier comment, I > have tasked Kate with addressing the full test story in a follow-up. Sorry to a stickler here, but why land this without tests?

Brent Fulgham

Comment 28 2019-09-03 16:26:46 PDT

(In reply to Sam Weinig from comment #27) > > > Can the tests land first? Or is there some urgency here? > > > > I would like to get this landed as-is. As I said in an earlier comment, I > > have tasked Kate with addressing the full test story in a follow-up. > > Sorry to a stickler here, but why land this without tests? I guess you are concerned that this experimental code will be broken by someone else in the time between this landing, and later in the week when Kate builds infrastructure to test these database routines (along with all of the other experimental db code)? The order of operations doesn't seem important for this.

EWS Watchlist

Comment 29 2019-09-03 16:41:26 PDT

Comment on attachment 377926 [details] Patch Attachment 377926 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/12994333 New failing tests: js/dom/customAccessor-defineProperty.html

EWS Watchlist

Comment 30 2019-09-03 16:41:28 PDT

Created attachment 377933 [details] Archive of layout-test-results from ews212 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews212 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit

Kate Cheney

Comment 31 2019-09-03 17:26:05 PDT

I will try to have the testing-infrastructure patch up as soon as possible! Also, I am perplexed by the test failures in win... I will look into it.

Chris Dumez

Comment 32 2019-09-03 18:55:09 PDT

(In reply to Katherine_cheney from comment #31) > I will try to have the testing-infrastructure patch up as soon as possible! > Also, I am perplexed by the test failures in win... I will look into it. The win failure is very likely just a flaky test, I would not worry about it.

Sam Weinig

Comment 33 2019-09-06 08:18:11 PDT

Comment on attachment 377926 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377926&action=review Getting real close. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:516 > +static const String joinSubStatisticsForSorting() This should return either a StringView or const char*. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:566 > + auto[queryStart, queryEnd] = buildQueryStartAndEnd(statistic); space after auto / before the [. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:609 > +static Optional<unsigned> getMedianOfPrevalentResourcesWithUserInteraction(SQLiteDatabase& database, unsigned prevalentResourcesWithUserInteractionCount) > +{ > + SQLiteStatement medianDaysSinceUIStatement = makeMedianWithUIQuery(database); > + unsigned median; > + > + if (medianDaysSinceUIStatement.prepare() == SQLITE_OK) { > + if (medianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK > + || medianDaysSinceUIStatement.bindInt(2, (prevalentResourcesWithUserInteractionCount / 2) != SQLITE_OK)) { > + RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); > + ASSERT_NOT_REACHED(); > + return WTF::nullopt; > + } > + if (medianDaysSinceUIStatement.step() == SQLITE_ROW) { > + double rawSeconds = medianDaysSinceUIStatement.getColumnDouble(0); > + WallTime wallTime = WallTime::fromRawSeconds(rawSeconds); > + median = wallTime <= WallTime() ? 0 : std::floor((WallTime::now() - wallTime) / 24_h); > + } > + } > + > + if (prevalentResourcesWithUserInteractionCount & 1) > + return median; > + > + SQLiteStatement lowerMedianDaysSinceUIStatement = makeMedianWithUIQuery(database); > + if (lowerMedianDaysSinceUIStatement.prepare() == SQLITE_OK) { > + if (lowerMedianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK > + || lowerMedianDaysSinceUIStatement.bindInt(2, ((prevalentResourcesWithUserInteractionCount - 1) / 2)) != SQLITE_OK) { > + RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); > + ASSERT_NOT_REACHED(); > + return WTF::nullopt; > + } > + if (lowerMedianDaysSinceUIStatement.step() == SQLITE_ROW) { > + double rawSecondsLower = lowerMedianDaysSinceUIStatement.getColumnDouble(0); > + WallTime wallTimeLower = WallTime::fromRawSeconds(rawSecondsLower); > + return ((wallTimeLower <= WallTime() ? 0 : std::floor((WallTime::now() - wallTimeLower) / 24_h)) + median) / 2; > + } > + } > + > + return WTF::nullopt; > +} I think this would read cleaner and avoid the uninitialized unsigned would be something like this: static Optional<unsigned> getMedianOfPrevalentResourcesWithUserInteraction(SQLiteDatabase& database, unsigned prevalentResourcesWithUserInteractionCount) { SQLiteStatement medianDaysSinceUIStatement = makeMedianWithUIQuery(database); // Prepare if (medianDaysSinceUIStatement.prepare() != SQLITE_OK) { RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); ASSERT_NOT_REACHED(); return WTF::nullopt; } // Bind if (medianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK || medianDaysSinceUIStatement.bindInt(2, (prevalentResourcesWithUserInteractionCount / 2) != SQLITE_OK)) { RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); ASSERT_NOT_REACHED(); return WTF::nullopt; } // Step if (medianDaysSinceUIStatement.step() != SQLITE_ROW) { RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); ASSERT_NOT_REACHED(); return WTF::nullopt; } double rawSeconds = medianDaysSinceUIStatement.getColumnDouble(0); WallTime wallTime = WallTime::fromRawSeconds(rawSeconds); unsigned median = wallTime <= WallTime() ? 0 : std::floor((WallTime::now() - wallTime) / 24_h); if (prevalentResourcesWithUserInteractionCount & 1) return median; SQLiteStatement lowerMedianDaysSinceUIStatement = makeMedianWithUIQuery(database); // Prepare if (lowerMedianDaysSinceUIStatement.prepare() != SQLITE_OK) { RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); ASSERT_NOT_REACHED(); return WTF::nullopt; } // Bind if (lowerMedianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK || lowerMedianDaysSinceUIStatement.bindInt(2, ((prevalentResourcesWithUserInteractionCount - 1) / 2)) != SQLITE_OK) { RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); ASSERT_NOT_REACHED(); return WTF::nullopt; } // Step if (lowerMedianDaysSinceUIStatement.step() != SQLITE_ROW) { RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); ASSERT_NOT_REACHED(); return WTF::nullopt; } double rawSecondsLower = lowerMedianDaysSinceUIStatement.getColumnDouble(0); WallTime wallTimeLower = WallTime::fromRawSeconds(rawSecondsLower); return ((wallTimeLower <= WallTime() ? 0 : std::floor((WallTime::now() - wallTimeLower) / 24_h)) + median) / 2; } What do you think? You would want to use the same pattern in getTopPrevelentResourceDaysSinceUI and getMedianOfPrevalentResourceWithoutUserInteraction. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:642 > +static Optional<unsigned> getMedianOfPrevalentResourceWithoutUserInteraction(SQLiteDatabase& database, int range, PrevalentResourceDatabaseTelemetry::Statistic statistic, unsigned numberOfPrevalentResourcesWithoutUI) I think range and median below should both be unsigned. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:648 > + SQLiteStatement Statistic = makeMedianWithoutUIQuery(database, statistic); Statistic should start with a lowercase s. > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.cpp:268 > +static String makeDescription(PrevalentResourceDatabaseTelemetry::Statistic statistic) This should return either a StringView or const char*.

Kate Cheney

Comment 34 2019-09-10 14:56:01 PDT

(In reply to Sam Weinig from comment #33) > Comment on attachment 377926 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=377926&action=review > > Getting real close. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:609 > > +static Optional<unsigned> getMedianOfPrevalentResourcesWithUserInteraction(SQLiteDatabase& database, unsigned prevalentResourcesWithUserInteractionCount) > > +{ > > + SQLiteStatement medianDaysSinceUIStatement = makeMedianWithUIQuery(database); > > + unsigned median; > > + > > + if (medianDaysSinceUIStatement.prepare() == SQLITE_OK) { > > + if (medianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK > > + || medianDaysSinceUIStatement.bindInt(2, (prevalentResourcesWithUserInteractionCount / 2) != SQLITE_OK)) { > > + RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); > > + ASSERT_NOT_REACHED(); > > + return WTF::nullopt; > > + } > > + if (medianDaysSinceUIStatement.step() == SQLITE_ROW) { > > + double rawSeconds = medianDaysSinceUIStatement.getColumnDouble(0); > > + WallTime wallTime = WallTime::fromRawSeconds(rawSeconds); > > + median = wallTime <= WallTime() ? 0 : std::floor((WallTime::now() - wallTime) / 24_h); > > + } > > + } > > + > > + if (prevalentResourcesWithUserInteractionCount & 1) > > + return median; > > + > > + SQLiteStatement lowerMedianDaysSinceUIStatement = makeMedianWithUIQuery(database); > > + if (lowerMedianDaysSinceUIStatement.prepare() == SQLITE_OK) { > > + if (lowerMedianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK > > + || lowerMedianDaysSinceUIStatement.bindInt(2, ((prevalentResourcesWithUserInteractionCount - 1) / 2)) != SQLITE_OK) { > > + RELEASE_LOG_ERROR(Network, "ResourceLoadStatisticsDatabaseStore::getMedianOfPrevalentResourcesWithUserInteraction, error message: %{public}s", database.lastErrorMsg()); > > + ASSERT_NOT_REACHED(); > > + return WTF::nullopt; > > + } > > + if (lowerMedianDaysSinceUIStatement.step() == SQLITE_ROW) { > > + double rawSecondsLower = lowerMedianDaysSinceUIStatement.getColumnDouble(0); > > + WallTime wallTimeLower = WallTime::fromRawSeconds(rawSecondsLower); > > + return ((wallTimeLower <= WallTime() ? 0 : std::floor((WallTime::now() - wallTimeLower) / 24_h)) + median) / 2; > > + } > > + } > > + > > + return WTF::nullopt; > > +} > > I think this would read cleaner and avoid the uninitialized unsigned would > be something like this: > > static Optional<unsigned> > getMedianOfPrevalentResourcesWithUserInteraction(SQLiteDatabase& database, > unsigned prevalentResourcesWithUserInteractionCount) > { > SQLiteStatement medianDaysSinceUIStatement = > makeMedianWithUIQuery(database); > > // Prepare > if (medianDaysSinceUIStatement.prepare() != SQLITE_OK) { > RELEASE_LOG_ERROR(Network, > "ResourceLoadStatisticsDatabaseStore:: > getMedianOfPrevalentResourcesWithUserInteraction, error message: > %{public}s", database.lastErrorMsg()); > ASSERT_NOT_REACHED(); > return WTF::nullopt; > } > > // Bind > if (medianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK || > medianDaysSinceUIStatement.bindInt(2, > (prevalentResourcesWithUserInteractionCount / 2) != SQLITE_OK)) { > RELEASE_LOG_ERROR(Network, > "ResourceLoadStatisticsDatabaseStore:: > getMedianOfPrevalentResourcesWithUserInteraction, error message: > %{public}s", database.lastErrorMsg()); > ASSERT_NOT_REACHED(); > return WTF::nullopt; > } > > // Step > if (medianDaysSinceUIStatement.step() != SQLITE_ROW) { > RELEASE_LOG_ERROR(Network, > "ResourceLoadStatisticsDatabaseStore:: > getMedianOfPrevalentResourcesWithUserInteraction, error message: > %{public}s", database.lastErrorMsg()); > ASSERT_NOT_REACHED(); > return WTF::nullopt; > } > > double rawSeconds = medianDaysSinceUIStatement.getColumnDouble(0); > WallTime wallTime = WallTime::fromRawSeconds(rawSeconds); > unsigned median = wallTime <= WallTime() ? 0 : > std::floor((WallTime::now() - wallTime) / 24_h); > > if (prevalentResourcesWithUserInteractionCount & 1) > return median; > > SQLiteStatement lowerMedianDaysSinceUIStatement = > makeMedianWithUIQuery(database); > > // Prepare > if (lowerMedianDaysSinceUIStatement.prepare() != SQLITE_OK) { > RELEASE_LOG_ERROR(Network, > "ResourceLoadStatisticsDatabaseStore:: > getMedianOfPrevalentResourcesWithUserInteraction, error message: > %{public}s", database.lastErrorMsg()); > ASSERT_NOT_REACHED(); > return WTF::nullopt; > } > > // Bind > if (lowerMedianDaysSinceUIStatement.bindInt(1, 1) != SQLITE_OK || > lowerMedianDaysSinceUIStatement.bindInt(2, > ((prevalentResourcesWithUserInteractionCount - 1) / 2)) != SQLITE_OK) { > RELEASE_LOG_ERROR(Network, > "ResourceLoadStatisticsDatabaseStore:: > getMedianOfPrevalentResourcesWithUserInteraction, error message: > %{public}s", database.lastErrorMsg()); > ASSERT_NOT_REACHED(); > return WTF::nullopt; > } > > // Step > if (lowerMedianDaysSinceUIStatement.step() != SQLITE_ROW) { > RELEASE_LOG_ERROR(Network, > "ResourceLoadStatisticsDatabaseStore:: > getMedianOfPrevalentResourcesWithUserInteraction, error message: > %{public}s", database.lastErrorMsg()); > ASSERT_NOT_REACHED(); > return WTF::nullopt; > } > > double rawSecondsLower = > lowerMedianDaysSinceUIStatement.getColumnDouble(0); > WallTime wallTimeLower = WallTime::fromRawSeconds(rawSecondsLower); > return ((wallTimeLower <= WallTime() ? 0 : std::floor((WallTime::now() - > wallTimeLower) / 24_h)) + median) / 2; > } > > What do you think? You would want to use the same pattern in > getTopPrevelentResourceDaysSinceUI and > getMedianOfPrevalentResourceWithoutUserInteraction. > Yes agreed! This looks much cleaner and is easier to understand. Also, I uploaded a patch for ITP Database testing (https://bugs.webkit.org/show_bug.cgi?id=195420).

Kate Cheney

Comment 35 2019-10-03 10:53:51 PDT

Created attachment 380138 [details] Patch

Kate Cheney

Comment 36 2019-10-03 10:54:40 PDT

This patch has testing!! Yay

John Wilander

Comment 37 2019-10-04 15:25:29 PDT

Comment on attachment 380138 [details] Patch LGTM. Thanks Sam and Brandy for all the comments and suggestions here! I'll r+ and maybe you can get Sam or Brady to cq+ it for you?

Kate Cheney

Comment 38 2019-10-04 16:15:24 PDT

(In reply to John Wilander from comment #37) > Comment on attachment 380138 [details] > Patch > > LGTM. Thanks Sam and Brandy for all the comments and suggestions here! I'll > r+ and maybe you can get Sam or Brady to cq+ it for you? Thanks John!

Brent Fulgham

Comment 39 2019-10-08 13:10:27 PDT

Comment on attachment 380138 [details] Patch I think this looks fine. r=me.

WebKit Commit Bot

Comment 40 2019-10-08 13:50:40 PDT

Comment on attachment 380138 [details] Patch Rejecting attachment 380138 [details] from commit-queue. Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-01', 'apply-attachment', '--no-update', '--non-interactive', 380138, '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit Logging in as commit-queue@webkit.org... Fetching: https://bugs.webkit.org/attachment.cgi?id=380138&action=edit Fetching: https://bugs.webkit.org/show_bug.cgi?id=195088&ctype=xml&excludefield=attachmentdata Processing 1 patch from 1 bug. Processing patch 380138 from bug 195088. Fetching: https://bugs.webkit.org/attachment.cgi?id=380138 Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force', '--reviewer', u'John Wilander']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit Parsed 21 diffs from patch file(s). patching file Source/WebKit/ChangeLog Hunk #1 succeeded at 1 with fuzz 3. patching file Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp Hunk #1 FAILED at 176. Hunk #2 succeeded at 218 (offset -2 lines). Hunk #3 succeeded at 257 (offset -2 lines). Hunk #4 FAILED at 369. Hunk #5 succeeded at 623 (offset 9 lines). 2 out of 5 hunks FAILED -- saving rejects to file Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp.rej patching file Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h Hunk #1 succeeded at 47 (offset 1 line). Hunk #2 succeeded at 136 with fuzz 1 (offset 2 lines). Hunk #3 succeeded at 230 (offset -6 lines). patching file Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp Hunk #2 succeeded at 1081 (offset 22 lines). patching file Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h Hunk #1 succeeded at 188 (offset 1 line). patching file Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.cpp patching file Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsTelemetry.h patching file Source/WebKit/NetworkProcess/NetworkSession.cpp patching file Source/WebKit/NetworkProcess/NetworkSession.h patching file Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp Hunk #1 succeeded at 956 (offset 6 lines). patching file Source/WebKit/UIProcess/Network/NetworkProcessProxy.h Hunk #1 succeeded at 232 (offset 1 line). patching file Source/WebKit/UIProcess/Network/NetworkProcessProxy.messages.in patching file Tools/ChangeLog Hunk #1 succeeded at 1 with fuzz 3. patching file Tools/WebKitTestRunner/InjectedBundle/InjectedBundle.cpp patching file Tools/WebKitTestRunner/InjectedBundle/TestRunner.cpp Hunk #1 succeeded at 2008 (offset 31 lines). patching file Tools/WebKitTestRunner/InjectedBundle/TestRunner.h patching file LayoutTests/ChangeLog Hunk #1 succeeded at 1 with fuzz 3. patching file LayoutTests/http/tests/resourceLoadStatistics/telemetry-generation-advanced-functionality-database-expected.txt patching file LayoutTests/http/tests/resourceLoadStatistics/telemetry-generation-advanced-functionality-database.html patching file LayoutTests/http/tests/resourceLoadStatistics/telemetry-generation-basic-functionality-database-expected.txt patching file LayoutTests/http/tests/resourceLoadStatistics/telemetry-generation-basic-functionality-database.html Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force', '--reviewer', u'John Wilander']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit Full output: https://webkit-queues.webkit.org/results/13107476

Brent Fulgham

Comment 41 2019-10-08 13:53:22 PDT

Bummer. Bitrot! Can you upload a new version and I'll cq+ it asap?

Kate Cheney

Comment 42 2019-10-08 14:47:49 PDT

Created attachment 380468 [details] Patch

WebKit Commit Bot

Comment 43 2019-10-08 15:13:46 PDT

Comment on attachment 380468 [details] Patch Clearing flags on attachment: 380468 Committed r250866: <https://trac.webkit.org/changeset/250866>

WebKit Commit Bot

Comment 44 2019-10-08 15:13:48 PDT

All reviewed patches have been landed. Closing bug.

Andy Estes

Comment 45 2020-06-16 17:05:55 PDT

Looks like the R"" literal added in r250866 is causing update-webkit-localizable-strings to fail. I filed <https://webkit.org/b/213273> REGRESSION (r250866): update-webkit-localizable-strings fails with "mismatched quotes" errors.

Attachments
Patch (41.46 KB, patch) 2019-08-29 11:03 PDT, Kate Cheney	no flags	Details Formatted Diff Diff
Patch (39.27 KB, patch) 2019-08-29 13:49 PDT, Kate Cheney	no flags	Details Formatted Diff Diff
sketch1.cpp (7.03 KB, text/plain) 2019-08-30 08:30 PDT, Sam Weinig	no flags	Details
sketch2.cpp (5.84 KB, text/plain) 2019-08-30 08:31 PDT, Sam Weinig	no flags	Details
sketch3.cpp (5.37 KB, text/plain) 2019-08-30 08:32 PDT, Sam Weinig	no flags	Details
sketch2.cpp (5.60 KB, text/plain) 2019-08-30 08:47 PDT, Sam Weinig	no flags	Details
sketch3.cpp (5.16 KB, text/plain) 2019-08-30 08:47 PDT, Sam Weinig	no flags	Details
Patch (28.91 KB, patch) 2019-08-30 18:32 PDT, Kate Cheney	no flags	Details Formatted Diff Diff
Patch (28.91 KB, patch) 2019-08-30 20:08 PDT, Kate Cheney	no flags	Details Formatted Diff Diff
Patch (30.41 KB, patch) 2019-09-03 15:14 PDT, Kate Cheney	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews212 for win-future (13.58 MB, application/zip) 2019-09-03 16:41 PDT, EWS Watchlist	no flags	Details
Patch (86.46 KB, patch) 2019-10-03 10:53 PDT, Kate Cheney	no flags	Details Formatted Diff Diff
Patch (87.28 KB, patch) 2019-10-08 14:47 PDT, Kate Cheney	no flags	Details Formatted Diff Diff
Show Obsolete (8) View All Add attachment proposed patch, testcase, etc.

Bug 195088