Bug 204458

Summary: ITP Database crashes if table schema is not correct
Product: WebKit Reporter: Kate Cheney <katherine_cheney>
Component: WebKit Misc.Assignee: Kate Cheney <katherine_cheney>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, cdumez, commit-queue, jbedard, webkit-bug-importer, wilander
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch none

Description Kate Cheney 2019-11-21 09:50:38 PST
Started seeing this after TopFrameLoadedThirdPartyScripts was added. I had been using the ITP database before the update, so the schema wasn't up to date. Trying to update/check the new table caused crashes.
Comment 1 Radar WebKit Bug Importer 2019-11-21 10:20:39 PST
<rdar://problem/57399084>
Comment 2 Kate Cheney 2019-11-21 10:36:51 PST
Created attachment 384068 [details]
Patch
Comment 3 Brent Fulgham 2019-11-21 11:26:27 PST
Comment on attachment 384068 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384068&action=review

> Source/WebKit/ChangeLog:12
> +        any manual changes to the database file.

How does this relate to the "migration" task (see Bug 195294)

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:305
> +    };

It's a shame we need this array of strings in addition to the schema creation string (createObservedDomain) and the enum of column ID's for each of these. I wonder if there is a way to consolidate this information somehow.

At the very least, I feel like this array of strings should live very close to the table creation string to make it harder to add something in one place and not the other.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:308
> +        SQLiteStatement statement(m_database, "SELECT 1 from sqlite_master WHERE type='table' and tbl_name=?");

Can this be done more efficiently?

For example, could we do something like "SELECT COUNT(*) FROM sqlite_master WHERE type='table' and tbl_name IN (... the list of tables)" or something?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:326
> +        FileSystem::deleteFile(m_storageDirectoryPath);

We should add a FIXME(195294): Perform migration here.
Comment 4 Kate Cheney 2019-11-21 13:11:39 PST
Comment on attachment 384068 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384068&action=review

>> Source/WebKit/ChangeLog:12
>> +        any manual changes to the database file.
> 
> How does this relate to the "migration" task (see Bug 195294)

Maybe they should be one and the same, but I'm wondering if there might be more to do for the entire versioning/upgrade path radar.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:305
>> +    };
> 
> It's a shame we need this array of strings in addition to the schema creation string (createObservedDomain) and the enum of column ID's for each of these. I wonder if there is a way to consolidate this information somehow.
> 
> At the very least, I feel like this array of strings should live very close to the table creation string to make it harder to add something in one place and not the other.

We could store the table names as keys in a map, and the values could be the CREATE TABLE queries. Then both isCorrectTableSchema() and createSchema() could iterate through the map to get the tableNames/create table queries. And if someone added a new table, it would only have to be added once to the map.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:308
>> +        SQLiteStatement statement(m_database, "SELECT 1 from sqlite_master WHERE type='table' and tbl_name=?");
> 
> Can this be done more efficiently?
> 
> For example, could we do something like "SELECT COUNT(*) FROM sqlite_master WHERE type='table' and tbl_name IN (... the list of tables)" or something?

agreed, I think this will work!

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:326
>> +        FileSystem::deleteFile(m_storageDirectoryPath);
> 
> We should add a FIXME(195294): Perform migration here.

By migration, I am assuming you mean migrating existing data over instead of deleting the whole file. How important is that/should it be implemented now instead of in the future? I was basing this off what we did for the change in the schema of ObservedDomains where we deleted the entire file.
Comment 5 Kate Cheney 2019-11-21 13:23:32 PST
(In reply to katherine_cheney from comment #4)
> Comment on attachment 384068 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=384068&action=review
> 
> >> Source/WebKit/ChangeLog:12
> >> +        any manual changes to the database file.
> > 
> > How does this relate to the "migration" task (see Bug 195294)
> 
> Maybe they should be one and the same, but I'm wondering if there might be
> more to do for the entire versioning/upgrade path radar.
> 

I talked with Brady about this a bit. His take was that we should check for schema compatibility as needed when we make upgrades, and not worry about protecting against users changing the schema. Is there anything specific you had in mind for addressing bug 195294?
Comment 6 Kate Cheney 2019-11-21 17:25:28 PST
Created attachment 384108 [details]
Patch
Comment 7 Kate Cheney 2019-11-22 08:02:00 PST
Per what we talked about Brent, I kept the iteration through the table list as opposed to efficient SQLite list lookup in order to future-proof this for when we merge data when the schema is updated. The iteration tells us which table(s) are missing, so we can add the new table(s) and migrate the data instead of deleting the whole file.
Comment 8 Brent Fulgham 2019-11-22 08:51:49 PST
Comment on attachment 384108 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384108&action=review

I think this is close, but not quite correct.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:316
> +            || statement.bindText(1, table) != SQLITE_OK) {

I think this would be more efficient if you created the SQLiteStatement outside the loop, did the prepare outside the loop, and did the 'bindText' inside the loop for each table name.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:321
> +            return false;

I'm not sure we should return if we don't receive a result here. Don't we want to log the missing table name, and then keep going to see what other tables are missing? This would be helpful when trying to debug a problem.

It would also give us the ability to perform fix-up (in the future) if we want to add tables to the Schema.

So maybe have a boolean that you set to 'true' if you find any missing tables, then return that value after completing the loop?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:323
> +        resetStatement(statement);

If you leave the object construction inside the loop, you don't need this reset because the statement is fully reconstructed at each loop iteration.

However, I think you should move construction outside the loop (as I suggest above).
Comment 9 Kate Cheney 2019-11-22 09:24:17 PST
Created attachment 384159 [details]
Patch
Comment 10 Brent Fulgham 2019-11-22 09:26:46 PST
Comment on attachment 384159 [details]
Patch

Looks great! r=me
Comment 11 Kate Cheney 2019-11-22 09:27:20 PST
thanks for feedback!
Comment 12 WebKit Commit Bot 2019-11-22 10:41:50 PST
Comment on attachment 384159 [details]
Patch

Rejecting attachment 384159 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-03', 'land-attachment', '--force-clean', '--non-interactive', '--parent-command=commit-queue', 384159, '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit

Logging in as commit-queue@webkit.org...
Fetching: https://bugs.webkit.org/attachment.cgi?id=384159&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=204458&ctype=xml&excludefield=attachmentdata
Processing 1 patch from 1 bug.
Updating working directory
Processing patch 384159 from bug 204458.
Fetching: https://bugs.webkit.org/attachment.cgi?id=384159
Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	Source/WebKit/ChangeLog

ERROR from SVN:
Item is out of date: File '/trunk/Source/WebKit/ChangeLog' is out of date
W: 4d4568ee3653af4484cc8f1056bc35ba3fa3baa6 and refs/remotes/origin/master differ, using rebase:
:040000 040000 9bd957a0828fca3030d43ba6ed811003e8a814b3 5cdd1523b9652842641aa3fd0a384ef01e2728e9 M	Source
Current branch master is up to date.
ERROR: Not all changes have been committed into SVN, however the committed
ones (if any) seem to be successfully integrated into the working tree.
Please see the above messages for details.


Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	Source/WebKit/ChangeLog

ERROR from SVN:
Item is out of date: File '/trunk/Source/WebKit/ChangeLog' is out of date
W: 4d4568ee3653af4484cc8f1056bc35ba3fa3baa6 and refs/remotes/origin/master differ, using rebase:
:040000 040000 9bd957a0828fca3030d43ba6ed811003e8a814b3 5cdd1523b9652842641aa3fd0a384ef01e2728e9 M	Source
Current branch master is up to date.
ERROR: Not all changes have been committed into SVN, however the committed
ones (if any) seem to be successfully integrated into the working tree.
Please see the above messages for details.


Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit
Updating OpenSource
Current branch master is up to date.

Full output: https://webkit-queues.webkit.org/results/13273426
Comment 13 Kate Cheney 2019-11-22 14:13:51 PST
Created attachment 384194 [details]
Patch
Comment 14 Kate Cheney 2019-11-22 14:15:20 PST
did a rebase and uploaded again, hopefully won't get rejected this time
Comment 15 WebKit Commit Bot 2019-11-22 15:50:32 PST
Comment on attachment 384194 [details]
Patch

Rejecting attachment 384194 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-03', 'land-attachment', '--force-clean', '--non-interactive', '--parent-command=commit-queue', 384194, '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit

Logging in as commit-queue@webkit.org...
Fetching: https://bugs.webkit.org/attachment.cgi?id=384194&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=204458&ctype=xml&excludefield=attachmentdata
Processing 1 patch from 1 bug.
Updating working directory
Processing patch 384194 from bug 204458.
Fetching: https://bugs.webkit.org/attachment.cgi?id=384194
Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	Source/WebKit/ChangeLog

ERROR from SVN:
Item is out of date: File '/trunk/Source/WebKit/ChangeLog' is out of date
W: e2beed0d693954c84229efc494ae93b83b9f76e2 and refs/remotes/origin/master differ, using rebase:
:040000 040000 c1ecb8dd1ba531471983bf49dd21d7046e9032e9 7e1e0dbc7e9860f75ee0932dacd3880b3869e235 M	Source
Current branch master is up to date.
ERROR: Not all changes have been committed into SVN, however the committed
ones (if any) seem to be successfully integrated into the working tree.
Please see the above messages for details.


Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	Source/WebKit/ChangeLog

ERROR from SVN:
Item is out of date: File '/trunk/Source/WebKit/ChangeLog' is out of date
W: e2beed0d693954c84229efc494ae93b83b9f76e2 and refs/remotes/origin/master differ, using rebase:
:040000 040000 c1ecb8dd1ba531471983bf49dd21d7046e9032e9 7e1e0dbc7e9860f75ee0932dacd3880b3869e235 M	Source
Current branch master is up to date.
ERROR: Not all changes have been committed into SVN, however the committed
ones (if any) seem to be successfully integrated into the working tree.
Please see the above messages for details.


Failed to run "['git', 'svn', 'dcommit', '--rmdir']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit
Updating OpenSource
From https://git.webkit.org/git/WebKit
   d38e64630e4..30cbb8ab8be  master     -> origin/master
Partial-rebuilding .git/svn/refs/remotes/origin/master/.rev_map.268f45cc-cd09-0410-ab3c-d52691b4dbfc ...
Currently at 252807 = d38e64630e4fe9923769b6bf2662775f5a6ddc8f
r252808 = 283deeb7fc56f6430f60dc3a067cbb44ec92901a
r252809 = 0df8c4888359f35f63e5c215004d8f15247be6a4
r252810 = 30cbb8ab8be602f24ef56dd047a591165e0797e0
Done rebuilding .git/svn/refs/remotes/origin/master/.rev_map.268f45cc-cd09-0410-ab3c-d52691b4dbfc
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/origin/master.

Full output: https://webkit-queues.webkit.org/results/13273966
Comment 16 WebKit Commit Bot 2019-11-22 16:42:35 PST
Comment on attachment 384194 [details]
Patch

Clearing flags on attachment: 384194

Committed r252817: <https://trac.webkit.org/changeset/252817>
Comment 17 WebKit Commit Bot 2019-11-22 16:42:37 PST
All reviewed patches have been landed.  Closing bug.