NEW 239857
ITP data not deleted when cleared via WebsiteDataStore
https://bugs.webkit.org/show_bug.cgi?id=239857
Summary ITP data not deleted when cleared via WebsiteDataStore
bxl75521@anonaddy.me
Reported 2022-04-28 09:38:00 PDT
Created attachment 458530 [details] Bug happening STEPS TO REPRODUCE 1. Visit different websites 2. Go to Settings > Privacy > Clear Personal Data and clear everything 3. Reopen the Clear Personal Data window. SEEN BEHAVIOR All of the entries shown prior to the deletion come back after reopening the window. EXPECTED BEHAVIOR Data being cleared/no entries in the Clear Personal Data window. SOFTWARE VERSION GNOME 41.5 Wayland Fedora 35 Workstation GNOME Web 41.3 ADDITIONAL INFORMATION The only way I found to fix this is clearing the epiphany folder in ~/.local/share and restarting the browser.
Attachments
Bug happening (1.24 MB, video/webm)
2022-04-28 09:38 PDT, bxl75521@anonaddy.me
no flags
John Wilander
Comment 1 2022-04-28 09:58:09 PDT
Thanks for filing! In Apple's view, ITP data is not *website* data. It's web engine data about sites in order to apply privacy protections as the user agent. This makes it more akin to browser history than website data.
bxl75521@anonaddy.me
Comment 2 2022-04-28 10:05:31 PDT
> Thanks for filing! o/ > It's web engine data about sites in order to apply privacy protections as the user agent. This makes it more akin to browser history than website data. Weird. I reported the bug as "Clearing personal data still leaves residual stuff", but the title was changed.
Michael Catanzaro
Comment 3 2022-04-28 11:50:38 PDT
The problem is webkit_website_data_manager_clear() and/or webkit_website_data_manager_remove() is broken for WEBKIT_WEBSITE_DATA_ITP. This is public API.
Radar WebKit Bug Importer
Comment 4 2022-05-05 09:38:15 PDT
Michael Catanzaro
Comment 5 2022-05-13 16:02:49 PDT
(In reply to Michael Catanzaro from comment #3) > The problem is webkit_website_data_manager_clear() and/or > webkit_website_data_manager_remove() is broken for WEBKIT_WEBSITE_DATA_ITP. > This is public API. Oddly, we actually have a test for this, /webkit/WebKitWebsiteData/itp in TestWebsiteData.cpp, and it is passing. The test does check to ensure that webkit_website_data_manager_fetch() returns no results after the deletion. Further investigation required.
bxl75521@anonaddy.me
Comment 6 2022-05-13 22:14:23 PDT
(In reply to Michael Catanzaro from comment #5) > (In reply to Michael Catanzaro from comment #3) > > The problem is webkit_website_data_manager_clear() and/or > > webkit_website_data_manager_remove() is broken for WEBKIT_WEBSITE_DATA_ITP. > > This is public API. > > Oddly, we actually have a test for this, /webkit/WebKitWebsiteData/itp in > TestWebsiteData.cpp, and it is passing. The test does check to ensure that > webkit_website_data_manager_fetch() returns no results after the deletion. > Further investigation required. Weird! I still see the URLs that I visited after clearing cache. Even after restarting the app. The "speed dial" of Web isn't cleared also.
Michael Catanzaro
Comment 7 2022-05-14 11:08:57 PDT
(In reply to bxl75521@anonaddy.me from comment #6) > Weird! I still see the URLs that I visited after clearing cache. Even after > restarting the app. What exactly are you talking about here? Are you still talking about ITP data? I can see that clearing disk cache data works correctly for me. Clearing the ITP data does not work: it just returns when I reopen the preferences dialog. I don't know why the test passes when the dialog does not work in practice. > The "speed dial" of Web isn't cleared also. That's controlled by Epiphany browser history. It's not a WebKit feature and it has nothing to do with your website data or with ITP.
bxl75521@anonaddy.me
Comment 8 2022-05-15 12:34:33 PDT
(In reply to Michael Catanzaro from comment #7) > (In reply to bxl75521@anonaddy.me from comment #6) > > Weird! I still see the URLs that I visited after clearing cache. Even after > > restarting the app. > > What exactly are you talking about here? Are you still talking about ITP > data? I can see that clearing disk cache data works correctly for me. > Clearing the ITP data does not work: it just returns when I reopen the > preferences dialog. I don't know why the test passes when the dialog does > not work in practice. > > > The "speed dial" of Web isn't cleared also. > > That's controlled by Epiphany browser history. It's not a WebKit feature and > it has nothing to do with your website data or with ITP. I posted this issue in the Epiphany repo, but Catanzaro told me to post the issue here. The title of this bug was changed, it wasn't be who wrote "ITP data not deleted when cleared via WebsiteDataStore".
bxl75521@anonaddy.me
Comment 9 2022-05-15 12:34:57 PDT
However, the title of this bug was changed*
Michael Catanzaro
Comment 10 2022-05-15 12:44:07 PDT
I changed the title of this bug to match your initial complaint. You were complaining about a broken user interface feature in Epiphany, and I changed the title to indicate the public API that is broken. But now in comment #6 you seem to be additionally complaining about additional stuff unrelated to the original complaint about ITP, leaving me confused. Epiphany's history feature has nothing to do with clearing ITP data, disk cache, or any other WebKit website data.
bxl75521@anonaddy.me
Comment 11 2022-05-15 14:03:15 PDT
The issue is that history isn't being cleared. However, I'm not sure if this is specific to some users. Does the issue happen for you?
Michael Catanzaro
Comment 12 2022-05-15 16:27:27 PDT
If you have an issue with Epiphany history, please go back to Epiphany's issue tracker. WebKit does not manage your browser history. I can see from the video you attached on the Epiphany issue tracker that you uncovered a bug with WebKit's Intelligent Tracking Prevention feature not properly removing ITP data when the webkit_website_data_manager_clear() function is called with WebKitWebsiteDataType WEBKIT_WEBSITE_DATA_ITP. That's what we'll track here in this issue.
Michael Catanzaro
Comment 13 2022-05-15 19:10:41 PDT
If I do remove operations (remove ITP data for a particular domain, implemented in ResourceLoadStatisticsDatabaseStore::removeDataForDomain), the data does appear to get deleted properly. Only the clear operations (remove all ITP data for all domains, implemented in ResourceLoadStatisticsDatabaseStore::clear) seem to fail. But: if I do a remove operation followed by a clear operation, then it seems the clear operation somehow actually restores the ITP data that was previously removed! That's wild. Taking a very quick look at ResourceLoadStatisticsDatabaseStore::clear, it winds up calling SQLiteDatabase::clearAllTables, which seems unlikely to fail. Perhaps the database is somehow getting repopulated using cached data? Or maybe the fetch operations are returning data that is no longer in the database? Requires further investigation. Jon, since this is your area of expertise, I wonder if you can reproduce any of this behavior?
John Wilander
Comment 14 2022-05-15 21:22:03 PDT
Cc’ing Kate who knows the DB code well.
Kate Cheney
Comment 15 2022-05-16 10:31:30 PDT
I can't repro on macOS with Safari. Preferences > Privacy > Manage Website Data > Remove All seems to work as expected (the ITP database is emptied out properly). If a page is open and dynamically loading new 3rd parties, it is possible that something you deleted previously would appear immediately again in the database because a new resource is loaded. However this would not explain a large scale occurrence of deleted website data re-appearing in the database. I'm curious to know if this bug is caused because: (1) We explicitly add domains back to the ITP database after clearing (2) The clearing fails and the domains never were deleted (3) We add the domains back in-memory somewhere when returning the ITP data, but they aren't re-inserted in the database. Knowing more about how you're confirming that the data re-appears would probably be helpful. We can at least eliminate (3) if re-starting the session still shows the previously-cleared data appearing. (1) and (2) will be harder to test without some logging or deeper inspection of the database file itself.
Michael Catanzaro
Comment 16 2022-05-16 11:06:36 PDT
(In reply to Kate Cheney from comment #15) > Knowing more about how you're confirming that the data re-appears would > probably be helpful. We can at least eliminate (3) if re-starting the > session still shows the previously-cleared data appearing. This dialog displays the result of webkit_website_data_manager_fetch(). When describing the behavior above, I didn't even restart anything. Epiphany would have just done a new fetch() operation on the existing WebKitWebsiteDataManager (the WPE/GTK API wrapper around WebsiteDataStore). > (1) and (2) will be harder to test without some logging or deeper inspection > of the database file itself. I'll try to look a little deeper....
Kate Cheney
Comment 17 2022-05-16 11:36:45 PDT
(In reply to Michael Catanzaro from comment #16) > (In reply to Kate Cheney from comment #15) > > Knowing more about how you're confirming that the data re-appears would > > probably be helpful. We can at least eliminate (3) if re-starting the > > session still shows the previously-cleared data appearing. > > This dialog displays the result of webkit_website_data_manager_fetch(). > > When describing the behavior above, I didn't even restart anything. Epiphany > would have just done a new fetch() operation on the existing > WebKitWebsiteDataManager (the WPE/GTK API wrapper around WebsiteDataStore). > My thought here was that any in-memory storage would be cleared after a session restart. If we restart and still see the cleared data appearing, then we know that domains are in the ITP database (either never were cleared or were re-inserted) and not just hanging around in memory somewhere. Might give us some additional info. > > (1) and (2) will be harder to test without some logging or deeper inspection > > of the database file itself. > > I'll try to look a little deeper.... Looking for calls to ResourceLoadStatisticsDatabaseStore::insertObservedDomain might be useful here. That's how we enter new domains into the database. If we see that being called for domains previously deleted, we know that ITP is re-inserting some deleted domains.
Michael Catanzaro
Comment 18 2022-05-16 14:28:11 PDT
(In reply to Kate Cheney from comment #17) > My thought here was that any in-memory storage would be cleared after a > session restart. If we restart and still see the cleared data appearing, > then we know that domains are in the ITP database (either never were cleared > or were re-inserted) and not just hanging around in memory somewhere. Might > give us some additional info. It persists across restarts. > Looking for calls to > ResourceLoadStatisticsDatabaseStore::insertObservedDomain might be useful > here. That's how we enter new domains into the database. If we see that > being called for domains previously deleted, we know that ITP is > re-inserting some deleted domains. Good guess. I see lots of calls to this ResourceLoadStatisticsDatabaseStore::insertObservedDomain when doing a clear (webkit_website_data_manager_clear()/ResourceLoadStatisticsDatabaseStore::clear) operation. If I just do individual removes (webkit_website_data_manager_remove()/ResourceLoadStatisticsDatabaseStore::removeDataForDomain), and then restart without doing a clear, then they do stay permanently removed. So we're dealing with theory (1) We explicitly add domains back to the ITP database after clearing (but not after removeDataForDomain). Next step: I'll try to figure out why ResourceLoadStatisticsDatabaseStore::insertObservedDomain is being called.
Michael Catanzaro
Comment 19 2022-05-27 12:13:05 PDT
(In reply to Michael Catanzaro from comment #18) > Next step: I'll try to figure out why > ResourceLoadStatisticsDatabaseStore::insertObservedDomain is being called. This is unexpectedly difficult. The problem is if I try to attach to the network process with gdb, it takes so long to attach that WebKit decides the network process is lost and spawns a new one.
Chris Dumez
Comment 20 2022-05-27 12:19:51 PDT
(In reply to Michael Catanzaro from comment #19) > (In reply to Michael Catanzaro from comment #18) > > Next step: I'll try to figure out why > > ResourceLoadStatisticsDatabaseStore::insertObservedDomain is being called. > > This is unexpectedly difficult. The problem is if I try to attach to the > network process with gdb, it takes so long to attach that WebKit decides the > network process is lost and spawns a new one. You may need to update `AuxiliaryProcessProxy::mayBecomeUnresponsive()` to return false and then again. AuxiliaryProcessProxy::mayBecomeUnresponsive() may not be properly detecting that the process is being debugged.
Michael Catanzaro
Comment 21 2022-05-27 14:47:44 PDT
Very interesting. I'm not quite sure how to detect whether another process is being debugged on Linux, though. Attaching with ptrace to see whether that fails would be a pretty heavy hammer, and even that won't work nowadays because it requires root. (Detecting whether the *current* process is being debugged is easy, but that's not what you're doing in AuxiliaryProcessProxy::platformIsBeingDebugged.) Anyway, I'll just sabotage that locally for now... thanks for the tip.
Michael Catanzaro
Comment 22 2022-05-27 15:29:08 PDT
(In reply to Michael Catanzaro from comment #21) > Anyway, I'll just sabotage > that locally for now... thanks for the tip. Sabotaging AuxiliaryProcessProxy::mayBecomeUnresponsive to always return true is not enough. Evidently the auxiliary process gets killed from somewhere else. Maybe I can figure this out using WTFLogAlways instead.
Kate Cheney
Comment 23 2022-05-27 15:30:59 PDT
(In reply to Michael Catanzaro from comment #22) > (In reply to Michael Catanzaro from comment #21) > > Anyway, I'll just sabotage > > that locally for now... thanks for the tip. > > Sabotaging AuxiliaryProcessProxy::mayBecomeUnresponsive to always return > true is not enough. Evidently the auxiliary process gets killed from > somewhere else. Maybe I can figure this out using WTFLogAlways instead. Seems like a good place for WTFReportBacktrace() to figure out where insertObservedDomain is being initiated from.
Michael Catanzaro
Comment 24 2022-05-27 16:16:02 PDT
(In reply to Kate Cheney from comment #23) > Seems like a good place for WTFReportBacktrace() to figure out where > insertObservedDomain is being initiated from. That doesn't work anymore since bug #181916, but perhaps I should just sabotage that too. Another option: add a counter variable to crash after the function is called n times, or better, only after ResourceLoadStatisticsDatabaseStore::insertObservedDomain is called first. Will try these next week.
Michael Catanzaro
Comment 25 2022-06-03 07:19:23 PDT
I wound up changing ResourceLoadStatisticsDatabaseStore::insertObservedDomain to crash if called after ResourceLoadStatisticsDatabaseStore::clear has been called. The backtrace looks like this: #0 0x00007f177966d88e in WTFCrash () at /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/Assertions.cpp:322 #1 0x00007f177b2d2ee5 in WTFCrashWithInfo () at /home/mcatanzaro/Projects/WebKit/WebKitBuild/GNOME-gtk3/WTF/Headers/wtf/Assertions.h:748 #2 WebKit::ResourceLoadStatisticsDatabaseStore::insertObservedDomain (this=this@entry=0x7f17312f8000, loadStatistics=...) at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:705 #3 0x00007f177b2d9b42 in WebKit::ResourceLoadStatisticsDatabaseStore::ensureResourceStatisticsForRegistrableDomain ( this=0x7f17312f8000, domain=...) at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1978 #4 0x00007f177b2dfe7a in WebKit::ResourceLoadStatisticsDatabaseStore::grandfatherDataForDomains ( this=this@entry=0x7f17312f8000, domains=...) at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1368 #5 0x00007f177b2e9228 in WebKit::ResourceLoadStatisticsDatabaseStore::grandfatherDataForDomains (domains=..., this=0x7f17312f8000) at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.cpp:240 #6 operator() (__closure=0x7f17180e8cc8) at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.cpp:244 #7 WTF::Detail::CallableWrapper<WebKit::ResourceLoadStatisticsStore::grandfatherExistingWebsiteData(WTF::CompletionHandler<void()>&&)::<lambda()> mutable::<lambda(WTF::HashSet<WebCore::RegistrableDomain>&&)> mutable::<lambda()>, void>::call(void) (this=0x7f17180e8cc0) at /home/mcatanzaro/Projects/WebKit/WebKitBuild/GNOME-gtk3/WTF/Headers/wtf/Function.h:53 #8 0x00007f177969430e in WTF::Function<void ()>::operator()() const (this=<synthetic pointer>) at /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/Function.h:79 #9 WTF::RunLoop::performWork (this=0x7f1772f99000) at /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/RunLoop.cpp:133 It's coming from the lambdas in WebResourceLoadStatisticsStore::scheduleClearInMemoryAndPersistent. I don't know how exactly grandfathering is supposed to work, but it seems to be grandfathering the data that is supposed to be cleared.
Kate Cheney
Comment 26 2022-06-03 09:59:09 PDT
(In reply to Michael Catanzaro from comment #25) > I wound up changing > ResourceLoadStatisticsDatabaseStore::insertObservedDomain to crash if called > after ResourceLoadStatisticsDatabaseStore::clear has been called. The > backtrace looks like this: > Nice! > #0 0x00007f177966d88e in WTFCrash () at > /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/Assertions.cpp:322 > #1 0x00007f177b2d2ee5 in WTFCrashWithInfo () > at > /home/mcatanzaro/Projects/WebKit/WebKitBuild/GNOME-gtk3/WTF/Headers/wtf/ > Assertions.h:748 > #2 WebKit::ResourceLoadStatisticsDatabaseStore::insertObservedDomain > (this=this@entry=0x7f17312f8000, > loadStatistics=...) > at > /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ > ResourceLoadStatisticsDatabaseStore.cpp:705 > #3 0x00007f177b2d9b42 in > WebKit::ResourceLoadStatisticsDatabaseStore:: > ensureResourceStatisticsForRegistrableDomain ( > this=0x7f17312f8000, domain=...) > at > /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ > ResourceLoadStatisticsDatabaseStore.cpp:1978 > #4 0x00007f177b2dfe7a in > WebKit::ResourceLoadStatisticsDatabaseStore::grandfatherDataForDomains ( > this=this@entry=0x7f17312f8000, domains=...) > at > /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ > ResourceLoadStatisticsDatabaseStore.cpp:1368 > #5 0x00007f177b2e9228 in > WebKit::ResourceLoadStatisticsDatabaseStore::grandfatherDataForDomains > (domains=..., > this=0x7f17312f8000) > at > /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ > ResourceLoadStatisticsStore.cpp:240 > #6 operator() (__closure=0x7f17180e8cc8) > at > /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ > ResourceLoadStatisticsStore.cpp:244 > #7 > WTF::Detail::CallableWrapper<WebKit::ResourceLoadStatisticsStore:: > grandfatherExistingWebsiteData(WTF::CompletionHandler<void()>&&)::<lambda()> > mutable::<lambda(WTF::HashSet<WebCore::RegistrableDomain>&&)> > mutable::<lambda()>, void>::call(void) (this=0x7f17180e8cc0) > at > /home/mcatanzaro/Projects/WebKit/WebKitBuild/GNOME-gtk3/WTF/Headers/wtf/ > Function.h:53 > #8 0x00007f177969430e in WTF::Function<void ()>::operator()() const > (this=<synthetic pointer>) > at /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/Function.h:79 > #9 WTF::RunLoop::performWork (this=0x7f1772f99000) at > /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/RunLoop.cpp:133 > > It's coming from the lambdas in > WebResourceLoadStatisticsStore::scheduleClearInMemoryAndPersistent. I don't > know how exactly grandfathering is supposed to work, but it seems to be > grandfathering the data that is supposed to be cleared. It looks like we grandfather data in NetworkProcess::deleteWebsiteData if the specified website data types to delete do not contain all of ITP's monitored data types. I'm guessing that WEBKIT_WEBSITE_DATA_ITP is not translating to all the data types tracked in WebResourceLoadStatistics::monitoredDataTypes(). If there are things in that list that are port/platform specific we could add some flags, or adjust the types in webkit_website_data_manager_remove() to contain all ITP monitored types.
Michael Catanzaro
Comment 27 2022-06-03 13:33:36 PDT
(In reply to Kate Cheney from comment #26) > It looks like we grandfather data in NetworkProcess::deleteWebsiteData if > the specified website data types to delete do not contain all of ITP's > monitored data types. I'm guessing that WEBKIT_WEBSITE_DATA_ITP is not > translating to all the data types tracked in > WebResourceLoadStatistics::monitoredDataTypes(). WEBKIT_WEBSITE_DATA_ITP is not a list of data types to delete. It's just WebsiteDataType::ResourceLoadStatistics. > If there are things in that > list that are port/platform specific we could add some flags, or adjust the > types in webkit_website_data_manager_remove() to contain all ITP monitored > types. I don't think anything is platform-specific here. WebsiteDataType is a flags enum that specifies which website data types to delete. WebsiteDataStore allows you to delete all types at once, or only one type of data, or any combination of data types. In this case, we're deleting only the WebsiteDataType::ResourceLoadStatistics. WebsiteDataStore's WebsiteDataTypes are expected to be independent.
Note You need to log in before you can comment on or make changes to this bug.