Bug 239857 - ITP data not deleted when cleared via WebsiteDataStore
Summary: ITP data not deleted when cleared via WebsiteDataStore
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: Other
Hardware: Unspecified Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2022-04-28 09:38 PDT by bxl75521@anonaddy.me
Modified: 2022-06-03 13:33 PDT (History)
7 users (show)

See Also:


Attachments
Bug happening (1.24 MB, video/webm)
2022-04-28 09:38 PDT, bxl75521@anonaddy.me
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description bxl75521@anonaddy.me 2022-04-28 09:38:00 PDT
Created attachment 458530 [details]
Bug happening

STEPS TO REPRODUCE
1. Visit different websites
2. Go to Settings > Privacy > Clear Personal Data and clear everything
3. Reopen the Clear Personal Data window.

SEEN BEHAVIOR
All of the entries shown prior to the deletion come back after reopening the window.

EXPECTED BEHAVIOR
Data being cleared/no entries in the Clear Personal Data window.

SOFTWARE VERSION
GNOME 41.5 Wayland
Fedora 35 Workstation
GNOME Web 41.3

ADDITIONAL INFORMATION
The only way I found to fix this is clearing the epiphany folder in ~/.local/share and restarting the browser.
Comment 1 John Wilander 2022-04-28 09:58:09 PDT
Thanks for filing! In Apple's view, ITP data is not *website* data. It's web engine data about sites in order to apply privacy protections as the user agent. This makes it more akin to browser history than website data.
Comment 2 bxl75521@anonaddy.me 2022-04-28 10:05:31 PDT
> Thanks for filing!

o/

> It's web engine data about sites in order to apply privacy protections as the user agent. This makes it more akin to browser history than website data.

Weird. I reported the bug as "Clearing personal data still leaves residual stuff", but the title was changed.
Comment 3 Michael Catanzaro 2022-04-28 11:50:38 PDT
The problem is webkit_website_data_manager_clear() and/or webkit_website_data_manager_remove() is broken for WEBKIT_WEBSITE_DATA_ITP. This is public API.
Comment 4 Radar WebKit Bug Importer 2022-05-05 09:38:15 PDT
<rdar://problem/92801888>
Comment 5 Michael Catanzaro 2022-05-13 16:02:49 PDT
(In reply to Michael Catanzaro from comment #3)
> The problem is webkit_website_data_manager_clear() and/or
> webkit_website_data_manager_remove() is broken for WEBKIT_WEBSITE_DATA_ITP.
> This is public API.

Oddly, we actually have a test for this, /webkit/WebKitWebsiteData/itp in TestWebsiteData.cpp, and it is passing. The test does check to ensure that webkit_website_data_manager_fetch() returns no results after the deletion. Further investigation required.
Comment 6 bxl75521@anonaddy.me 2022-05-13 22:14:23 PDT
(In reply to Michael Catanzaro from comment #5)
> (In reply to Michael Catanzaro from comment #3)
> > The problem is webkit_website_data_manager_clear() and/or
> > webkit_website_data_manager_remove() is broken for WEBKIT_WEBSITE_DATA_ITP.
> > This is public API.
> 
> Oddly, we actually have a test for this, /webkit/WebKitWebsiteData/itp in
> TestWebsiteData.cpp, and it is passing. The test does check to ensure that
> webkit_website_data_manager_fetch() returns no results after the deletion.
> Further investigation required.

Weird! I still see the URLs that I visited after clearing cache. Even after restarting the app. The "speed dial" of Web isn't cleared also.
Comment 7 Michael Catanzaro 2022-05-14 11:08:57 PDT
(In reply to bxl75521@anonaddy.me from comment #6)
> Weird! I still see the URLs that I visited after clearing cache. Even after
> restarting the app.

What exactly are you talking about here? Are you still talking about ITP data? I can see that clearing disk cache data works correctly for me. Clearing the ITP data does not work: it just returns when I reopen the preferences dialog. I don't know why the test passes when the dialog does not work in practice.

> The "speed dial" of Web isn't cleared also.

That's controlled by Epiphany browser history. It's not a WebKit feature and it has nothing to do with your website data or with ITP.
Comment 8 bxl75521@anonaddy.me 2022-05-15 12:34:33 PDT
(In reply to Michael Catanzaro from comment #7)
> (In reply to bxl75521@anonaddy.me from comment #6)
> > Weird! I still see the URLs that I visited after clearing cache. Even after
> > restarting the app.
> 
> What exactly are you talking about here? Are you still talking about ITP
> data? I can see that clearing disk cache data works correctly for me.
> Clearing the ITP data does not work: it just returns when I reopen the
> preferences dialog. I don't know why the test passes when the dialog does
> not work in practice.
> 
> > The "speed dial" of Web isn't cleared also.
> 
> That's controlled by Epiphany browser history. It's not a WebKit feature and
> it has nothing to do with your website data or with ITP.

I posted this issue in the Epiphany repo, but Catanzaro told me to post the issue here. The title of this bug was changed, it wasn't be who wrote "ITP data not deleted when cleared via WebsiteDataStore".
Comment 9 bxl75521@anonaddy.me 2022-05-15 12:34:57 PDT
However, the title of this bug was changed*
Comment 10 Michael Catanzaro 2022-05-15 12:44:07 PDT
I changed the title of this bug to match your initial complaint. You were complaining about a broken user interface feature in Epiphany, and I changed the title to indicate the public API that is broken.

But now in comment #6 you seem to be additionally complaining about additional stuff unrelated to the original complaint about ITP, leaving me confused. Epiphany's history feature has nothing to do with clearing ITP data, disk cache, or any other WebKit website data.
Comment 11 bxl75521@anonaddy.me 2022-05-15 14:03:15 PDT
The issue is that history isn't being cleared. However, I'm not sure if this is specific to some users. Does the issue happen for you?
Comment 12 Michael Catanzaro 2022-05-15 16:27:27 PDT
If you have an issue with Epiphany history, please go back to Epiphany's issue tracker. WebKit does not manage your browser history.

I can see from the video you attached on the Epiphany issue tracker that you uncovered a bug with WebKit's Intelligent Tracking Prevention feature not properly removing ITP data when the webkit_website_data_manager_clear() function is called with WebKitWebsiteDataType WEBKIT_WEBSITE_DATA_ITP. That's what we'll track here in this issue.
Comment 13 Michael Catanzaro 2022-05-15 19:10:41 PDT
If I do remove operations (remove ITP data for a particular domain, implemented in ResourceLoadStatisticsDatabaseStore::removeDataForDomain), the data does appear to get deleted properly. Only the clear operations (remove all ITP data for all domains, implemented in ResourceLoadStatisticsDatabaseStore::clear) seem to fail. But: if I do a remove operation followed by a clear operation, then it seems the clear operation somehow actually restores the ITP data that was previously removed! That's wild.

Taking a very quick look at ResourceLoadStatisticsDatabaseStore::clear, it winds up calling SQLiteDatabase::clearAllTables, which seems unlikely to fail. Perhaps the database is somehow getting repopulated using cached data? Or maybe the fetch operations are returning data that is no longer in the database? Requires further investigation.

Jon, since this is your area of expertise, I wonder if you can reproduce any of this behavior?
Comment 14 John Wilander 2022-05-15 21:22:03 PDT
Cc’ing Kate who knows the DB code well.
Comment 15 Kate Cheney 2022-05-16 10:31:30 PDT
I can't repro on macOS with Safari. Preferences > Privacy > Manage Website Data > Remove All seems to work as expected (the ITP database is emptied out properly).

If a page is open and dynamically loading new 3rd parties, it is possible that something you deleted previously would appear immediately again in the database because a new resource is loaded. However this would not explain a large scale occurrence of deleted website data re-appearing in the database.

I'm curious to know if this bug is caused because:

(1) We explicitly add domains back to the ITP database after clearing
(2) The clearing fails and the domains never were deleted
(3) We add the domains back in-memory somewhere when returning the ITP data, but they aren't re-inserted in the database.

Knowing more about how you're confirming that the data re-appears would probably be helpful. We can at least eliminate (3) if re-starting the session still shows the previously-cleared data appearing.

(1) and (2) will be harder to test without some logging or deeper inspection of the database file itself.
Comment 16 Michael Catanzaro 2022-05-16 11:06:36 PDT
(In reply to Kate Cheney from comment #15)
> Knowing more about how you're confirming that the data re-appears would
> probably be helpful. We can at least eliminate (3) if re-starting the
> session still shows the previously-cleared data appearing.

This dialog displays the result of webkit_website_data_manager_fetch().

When describing the behavior above, I didn't even restart anything. Epiphany would have just done a new fetch() operation on the existing WebKitWebsiteDataManager (the WPE/GTK API wrapper around WebsiteDataStore).

> (1) and (2) will be harder to test without some logging or deeper inspection
> of the database file itself.

I'll try to look a little deeper....
Comment 17 Kate Cheney 2022-05-16 11:36:45 PDT
(In reply to Michael Catanzaro from comment #16)
> (In reply to Kate Cheney from comment #15)
> > Knowing more about how you're confirming that the data re-appears would
> > probably be helpful. We can at least eliminate (3) if re-starting the
> > session still shows the previously-cleared data appearing.
> 
> This dialog displays the result of webkit_website_data_manager_fetch().
> 
> When describing the behavior above, I didn't even restart anything. Epiphany
> would have just done a new fetch() operation on the existing
> WebKitWebsiteDataManager (the WPE/GTK API wrapper around WebsiteDataStore).
> 

My thought here was that any in-memory storage would be cleared after a session restart. If we restart and still see the cleared data appearing, then we know that domains are in the ITP database (either never were cleared or were re-inserted) and not just hanging around in memory somewhere. Might give us some additional info.

> > (1) and (2) will be harder to test without some logging or deeper inspection
> > of the database file itself.
> 
> I'll try to look a little deeper....

Looking for calls to ResourceLoadStatisticsDatabaseStore::insertObservedDomain might be useful here. That's how we enter new domains into the database. If we see that being called for domains previously deleted, we know that ITP is re-inserting some deleted domains.
Comment 18 Michael Catanzaro 2022-05-16 14:28:11 PDT
(In reply to Kate Cheney from comment #17)
> My thought here was that any in-memory storage would be cleared after a
> session restart. If we restart and still see the cleared data appearing,
> then we know that domains are in the ITP database (either never were cleared
> or were re-inserted) and not just hanging around in memory somewhere. Might
> give us some additional info.

It persists across restarts.

> Looking for calls to
> ResourceLoadStatisticsDatabaseStore::insertObservedDomain might be useful
> here. That's how we enter new domains into the database. If we see that
> being called for domains previously deleted, we know that ITP is
> re-inserting some deleted domains.

Good guess. I see lots of calls to this ResourceLoadStatisticsDatabaseStore::insertObservedDomain when doing a clear (webkit_website_data_manager_clear()/ResourceLoadStatisticsDatabaseStore::clear) operation. If I just do individual removes (webkit_website_data_manager_remove()/ResourceLoadStatisticsDatabaseStore::removeDataForDomain), and then restart without doing a clear, then they do stay permanently removed. So we're dealing with theory (1) We explicitly add domains back to the ITP database after clearing (but not after removeDataForDomain).

Next step: I'll try to figure out why ResourceLoadStatisticsDatabaseStore::insertObservedDomain is being called.
Comment 19 Michael Catanzaro 2022-05-27 12:13:05 PDT
(In reply to Michael Catanzaro from comment #18)
> Next step: I'll try to figure out why
> ResourceLoadStatisticsDatabaseStore::insertObservedDomain is being called.

This is unexpectedly difficult. The problem is if I try to attach to the network process with gdb, it takes so long to attach that WebKit decides the network process is lost and spawns a new one.
Comment 20 Chris Dumez 2022-05-27 12:19:51 PDT
(In reply to Michael Catanzaro from comment #19)
> (In reply to Michael Catanzaro from comment #18)
> > Next step: I'll try to figure out why
> > ResourceLoadStatisticsDatabaseStore::insertObservedDomain is being called.
> 
> This is unexpectedly difficult. The problem is if I try to attach to the
> network process with gdb, it takes so long to attach that WebKit decides the
> network process is lost and spawns a new one.

You may need to update `AuxiliaryProcessProxy::mayBecomeUnresponsive()` to return false and then again. AuxiliaryProcessProxy::mayBecomeUnresponsive() may not be properly detecting that the process is being debugged.
Comment 21 Michael Catanzaro 2022-05-27 14:47:44 PDT
Very interesting. I'm not quite sure how to detect whether another process is being debugged on Linux, though. Attaching with ptrace to see whether that fails would be a pretty heavy hammer, and even that won't work nowadays because it requires root. (Detecting whether the *current* process is being debugged is easy, but that's not what you're doing in AuxiliaryProcessProxy::platformIsBeingDebugged.) Anyway, I'll just sabotage that locally for now... thanks for the tip.
Comment 22 Michael Catanzaro 2022-05-27 15:29:08 PDT
(In reply to Michael Catanzaro from comment #21)
> Anyway, I'll just sabotage
> that locally for now... thanks for the tip.

Sabotaging AuxiliaryProcessProxy::mayBecomeUnresponsive to always return true is not enough. Evidently the auxiliary process gets killed from somewhere else. Maybe I can figure this out using WTFLogAlways instead.
Comment 23 Kate Cheney 2022-05-27 15:30:59 PDT
(In reply to Michael Catanzaro from comment #22)
> (In reply to Michael Catanzaro from comment #21)
> > Anyway, I'll just sabotage
> > that locally for now... thanks for the tip.
> 
> Sabotaging AuxiliaryProcessProxy::mayBecomeUnresponsive to always return
> true is not enough. Evidently the auxiliary process gets killed from
> somewhere else. Maybe I can figure this out using WTFLogAlways instead.

Seems like a good place for WTFReportBacktrace() to figure out where insertObservedDomain is being initiated from.
Comment 24 Michael Catanzaro 2022-05-27 16:16:02 PDT
(In reply to Kate Cheney from comment #23)
> Seems like a good place for WTFReportBacktrace() to figure out where
> insertObservedDomain is being initiated from.

That doesn't work anymore since bug #181916, but perhaps I should just sabotage that too. Another option: add a counter variable to crash after the function is called n times, or better, only after ResourceLoadStatisticsDatabaseStore::insertObservedDomain is called first. Will try these next week.
Comment 25 Michael Catanzaro 2022-06-03 07:19:23 PDT
I wound up changing ResourceLoadStatisticsDatabaseStore::insertObservedDomain to crash if called after ResourceLoadStatisticsDatabaseStore::clear has been called. The backtrace looks like this:

#0  0x00007f177966d88e in WTFCrash () at /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/Assertions.cpp:322
#1  0x00007f177b2d2ee5 in WTFCrashWithInfo ()
    at /home/mcatanzaro/Projects/WebKit/WebKitBuild/GNOME-gtk3/WTF/Headers/wtf/Assertions.h:748
#2  WebKit::ResourceLoadStatisticsDatabaseStore::insertObservedDomain (this=this@entry=0x7f17312f8000, 
    loadStatistics=...)
    at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:705
#3  0x00007f177b2d9b42 in WebKit::ResourceLoadStatisticsDatabaseStore::ensureResourceStatisticsForRegistrableDomain (
    this=0x7f17312f8000, domain=...)
    at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1978
#4  0x00007f177b2dfe7a in WebKit::ResourceLoadStatisticsDatabaseStore::grandfatherDataForDomains (
    this=this@entry=0x7f17312f8000, domains=...)
    at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1368
#5  0x00007f177b2e9228 in WebKit::ResourceLoadStatisticsDatabaseStore::grandfatherDataForDomains (domains=..., 
    this=0x7f17312f8000)
    at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.cpp:240
#6  operator() (__closure=0x7f17180e8cc8)
    at /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.cpp:244
#7  WTF::Detail::CallableWrapper<WebKit::ResourceLoadStatisticsStore::grandfatherExistingWebsiteData(WTF::CompletionHandler<void()>&&)::<lambda()> mutable::<lambda(WTF::HashSet<WebCore::RegistrableDomain>&&)> mutable::<lambda()>, void>::call(void) (this=0x7f17180e8cc0)
    at /home/mcatanzaro/Projects/WebKit/WebKitBuild/GNOME-gtk3/WTF/Headers/wtf/Function.h:53
#8  0x00007f177969430e in WTF::Function<void ()>::operator()() const (this=<synthetic pointer>)
    at /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/Function.h:79
#9  WTF::RunLoop::performWork (this=0x7f1772f99000) at /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/RunLoop.cpp:133

It's coming from the lambdas in WebResourceLoadStatisticsStore::scheduleClearInMemoryAndPersistent. I don't know how exactly grandfathering is supposed to work, but it seems to be grandfathering the data that is supposed to be cleared.
Comment 26 Kate Cheney 2022-06-03 09:59:09 PDT
(In reply to Michael Catanzaro from comment #25)
> I wound up changing
> ResourceLoadStatisticsDatabaseStore::insertObservedDomain to crash if called
> after ResourceLoadStatisticsDatabaseStore::clear has been called. The
> backtrace looks like this:
> 

Nice!

> #0  0x00007f177966d88e in WTFCrash () at
> /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/Assertions.cpp:322
> #1  0x00007f177b2d2ee5 in WTFCrashWithInfo ()
>     at
> /home/mcatanzaro/Projects/WebKit/WebKitBuild/GNOME-gtk3/WTF/Headers/wtf/
> Assertions.h:748
> #2  WebKit::ResourceLoadStatisticsDatabaseStore::insertObservedDomain
> (this=this@entry=0x7f17312f8000, 
>     loadStatistics=...)
>     at
> /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/
> ResourceLoadStatisticsDatabaseStore.cpp:705
> #3  0x00007f177b2d9b42 in
> WebKit::ResourceLoadStatisticsDatabaseStore::
> ensureResourceStatisticsForRegistrableDomain (
>     this=0x7f17312f8000, domain=...)
>     at
> /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/
> ResourceLoadStatisticsDatabaseStore.cpp:1978
> #4  0x00007f177b2dfe7a in
> WebKit::ResourceLoadStatisticsDatabaseStore::grandfatherDataForDomains (
>     this=this@entry=0x7f17312f8000, domains=...)
>     at
> /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/
> ResourceLoadStatisticsDatabaseStore.cpp:1368
> #5  0x00007f177b2e9228 in
> WebKit::ResourceLoadStatisticsDatabaseStore::grandfatherDataForDomains
> (domains=..., 
>     this=0x7f17312f8000)
>     at
> /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/
> ResourceLoadStatisticsStore.cpp:240
> #6  operator() (__closure=0x7f17180e8cc8)
>     at
> /home/mcatanzaro/Projects/WebKit/Source/WebKit/NetworkProcess/Classifier/
> ResourceLoadStatisticsStore.cpp:244
> #7 
> WTF::Detail::CallableWrapper<WebKit::ResourceLoadStatisticsStore::
> grandfatherExistingWebsiteData(WTF::CompletionHandler<void()>&&)::<lambda()>
> mutable::<lambda(WTF::HashSet<WebCore::RegistrableDomain>&&)>
> mutable::<lambda()>, void>::call(void) (this=0x7f17180e8cc0)
>     at
> /home/mcatanzaro/Projects/WebKit/WebKitBuild/GNOME-gtk3/WTF/Headers/wtf/
> Function.h:53
> #8  0x00007f177969430e in WTF::Function<void ()>::operator()() const
> (this=<synthetic pointer>)
>     at /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/Function.h:79
> #9  WTF::RunLoop::performWork (this=0x7f1772f99000) at
> /home/mcatanzaro/Projects/WebKit/Source/WTF/wtf/RunLoop.cpp:133
> 
> It's coming from the lambdas in
> WebResourceLoadStatisticsStore::scheduleClearInMemoryAndPersistent. I don't
> know how exactly grandfathering is supposed to work, but it seems to be
> grandfathering the data that is supposed to be cleared.

It looks like we grandfather data in NetworkProcess::deleteWebsiteData if the specified website data types to delete do not contain all of ITP's monitored data types. I'm guessing that WEBKIT_WEBSITE_DATA_ITP is not translating to all the data types tracked in WebResourceLoadStatistics::monitoredDataTypes(). If there are things in that list that are port/platform specific we could add some flags, or adjust the types in webkit_website_data_manager_remove() to contain all ITP monitored types.
Comment 27 Michael Catanzaro 2022-06-03 13:33:36 PDT
(In reply to Kate Cheney from comment #26)
> It looks like we grandfather data in NetworkProcess::deleteWebsiteData if
> the specified website data types to delete do not contain all of ITP's
> monitored data types. I'm guessing that WEBKIT_WEBSITE_DATA_ITP is not
> translating to all the data types tracked in
> WebResourceLoadStatistics::monitoredDataTypes().

WEBKIT_WEBSITE_DATA_ITP is not a list of data types to delete. It's just WebsiteDataType::ResourceLoadStatistics.

> If there are things in that
> list that are port/platform specific we could add some flags, or adjust the
> types in webkit_website_data_manager_remove() to contain all ITP monitored
> types.

I don't think anything is platform-specific here. WebsiteDataType is a flags enum that specifies which website data types to delete. WebsiteDataStore allows you to delete all types at once, or only one type of data, or any combination of data types. In this case, we're deleting only the WebsiteDataType::ResourceLoadStatistics.

WebsiteDataStore's WebsiteDataTypes are expected to be independent.