Bug 210184

Summary: [GTK][WPE] Enable resource load statistics
Product: WebKit Reporter: Carlos Garcia Campos <cgarcia>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, annulen, berto, bfulgham, bugs-noreply, ews-watchlist, gustavo, gyuyoung.kim, mcatanzaro, ryuan.choi, sergio, wilander, youennf, zan
Priority: P2 Keywords: Gtk
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=210739
https://bugs.webkit.org/show_bug.cgi?id=210796
https://bugs.webkit.org/show_bug.cgi?id=210487
Bug Depends on: 210483, 213493, 213502    
Bug Blocks: 177943, 210422    
Attachments:
Description Flags
WIP patch
none
WIP patch
none
Patch zan: review+

Description Carlos Garcia Campos 2020-04-08 05:38:48 PDT
.
Comment 1 Carlos Garcia Campos 2020-04-08 05:44:50 PDT
Created attachment 395795 [details]
WIP patch

This is a WIP patch. Most of the tests are passing, I'm getting just 3 failures and 2 flaky timeouts:

153 tests ran as expected, 5 didn't:


Unexpected flakiness: timeouts (2)
  http/tests/resourceLoadStatistics/add-blocking-to-redirect-database.html [ Timeout Pass ]
  http/tests/resourceLoadStatistics/clear-in-memory-and-persistent-store-one-hour-database.html [ Timeout Pass ]

I think these are related to a race condition in FileMonitor GLib implementation, I'll investigate.

Regressions: Unexpected text-only failures (3)
  http/tests/resourceLoadStatistics/do-not-remove-blocking-in-redirect-database.html [ Failure ]

This looks like an actual bug that I need to investigate too.

  http/tests/resourceLoadStatistics/switch-session-on-navigation-to-prevalent-with-interaction-database.php [ Failure ]
  http/tests/resourceLoadStatistics/switch-session-on-navigation-to-prevalent-with-interaction.php [ Failure ]

And these are because we are not handling isolated sessions, mainly because I'm ot sure what the expected behavior is there.
Comment 2 Carlos Garcia Campos 2020-04-08 05:50:41 PDT
So, I have a few questions for John/Youenn/Alex:

 - What's the expected behavior of isolated sessions? They start with an empty cookie storage and only allow first-party? Why is there a limit of isolated sessions?

 - There isn't HSTS tests, so I don't know what the expected behavior is there either. Should we downgrade requests upgraded by HSTS when cokies should be blocked?

 - What API should we expose for ITP? is it enough to expose WebsiteData API to set stats dir, fetch/delete website data and enable/disabled ITP? Anything else?
Comment 3 Carlos Garcia Campos 2020-04-08 06:54:12 PDT
Created attachment 395803 [details]
WIP patch

Found the problem with http/tests/resourceLoadStatistics/do-not-remove-blocking-in-redirect-database.html so remaining issues are:

 - isolated sessions support
 - flaky timeouts due to FileMonitor
 - HSTS requests
Comment 4 Carlos Garcia Campos 2020-04-13 02:18:00 PDT
(In reply to Carlos Garcia Campos from comment #2)
> So, I have a few questions for John/Youenn/Alex:
> 
>  - What's the expected behavior of isolated sessions? They start with an
> empty cookie storage and only allow first-party? Why is there a limit of
> isolated sessions?
> 
>  - There isn't HSTS tests, so I don't know what the expected behavior is
> there either. Should we downgrade requests upgraded by HSTS when cokies
> should be blocked?
> 
>  - What API should we expose for ITP? is it enough to expose WebsiteData API
> to set stats dir, fetch/delete website data and enable/disabled ITP?
> Anything else?

Patrick told me we have to also expose API to implement UIClient::requestStorageAccessConfirm() 

Anything else?
Comment 5 Carlos Garcia Campos 2020-04-14 02:46:07 PDT
Since I'm not getting any help here, I think we can just land this patch and file a bug report for the remaining test failing (isolated sessions). I'll submit a new patch with changelogs and expectations for failures.
Comment 6 Carlos Garcia Campos 2020-04-14 05:08:33 PDT
Created attachment 396401 [details]
Patch
Comment 7 youenn fablet 2020-04-14 07:40:49 PDT
Comment on attachment 396401 [details]
Patch

LGTM, gtk bot is failing.

View in context: https://bugs.webkit.org/attachment.cgi?id=396401&action=review

> Source/WebCore/platform/network/soup/NetworkStorageSessionSoup.cpp:275
> +        return;

This code is similar to the cocoa version.
Not a pre-requisite but it might be interesting to see whether we could move this to cross-platform code.
Ditto below.

> Source/WebKit/UIProcess/API/C/WKPage.cpp:2978
> +        Vector<RefPtr<API::Object>> apiDomains = WTF::map(domains, [](auto& domain) {

auto
Comment 8 John Wilander 2020-04-14 08:03:43 PDT
Sorry for dropping the ball on this. Excited to see you’re enabling ITP!

(In reply to Carlos Garcia Campos from comment #2)
> So, I have a few questions for John/Youenn/Alex:
> 
>  - What's the expected behavior of isolated sessions? They start with an
> empty cookie storage and only allow first-party? Why is there a limit of
> isolated sessions?

Isolated sessions are about the network layer below HTTP. A new session gets a new TLS connection for instance. But you can hang all kinds of things on the session to isolate it such as an individual DNS cache.

>  - There isn't HSTS tests, so I don't know what the expected behavior is
> there either. Should we downgrade requests upgraded by HSTS when cokies
> should be blocked?

Yes, when the original request is HTTP, the request will have its cookies blocked, and has been upgraded by the HSTS mechanism, downgrade back to HTTP, apply all other rules in WebKit that might again upgrade it such as Upgrade Insecure Requests or potential/future auto-upgrade of mixed content, and send out.

The reason for lack of tests is that HSTS requires real, trusted certificates and self-signed ones like the one in the test runner will not do.

>  - What API should we expose for ITP? is it enough to expose WebsiteData API
> to set stats dir, fetch/delete website data and enable/disabled ITP?
> Anything else?

If you want fine grained controls, you can offer more. I believe we offer the ability to exempt localhost for cases where a localhost server is used to create an stand-alone application with a UI rendered with web technologies.
Comment 9 Michael Catanzaro 2020-04-14 08:11:51 PDT
(In reply to John Wilander from comment #8) 
> Isolated sessions are about the network layer below HTTP. A new session gets
> a new TLS connection for instance. 

Is the goal to avoid session resumption across separate isolated sessions?

(That's not currently possible with the libsoup backend, but adding new APIs might be possible.)

> But you can hang all kinds of things on
> the session to isolate it such as an individual DNS cache.

What's the value of isolating DNS cache? Is it to try to get different results from load balancers? (It won't be possible to do on Linux regardless. I'm just curious.)
Comment 10 Carlos Garcia Campos 2020-04-15 01:32:03 PDT
(In reply to youenn fablet from comment #7)
> Comment on attachment 396401 [details]
> Patch
> 
> LGTM, gtk bot is failing.

Not related to this patch:

DerivedSources/WebCore/JSCanvasTextBaseline.cpp:80:1: fatal error: error writing to /tmp/cc8UZFn2.s: No space left on device

> View in context:
> https://bugs.webkit.org/attachment.cgi?id=396401&action=review
> 
> > Source/WebCore/platform/network/soup/NetworkStorageSessionSoup.cpp:275
> > +        return;
> 
> This code is similar to the cocoa version.
> Not a pre-requisite but it might be interesting to see whether we could move
> this to cross-platform code.
> Ditto below.
> 
> > Source/WebKit/UIProcess/API/C/WKPage.cpp:2978
> > +        Vector<RefPtr<API::Object>> apiDomains = WTF::map(domains, [](auto& domain) {
> 
> auto
Comment 11 Carlos Garcia Campos 2020-04-15 01:37:46 PDT
(In reply to John Wilander from comment #8)
> Sorry for dropping the ball on this. Excited to see you’re enabling ITP!
> 
> (In reply to Carlos Garcia Campos from comment #2)
> > So, I have a few questions for John/Youenn/Alex:
> > 
> >  - What's the expected behavior of isolated sessions? They start with an
> > empty cookie storage and only allow first-party? Why is there a limit of
> > isolated sessions?
> 
> Isolated sessions are about the network layer below HTTP. A new session gets
> a new TLS connection for instance. But you can hang all kinds of things on
> the session to isolate it such as an individual DNS cache.

But when is an isolated session used? why is there a limit? I need to understand the expected behavior to see how to implement it in libsoup. For example, in libsoup we can ensure a new connection is always used for a particular request without having to use a different session, the same way we can disable cookies or other features.

> >  - There isn't HSTS tests, so I don't know what the expected behavior is
> > there either. Should we downgrade requests upgraded by HSTS when cokies
> > should be blocked?
> 
> Yes, when the original request is HTTP, the request will have its cookies
> blocked, and has been upgraded by the HSTS mechanism, downgrade back to
> HTTP, apply all other rules in WebKit that might again upgrade it such as
> Upgrade Insecure Requests or potential/future auto-upgrade of mixed content,
> and send out.

I'll open a new bug to do this.

> The reason for lack of tests is that HSTS requires real, trusted
> certificates and self-signed ones like the one in the test runner will not
> do.

Yes, I know, I just meant I couldn't figure out the expected behavior just by looking at cocoa implementation.

> >  - What API should we expose for ITP? is it enough to expose WebsiteData API
> > to set stats dir, fetch/delete website data and enable/disabled ITP?
> > Anything else?
> 
> If you want fine grained controls, you can offer more. I believe we offer
> the ability to exempt localhost for cases where a localhost server is used
> to create an stand-alone application with a UI rendered with web
> technologies.

How does itp afect the cookies acept policy? is it just ignored? what policy should we use in libsoup when itp is enabled?
Comment 12 Michael Catanzaro 2020-04-15 09:37:01 PDT
(In reply to Carlos Garcia Campos from comment #11)
> But when is an isolated session used? why is there a limit? I need to
> understand the expected behavior to see how to implement it in libsoup. For
> example, in libsoup we can ensure a new connection is always used for a
> particular request without having to use a different session, the same way
> we can disable cookies or other features.

GLib doesn't have any way to prevent session resumption, so it would be impossible to do in libsoup alone without changes in GLib. I could add GLib API for you if desired. Would be easy to do, and would allow removing a hack we use to disable session resumption when running tests. (Currently calling g_test_init() will disable session resumption, because there's no API tests can use to do that, so this behavior has to be tested manually.)

> How does itp afect the cookies acept policy? is it just ignored? what policy
> should we use in libsoup when itp is enabled?

John can correct me if I get something wrong, but ITP's behavior changes over time and will likely change again in the future. Currently, ITP blocks all third-party cookies for all domains, unless websites use the storage access API. So it overrides WEBKIT_COOKIE_POLICY_ACCEPT_ALWAYS. And it sort of overrides WEBKIT_COOKIE_POLICY_ACCEPT_NO_THIRD_PARTY, in that websites can use the storage access API to request to store third-party cookies. libsoup must not break the storage access API by blocking third-party cookies at the libsoup level, so I think WebKit should always use SOUP_COOKIE_JAR_ACCEPT_ALWAYS, and manually reject third-party cookies at the WebKit level, except when permitted by storage access API. I guess if the application selects WEBKIT_COOKIE_POLICY_ACCEPT_NEVER, then that can take precedence and we can use SOUP_COOKIE_JAR_ACCEPT_NEVER, but otherwise ITP policy should be used.
Comment 13 Michael Catanzaro 2020-04-15 09:40:06 PDT
(In reply to Carlos Garcia Campos from comment #11)
> But when is an isolated session used? why is there a limit?

BTW I'm also curious about these, and also *why*. I can't think of any privacy benefit besides limiting session resumption, and that's easy to limit without ITP by using aggressive timeouts.
Comment 14 Carlos Garcia Campos 2020-04-20 01:34:05 PDT
Committed r260356: <https://trac.webkit.org/changeset/260356>
Comment 15 Carlos Garcia Campos 2020-04-20 01:35:03 PDT
(In reply to Michael Catanzaro from comment #13)
> (In reply to Carlos Garcia Campos from comment #11)
> > But when is an isolated session used? why is there a limit?
> 
> BTW I'm also curious about these, and also *why*. I can't think of any
> privacy benefit besides limiting session resumption, and that's easy to
> limit without ITP by using aggressive timeouts.

This patch landed, so let's move the discussion about isolated sessions to bug #210487.