Bug 149551 - [SOUP] Slack.com is not working, new messages do not load due to WebSocket authentication failure issue
Summary: [SOUP] Slack.com is not working, new messages do not load due to WebSocket au...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2015-09-24 23:54 PDT by javiermon
Modified: 2020-09-26 06:52 PDT (History)
17 users (show)

See Also:


Attachments
Output in the WebKit console when loading a Slack channel (21.41 KB, text/plain)
2016-07-18 03:47 PDT, Mathieu Bridon
no flags Details
Patch (1.96 KB, patch)
2020-09-25 09:00 PDT, Carlos Garcia Campos
mcatanzaro: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description javiermon 2015-09-24 23:54:54 PDT
Hi

This was originally reported here:
https://bugzilla.gnome.org/show_bug.cgi?id=755508

Slack.com is not working anymore with gnome's web browser (epiphany). I've tested with the MiniBrowser and the bug is still there. Here's the relevant information of the webkitgtk4-devel package I used to test with minibrowser:
Name        : webkitgtk4-devel
Arch        : x86_64
Epoch       : 0
Version     : 2.10.0
Release     : 1.fc23


Thanks,
Comment 1 Carlos Garcia Campos 2015-09-25 00:02:02 PDT
It works for me, veeeeeery slow, though.
Comment 2 javiermon 2015-09-25 00:07:01 PDT
Hi

Have your tried logging in and loading channels, etc? 

In 2 of my machines the website shows a loading page and then it shows some parts but it doesn't load the channels/users/etc. Also, a yellow banner says reconnecting... but nothing else happens.

Thanks,
Comment 3 Carlos Garcia Campos 2015-09-25 00:53:04 PDT
No I haven't, all the information I had here is that slack.com was not working. I don't have an account to try it out.
Comment 4 Štefan Gurský 2016-01-07 02:26:17 PST
It works now in epiphany 3.18.2-1.fc23 with webkitgtk4 2.10.4
Comment 5 Michael Catanzaro 2016-01-07 06:17:01 PST
(In reply to comment #4)
> It works now in epiphany 3.18.2-1.fc23 with webkitgtk4 2.10.4

Nice when these things fix themselves....
Comment 6 Mathieu Bridon 2016-07-17 23:57:18 PDT
This still happens for me with webkitgtk4-2.12.3-1.fc24.x86_64

(I tried with the MiniBrowser)

Carlos, since you don't have an account, what information could I give you to help fix this? :)
Comment 7 Carlos Garcia Campos 2016-07-18 00:44:21 PDT
(In reply to comment #6)
> This still happens for me with webkitgtk4-2.12.3-1.fc24.x86_64
> 
> (I tried with the MiniBrowser)
> 
> Carlos, since you don't have an account, what information could I give you
> to help fix this? :)

Check if there are errors in the console with the inspector, for example.
Comment 8 Mathieu Bridon 2016-07-18 03:47:38 PDT
Created attachment 283897 [details]
Output in the WebKit console when loading a Slack channel

Here's a paste of what I see in the WebKit console when loading a slack team discussion page.

Sorry for the formatting, I just selected all text and then copy-pasted it. Let me know if there is a way to acquire this trace in a nicer manner.
Comment 9 Mathieu Bridon 2016-12-14 09:08:12 PST
Anything else I can do to help get this fixed?
Comment 10 Michael Catanzaro 2016-12-14 10:17:17 PST
Asking the Slack developers to look at it would help. We're not going to debug their website for them.

This looks bad:

[Warning] Unhandled rejection _addBinding@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:11:15607 (rollup-core_required_libs.js, line 24)
_registerListener@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:11:15424
add@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:11:16071
onStart@https://h2.slack-edge.com/33e0d/js/rollup-client.js:4:30182
_callOnStarts@https://h2.slack-edge.com/4f1f/js/rollup-core_required_ts.js:2:7715
_onTemplatesLoaded@https://h2.slack-edge.com/4f1f/js/rollup-core_required_ts.js:2:7184
https://h2.slack-edge.com/4f1f/js/rollup-core_required_ts.js:2:4710
cancellationExecute@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:23:24553
_resolveFromExecutor@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:25:10810
Promise@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:25:2167
loadTemplates@https://h2.slack-edge.com/4f1f/js/rollup-core_required_ts.js:2:4538
_onDOMReady@https://h2.slack-edge.com/4f1f/js/rollup-core_required_ts.js:2:5511
fire@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:2:10822
fireWith@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:2:12106
ready@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:2:15007
completed@https://h2.slack-edge.com/3e1f/js/rollup-core_required_libs.js:2:15280

The log also shows some serious performance issues:

[Error] 2016/7/18 11:47:17.478  view.onStart() took 96.81ms, and that's WAY too long! Please make lazy any expensive DOM work etc you're doing
	_console (rollup-core_required_ts.js:1:9133)
	error (rollup-core_required_ts.js:1:3156)
	_callOnStarts (rollup-core_required_ts.js:2:7969)
	_onTemplatesLoaded (rollup-core_required_ts.js:2:7185)
	(anonymous function) (rollup-core_required_ts.js:2:4711)
	cancellationExecute (rollup-core_required_libs.js:23:24554)
	_resolveFromExecutor (rollup-core_required_libs.js:25:10811)
	Promise (rollup-core_required_libs.js:25:2168)
	loadTemplates (rollup-core_required_ts.js:2:4539)
	_onDOMReady (rollup-core_required_ts.js:2:5512)
	fire (rollup-core_required_libs.js:2:10823)
	fireWith (rollup-core_required_libs.js:2:12107)
	ready (rollup-core_required_libs.js:2:15008)
	completed (rollup-core_required_libs.js:2:15281)
Comment 11 Guilaume Ayoub 2020-05-14 02:12:00 PDT
Current version of Slack is still broken with recent versions of WebKitGTK: new messages are not automatically fetched and user statuses are not updated at all.

The JS error has changed and looks actually better:

"fetchAuthAndReturnToUrl: too many auth fetches: 1589447178329,1589447180256,1589447182433; clearing auth fetch history and showing error"

Is there anything I can do to help?
Comment 12 Michael Catanzaro 2020-05-14 06:31:06 PDT
Ahhh, this has been really annoying since WebKit switched to Slack for communication. I was wondering why no other WebKitGTK devs seemed to have problems using Slack. Now I know... almost nobody else actually uses WebKit. ;)
Comment 13 Jérémy Lal 2020-05-14 06:44:51 PDT
I've been using slack on epiphany for several years now.
Sometimes it needs a restart (hangs or high cpu load). Most of the time it's usable.
Right now on debian with epiphany 3.36.1, webkitgtk 2.28 (but it's been all right at least since 2.20).
Comment 14 Guilaume Ayoub 2020-05-14 06:55:37 PDT
(In reply to Jérémy Lal from comment #13)
> I've been using slack on epiphany for several years now.
> Sometimes it needs a restart (hangs or high cpu load). Most of the time it's
> usable.

It somehow works for me, but I wouldn’t say "usable".

I’m on Epiphany 3.37.1 with WebKitGTK 2.28.2. I’ve only been trying to use Slack a month ago, and I had (and still have) these problems:

- New messages never arrive. The popup "Last update x minutes ago" is always displayed, I suppose that it should only appear when disconnected. The only way to get new messages is to reload the page.
- All the users (in the Direct Messages section) are offline, even when they appear as connected in other browsers.

Everything works fine on Firefox and Chrome on the same computer.

I can provide more information about my Epiphany and WebKitGTK configurations if needed.
Comment 15 Michael Catanzaro 2020-05-14 08:15:43 PDT
(In reply to Guilaume Ayoub from comment #14)
> - New messages never arrive. The popup "Last update x minutes ago" is always
> displayed, I suppose that it should only appear when disconnected. The only
> way to get new messages is to reload the page.

In my case, I can get new messages by clicking "Last updated x minutes ago... Load new messages." Sometimes it actually seems to randomly work properly. But usually I have to click this button.
Comment 16 Jérémy Lal 2020-05-14 08:53:51 PDT
Using slack installed as a separate epiphany "application" it works all right here.
However the same url opened in a regular epiphany instance reproduces the "fetchAuthAndReturnToUrl" error.
Comment 17 Guilaume Ayoub 2020-05-14 09:06:59 PDT
(In reply to Michael Catanzaro from comment #15)
> In my case, I can get new messages by clicking "Last updated x minutes
> ago... Load new messages." Sometimes it actually seems to randomly work
> properly. But usually I have to click this button.

Clicking works for me too, but it’s kind of removing the "instant" part of "instant messaging" :).

I’ve never seen it work more than a dozen of seconds before getting the popup again.

(In reply to Jérémy Lal from comment #16)
> Using slack installed as a separate epiphany "application" it works all
> right here.
> However the same url opened in a regular epiphany instance reproduces the
> "fetchAuthAndReturnToUrl" error.

It doesn’t work for me in a separate Epiphany application :/. I’ve tried to change the application properties to be very permissive, it doesn’t fix the problem.
Comment 18 Jérémy Lal 2020-05-14 09:14:51 PDT
Odd, this is how the .desktop file is starting it:

epiphany --application-mode --profile="/home/dev/.local/share/epiphany-app.slack.com-xxxxxxxxxxx" https://app.slack.com/client/aaaaa/bbbbbb
Comment 19 Link Dupont 2020-06-15 10:18:56 PDT
This recently started happening to me with Epiphany 3.36.2 and WebKitGTK 2.28.2. I noticed a couple suspicious lines in my console.

[Info] Jun-15 12:36:01.247 [SOCKET-MANAGER] (T024NJK6Z) received an error of type -1337: invalid_auth (gantry-shared.ade4760.min.js, line 1)
12:38 PM [Info] Jun-15 12:36:01.249 [SOCKET-MANAGER] (T024NJK6Z) Closing socket because #4106: received an "error" message from socket (gantry-shared.ade4760.min.js, line 1)
Comment 20 Link Dupont 2020-06-16 12:12:11 PDT
So I could not rest knowing that this used to work, and decided to test it on another computer (my laptop). In browser mode, with an existing session, it works (the web socket maintains a connection and I do not see the "Load new messages" bubble). I decided to copy the epiphany settings from my laptop to my desktop. I created a tarball containing: ~/.config/epiphany, ~/.local/share/epiphany, and ~/.cache/epiphany. I copied the tarball over to my desktop, removed those directories from my desktop, and then unarchived the tarball. And that worked. Using browser mode, my slack workspace was able to connect.

I then went and created a web-app for app.slack.com. And it didn't work in that web-app. I concluded (thanks to bouncing ideas off of mcatanzaro), that the problem must exist in cookies, localstorage or IndexedDB. I opened up the cookies.sqlite on ~/.local/share/epiphany and in the slack web-app. They each contained 4 cookies with host = '.slack.com', but had different values. I dumped the table from the browser mode, grabbed the 4 "INSERT" statements, deleted the cookies from the web-app, and ran the 4 INSERT statements on the web-app cookies.sqlite. Lo and behold, the web-app could now maintain a websocket!

WHAT this means, I have no idea, but it seems to point to one of the 4 cookies. I didn't test further to identify which of the 4 cookies might be causing it.

sqlite> SELECT name, host, path FROM moz_cookies ;
d|.slack.com|/
lc|.slack.com|/
b|.slack.com|/

I appear to now have only 3 cookies. Originally, there was a fourth, named "x".
Comment 21 Michael Catanzaro 2020-06-16 14:06:38 PDT
Did you test cookie accept policy to see if, for example, allowing all cookies (instead of only first-party cookies) would work?
Comment 22 Link Dupont 2020-06-18 02:36:52 PDT
(In reply to Michael Catanzaro from comment #21)
> Did you test cookie accept policy to see if, for example, allowing all
> cookies (instead of only first-party cookies) would work?

I just tried this. I removed my cookies.sqlite, then set the setting to allow all cookies. I was required to log in again, and when I did, the "Load new messages" bubble was back.

Restoring my old cookies restored websocket functionality.
Comment 23 Philippe Normand 2020-09-24 06:16:35 PDT
When this happens the wssbackup.slack.com and wssprimary.slack.com keep returning the same error:

Data	Time
WebSocket Connection Established	1600953198.645637
{"type":"error","error":{"code":-1337,"msg":"invalid_auth"}} 1600953198.885279
	

Perhaps the involved auth cookies silently expire and are not renewed until the user clicks on the "load new messages" link though? Just wild-guessing here...
Comment 24 Link Dupont 2020-09-24 06:42:25 PDT
(In reply to Philippe Normand from comment #23)
> 
> Perhaps the involved auth cookies silently expire and are not renewed until
> the user clicks on the "load new messages" link though? Just wild-guessing
> here...

There's something else happening though. Clicking on the "load new messages" link doesn't bring back the old behaviour of receiving new messages automatically though. Clicking it loads new messages, and then you get no new messages until you click it again.
Comment 25 Philippe Normand 2020-09-24 06:44:29 PDT
Exactly, one you're into this issue, you're in forever :)
Comment 26 Carlos Garcia Campos 2020-09-25 06:59:13 PDT
I've found the problem, but not the solution yet, maybe Patrick can help. The problem is that we are not including all the cookies in the web socket request. There are two cookies with the Lax same-site policy set, that are not included for the websocket request because it's not considered a top site navigation and the hosts don't match (app.slack.com - wss-primary.slack.com). I don't know how same-site cookies work, so I don't know if we need to make an exception for websockets or consider them a top site navigation. Maybe it's a WebKit bug that should mark the websocket request as same site? Patrick?
Comment 27 Michael Catanzaro 2020-09-25 08:39:39 PDT
I don't think WebSockets should be treated as a top site navigation. That doesn't sound right.

(In reply to Carlos Garcia Campos from comment #26)
> Maybe it's a WebKit bug that should mark the websocket request as same site?

app.slack.com and wss-primary.slack.com have the same registrable domain (top privately-controlled domain), slack.com, so they should be treated as the same site for purposes of cookie policy. That is, wss-primary.slack.com should be allowed to receive cookies even if the request is SameSite=Lax or SameSite=Strict.

The cookie's Domain attribute of course has to match for the cookie to be sent in the request. In this case, Domain is set to .slack.com, so it does.

libsoup used to be stricter than any browser in using the full domain rather than registrable domain, but that was not web-compatible, so I changed it: https://gitlab.gnome.org/GNOME/libsoup/-/commit/d5952f2ff3a89e8ed19826ca3ace72078b9b1ed6. WebKit should match that behavior.

There's a complicated draft spec for this here: https://tools.ietf.org/html/draft-ietf-httpbis-rfc6265bis-03#section-5.2
Comment 28 Michael Catanzaro 2020-09-25 08:42:11 PDT
(In reply to Michael Catanzaro from comment #27)
> In this case, Domain is set to .slack.com, so it does.

(That means the cookie should be sent to every subdomain of slack.com. The leading dot is ignored.)

And Path has to match too, but here the Path is /, so it should be sent regardless of path.

So all signs point to this being our bug.
Comment 29 Michael Catanzaro 2020-09-25 08:50:28 PDT
I suspect cookie_is_valid_for_same_site_policy() in soup-cookie-jar.c, it does this:

return soup_host_matches_host (soup_uri_get_host (cookie_uri ? cookie_uri : top_level), soup_uri_get_host (uri));

with no calls to soup_tld_get_base_domain(). That looks wrong. In contrast, incoming_cookie_is_third_party() uses soup_tld_get_base_domain() before comparing domains.

That could return too many cookies, but the purpose of that function is only to check Same-Site policy, and there is other logic in get_cookies() to ensure Domain matches, so I think it should be safe to change.
Comment 30 Carlos Garcia Campos 2020-09-25 09:00:19 PDT
Created attachment 409698 [details]
Patch
Comment 31 Michael Catanzaro 2020-09-25 09:20:17 PDT
Comment on attachment 409698 [details]
Patch

Well... surprise. Any idea why this isn't broken for Safari?

Looking for other suspicious cases,there are a lot of places where we call setFirstPartyForCookies but not addSameSiteInfoToRequestIfNeeded: WebKitWebSourceGStreamer.cpp, SWServer.cpp, XSLTProcessor.cpp, NetworkCacheSpeculativeLoadManager.cpp, NetworkDataTaskCocoa.mm, NetworkDataTaskSoup.cpp, and WebProcessPool.cpp. I wonder how many of these are similar bugs and how many are covered by FrameLoader?

I also wonder why cookie_is_valid_for_same_site_policy() isn't causing this problem? We probably still need to look closer at that....
Comment 32 Carlos Garcia Campos 2020-09-26 00:39:25 PDT
(In reply to Michael Catanzaro from comment #31)
> Comment on attachment 409698 [details]
> Patch
> 
> Well... surprise. Any idea why this isn't broken for Safari?

Yes, I guess safari doesn't use the new websockets yet. In WebKit websockets implementation the request already includes the right cookies. See WebSocketHandshake::clientHandshakeRequest(). That includes the cookies using the given cookieRequestHeaderFieldValue function that ends up in CookieJar::cookieRequestHeaderFieldValue(Document& document, const URL& url) that does the right thing for same site cookies.

> Looking for other suspicious cases,there are a lot of places where we call
> setFirstPartyForCookies but not addSameSiteInfoToRequestIfNeeded:
> WebKitWebSourceGStreamer.cpp, SWServer.cpp, XSLTProcessor.cpp,
> NetworkCacheSpeculativeLoadManager.cpp, NetworkDataTaskCocoa.mm,
> NetworkDataTaskSoup.cpp, and WebProcessPool.cpp. I wonder how many of these
> are similar bugs and how many are covered by FrameLoader?

I haven't looked in detail, but I think we are sometimes calling setFirstPartyForCookies more than once for the same request.

> I also wonder why cookie_is_valid_for_same_site_policy() isn't causing this
> problem? We probably still need to look closer at that....

cookie_is_valid_for_same_site_policy does the right thing when it receives a site for cookies, in this case it receives the document url instead of the websocket one, and the hosts match.
Comment 33 Carlos Garcia Campos 2020-09-26 06:51:43 PDT
Committed r267620: <https://trac.webkit.org/changeset/267620>
Comment 34 Radar WebKit Bug Importer 2020-09-26 06:52:17 PDT
<rdar://problem/69623648>