Bug 201031 - [GTK][WPE] Some pages won't load and network process chews 100% of CPU
Summary: [GTK][WPE] Some pages won't load and network process chews 100% of CPU
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Nightly Build
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-22 01:35 PDT by Adrian Perez
Modified: 2019-08-29 01:05 PDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Perez 2019-08-22 01:35:10 PDT
This can be reproduced reliably with MiniBrowser (also Epiphany)
with any recent build from “trunk”, opening the following URL:

  https://travis-ci.com/Igalia/cog

If you have the Web Inspector open while trying to load the page,
you will see that the main resource and most “normal” resources
(CSS, JS, etc.) load fine, then at some point two XHRs (one to
“*.statuspage.io” and another to “api.travis-ci.com”) seem to be
stalled and no doing any progress — it is unclear to me at the
moment whether these are the root cause or not, though. Sometimes
other resources may show as stalled in the inspector as well, if
their load has started *afterwards* (not necessarily XHRs).

Once this point is reached, the network process spins up to 100%
CPU time usage, and it does not seem to handle requests anymore:
this can be easily noticed by opening a second tab and trying to
load some other page (spoiler: it won't load). Killing the network
process will force spawning of a new one, which makes the browser
usable again — but of course the Travis-CI page is not usable,
because it failed to load some resources earlier and its network
process (the killed one) is gone.

In general, any Travis-CI project overview page triggers it, and
as far as I can remember this has been happening for at least
two or three weeks (maybe a bit more). I just now found how to
reliably trigger the problem.
Comment 1 Adrian Perez 2019-08-22 01:38:48 PDT
I have a mild suspicion that this could be related somehow to the
usage of the new libsoup WebSockets API (bug #199151), so I have
added Carlos García in CC. I might have some time to bisect this
in a few days, in case nobody has any idea what the root cause
might be =)
Comment 2 Carlos Garcia Campos 2019-08-27 05:19:30 PDT
The problem is related to the GLib source priorities. It seems we are reading a lot of data from WebSockets and the pollable source used by libsoup is taking precedence over the other IO sources that happen in the network process. Note that we use AsyncIONetwork priority for IO in the network process, that's 100 for GTK and 10 for WPE, but the libsoup WebSocket source is created with the default priority (0).
Comment 3 Carlos Garcia Campos 2019-08-27 06:04:11 PDT
Ok, it's not actually reading a lot of data, it's that we always get G_IO_ERROR_WOULD_BLOCK and the source callback is executed again and again
Comment 4 Carlos Garcia Campos 2019-08-28 03:08:43 PDT
This is a libsoup issue in the end, it will be fixed by MR https://gitlab.gnome.org/GNOME/libsoup/merge_requests/91
Comment 5 Adrian Perez 2019-08-29 01:05:28 PDT
(In reply to Carlos Garcia Campos from comment #4)
> This is a libsoup issue in the end, it will be fixed by MR
> https://gitlab.gnome.org/GNOME/libsoup/merge_requests/91

Thanks for taking a look at this, yay! \o/