Bug 141508 - [SOUP] Synchronous XMLHttpRequests can time out when we reach the max connections limit
Summary: [SOUP] Synchronous XMLHttpRequests can time out when we reach the max connect...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: Gtk
Depends on:
Blocks: 142164
  Show dependency treegraph
 
Reported: 2015-02-12 06:29 PST by Carlos Garcia Campos
Modified: 2015-03-03 11:05 PST (History)
6 users (show)

See Also:


Attachments
Patch (11.68 KB, patch)
2015-03-02 05:31 PST, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Updated patch (12.30 KB, patch)
2015-03-02 23:52 PST, Carlos Garcia Campos
svillar: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Garcia Campos 2015-02-12 06:29:04 PST
This is the same problem that bug #84560 tried to fix, but nowadays the browser is not frozen, but the sync requests end up timing out after the 10 seconds timeout used by default for sync loads. The patch that landed in bug #84560 was not enough. This is what is happening now:

 1. We are the max connections limit (normally the connections per host).
 2. A new sync load is started, we increase the max conns limit by 1. Let's say max conns per host was 6, now is 7. We run the nested main loop with a custom context to ensure next request is the only one processed.
 3. The request is correctly processed creating the new connection needed, now current num connection wold be 7.
 4. The syn request finishes, so we finish the nested main loop and the limits are decreased by one again. The mac conns per host is now 6, but current num conns is still 7. At this point the connection is still IN_USE, because the message is still in the queue.
 5. A new sync request is started, we increase the limits again, now max conns per host would be 7 again
 6. The new load is not processed because current num conns is still 7 and limit is also 7. The new connection we created for the previous request hasn't been dropped.
 7. Nothing is processed, because the nested main loop and the connections limit, so we keep doing nothing for 10 seconds until the timeout source is scheduled.

And this keeps happening all the time while there are more sync loads. This not only makes many things not working, but everything very slow, since we do nothing for 10 seconds every time this happen.

The first solution that I thought was dropping idle connections in libsoup after the conns limits is set when the new limit is less than the current number of connections. But when we decrease the limits (in step 4) the message is still running, it switches to finishing right after step 4, so the connection is still in IN_USE state. After step 4, the message is processed again and it switches to FINISHED, then it's unqueued and its connection is set to IDLE.

We might try to keep that approach, but decreasing the limits in an idle, but I'm not sure if it would be possible that before the idle, the connection is reused by another request waiting for a connection free, and set to IN_USE again. If this approach would be possible, we could even cancel the idle if a new sync request is started immediately and reuse the connection. But i'm not sure this is possible or reliable.

Any other solution that I can think of, would require new libsoup API, something like a message flag (IGNORE_CONNECTION_LIMITS, for example), that forces the message to create a new connection but only if we have reached the limit (otherwise we would reuse an idle connection unless there's another message flag or whatever preventing that). If a new connection is created for this particular message, we would drop that connection automatically when the message is unqueued (again we could do that after a short delay, just in case another message with that flag is queued).

What do you think? any other solution?
Comment 1 Dan Winship 2015-02-12 06:31:49 PST
might be able to fix it by running the inner loop until you get SoupSession::request-unqueued rather than just waiting for the message to finish?
Comment 2 Carlos Garcia Campos 2015-02-12 06:36:19 PST
(In reply to comment #1)
> might be able to fix it by running the inner loop until you get
> SoupSession::request-unqueued rather than just waiting for the message to
> finish?

Yes, I thought about a similar solution but using SoupMessage::finished, and then I realized that it's unqueued right after SoupMessage::finished. I didn't see the request-unqueued signal. That Would still require changes in libsoup, though, to drop idle connections when limits are changed.
Comment 3 Carlos Garcia Campos 2015-02-15 23:10:48 PST
(In reply to comment #2)
> (In reply to comment #1)
> > might be able to fix it by running the inner loop until you get
> > SoupSession::request-unqueued rather than just waiting for the message to
> > finish?
> 
> Yes, I thought about a similar solution but using SoupMessage::finished, and
> then I realized that it's unqueued right after SoupMessage::finished. I
> didn't see the request-unqueued signal. That Would still require changes in
> libsoup, though, to drop idle connections when limits are changed.

The problem is that the queue is not kicked when a message loaded from the cache switches to FINISHING state, so when using the nested main loop, since no other requests are processed (because of the different main context), the ::unqueue signal is never emitted and the main loop doesn't finish. I guess we should fix that in libsoup, but even fixing that would produce inconsistencies between resources loaded from the network and from the cache, because cached resources switch to FINISHING too early, so the message can be unqueued before we have finished reading the data stream, and the main loop could finish too early. 

The thing is that it doesn't seem to be possible to fix this without doing any change in libsoup, so I think a specific solution (something like the message flag I proposed) would work better, and it will be more reliable. This approach of increasing the limits and decreasing them after words looks very weak to me (and a bit hacky, TBH)
Comment 4 Carlos Garcia Campos 2015-02-17 00:21:19 PST
I've finally managed to make my test case work reliably by patching libsoup to kick the queue for cached resources after switching to FINISHING state and cleanup idle connections when limit properties change and are less than the current number of connections. In WebKit I changed the synchronous loader to finish the loop when the message is unqueued or when we finish loading the resource, if we are still reading the stream when the message is unqueued we wait for the didFinishLoading, and if the message hasn't been unqueued when we finish loading the data, then we wait for unqueued. 
I'm not happy with this solution, it looks tricky, hacky and weak to me, and it's very risky if something goes wrong, because the nested main loop might never end blocking the networking process forever (in multiprocess model this affects all the web processes). This is because the timeout for sync resources is cleared when the request finishes loading, but now we wait for the message to be unqueued to finish the loop.
Comment 5 Carlos Garcia Campos 2015-03-02 05:31:22 PST
Created attachment 247661 [details]
Patch

I think we should also bump the libsoup version in jhbuild when 2.50 is released.
Comment 6 WebKit Commit Bot 2015-03-02 05:33:33 PST
Attachment 247661 [details] did not pass style-queue:


ERROR: Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestResources.cpp:709:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestResources.cpp:711:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestResources.cpp:820:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 3 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 Carlos Garcia Campos 2015-03-02 23:52:33 PST
Created attachment 247751 [details]
Updated patch

Check for 2.49.91 version instead of 2.50 and bump the libsoup version in jhbuild.
Comment 8 WebKit Commit Bot 2015-03-02 23:54:01 PST
Attachment 247751 [details] did not pass style-queue:


ERROR: Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestResources.cpp:709:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestResources.cpp:711:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestResources.cpp:820:  This { should be at the end of the previous line  [whitespace/braces] [4]
Total errors found: 3 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Sergio Villar Senin 2015-03-03 01:00:35 PST
Comment on attachment 247751 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=247751&action=review

Nice!

> Source/WebCore/ChangeLog:10
> +        connections allowed if the soup version is recent enough.

Perhaps we should explain here why the increase/decrease was not a good idea sometimes.

> Source/WebCore/platform/network/soup/ResourceHandleSoup.cpp:935
> +    // Ignore the connection limits in synchronous loads to avoid freezing the networking process.

This comment doesn't add much as the code bellow is really explicit. I'd better include a link to the bug or a more detailed description of the issue. I'd prefer the former.

> Source/WebCore/platform/network/soup/ResourceHandleSoup.cpp:942
>      g_signal_connect(d->m_soupMessage.get(), "network-event", G_CALLBACK(networkEventCallback), handle);

As you're fixing some other stuff like override,final, etc... I think we could use this bug to also store the SoupMessage in a local variable and use it all along this method which is full of the ugly d->m_soupMessage.get().

> Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestResources.cpp:704
> +        GUniquePtr<char> xhr(g_strdup_printf("xhr = new XMLHttpRequest; xhr.open('GET', '/sync-request-on-max-conns-xhr%u', false); xhr.send();", i));

If it were me I'd have stored this string in a static variable putting each instruction in a different line for better readability. But since this is a unit test I could live with it.
Comment 10 Carlos Garcia Campos 2015-03-03 01:17:53 PST
Committed r180927: <http://trac.webkit.org/changeset/180927>
Comment 11 Carlos Garcia Campos 2015-03-03 01:19:24 PST
(In reply to comment #9)
> Comment on attachment 247751 [details]
> Updated patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=247751&action=review
> 
> Nice!

Thanks for the review.
 
> > Source/WebCore/ChangeLog:10
> > +        connections allowed if the soup version is recent enough.
> 
> Perhaps we should explain here why the increase/decrease was not a good idea
> sometimes.

Added more explanation there.

> > Source/WebCore/platform/network/soup/ResourceHandleSoup.cpp:935
> > +    // Ignore the connection limits in synchronous loads to avoid freezing the networking process.
> 
> This comment doesn't add much as the code bellow is really explicit. I'd
> better include a link to the bug or a more detailed description of the
> issue. I'd prefer the former.

Added the bug URL.

> > Source/WebCore/platform/network/soup/ResourceHandleSoup.cpp:942
> >      g_signal_connect(d->m_soupMessage.get(), "network-event", G_CALLBACK(networkEventCallback), handle);
> 
> As you're fixing some other stuff like override,final, etc... I think we
> could use this bug to also store the SoupMessage in a local variable and use
> it all along this method which is full of the ugly d->m_soupMessage.get().

Fixed the sync loader because I was changing that, but for more general cleanups I prefer to use a separate bug.

> > Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestResources.cpp:704
> > +        GUniquePtr<char> xhr(g_strdup_printf("xhr = new XMLHttpRequest; xhr.open('GET', '/sync-request-on-max-conns-xhr%u', false); xhr.send();", i));
> 
> If it were me I'd have stored this string in a static variable putting each
> instruction in a different line for better readability. But since this is a
> unit test I could live with it.

:-)
Comment 12 Carlos Garcia Campos 2015-03-03 02:15:46 PST
It seems libsoup 2.49.91 doesn't build in 32 bit, so the 32 bit bot is currently broken. I've filed a bug https://bugzilla.gnome.org/show_bug.cgi?id=745509. Once it's fixed we can just land a follow up patch to use the new revision in our jhbuild.