Bug 28687 - should not pass URI fragments to libsoup
Summary: should not pass URI fragments to libsoup
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-24 14:49 PDT by Dan Winship
Modified: 2009-09-08 13:07 PDT (History)
2 users (show)

See Also:


Attachments
dontpassfragment.patch (2.86 KB, patch)
2009-09-08 08:12 PDT, Xan Lopez
gustavo: review+
xan.lopez: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Winship 2009-08-24 14:49:59 PDT
(pointed out by reinouts on irc)

If you click on a link like "http://example.com/#comments" in WebKitGTK, it passes the whole URI (including the fragment) to libsoup. Normally libsoup doesn't pass the "#comments" to the remote server, but if you're using a proxy, then it does. And some servers will return a 404 or 403 in that case because they interpret the "#comments" as being part of the path.

The fact that libsoup's behavior changes depending on whether you use a proxy is a libsoup bug, but you could argue about which of the two behaviors is the buggy one. I think WebKit should not be passing fragments to libsoup; WebKit wants the whole page back, and so it should be requesting the whole page from libsoup, not just a portion of it. Also, it could be confusing if messages with URIs that are not soup_uri_equal() would result in identical requests, and this could end up tripping up things like caching, etc, as well. It seems wrong for "soup_message_get_uri()" to return a URI with a fragment in it, since the response body actually contains all fragments. But it would be wrong for SoupMessage to clear the fragment from the URI itself too, since that could trip up the app if it expected the URI to match what it had originally set it to.

So anyway, I think ResourceHandleSoup should strip the URI fragment from the URI as part of its process of canonicalizing it before passing it to libsoup.
Comment 1 Xan Lopez 2009-08-26 03:59:27 PDT
So, I've tried something like...

diff --git a/WebCore/platform/network/soup/ResourceRequestSoup.cpp b/WebCore/platform/network/soup/ResourceRequestSoup.cpp
index f2011bb..eec75e0 100644
--- a/WebCore/platform/network/soup/ResourceRequestSoup.cpp
+++ b/WebCore/platform/network/soup/ResourceRequestSoup.cpp
@@ -30,9 +30,20 @@ using namespace std;

 namespace WebCore {

+static char* stripFragmentFromURI(char* uri)
+{
+    char* ptr = g_strrstr(uri, "#");
+    if (ptr && *(ptr-1) != '&')
+        *ptr = '\0';
+
+    return uri;
+}
+
 SoupMessage* ResourceRequest::toSoupMessage() const
 {
-    SoupMessage* soupMessage = soup_message_new(httpMethod().utf8().data(), url().string().utf8().data());
+    GOwnPtr<char> uri(stripFragmentFromURI(g_strdup(url().string().utf8().data())));
+    SoupMessage* soupMessage = soup_message_new(httpMethod().utf8().data(),
+                                                const_cast<char*>(uri.get()));
     if (!soupMessage)
         return 0;

@@ -53,7 +64,7 @@ SoupMessage* ResourceRequest::toSoupMessage() const
 void ResourceRequest::updateFromSoupMessage(SoupMessage* soupMessage)
 {
     SoupURI* soupURI = soup_message_get_uri(soupMessage);
-    GOwnPtr<gchar> uri(soup_uri_to_string(soupURI, FALSE));
+    GOwnPtr<gchar> uri(stripFragmentFromURI(soup_uri_to_string(soupURI, FALSE)));
     m_url = KURL(KURL(), String::fromUTF8(uri.get()));

And all the http tests seem to still pass. Looks reasonable Dan? I first didn't check that the previous character is a & and failed one test that dealt with espaced hex stuff in the URL. Is there any function in soup/glib I should be using to strip the fragment from a string? (Other than going through SoupURI somehow, I suppose).
Comment 2 Dan Winship 2009-08-26 05:24:47 PDT
You probably want to operate on the parsed URL, not the string form.

In ResourceHandle::start(), you have:

    KURL url = request().url();
    String urlString = url.string();

and you could just add a call to url.removeFragmentIdentifier() in between. Likewise in updateFromSoupMessage. (Though I'm not sure if you want m_url to contain the fragment or not.)
Comment 3 Xan Lopez 2009-09-08 08:12:40 PDT
Created attachment 39187 [details]
dontpassfragment.patch

Don't pass fragments in URIs to libsoup.
Comment 4 Gustavo Noronha (kov) 2009-09-08 12:18:24 PDT
Comment on attachment 39187 [details]
dontpassfragment.patch

Looks good, thanks!
Comment 5 Xan Lopez 2009-09-08 13:07:43 PDT
Landed in r48178, closing.