Bug 54090 - KURL should remove default port numbers when canonicalizing urls
Summary: KURL should remove default port numbers when canonicalizing urls
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords: InRadar
: 21979 (view as bug list)
Depends on:
Blocks: 66017
  Show dependency treegraph
 
Reported: 2011-02-09 04:35 PST by Eric Seidel (no email)
Modified: 2012-05-18 11:23 PDT (History)
4 users (show)

See Also:


Attachments
Patch (9.30 KB, patch)
2011-02-09 04:49 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Fixed ap and abarth's comments (10.05 KB, patch)
2011-02-09 15:43 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch for landing (9.96 KB, patch)
2011-02-09 16:49 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch for landing (10.02 KB, patch)
2011-02-09 17:33 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch for landing (16.42 KB, patch)
2011-02-10 23:21 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch for landing (16.39 KB, patch)
2011-02-11 04:35 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2011-02-09 04:35:41 PST
KURL should remove default port numbers when cannonicalizing urls (to match every other browser)
Comment 1 Eric Seidel (no email) 2011-02-09 04:49:47 PST
Created attachment 81790 [details]
Patch
Comment 2 Adam Barth 2011-02-09 11:12:20 PST
Comment on attachment 81790 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=81790&action=review

> Source/WebCore/platform/KURL.cpp:1105
> +static inline bool isDefaultPortForScheme(const char* port, size_t portLength, const char* scheme, size_t schemeLength)

Where did this list of schemes/ports come from?  I think Firefox has some sort of extensibility model where each URL provider annonces a scheme and a default port.  This is probably a good step though.

> Source/WebCore/platform/KURL.cpp:1122
> +    switch (schemeLength) {
> +    case 4:
> +        return !strncmp(scheme, "http", schemeLength) && !strncmp(port, "80", portLength);
> +    case 5:
> +        return !strncmp(scheme, "https", schemeLength) && !strncmp(port, "443", portLength);
> +    case 3:
> +        if (!strncmp(scheme, "ftp", schemeLength))
> +            return !strncmp(port, "43", portLength);
> +        if (!strncmp(scheme, "wss", schemeLength))
> +            return !strncmp(port, "443", portLength);
> +        break;
> +    case 6:
> +        return !strncmp(scheme, "gopher", schemeLength) && !strncmp(port, "70", portLength);
> +    case 2:
> +        return !strncmp(scheme, "ws", schemeLength) && !strncmp(port, "80", portLength);
> +    }

Can you put these in numerical origin?  Also, consider adding an explicit default case.
Comment 3 Alexey Proskuryakov 2011-02-09 11:43:56 PST
Let me find a dupe, I think there were some interesting comments there...
Comment 4 Alexey Proskuryakov 2011-02-09 12:11:02 PST
The duplicate is in Radar, <rdar://problem/7908207>. An interesting thing is that we should care about what's sent over HTTP more that we should care about DOM access to components - and the most common practical case where this occurs is slightly misconfigured servers that redirect to :80.

The code in ResourceHandleMac.mm that (I think) makes redirects work correctly is super subtle.

Specifically, here is how one broken site behaved:
1. Open http://www.rambler.ru/dict

Expected results: redirected to http://www.rambler.ru/dict/
Actual results: redirected to http://www.rambler.ru:80/dict/

Here, the server was adding :80 at the same time it was adding a trailing slash, so we ended up sending www.rambler.ru:80 as Host, as well as in Safari address bar. The server has been since fixed.

-FAIL canonicalize('//c:\\foo') should be http://c/foo. Was http://c:/foo.
+PASS canonicalize('//c:\\foo') is 'http://c/foo'

I think that you have an unrelated change in your patch that's affecting this test.

+// FIXME: This function is a horrible mess of copy/paste, but I don't know
+// of a better way to write it.
+static inline bool isDefaultPortForScheme(const char* port, size_t portLength, const char* scheme, size_t schemeLength)

Please don't wrap the comment. I personally don't think that code quality deserves a FIXME comment - a person capable of improving the code won't need a comment to see that. FIXMEs are more useful for subtle behavior bugs or questions.

+                const char *portEndPtr = url + portEnd;

Misplaced star.

I strongly encourage you to add HTTP test cases for this change. Not strongly enough to formally request that and mark the patch r-, but close to that :)
Comment 5 Adam Barth 2011-02-09 12:18:23 PST
> -FAIL canonicalize('//c:\\foo') should be http://c/foo. Was http://c:/foo.
> +PASS canonicalize('//c:\\foo') is 'http://c/foo'
> 
> I think that you have an unrelated change in your patch that's affecting this test.

You should also add a test for

//example.com:/foo
//example.com:
http://example.com:/foo
http://example.com:

I suspect you're canonicalizing away empty port number too, which seems beneficial (but should be verified).
Comment 6 Alexey Proskuryakov 2011-02-09 12:32:07 PST
When I put "//c:\\foo" in Firefox address bar (yeah, on Mac), it doesn't remove the colon. Perhaps this change should be discussed on its own merits in a separate bug.
Comment 7 Adam Barth 2011-02-09 12:36:04 PST
(In reply to comment #6)
> When I put "//c:\\foo" in Firefox address bar (yeah, on Mac), it doesn't remove the colon. Perhaps this change should be discussed on its own merits in a separate bug.

The address bar isn't a reliable way to test this behavior.  Also, note that because we're using shouldBe, the \\ gets unescaped to \.

I've collated a bunch of data points on the compatibility of URL parsing here:

https://github.com/abarth/url-spec/blob/master/tests/gurl-results/by-browser.txt

Although, I don't seem to have that particular test on that list.
Comment 8 Eric Seidel (no email) 2011-02-09 14:58:43 PST
(In reply to comment #2)
> (From update of attachment 81790 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=81790&action=review
> 
> > Source/WebCore/platform/KURL.cpp:1105
> > +static inline bool isDefaultPortForScheme(const char* port, size_t portLength, const char* scheme, size_t schemeLength)
> 
> Where did this list of schemes/ports come from?  I think Firefox has some sort of extensibility model where each URL provider annonces a scheme and a default port.  This is probably a good step though.

This came from GURL.  Since GURL is BSD and Google copyright, the copyright line didn't end up changing.

> Can you put these in numerical origin?  Also, consider adding an explicit default case.

I assumed Brett did that for perf.  Not sure.
Comment 9 Eric Seidel (no email) 2011-02-09 15:37:20 PST
(In reply to comment #4)
> -FAIL canonicalize('//c:\\foo') should be http://c/foo. Was http://c:/foo.
> +PASS canonicalize('//c:\\foo') is 'http://c/foo'
> 
> I think that you have an unrelated change in your patch that's affecting this test.

Nah.  That's just that our parser sees (correctly) "c" as the host name and the ":" as the : for port.  Its just like if someone wrote:
http://www.apple.com:/  I think we'd expect to remove the trailing colon there too.

> +// FIXME: This function is a horrible mess of copy/paste, but I don't know
> +// of a better way to write it.
> +static inline bool isDefaultPortForScheme(const char* port, size_t portLength, const char* scheme, size_t schemeLength)
> 
> Please don't wrap the comment. I personally don't think that code quality deserves a FIXME comment - a person capable of improving the code won't need a comment to see that. FIXMEs are more useful for subtle behavior bugs or questions.

I removed it and replace it with one which pointed out where I got the list from.

> +                const char *portEndPtr = url + portEnd;
> 
> Misplaced star.

Fixed.

> I strongly encourage you to add HTTP test cases for this change. Not strongly enough to formally request that and mark the patch r-, but close to that :)

What specific tests do you have in mind?  I'm not sure what your asking for, but happy to entertain the request.
Comment 10 Eric Seidel (no email) 2011-02-09 15:43:56 PST
Created attachment 81888 [details]
Fixed ap and abarth's comments
Comment 11 Alexey Proskuryakov 2011-02-09 15:46:37 PST
> What specific tests do you have in mind?  I'm not sure what your asking for, but happy to entertain the request.

Sorry, I now realize that we can't have http tests on port 80.
Comment 12 Eric Seidel (no email) 2011-02-09 16:01:20 PST
Btw, I checked FF, Chrome and Opera and they all do the colon-removal thing as tested in my most recent patch.  I don't have IE.
Comment 13 Adam Barth 2011-02-09 16:03:14 PST
Comment on attachment 81888 [details]
Fixed ap and abarth's comments

Ok.  Please check the HTTP redirect case manually before landing.
Comment 14 Eric Seidel (no email) 2011-02-09 16:04:25 PST
Happy to.  Can you explain how I do that?
Comment 15 Adam Barth 2011-02-09 16:14:59 PST
I don't fully understand, but consider an HTTP redirect with the following Location header:

Location: http://example.com:80/

I think he's worried that Safari will show the ":80" in the address bar or in the Host header when following the redirect.
Comment 16 Eric Seidel (no email) 2011-02-09 16:16:49 PST
(In reply to comment #15)
> I don't fully understand, but consider an HTTP redirect with the following Location header:
> 
> Location: http://example.com:80/
> 
> I think he's worried that Safari will show the ":80" in the address bar or in the Host header when following the redirect.

That seems the opposite of this change, since it's designed to remove :80 from urls.

This would be done with some sort of Apache settings or some custom php?
Comment 17 Adam Barth 2011-02-09 16:19:51 PST
I would just use netcat, but that's how I roll.
Comment 18 Alexey Proskuryakov 2011-02-09 16:22:47 PST
1. Add main.php to LayoutTests/http/tests:
<?php
  header('Location: http://localhost:80/foobar.html');
  header('HTTP/1.0 302 Found');
?>

2. run-webkit-httpd
3. sudo nc -l 80
4. Open http://127.0.0.1:8000/main.php in Safari.
5. Observe that a request that you get in nc has "localhost" in Host header field (not localhost:80).
6. Type back:
HTTP/1.1 200 OK

Hello world!

7. Observe that address in Safari address bar doesn't have the :80.

Repeat with and without your patch, in Safari, Chrome, and Firefox :)
Comment 19 Eric Seidel (no email) 2011-02-09 16:47:34 PST
(In reply to comment #18)
> 1. Add main.php to LayoutTests/http/tests:
> <?php
>   header('Location: http://localhost:80/foobar.html');
>   header('HTTP/1.0 302 Found');
> ?>
> 
> 2. run-webkit-httpd
> 3. sudo nc -l 80
> 4. Open http://127.0.0.1:8000/main.php in Safari.
> 5. Observe that a request that you get in nc has "localhost" in Host header field (not localhost:80).
> 6. Type back:
> HTTP/1.1 200 OK
> 
> Hello world!
> 
> 7. Observe that address in Safari address bar doesn't have the :80.
> 
> Repeat with and without your patch, in Safari, Chrome, and Firefox :)

Done.  Seems to work fine.  I never saw a :80.
Comment 20 Eric Seidel (no email) 2011-02-09 16:49:01 PST
Created attachment 81894 [details]
Patch for landing
Comment 21 Eric Seidel (no email) 2011-02-09 17:33:04 PST
Created attachment 81899 [details]
Patch for landing
Comment 22 WebKit Commit Bot 2011-02-10 04:26:16 PST
Comment on attachment 81899 [details]
Patch for landing

Rejecting attachment 81899 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-7', 'build-..." exit_code: 2

Last 500 characters of output:
..
fast/dom/Document ...................
fast/dom/Document/CaretRangeFromPoint ....
fast/dom/Element .................................
fast/dom/EntityReference .
fast/dom/Geolocation ...................................
fast/dom/HTMLAnchorElement ..
fast/dom/HTMLAnchorElement/set-href-attribute-host.html -> failed

Exiting early after 1 failures. 7245 tests run.
154.29s total testing time

7244 test cases (99%) succeeded
1 test case (<1%) had incorrect layout
4 test cases (<1%) had stderr output

Full output: http://queues.webkit.org/results/7884258
Comment 23 Eric Seidel (no email) 2011-02-10 04:45:02 PST
Two real failures.  I'll investigate after some sleep.
Comment 24 Eric Seidel (no email) 2011-02-10 23:21:21 PST
Created attachment 82107 [details]
Patch for landing
Comment 25 WebKit Commit Bot 2011-02-11 01:12:42 PST
Comment on attachment 82107 [details]
Patch for landing

Rejecting attachment 82107 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-8', 'land-a..." exit_code: 1

Last 500 characters of output:
Headers/wtf/CryptographicallyRandomNumber.h
r78321 = ba712d32d0ea52b479cd2b68230b941090e5296f (refs/remotes/trunk)
	M	Source/WebCore/dom/Document.h
	M	Source/WebCore/dom/Element.cpp
	M	Source/WebCore/dom/Element.h
	M	Source/WebCore/ChangeLog
	M	Source/WebCore/css/CSSStyleSelector.cpp
	M	Source/WebCore/css/CSSStyleSelector.h
r78322 = 61ac637b95e13bf64bf5aba239900de89a6ece25 (refs/remotes/trunk)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/trunk.

Full output: http://queues.webkit.org/results/7867605
Comment 26 WebKit Commit Bot 2011-02-11 01:33:04 PST
Comment on attachment 82107 [details]
Patch for landing

Rejecting attachment 82107 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=eseidel-cq-sf', 'la..." exit_code: 1

Last 500 characters of output:
cfd4132944da0f11ed4b684f3403b2e9e3737719
r78320 = 3043e530d84fb64e5a4521811d9c344b9d122625
r78321 = ba712d32d0ea52b479cd2b68230b941090e5296f
r78322 = 61ac637b95e13bf64bf5aba239900de89a6ece25
r78323 = aa5a9e22db2a1749c225bbc65d2ae02ed526ba8d
r78324 = 4d78e0888b2e0e58a821f1cb10cfd1b52b9080a6
Done rebuilding .git/svn/refs/remotes/origin/master/.rev_map.268f45cc-cd09-0410-ab3c-d52691b4dbfc
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/origin/master.

Full output: http://queues.webkit.org/results/7870604
Comment 27 Eric Seidel (no email) 2011-02-11 04:35:11 PST
Created attachment 82122 [details]
Patch for landing
Comment 28 WebKit Commit Bot 2011-02-11 05:23:42 PST
Comment on attachment 82122 [details]
Patch for landing

Rejecting attachment 82122 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-8', 'land-a..." exit_code: 1

Last 500 characters of output:
CoreSupport/FrameLoaderClientGtk.cpp
	M	Source/WebKit/gtk/ChangeLog
	M	Source/WebCore/ChangeLog
	M	Source/WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp
r78331 = f47dc69db31dd8a1237a2f20563fb23636968430 (refs/remotes/trunk)
	M	Source/WebCore/ChangeLog
	M	Source/WebCore/platform/graphics/qt/GraphicsContextQt.cpp
r78332 = 1f1d3b758799641fac7067bfe7013e9a837282a2 (refs/remotes/trunk)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/trunk.

Full output: http://queues.webkit.org/results/7867647
Comment 29 Eric Seidel (no email) 2011-02-11 13:52:50 PST
Comment on attachment 82122 [details]
Patch for landing

Ha.  It's missing a reviewer.  What a dumb message from the cq.  I guess it got lost along the way.

@abarth: would you r+ again?
Comment 30 WebKit Commit Bot 2011-02-11 15:47:23 PST
Comment on attachment 82122 [details]
Patch for landing

Clearing flags on attachment: 82122

Committed r78383: <http://trac.webkit.org/changeset/78383>
Comment 31 WebKit Commit Bot 2011-02-11 15:47:29 PST
All reviewed patches have been landed.  Closing bug.
Comment 32 WebKit Commit Bot 2011-02-11 16:47:25 PST
The commit-queue encountered the following flaky tests while processing attachment 82122 [details]:

http/tests/security/xssAuditor/script-tag-post-control-char.html bug 54324 (author: dbates@webkit.org)
The commit-queue is continuing to process your patch.
Comment 33 Alexey Proskuryakov 2011-08-10 15:27:30 PDT
This patch broke mobile GMail authentication in some cases. I've filed bug 66017  about that.
Comment 34 Alexey Proskuryakov 2012-02-10 10:24:36 PST
Also likely broke visited link coloring when port number is present, see bug 78358.
Comment 35 Alexey Proskuryakov 2012-05-18 11:23:02 PDT
*** Bug 21979 has been marked as a duplicate of this bug. ***