Bug 54090

Summary: KURL should remove default port numbers when canonicalizing urls
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: PlatformAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, brettw, commit-queue
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 66017    
Attachments:
Description Flags
Patch
none
Fixed ap and abarth's comments
none
Patch for landing
none
Patch for landing
none
Patch for landing
none
Patch for landing none

Description Eric Seidel (no email) 2011-02-09 04:35:41 PST
KURL should remove default port numbers when cannonicalizing urls (to match every other browser)
Comment 1 Eric Seidel (no email) 2011-02-09 04:49:47 PST
Created attachment 81790 [details]
Patch
Comment 2 Adam Barth 2011-02-09 11:12:20 PST
Comment on attachment 81790 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=81790&action=review

> Source/WebCore/platform/KURL.cpp:1105
> +static inline bool isDefaultPortForScheme(const char* port, size_t portLength, const char* scheme, size_t schemeLength)

Where did this list of schemes/ports come from?  I think Firefox has some sort of extensibility model where each URL provider annonces a scheme and a default port.  This is probably a good step though.

> Source/WebCore/platform/KURL.cpp:1122
> +    switch (schemeLength) {
> +    case 4:
> +        return !strncmp(scheme, "http", schemeLength) && !strncmp(port, "80", portLength);
> +    case 5:
> +        return !strncmp(scheme, "https", schemeLength) && !strncmp(port, "443", portLength);
> +    case 3:
> +        if (!strncmp(scheme, "ftp", schemeLength))
> +            return !strncmp(port, "43", portLength);
> +        if (!strncmp(scheme, "wss", schemeLength))
> +            return !strncmp(port, "443", portLength);
> +        break;
> +    case 6:
> +        return !strncmp(scheme, "gopher", schemeLength) && !strncmp(port, "70", portLength);
> +    case 2:
> +        return !strncmp(scheme, "ws", schemeLength) && !strncmp(port, "80", portLength);
> +    }

Can you put these in numerical origin?  Also, consider adding an explicit default case.
Comment 3 Alexey Proskuryakov 2011-02-09 11:43:56 PST
Let me find a dupe, I think there were some interesting comments there...
Comment 4 Alexey Proskuryakov 2011-02-09 12:11:02 PST
The duplicate is in Radar, <rdar://problem/7908207>. An interesting thing is that we should care about what's sent over HTTP more that we should care about DOM access to components - and the most common practical case where this occurs is slightly misconfigured servers that redirect to :80.

The code in ResourceHandleMac.mm that (I think) makes redirects work correctly is super subtle.

Specifically, here is how one broken site behaved:
1. Open http://www.rambler.ru/dict

Expected results: redirected to http://www.rambler.ru/dict/
Actual results: redirected to http://www.rambler.ru:80/dict/

Here, the server was adding :80 at the same time it was adding a trailing slash, so we ended up sending www.rambler.ru:80 as Host, as well as in Safari address bar. The server has been since fixed.

-FAIL canonicalize('//c:\\foo') should be http://c/foo. Was http://c:/foo.
+PASS canonicalize('//c:\\foo') is 'http://c/foo'

I think that you have an unrelated change in your patch that's affecting this test.

+// FIXME: This function is a horrible mess of copy/paste, but I don't know
+// of a better way to write it.
+static inline bool isDefaultPortForScheme(const char* port, size_t portLength, const char* scheme, size_t schemeLength)

Please don't wrap the comment. I personally don't think that code quality deserves a FIXME comment - a person capable of improving the code won't need a comment to see that. FIXMEs are more useful for subtle behavior bugs or questions.

+                const char *portEndPtr = url + portEnd;

Misplaced star.

I strongly encourage you to add HTTP test cases for this change. Not strongly enough to formally request that and mark the patch r-, but close to that :)
Comment 5 Adam Barth 2011-02-09 12:18:23 PST
> -FAIL canonicalize('//c:\\foo') should be http://c/foo. Was http://c:/foo.
> +PASS canonicalize('//c:\\foo') is 'http://c/foo'
> 
> I think that you have an unrelated change in your patch that's affecting this test.

You should also add a test for

//example.com:/foo
//example.com:
http://example.com:/foo
http://example.com:

I suspect you're canonicalizing away empty port number too, which seems beneficial (but should be verified).
Comment 6 Alexey Proskuryakov 2011-02-09 12:32:07 PST
When I put "//c:\\foo" in Firefox address bar (yeah, on Mac), it doesn't remove the colon. Perhaps this change should be discussed on its own merits in a separate bug.
Comment 7 Adam Barth 2011-02-09 12:36:04 PST
(In reply to comment #6)
> When I put "//c:\\foo" in Firefox address bar (yeah, on Mac), it doesn't remove the colon. Perhaps this change should be discussed on its own merits in a separate bug.

The address bar isn't a reliable way to test this behavior.  Also, note that because we're using shouldBe, the \\ gets unescaped to \.

I've collated a bunch of data points on the compatibility of URL parsing here:

https://github.com/abarth/url-spec/blob/master/tests/gurl-results/by-browser.txt

Although, I don't seem to have that particular test on that list.
Comment 8 Eric Seidel (no email) 2011-02-09 14:58:43 PST
(In reply to comment #2)
> (From update of attachment 81790 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=81790&action=review
> 
> > Source/WebCore/platform/KURL.cpp:1105
> > +static inline bool isDefaultPortForScheme(const char* port, size_t portLength, const char* scheme, size_t schemeLength)
> 
> Where did this list of schemes/ports come from?  I think Firefox has some sort of extensibility model where each URL provider annonces a scheme and a default port.  This is probably a good step though.

This came from GURL.  Since GURL is BSD and Google copyright, the copyright line didn't end up changing.

> Can you put these in numerical origin?  Also, consider adding an explicit default case.

I assumed Brett did that for perf.  Not sure.
Comment 9 Eric Seidel (no email) 2011-02-09 15:37:20 PST
(In reply to comment #4)
> -FAIL canonicalize('//c:\\foo') should be http://c/foo. Was http://c:/foo.
> +PASS canonicalize('//c:\\foo') is 'http://c/foo'
> 
> I think that you have an unrelated change in your patch that's affecting this test.

Nah.  That's just that our parser sees (correctly) "c" as the host name and the ":" as the : for port.  Its just like if someone wrote:
http://www.apple.com:/  I think we'd expect to remove the trailing colon there too.

> +// FIXME: This function is a horrible mess of copy/paste, but I don't know
> +// of a better way to write it.
> +static inline bool isDefaultPortForScheme(const char* port, size_t portLength, const char* scheme, size_t schemeLength)
> 
> Please don't wrap the comment. I personally don't think that code quality deserves a FIXME comment - a person capable of improving the code won't need a comment to see that. FIXMEs are more useful for subtle behavior bugs or questions.

I removed it and replace it with one which pointed out where I got the list from.

> +                const char *portEndPtr = url + portEnd;
> 
> Misplaced star.

Fixed.

> I strongly encourage you to add HTTP test cases for this change. Not strongly enough to formally request that and mark the patch r-, but close to that :)

What specific tests do you have in mind?  I'm not sure what your asking for, but happy to entertain the request.
Comment 10 Eric Seidel (no email) 2011-02-09 15:43:56 PST
Created attachment 81888 [details]
Fixed ap and abarth's comments
Comment 11 Alexey Proskuryakov 2011-02-09 15:46:37 PST
> What specific tests do you have in mind?  I'm not sure what your asking for, but happy to entertain the request.

Sorry, I now realize that we can't have http tests on port 80.
Comment 12 Eric Seidel (no email) 2011-02-09 16:01:20 PST
Btw, I checked FF, Chrome and Opera and they all do the colon-removal thing as tested in my most recent patch.  I don't have IE.
Comment 13 Adam Barth 2011-02-09 16:03:14 PST
Comment on attachment 81888 [details]
Fixed ap and abarth's comments

Ok.  Please check the HTTP redirect case manually before landing.
Comment 14 Eric Seidel (no email) 2011-02-09 16:04:25 PST
Happy to.  Can you explain how I do that?
Comment 15 Adam Barth 2011-02-09 16:14:59 PST
I don't fully understand, but consider an HTTP redirect with the following Location header:

Location: http://example.com:80/

I think he's worried that Safari will show the ":80" in the address bar or in the Host header when following the redirect.
Comment 16 Eric Seidel (no email) 2011-02-09 16:16:49 PST
(In reply to comment #15)
> I don't fully understand, but consider an HTTP redirect with the following Location header:
> 
> Location: http://example.com:80/
> 
> I think he's worried that Safari will show the ":80" in the address bar or in the Host header when following the redirect.

That seems the opposite of this change, since it's designed to remove :80 from urls.

This would be done with some sort of Apache settings or some custom php?
Comment 17 Adam Barth 2011-02-09 16:19:51 PST
I would just use netcat, but that's how I roll.
Comment 18 Alexey Proskuryakov 2011-02-09 16:22:47 PST
1. Add main.php to LayoutTests/http/tests:
<?php
  header('Location: http://localhost:80/foobar.html');
  header('HTTP/1.0 302 Found');
?>

2. run-webkit-httpd
3. sudo nc -l 80
4. Open http://127.0.0.1:8000/main.php in Safari.
5. Observe that a request that you get in nc has "localhost" in Host header field (not localhost:80).
6. Type back:
HTTP/1.1 200 OK

Hello world!

7. Observe that address in Safari address bar doesn't have the :80.

Repeat with and without your patch, in Safari, Chrome, and Firefox :)
Comment 19 Eric Seidel (no email) 2011-02-09 16:47:34 PST
(In reply to comment #18)
> 1. Add main.php to LayoutTests/http/tests:
> <?php
>   header('Location: http://localhost:80/foobar.html');
>   header('HTTP/1.0 302 Found');
> ?>
> 
> 2. run-webkit-httpd
> 3. sudo nc -l 80
> 4. Open http://127.0.0.1:8000/main.php in Safari.
> 5. Observe that a request that you get in nc has "localhost" in Host header field (not localhost:80).
> 6. Type back:
> HTTP/1.1 200 OK
> 
> Hello world!
> 
> 7. Observe that address in Safari address bar doesn't have the :80.
> 
> Repeat with and without your patch, in Safari, Chrome, and Firefox :)

Done.  Seems to work fine.  I never saw a :80.
Comment 20 Eric Seidel (no email) 2011-02-09 16:49:01 PST
Created attachment 81894 [details]
Patch for landing
Comment 21 Eric Seidel (no email) 2011-02-09 17:33:04 PST
Created attachment 81899 [details]
Patch for landing
Comment 22 WebKit Commit Bot 2011-02-10 04:26:16 PST
Comment on attachment 81899 [details]
Patch for landing

Rejecting attachment 81899 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-7', 'build-..." exit_code: 2

Last 500 characters of output:
..
fast/dom/Document ...................
fast/dom/Document/CaretRangeFromPoint ....
fast/dom/Element .................................
fast/dom/EntityReference .
fast/dom/Geolocation ...................................
fast/dom/HTMLAnchorElement ..
fast/dom/HTMLAnchorElement/set-href-attribute-host.html -> failed

Exiting early after 1 failures. 7245 tests run.
154.29s total testing time

7244 test cases (99%) succeeded
1 test case (<1%) had incorrect layout
4 test cases (<1%) had stderr output

Full output: http://queues.webkit.org/results/7884258
Comment 23 Eric Seidel (no email) 2011-02-10 04:45:02 PST
Two real failures.  I'll investigate after some sleep.
Comment 24 Eric Seidel (no email) 2011-02-10 23:21:21 PST
Created attachment 82107 [details]
Patch for landing
Comment 25 WebKit Commit Bot 2011-02-11 01:12:42 PST
Comment on attachment 82107 [details]
Patch for landing

Rejecting attachment 82107 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-8', 'land-a..." exit_code: 1

Last 500 characters of output:
Headers/wtf/CryptographicallyRandomNumber.h
r78321 = ba712d32d0ea52b479cd2b68230b941090e5296f (refs/remotes/trunk)
	M	Source/WebCore/dom/Document.h
	M	Source/WebCore/dom/Element.cpp
	M	Source/WebCore/dom/Element.h
	M	Source/WebCore/ChangeLog
	M	Source/WebCore/css/CSSStyleSelector.cpp
	M	Source/WebCore/css/CSSStyleSelector.h
r78322 = 61ac637b95e13bf64bf5aba239900de89a6ece25 (refs/remotes/trunk)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/trunk.

Full output: http://queues.webkit.org/results/7867605
Comment 26 WebKit Commit Bot 2011-02-11 01:33:04 PST
Comment on attachment 82107 [details]
Patch for landing

Rejecting attachment 82107 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=eseidel-cq-sf', 'la..." exit_code: 1

Last 500 characters of output:
cfd4132944da0f11ed4b684f3403b2e9e3737719
r78320 = 3043e530d84fb64e5a4521811d9c344b9d122625
r78321 = ba712d32d0ea52b479cd2b68230b941090e5296f
r78322 = 61ac637b95e13bf64bf5aba239900de89a6ece25
r78323 = aa5a9e22db2a1749c225bbc65d2ae02ed526ba8d
r78324 = 4d78e0888b2e0e58a821f1cb10cfd1b52b9080a6
Done rebuilding .git/svn/refs/remotes/origin/master/.rev_map.268f45cc-cd09-0410-ab3c-d52691b4dbfc
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/origin/master.

Full output: http://queues.webkit.org/results/7870604
Comment 27 Eric Seidel (no email) 2011-02-11 04:35:11 PST
Created attachment 82122 [details]
Patch for landing
Comment 28 WebKit Commit Bot 2011-02-11 05:23:42 PST
Comment on attachment 82122 [details]
Patch for landing

Rejecting attachment 82122 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-8', 'land-a..." exit_code: 1

Last 500 characters of output:
CoreSupport/FrameLoaderClientGtk.cpp
	M	Source/WebKit/gtk/ChangeLog
	M	Source/WebCore/ChangeLog
	M	Source/WebCore/accessibility/gtk/AccessibilityObjectWrapperAtk.cpp
r78331 = f47dc69db31dd8a1237a2f20563fb23636968430 (refs/remotes/trunk)
	M	Source/WebCore/ChangeLog
	M	Source/WebCore/platform/graphics/qt/GraphicsContextQt.cpp
r78332 = 1f1d3b758799641fac7067bfe7013e9a837282a2 (refs/remotes/trunk)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/trunk.

Full output: http://queues.webkit.org/results/7867647
Comment 29 Eric Seidel (no email) 2011-02-11 13:52:50 PST
Comment on attachment 82122 [details]
Patch for landing

Ha.  It's missing a reviewer.  What a dumb message from the cq.  I guess it got lost along the way.

@abarth: would you r+ again?
Comment 30 WebKit Commit Bot 2011-02-11 15:47:23 PST
Comment on attachment 82122 [details]
Patch for landing

Clearing flags on attachment: 82122

Committed r78383: <http://trac.webkit.org/changeset/78383>
Comment 31 WebKit Commit Bot 2011-02-11 15:47:29 PST
All reviewed patches have been landed.  Closing bug.
Comment 32 WebKit Commit Bot 2011-02-11 16:47:25 PST
The commit-queue encountered the following flaky tests while processing attachment 82122 [details]:

http/tests/security/xssAuditor/script-tag-post-control-char.html bug 54324 (author: dbates@webkit.org)
The commit-queue is continuing to process your patch.
Comment 33 Alexey Proskuryakov 2011-08-10 15:27:30 PDT
This patch broke mobile GMail authentication in some cases. I've filed bug 66017  about that.
Comment 34 Alexey Proskuryakov 2012-02-10 10:24:36 PST
Also likely broke visited link coloring when port number is present, see bug 78358.
Comment 35 Alexey Proskuryakov 2012-05-18 11:23:02 PDT
*** Bug 21979 has been marked as a duplicate of this bug. ***