53850 – Make WebKit's fragment canonicalization match other browsers

RESOLVED FIXED53850

Make WebKit's fragment canonicalization match other browsers

https://bugs.webkit.org/show_bug.cgi?id=53850

Summary Make WebKit's fragment canonicalization match other browsers

Eric Seidel (no email)

Reported 2011-02-05 04:18:08 PST

Make WebKit's fragment cannonicalization match other browsers

Attachments
Patch (12.08 KB, patch) 2011-02-05 04:44 PST, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
updated patch (12.82 KB, patch) 2011-02-06 11:33 PST, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch for landing (12.00 KB, patch) 2011-02-06 23:40 PST, Adam Barth	no flags	Details Formatted Diff Diff
a more conservative patch, not quite ready (10.98 KB, patch) 2011-02-08 16:47 PST, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Ready for re-review and commit (12.66 KB, patch) 2011-02-08 21:40 PST, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch for landing (13.21 KB, patch) 2011-02-09 01:57 PST, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2011-02-05 04:44:55 PST

Created attachment 81357 [details] Patch

Eric Seidel (no email)

Comment 2 2011-02-05 04:46:06 PST

I'm not 100% convinced that Chrome's results (which we now match) are the best here. I explained things mostly in teh ChangeLOg. https://github.com/abarth/url-spec/blob/master/tests/gurl-results/by-browser.txt unfortunately has kinda old data and doesn't cover all of these tests for all browsers.

Adam Barth

Comment 3 2011-02-05 10:24:43 PST

Comment on attachment 81357 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=81357&action=review Some nits below. w.r.t. diverging from Firefox. I discussed this topic (in general) with some Firefox folks. They said they're willing to converge to match other browsers URL handling as long as we write a spec for how we think things ought to be. If we follow this trajectory, everyone should end up with interoperable behavior. > Source/WebCore/platform/KURL.cpp:997 > + if (c == '\0') > + continue; This part isn't correct. It's an artifact of how Brett did his testing in IE (which strips nulls at the HTML parser layer). It's hard to observe directly, but we should be able to observe via the DOM (not via the HTML parser). > Source/WebCore/platform/KURL.cpp:1001 > + *p++ = '%'; > + *p++ = hexDigits[c >> 4]; > + *p++ = hexDigits[c & 0xF]; We should add an inline helper function to do this work instead of copy/pasting it. Really what we should do is a templated free function that takes a predicate for what to escape, if you want to get fancy.

Eric Seidel (no email)

Comment 4 2011-02-06 11:33:04 PST

Created attachment 81418 [details] updated patch

Adam Barth

Comment 5 2011-02-06 11:37:05 PST

Comment on attachment 81418 [details] updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=81418&action=review > Source/WebCore/platform/KURL.cpp:1012 > + // FIXME: We should be decoding any non-ascii characters (above 0x80), > + // however at this point we don't have the full UChar anymore, just a char. This FIXME isn't correct. The decoding has already happened at this point.

Adam Barth

Comment 6 2011-02-06 23:40:08 PST

Created attachment 81452 [details] Patch for landing

WebKit Commit Bot

Comment 7 2011-02-07 02:55:21 PST

Comment on attachment 81452 [details] Patch for landing Rejecting attachment 81452 [details] from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-3', 'build-..." exit_code: 2 Last 500 characters of output: .......................................................................... fast/css/content .. fast/css/counters ....................... fast/css/getComputedStyle ........................ fast/css/namespaces ........... fast/doctypes ............ fast/dom .............. fast/dom/anchor-getParameter.html -> failed Exiting early after 1 failures. 6834 tests run. 145.29s total testing time 6833 test cases (99%) succeeded 1 test case (<1%) had incorrect layout 4 test cases (<1%) had stderr output Full output: http://queues.webkit.org/results/7699951

Eric Seidel (no email)

Comment 8 2011-02-08 00:48:55 PST

The patch must be wrong. We hit ASSERTs with this. fast/loader/data-url-encoding-html.html is one such test which asserts. I'll have to look at it tomrorow.

Eric Seidel (no email)

Comment 9 2011-02-08 02:07:38 PST

We're hitting: static void checkEncodedString(const String& url) { for (unsigned i = 0; i < url.length(); ++i) ASSERT(!(url[i] & ~0x7F)); // THIS ASSERT ASSERT(!url.length() || isSchemeFirstChar(url[0])); } Which is expected since we're now allowing characters over 127 in the URL. Instead of changing the ASSERT I think I'd rather just encode everything over 127 for now.

Eric Seidel (no email)

Comment 10 2011-02-08 13:15:12 PST

There is actually a bunch more to do here. About 10 tests fail with this applied.

Adam Barth

Comment 11 2011-02-08 14:06:07 PST

> Instead of changing the ASSERT I think I'd rather just encode everything over 127 for now. Why? We're supposed to have unicode characters over 127 in the fragment.

Eric Seidel (no email)

Comment 12 2011-02-08 14:14:57 PST

(In reply to comment #11) > > Instead of changing the ASSERT I think I'd rather just encode everything over 127 for now. > > Why? We're supposed to have unicode characters over 127 in the fragment. By the time this fragment handling code ever runs we have chars instead of UChars. And checkEncodedString clearly assumes the url will only ever include ascii. :) Still working through the tests.

Alexey Proskuryakov

Comment 13 2011-02-08 14:26:20 PST

The idea behind checkEncodedString() was to make as stringent test for the fact that the string is indeed a parsed KURL as possible, catching cases where arbitrary URLs are passed in. But yes, KURL works with bytes here, not with UTF-16 strings.

Adam Barth

Comment 14 2011-02-08 14:36:13 PST

I'm not sure I understand. By bytes, do you mean UTF-8 strings? Canonicalized URLs can have non-ASCII characters in their fragment.

Alexey Proskuryakov

Comment 15 2011-02-08 15:16:26 PST

I think that we can get byte sequences that aren't UTF-8 in some KURL code paths as we decode percent escapes. I'm not very clear on what goes on, but I doubt we can assume that "char*" is UTF-8 in KURL code.

Eric Seidel (no email)

Comment 16 2011-02-08 16:47:39 PST

Created attachment 81716 [details] a more conservative patch, not quite ready

Adam Barth

Comment 17 2011-02-08 17:57:31 PST

Comment on attachment 81716 [details] a more conservative patch, not quite ready Ok. This is certainly a step in the right direction. We'll need to revisit this issue at some point.

Eric Seidel (no email)

Comment 18 2011-02-08 21:40:16 PST

Created attachment 81744 [details] Ready for re-review and commit

Eric Seidel (no email)

Comment 19 2011-02-08 21:42:21 PST

I believe this passes all the tests now. However its hard to tell, since it seems SL Debug builds still crash a bunch of tests in TOT. :(

Adam Barth

Comment 20 2011-02-08 22:05:31 PST

Comment on attachment 81744 [details] Ready for re-review and commit View in context: https://bugs.webkit.org/attachment.cgi?id=81744&action=review > Source/WebCore/platform/KURL.cpp:1008 > + // Strip CR, LF and Tab from fragments, per: > + // https://bugs.webkit.org/show_bug.cgi?id=8770 How does GURL handle this case?

Eric Seidel (no email)

Comment 21 2011-02-08 23:05:18 PST

(In reply to comment #20) > (From update of attachment 81744 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=81744&action=review > > > Source/WebCore/platform/KURL.cpp:1008 > > + // Strip CR, LF and Tab from fragments, per: > > + // https://bugs.webkit.org/show_bug.cgi?id=8770 > > How does GURL handle this case? I'm not sure. I did test to see that both Chrome and FF pass the test in question (which is about stripping these chars). I'm not sure how GURL does the actual stripping however.

Eric Seidel (no email)

Comment 22 2011-02-09 01:20:10 PST

(In reply to comment #21) > (In reply to comment #20) > > (From update of attachment 81744 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=81744&action=review > > > > > Source/WebCore/platform/KURL.cpp:1008 > > > + // Strip CR, LF and Tab from fragments, per: > > > + // https://bugs.webkit.org/show_bug.cgi?id=8770 > > > > How does GURL handle this case? > > I'm not sure. I did test to see that both Chrome and FF pass the test in question (which is about stripping these chars). I'm not sure how GURL does the actual stripping however. GURL documents this here: http://www.google.com/codesearch/p?hl=en#R_oU-jn3RFk/trunk/src/url_canon_query.cc&q=tab%20package:http://google-url%5C.googlecode%5C.com&l=40 But I haven't found the code which actually makes that happen yet. I'm not very familiar with GURL.

Eric Seidel (no email)

Comment 23 2011-02-09 01:22:31 PST

Here is the code in question: http://www.google.com/codesearch/p?hl=en#R_oU-jn3RFk/trunk/src/url_canon_etc.cc&q=package:http://google-url%5C.googlecode%5C.com%200x20&l=275

Adam Barth

Comment 24 2011-02-09 01:26:25 PST

Comment on attachment 81744 [details] Ready for re-review and commit Ok, but we should add tests for these cases to fast/url/fragment.html

Eric Seidel (no email)

Comment 25 2011-02-09 01:46:25 PST

(In reply to comment #24) > (From update of attachment 81744 [details]) > Ok, but we should add tests for these cases to fast/url/fragment.html Ok. It's already covered by fast/loader/url-strip-cr-lf-tab.html, but I'm happy to add a test to fast/url/anchor.html.

Eric Seidel (no email)

Comment 26 2011-02-09 01:57:53 PST

Created attachment 81774 [details] Patch for landing

WebKit Commit Bot

Comment 27 2011-02-09 03:53:38 PST

Comment on attachment 81774 [details] Patch for landing Clearing flags on attachment: 81774 Committed r78040: <http://trac.webkit.org/changeset/78040>

WebKit Commit Bot

Comment 28 2011-02-09 03:53:45 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Other

OS OS X 10.5

Product WebKit

Component New Bugs

Assignee

Eric Seidel (no email)

Reported

2011-02-05 04:18 PST

Modified

2011-02-09 03:53 PST History

CC List

3 users Show

URL

Keywords

Depends on

Blocks