RESOLVED FIXED53850
Make WebKit's fragment canonicalization match other browsers
https://bugs.webkit.org/show_bug.cgi?id=53850
Summary Make WebKit's fragment canonicalization match other browsers
Eric Seidel (no email)
Reported 2011-02-05 04:18:08 PST
Make WebKit's fragment cannonicalization match other browsers
Attachments
Patch (12.08 KB, patch)
2011-02-05 04:44 PST, Eric Seidel (no email)
no flags
updated patch (12.82 KB, patch)
2011-02-06 11:33 PST, Eric Seidel (no email)
no flags
Patch for landing (12.00 KB, patch)
2011-02-06 23:40 PST, Adam Barth
no flags
a more conservative patch, not quite ready (10.98 KB, patch)
2011-02-08 16:47 PST, Eric Seidel (no email)
no flags
Ready for re-review and commit (12.66 KB, patch)
2011-02-08 21:40 PST, Eric Seidel (no email)
no flags
Patch for landing (13.21 KB, patch)
2011-02-09 01:57 PST, Eric Seidel (no email)
no flags
Eric Seidel (no email)
Comment 1 2011-02-05 04:44:55 PST
Eric Seidel (no email)
Comment 2 2011-02-05 04:46:06 PST
I'm not 100% convinced that Chrome's results (which we now match) are the best here. I explained things mostly in teh ChangeLOg. https://github.com/abarth/url-spec/blob/master/tests/gurl-results/by-browser.txt unfortunately has kinda old data and doesn't cover all of these tests for all browsers.
Adam Barth
Comment 3 2011-02-05 10:24:43 PST
Comment on attachment 81357 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=81357&action=review Some nits below. w.r.t. diverging from Firefox. I discussed this topic (in general) with some Firefox folks. They said they're willing to converge to match other browsers URL handling as long as we write a spec for how we think things ought to be. If we follow this trajectory, everyone should end up with interoperable behavior. > Source/WebCore/platform/KURL.cpp:997 > + if (c == '\0') > + continue; This part isn't correct. It's an artifact of how Brett did his testing in IE (which strips nulls at the HTML parser layer). It's hard to observe directly, but we should be able to observe via the DOM (not via the HTML parser). > Source/WebCore/platform/KURL.cpp:1001 > + *p++ = '%'; > + *p++ = hexDigits[c >> 4]; > + *p++ = hexDigits[c & 0xF]; We should add an inline helper function to do this work instead of copy/pasting it. Really what we should do is a templated free function that takes a predicate for what to escape, if you want to get fancy.
Eric Seidel (no email)
Comment 4 2011-02-06 11:33:04 PST
Created attachment 81418 [details] updated patch
Adam Barth
Comment 5 2011-02-06 11:37:05 PST
Comment on attachment 81418 [details] updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=81418&action=review > Source/WebCore/platform/KURL.cpp:1012 > + // FIXME: We should be decoding any non-ascii characters (above 0x80), > + // however at this point we don't have the full UChar anymore, just a char. This FIXME isn't correct. The decoding has already happened at this point.
Adam Barth
Comment 6 2011-02-06 23:40:08 PST
Created attachment 81452 [details] Patch for landing
WebKit Commit Bot
Comment 7 2011-02-07 02:55:21 PST
Comment on attachment 81452 [details] Patch for landing Rejecting attachment 81452 [details] from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-3', 'build-..." exit_code: 2 Last 500 characters of output: .......................................................................... fast/css/content .. fast/css/counters ....................... fast/css/getComputedStyle ........................ fast/css/namespaces ........... fast/doctypes ............ fast/dom .............. fast/dom/anchor-getParameter.html -> failed Exiting early after 1 failures. 6834 tests run. 145.29s total testing time 6833 test cases (99%) succeeded 1 test case (<1%) had incorrect layout 4 test cases (<1%) had stderr output Full output: http://queues.webkit.org/results/7699951
Eric Seidel (no email)
Comment 8 2011-02-08 00:48:55 PST
The patch must be wrong. We hit ASSERTs with this. fast/loader/data-url-encoding-html.html is one such test which asserts. I'll have to look at it tomrorow.
Eric Seidel (no email)
Comment 9 2011-02-08 02:07:38 PST
We're hitting: static void checkEncodedString(const String& url) { for (unsigned i = 0; i < url.length(); ++i) ASSERT(!(url[i] & ~0x7F)); // THIS ASSERT ASSERT(!url.length() || isSchemeFirstChar(url[0])); } Which is expected since we're now allowing characters over 127 in the URL. Instead of changing the ASSERT I think I'd rather just encode everything over 127 for now.
Eric Seidel (no email)
Comment 10 2011-02-08 13:15:12 PST
There is actually a bunch more to do here. About 10 tests fail with this applied.
Adam Barth
Comment 11 2011-02-08 14:06:07 PST
> Instead of changing the ASSERT I think I'd rather just encode everything over 127 for now. Why? We're supposed to have unicode characters over 127 in the fragment.
Eric Seidel (no email)
Comment 12 2011-02-08 14:14:57 PST
(In reply to comment #11) > > Instead of changing the ASSERT I think I'd rather just encode everything over 127 for now. > > Why? We're supposed to have unicode characters over 127 in the fragment. By the time this fragment handling code ever runs we have chars instead of UChars. And checkEncodedString clearly assumes the url will only ever include ascii. :) Still working through the tests.
Alexey Proskuryakov
Comment 13 2011-02-08 14:26:20 PST
The idea behind checkEncodedString() was to make as stringent test for the fact that the string is indeed a parsed KURL as possible, catching cases where arbitrary URLs are passed in. But yes, KURL works with bytes here, not with UTF-16 strings.
Adam Barth
Comment 14 2011-02-08 14:36:13 PST
I'm not sure I understand. By bytes, do you mean UTF-8 strings? Canonicalized URLs can have non-ASCII characters in their fragment.
Alexey Proskuryakov
Comment 15 2011-02-08 15:16:26 PST
I think that we can get byte sequences that aren't UTF-8 in some KURL code paths as we decode percent escapes. I'm not very clear on what goes on, but I doubt we can assume that "char*" is UTF-8 in KURL code.
Eric Seidel (no email)
Comment 16 2011-02-08 16:47:39 PST
Created attachment 81716 [details] a more conservative patch, not quite ready
Adam Barth
Comment 17 2011-02-08 17:57:31 PST
Comment on attachment 81716 [details] a more conservative patch, not quite ready Ok. This is certainly a step in the right direction. We'll need to revisit this issue at some point.
Eric Seidel (no email)
Comment 18 2011-02-08 21:40:16 PST
Created attachment 81744 [details] Ready for re-review and commit
Eric Seidel (no email)
Comment 19 2011-02-08 21:42:21 PST
I believe this passes all the tests now. However its hard to tell, since it seems SL Debug builds still crash a bunch of tests in TOT. :(
Adam Barth
Comment 20 2011-02-08 22:05:31 PST
Comment on attachment 81744 [details] Ready for re-review and commit View in context: https://bugs.webkit.org/attachment.cgi?id=81744&action=review > Source/WebCore/platform/KURL.cpp:1008 > + // Strip CR, LF and Tab from fragments, per: > + // https://bugs.webkit.org/show_bug.cgi?id=8770 How does GURL handle this case?
Eric Seidel (no email)
Comment 21 2011-02-08 23:05:18 PST
(In reply to comment #20) > (From update of attachment 81744 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=81744&action=review > > > Source/WebCore/platform/KURL.cpp:1008 > > + // Strip CR, LF and Tab from fragments, per: > > + // https://bugs.webkit.org/show_bug.cgi?id=8770 > > How does GURL handle this case? I'm not sure. I did test to see that both Chrome and FF pass the test in question (which is about stripping these chars). I'm not sure how GURL does the actual stripping however.
Eric Seidel (no email)
Comment 22 2011-02-09 01:20:10 PST
(In reply to comment #21) > (In reply to comment #20) > > (From update of attachment 81744 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=81744&action=review > > > > > Source/WebCore/platform/KURL.cpp:1008 > > > + // Strip CR, LF and Tab from fragments, per: > > > + // https://bugs.webkit.org/show_bug.cgi?id=8770 > > > > How does GURL handle this case? > > I'm not sure. I did test to see that both Chrome and FF pass the test in question (which is about stripping these chars). I'm not sure how GURL does the actual stripping however. GURL documents this here: http://www.google.com/codesearch/p?hl=en#R_oU-jn3RFk/trunk/src/url_canon_query.cc&q=tab%20package:http://google-url%5C.googlecode%5C.com&l=40 But I haven't found the code which actually makes that happen yet. I'm not very familiar with GURL.
Adam Barth
Comment 24 2011-02-09 01:26:25 PST
Comment on attachment 81744 [details] Ready for re-review and commit Ok, but we should add tests for these cases to fast/url/fragment.html
Eric Seidel (no email)
Comment 25 2011-02-09 01:46:25 PST
(In reply to comment #24) > (From update of attachment 81744 [details]) > Ok, but we should add tests for these cases to fast/url/fragment.html Ok. It's already covered by fast/loader/url-strip-cr-lf-tab.html, but I'm happy to add a test to fast/url/anchor.html.
Eric Seidel (no email)
Comment 26 2011-02-09 01:57:53 PST
Created attachment 81774 [details] Patch for landing
WebKit Commit Bot
Comment 27 2011-02-09 03:53:38 PST
Comment on attachment 81774 [details] Patch for landing Clearing flags on attachment: 81774 Committed r78040: <http://trac.webkit.org/changeset/78040>
WebKit Commit Bot
Comment 28 2011-02-09 03:53:45 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.