WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
53850
Make WebKit's fragment canonicalization match other browsers
https://bugs.webkit.org/show_bug.cgi?id=53850
Summary
Make WebKit's fragment canonicalization match other browsers
Eric Seidel (no email)
Reported
2011-02-05 04:18:08 PST
Make WebKit's fragment cannonicalization match other browsers
Attachments
Patch
(12.08 KB, patch)
2011-02-05 04:44 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
updated patch
(12.82 KB, patch)
2011-02-06 11:33 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Patch for landing
(12.00 KB, patch)
2011-02-06 23:40 PST
,
Adam Barth
no flags
Details
Formatted Diff
Diff
a more conservative patch, not quite ready
(10.98 KB, patch)
2011-02-08 16:47 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Ready for re-review and commit
(12.66 KB, patch)
2011-02-08 21:40 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Patch for landing
(13.21 KB, patch)
2011-02-09 01:57 PST
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Eric Seidel (no email)
Comment 1
2011-02-05 04:44:55 PST
Created
attachment 81357
[details]
Patch
Eric Seidel (no email)
Comment 2
2011-02-05 04:46:06 PST
I'm not 100% convinced that Chrome's results (which we now match) are the best here. I explained things mostly in teh ChangeLOg.
https://github.com/abarth/url-spec/blob/master/tests/gurl-results/by-browser.txt
unfortunately has kinda old data and doesn't cover all of these tests for all browsers.
Adam Barth
Comment 3
2011-02-05 10:24:43 PST
Comment on
attachment 81357
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=81357&action=review
Some nits below. w.r.t. diverging from Firefox. I discussed this topic (in general) with some Firefox folks. They said they're willing to converge to match other browsers URL handling as long as we write a spec for how we think things ought to be. If we follow this trajectory, everyone should end up with interoperable behavior.
> Source/WebCore/platform/KURL.cpp:997 > + if (c == '\0') > + continue;
This part isn't correct. It's an artifact of how Brett did his testing in IE (which strips nulls at the HTML parser layer). It's hard to observe directly, but we should be able to observe via the DOM (not via the HTML parser).
> Source/WebCore/platform/KURL.cpp:1001 > + *p++ = '%'; > + *p++ = hexDigits[c >> 4]; > + *p++ = hexDigits[c & 0xF];
We should add an inline helper function to do this work instead of copy/pasting it. Really what we should do is a templated free function that takes a predicate for what to escape, if you want to get fancy.
Eric Seidel (no email)
Comment 4
2011-02-06 11:33:04 PST
Created
attachment 81418
[details]
updated patch
Adam Barth
Comment 5
2011-02-06 11:37:05 PST
Comment on
attachment 81418
[details]
updated patch View in context:
https://bugs.webkit.org/attachment.cgi?id=81418&action=review
> Source/WebCore/platform/KURL.cpp:1012 > + // FIXME: We should be decoding any non-ascii characters (above 0x80), > + // however at this point we don't have the full UChar anymore, just a char.
This FIXME isn't correct. The decoding has already happened at this point.
Adam Barth
Comment 6
2011-02-06 23:40:08 PST
Created
attachment 81452
[details]
Patch for landing
WebKit Commit Bot
Comment 7
2011-02-07 02:55:21 PST
Comment on
attachment 81452
[details]
Patch for landing Rejecting
attachment 81452
[details]
from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-3', 'build-..." exit_code: 2 Last 500 characters of output: .......................................................................... fast/css/content .. fast/css/counters ....................... fast/css/getComputedStyle ........................ fast/css/namespaces ........... fast/doctypes ............ fast/dom .............. fast/dom/anchor-getParameter.html -> failed Exiting early after 1 failures. 6834 tests run. 145.29s total testing time 6833 test cases (99%) succeeded 1 test case (<1%) had incorrect layout 4 test cases (<1%) had stderr output Full output:
http://queues.webkit.org/results/7699951
Eric Seidel (no email)
Comment 8
2011-02-08 00:48:55 PST
The patch must be wrong. We hit ASSERTs with this. fast/loader/data-url-encoding-html.html is one such test which asserts. I'll have to look at it tomrorow.
Eric Seidel (no email)
Comment 9
2011-02-08 02:07:38 PST
We're hitting: static void checkEncodedString(const String& url) { for (unsigned i = 0; i < url.length(); ++i) ASSERT(!(url[i] & ~0x7F)); // THIS ASSERT ASSERT(!url.length() || isSchemeFirstChar(url[0])); } Which is expected since we're now allowing characters over 127 in the URL. Instead of changing the ASSERT I think I'd rather just encode everything over 127 for now.
Eric Seidel (no email)
Comment 10
2011-02-08 13:15:12 PST
There is actually a bunch more to do here. About 10 tests fail with this applied.
Adam Barth
Comment 11
2011-02-08 14:06:07 PST
> Instead of changing the ASSERT I think I'd rather just encode everything over 127 for now.
Why? We're supposed to have unicode characters over 127 in the fragment.
Eric Seidel (no email)
Comment 12
2011-02-08 14:14:57 PST
(In reply to
comment #11
)
> > Instead of changing the ASSERT I think I'd rather just encode everything over 127 for now. > > Why? We're supposed to have unicode characters over 127 in the fragment.
By the time this fragment handling code ever runs we have chars instead of UChars. And checkEncodedString clearly assumes the url will only ever include ascii. :) Still working through the tests.
Alexey Proskuryakov
Comment 13
2011-02-08 14:26:20 PST
The idea behind checkEncodedString() was to make as stringent test for the fact that the string is indeed a parsed KURL as possible, catching cases where arbitrary URLs are passed in. But yes, KURL works with bytes here, not with UTF-16 strings.
Adam Barth
Comment 14
2011-02-08 14:36:13 PST
I'm not sure I understand. By bytes, do you mean UTF-8 strings? Canonicalized URLs can have non-ASCII characters in their fragment.
Alexey Proskuryakov
Comment 15
2011-02-08 15:16:26 PST
I think that we can get byte sequences that aren't UTF-8 in some KURL code paths as we decode percent escapes. I'm not very clear on what goes on, but I doubt we can assume that "char*" is UTF-8 in KURL code.
Eric Seidel (no email)
Comment 16
2011-02-08 16:47:39 PST
Created
attachment 81716
[details]
a more conservative patch, not quite ready
Adam Barth
Comment 17
2011-02-08 17:57:31 PST
Comment on
attachment 81716
[details]
a more conservative patch, not quite ready Ok. This is certainly a step in the right direction. We'll need to revisit this issue at some point.
Eric Seidel (no email)
Comment 18
2011-02-08 21:40:16 PST
Created
attachment 81744
[details]
Ready for re-review and commit
Eric Seidel (no email)
Comment 19
2011-02-08 21:42:21 PST
I believe this passes all the tests now. However its hard to tell, since it seems SL Debug builds still crash a bunch of tests in TOT. :(
Adam Barth
Comment 20
2011-02-08 22:05:31 PST
Comment on
attachment 81744
[details]
Ready for re-review and commit View in context:
https://bugs.webkit.org/attachment.cgi?id=81744&action=review
> Source/WebCore/platform/KURL.cpp:1008 > + // Strip CR, LF and Tab from fragments, per: > + //
https://bugs.webkit.org/show_bug.cgi?id=8770
How does GURL handle this case?
Eric Seidel (no email)
Comment 21
2011-02-08 23:05:18 PST
(In reply to
comment #20
)
> (From update of
attachment 81744
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=81744&action=review
> > > Source/WebCore/platform/KURL.cpp:1008 > > + // Strip CR, LF and Tab from fragments, per: > > + //
https://bugs.webkit.org/show_bug.cgi?id=8770
> > How does GURL handle this case?
I'm not sure. I did test to see that both Chrome and FF pass the test in question (which is about stripping these chars). I'm not sure how GURL does the actual stripping however.
Eric Seidel (no email)
Comment 22
2011-02-09 01:20:10 PST
(In reply to
comment #21
)
> (In reply to
comment #20
) > > (From update of
attachment 81744
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=81744&action=review
> > > > > Source/WebCore/platform/KURL.cpp:1008 > > > + // Strip CR, LF and Tab from fragments, per: > > > + //
https://bugs.webkit.org/show_bug.cgi?id=8770
> > > > How does GURL handle this case? > > I'm not sure. I did test to see that both Chrome and FF pass the test in question (which is about stripping these chars). I'm not sure how GURL does the actual stripping however.
GURL documents this here:
http://www.google.com/codesearch/p?hl=en#R_oU-jn3RFk/trunk/src/url_canon_query.cc&q=tab%20package:http://google-url%5C.googlecode%5C.com&l=40
But I haven't found the code which actually makes that happen yet. I'm not very familiar with GURL.
Eric Seidel (no email)
Comment 23
2011-02-09 01:22:31 PST
Here is the code in question:
http://www.google.com/codesearch/p?hl=en#R_oU-jn3RFk/trunk/src/url_canon_etc.cc&q=package:http://google-url%5C.googlecode%5C.com%200x20&l=275
Adam Barth
Comment 24
2011-02-09 01:26:25 PST
Comment on
attachment 81744
[details]
Ready for re-review and commit Ok, but we should add tests for these cases to fast/url/fragment.html
Eric Seidel (no email)
Comment 25
2011-02-09 01:46:25 PST
(In reply to
comment #24
)
> (From update of
attachment 81744
[details]
) > Ok, but we should add tests for these cases to fast/url/fragment.html
Ok. It's already covered by fast/loader/url-strip-cr-lf-tab.html, but I'm happy to add a test to fast/url/anchor.html.
Eric Seidel (no email)
Comment 26
2011-02-09 01:57:53 PST
Created
attachment 81774
[details]
Patch for landing
WebKit Commit Bot
Comment 27
2011-02-09 03:53:38 PST
Comment on
attachment 81774
[details]
Patch for landing Clearing flags on attachment: 81774 Committed
r78040
: <
http://trac.webkit.org/changeset/78040
>
WebKit Commit Bot
Comment 28
2011-02-09 03:53:45 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug