Summary: | window.location.href and others needlessly decodes URI-encoded characters | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Andrei Popescu <andreip> | ||||||||||||||||
Component: | DOM | Assignee: | Erik Arvidsson <arv> | ||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||
Severity: | Normal | CC: | abarth, ap, arv, ayao, benm, darin, dglazkov, ericu, erlingalf, evn, fishd, hudsonr, jashkenas, jeffcz, michiel, ossy, rniwa, ryan, sam, steveblock, tom, webkit.review.bot | ||||||||||||||||
Priority: | P2 | ||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||
Hardware: | All | ||||||||||||||||||
OS: | All | ||||||||||||||||||
Bug Depends on: | 69511 | ||||||||||||||||||
Bug Blocks: | |||||||||||||||||||
Attachments: |
|
Description
Andrei Popescu
2009-10-08 11:20:05 PDT
Created attachment 40895 [details]
Make KURL::prettyUrl (used by window.location.href) not URI-decode the URI path.
Created attachment 40896 [details]
Make KURL::prettyUrl (used by window.location.href) not URI-decode the URI path.
Forgot to mark the previous upload as 'patch'.
Comment on attachment 40896 [details]
Make KURL::prettyUrl (used by window.location.href) not URI-decode the URI path.
You ChangeLog is duplicated.
I don't really like pathRaw(), maybe encodedPath()? Or should path be decodedPath()?
fast/dom/Window/Location/location-path-has-is-unmodified.htm is sorta a js test, except you didn't generate the tempate using make-script-test-wrappers.
Why does this need to be run from onload? It needs to wait for the iframe to load? Will body onload do that?
(If this needs to wait for onload, then it won't work quite right as a js test, so maybe that's why you ddid it this way.)
+ correctValue = "path%2Dwith-escape%2Dcharacters.html";
+ shouldBe("result", "correctValue");
shouldBeEqualToString("normalizeURL(String(window.frames[0].location.href))", "path%2Dwith-escape%2Dcharacters.html");
would have been better.
Does this actually need to load a valid resource to work?
This should not change the behavior of prettyUrl. You can add a new function with the desired behavior and possibly remove prettyUrl if no one uses it. It doesn't make sense to have a “pretty” URL function that does not decode the path! You should look at what window.location.href needs and think about what the right name is for it. (In reply to comment #3) > (From update of attachment 40896 [details]) > You ChangeLog is duplicated. > > I don't really like pathRaw(), maybe encodedPath()? Or should path be > decodedPath()? > Actually, I think that using the conversion to String operator is enough. This returns the string that was built in the KURL::parse() method. > fast/dom/Window/Location/location-path-has-is-unmodified.htm is sorta a js > test, except you didn't generate the tempate using make-script-test-wrappers. > > Why does this need to be run from onload? It needs to wait for the iframe to > load? Will body onload do that? > Yes. > (If this needs to wait for onload, then it won't work quite right as a js test, > so maybe that's why you ddid it this way.) > That's right. In fact, I just used the pattern I found in the fast/dom/HTMLObjectElement/object-as-frame.html test that I had to modify. Note that that test also uses an onload handler to test the window.location.href property of an iframe. > + correctValue = "path%2Dwith-escape%2Dcharacters.html"; > + shouldBe("result", "correctValue"); > > shouldBeEqualToString("normalizeURL(String(window.frames[0].location.href))", > "path%2Dwith-escape%2Dcharacters.html"); > would have been better. > Done. > Does this actually need to load a valid resource to work? Actually it doesn't if I use a data URL. Done. Hi Darin, (In reply to comment #4) > This should not change the behavior of prettyUrl. You can add a new function > with the desired behavior and possibly remove prettyUrl if no one uses it. It > doesn't make sense to have a “pretty” URL function that does not decode the > path! > You're right, it would not be "pretty" anymore. prettyURL() is used by the Console so I cannot remove it. > You should look at what window.location.href needs and think about what the > right name is for it. I looked and, if I am right, then the String conversion operator of KURL should be enough, so I do not need to add anything new. This operator calls KURL::string(), which just returns the string that was the output of the parse method. So simply removing the call to prettyURL in Location.cpp seems to do the job and all the layout tests pass. Thanks, Andrei (In reply to comment #6) > I looked and, if I am right, then the String conversion operator of KURL should > be enough, so I do not need to add anything new. This operator calls > KURL::string(), which just returns the string that was the output of the parse > method. So simply removing the call to prettyURL in Location.cpp seems to do > the job and all the layout tests pass. Sounds good. The way to prove this is with tests. Think about cases where you think this might fail, and write a new test or tests that covers as many cases as possible. At least one test for each distinct thing prettyURL used to do, and also tests based on guesses you might have about what could go wrong. And then make sure you run the tests in other browsers too. (In reply to comment #7) > (In reply to comment #6) > > I looked and, if I am right, then the String conversion operator of KURL should > > be enough, so I do not need to add anything new. This operator calls > > KURL::string(), which just returns the string that was the output of the parse > > method. So simply removing the call to prettyURL in Location.cpp seems to do > > the job and all the layout tests pass. > > Sounds good. > > The way to prove this is with tests. Think about cases where you think this > might fail, and write a new test or tests that covers as many cases as > possible. At least one test for each distinct thing prettyURL used to do, and > also tests based on guesses you might have about what could go wrong. And then > make sure you run the tests in other browsers too. I was just about to upload a new patch but I will hold off for now and add more tests. Thanks for the comments! Andrei Please fix this bug as it makes it impossible to distinguish URLs like /%25 and /%2525 in a cross-browser manner. Adam, have you looked into this issue while planning your URL work? (In reply to comment #10) > Adam, have you looked into this issue while planning your URL work? I haven't. I've been studying the behavior of <a href>. In general, it makes sense to canonicalize URLs before returning them to web content. For example, returning % unescaped is clearly problematic because % is a control character in URLs. For what it is worth HTMLAnchorElement href and other URL DOM bindings use KURL::string which seems to do the right thing. Changing Location::href to use KURL::string as well seems to solve this particular bug without having to update KURL. Chiming in with another request to please fix this issue -- it has the side effect of effectively making the new pushState() API unusable in Safari. If your URL happens to have encoded data in it, like say, the name of a user-entered field -- it's impossible to distinguish their "/" in the title from a true "/', because the auto-decoding makes them appear the same to window.location.pathname. Thanks. I'll look at this in connection with the prettyURL audit. My investigation in Bug 61201 seems to indicate that all callers of prettyURL are wrong. I'm going to wrap that up first and then continue on to this bug. Many thanks, Adam. -- I very much appreciate you taking a look at this issue. Taking... My intention is to make Safari match the other browsers here Created attachment 104257 [details]
Patch
This makes Safari a whole lot more compatible with IE, Firefox and Opera which all return the encoded form for these things. Sorry about the incomplete ChangeLog files, I forgot that webkit-patch does not pick up changes to these things. Comment on attachment 104257 [details]
Patch
This patch is great.
> Sorry about the incomplete ChangeLog files, I forgot that webkit-patch does not pick up changes to these things. (I should also say that I'm assuming you wrote a nice ChangeLog entry that you'll include when landing the patch.) Comment on attachment 104257 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=104257&action=review > Source/WebCore/page/Location.cpp:68 > - const KURL& url = this->url(); > - // FIXME: Stop using deprecatedString(): https://bugs.webkit.org/show_bug.cgi?id=30225 > - return url.hasPath() ? url.deprecatedString() : url.deprecatedString() + "/"; > + return this->url().string(); I don’t see any coverage of "http://foo.com", which is a case that I’d expect to be affected by the hasPath check. Is that covered and I just missed it? The old code also might do things like omit a "@" or a ":" where the new code would keep it. Is that covered? > Source/WebCore/platform/KURL.cpp:604 > - return decodeURLEscapeSequences(m_string.substring(start, m_hostEnd - start)); > + return m_string.substring(start, m_hostEnd - start); This change seems entirely separate from the Location.href/toString one. Is there sufficient test coverage? I don’t see tests where hostnames are seen URL-encoded above. > Source/WebCore/platform/KURL.cpp:729 > - return decodeURLEscapeSequences(m_string.substring(m_portEnd, m_pathEnd - m_portEnd)); > + return m_string.substring(m_portEnd, m_pathEnd - m_portEnd); Are you sure that all callers of path want the URL-escaped form? For example, if I was going to take a file URL’s path, encode as UTF-8 and use it to access a file, I would not want it URL-escaped. Is there any code that does that? Comment on attachment 104257 [details] Patch Attachment 104257 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9419246 New failing tests: fast/dom/javascript-backslash.html fast/dom/HTMLObjectElement/object-as-frame.html Comment on attachment 104257 [details]
Patch
Looks like this will break DOMFileSystemBase::crackFileSystemURL
Fixing the location object seems fine, but the KURL changes are likely to have effects elsewhere. Please thoroughly investigate this before landing the change. Looks like this will also break createGlobalHDropContent in ClipboardWin.cpp Looks like this breaks KURL::fileSystemPath in KURLQt.cpp This might break FrameLoader::defaultObjectContentType when the path extension has characters that are URL-encoded in it. Comment on attachment 104257 [details]
Patch
Darin's comment make me less enthusiastic about this patch. :)
Comment on attachment 104257 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=104257&action=review >> Source/WebCore/page/Location.cpp:68 >> + return this->url().string(); > > I don’t see any coverage of "http://foo.com", which is a case that I’d expect to be affected by the hasPath check. Is that covered and I just missed it? > > The old code also might do things like omit a "@" or a ":" where the new code would keep it. Is that covered? http://trac.webkit.org/browser/trunk/LayoutTests/fast/url/script-tests/standard-url.js is a good place to add this test. It is covered there, but in an oblique way. A direct test would be better. Thanks Darin. I did expect this to need more work. I'll look into these issues. Another option is to introduce KURL::encodedPath() and KURL::encodedHost() Darin, for the record Chromium has used encoded host() and path() for KURL since 1.0 without any currently known issues so I'm hoping I can get this to be the case for KURL in non Chromium code as well. ClipboardWin.cpp and KURLQt.cpp aren't used by Chromium, so it's important to make sure we don't break these. I'm less worried about FrameLoader::defaultObjectContentType, although it would be nice to have a test. DOMFileSystemBase::crackFileSystemURL is used by Chromium, so if this change breaks that function, there must be something going on that I don't fully understand. (In reply to comment #33) > DOMFileSystemBase::crackFileSystemURL is used by Chromium, so if this change breaks that function, there must be something going on that I don't fully understand. I believe this patch changes the behavior of that function so that file extensions that are percent-escaped are be handled differently from the same file extensions when not percent-escaped. I don’t know whether file extensions that are percent-escaped are legal or not. So, for example if the extension was ".%6Apeg" instead of ".jpeg". (In reply to comment #34) > (In reply to comment #33) > > DOMFileSystemBase::crackFileSystemURL is used by Chromium, so if this change breaks that function, there must be something going on that I don't fully understand. > > I believe this patch changes the behavior of that function so that file extensions that are percent-escaped are be handled differently from the same file extensions when not percent-escaped. I don’t know whether file extensions that are percent-escaped are legal or not. > > So, for example if the extension was ".%6Apeg" instead of ".jpeg". The filesystem should support files named .%6Apeg and .jpeg, and should see them as distinct. It currently has some bugs in the area, which I'm working on with https://bugs.webkit.org/show_bug.cgi?id=62811. If you call toURL() on a FileEntry called ".%6Apeg", you should get "filesystem:http://your.origin.com/temporary/.%256Apeg". My comment is not about whether the old or new behavior is correct. My comment is about the fact that this patch changes behavior, and is not fixing a bug with a test case. I think a patch targeted at window.location should fix specifically window.location. Fixing the underlying function without testing all the call sites is unnecessarily dangerous. Yes, definitely. I asked EricU to look at this bug to help us understand what the correct behavior is and to see how it related to the other bug he's working on. (In reply to comment #37) > I asked EricU to look at this bug to help us understand what the correct behavior is and to see how it related to the other bug he's working on. Thanks EricU. I appreciate having the information! Created attachment 108568 [details]
Patch
Sorry for not getting back to this sooner. I ended up decoding the path in the places where it was expecting decoded values like before. I only tested this on Safari/Mac so I expect some Chromium failures. I'll look at the bots when they come in. Comment on attachment 108568 [details] Patch Attachment 108568 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9842119 New failing tests: fast/loader/subframe-navigate-during-main-frame-load.html fast/history/history-back-initial-vs-final-url.html fast/url/standard-url.html Created attachment 109710 [details]
Patch
(In reply to comment #42) > Created an attachment (id=109710) [details] > Patch There are still KURL issues that this patch does not fix but (fragments encodes non ascii characters for example) but it does fix the path issue and cleans up the non hierarchical URLs too. Comment on attachment 109710 [details] Patch Attachment 109710 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9958092 New failing tests: fast/url/standard-url.html Adam, the bot output does not include "standard-url" and it passing locally (built and tested with --chromium). Any advice? You're probably testing on a Mac. I suspect there are different expectations for standard URL on Windows and then Linux needs to further override the windows expectations. Notice that the fallback path for chromium-linux goes through chromium-win: https://docs.google.com/drawings/d/1z65SKkWrD4Slm6jobIphHwwRADyUtjOAxwGBVKBY8Kc/edit?hl=en_US Comment on attachment 109710 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=109710&action=review > LayoutTests/fast/dom/anchor-origin-expected.txt:12 > +data:text/html,foo => null This shows the test is now broken. The URL has <b> tags in it, but now they disappear. We should fix the test so the <b> tags don’t disappear. I understand they should not be %-escaped, but it’s bad to leave this test in a broken state. > LayoutTests/fast/url/script-tests/file-http-base.js:39 > - ["file:///C:/asdf#\\xc2", "file:///C:/asdf#\\xef\\xbf\\xbd"], > + ["file:///C:/asdf#\xc2", "file:///C:/asdf#\xc2"], I don’t understand why you changed what’s being tested here. Changing the result is one thing, but you changed the test input URL as well. Don’t we still want to test that case? Maybe we should test both? > LayoutTests/fast/url/script-tests/standard-url.js:64 > + ["javascript:alert(\\t 1 \\n\\r)", "javascript:alert( 1 )"], Is this really an expected, good test result? Why are the spaces being collapsed here? > LayoutTests/fast/url/segments.html:4 > -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> > +<!DOCTYPE html> > <html> > <head> > +<meta charset="utf-8"> Why this change? Change log doesn’t mention it. > Source/WebCore/fileapi/DOMFileSystemBase.cpp:46 > +#include "KURL.h" This file already uses KURL. No need to add a new include. > Source/WebCore/loader/FrameLoader.cpp:75 > +#include "KURL.h" This file already uses KURL. No need to add a new include. > Source/WebCore/page/Location.cpp:154 > String Location::toString() const > { > - if (!m_frame) > - return String(); > - > - const KURL& url = this->url(); > - // FIXME: Stop using deprecatedString(): https://bugs.webkit.org/show_bug.cgi?id=30225 > - return url.hasPath() ? url.deprecatedString() : url.deprecatedString() + "/"; > + return href(); > } If this is just turning around and calling another function, then lets inline this in the header. > Source/WebCore/platform/win/ClipboardWin.cpp:43 > +#include "KURL.h" No need to add this include. The file already used KURL member functions so it must already have been including this. > Source/WebCore/workers/WorkerLocation.cpp:80 > String WorkerLocation::toString() const > { > - // FIXME: Stop using deprecatedString(): https://bugs.webkit.org/show_bug.cgi?id=30225 > - return m_url.hasPath() ? m_url.deprecatedString() : m_url.deprecatedString() + "/"; > + return href(); > } Same here. Comment on attachment 109710 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=109710&action=review >> LayoutTests/fast/dom/anchor-origin-expected.txt:12 >> +data:text/html,foo => null > > This shows the test is now broken. The URL has <b> tags in it, but now they disappear. We should fix the test so the <b> tags don’t disappear. I understand they should not be %-escaped, but it’s bad to leave this test in a broken state. Fixed. >> LayoutTests/fast/url/script-tests/file-http-base.js:39 >> + ["file:///C:/asdf#\xc2", "file:///C:/asdf#\xc2"], > > I don’t understand why you changed what’s being tested here. Changing the result is one thing, but you changed the test input URL as well. Don’t we still want to test that case? Maybe we should test both? Fixed. I added a new test line here too. >> LayoutTests/fast/url/script-tests/standard-url.js:64 >> + ["javascript:alert(\\t 1 \\n\\r)", "javascript:alert( 1 )"], > > Is this really an expected, good test result? Why are the spaces being collapsed here? This is how all other browsers behave. >> LayoutTests/fast/url/segments.html:4 >> +<meta charset="utf-8"> > > Why this change? Change log doesn’t mention it. The test script is now UTF-8. I added a comment to the ChangeLog. >> Source/WebCore/page/Location.cpp:154 >> } > > If this is just turning around and calling another function, then lets inline this in the header. Fixed. Created attachment 109887 [details]
Patch
Comment on attachment 109887 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=109887&action=review > LayoutTests/ChangeLog:23 > + * fast/url/segments.html: Use UT-8 Typo Created attachment 109895 [details]
Patch for landing
Comment on attachment 109895 [details] Patch for landing Clearing flags on attachment: 109895 Committed r96779: <http://trac.webkit.org/changeset/96779> All reviewed patches have been landed. Closing bug. It seems like this patch caused fast/history/history-back-initial-vs-final-url.html to fail on Qt: http://build.webkit.org/results/Qt%20Linux%20Release/r96779%20(38208)/results.html (In reply to comment #55) > It seems like this patch caused fast/history/history-back-initial-vs-final-url.html to fail on Qt: > http://build.webkit.org/results/Qt%20Linux%20Release/r96779%20(38208)/results.html http://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/qt/DumpRenderTreeQt.cpp#L834 Needs to be changed to QString url = item.url().toString(); since the extra encoding is no longer needed to match KURL. I can make this change tomorrow when I get in to the office. Fix landed in http://trac.webkit.org/changeset/96789. But unfortunately there is one more bug: https://bugs.webkit.org/show_bug.cgi?id=69511 *** Bug 22899 has been marked as a duplicate of this bug. *** *** Bug 76320 has been marked as a duplicate of this bug. *** Has this fix made its way into Safari 5.1.5? Using that version of Safari, if I go to this page: http://limpet.net/mbrubeck/temp/chromium-location-test.html/foo%20bar?foo%20bar The non-query part of the location is still being decoded. *** Bug 83485 has been marked as a duplicate of this bug. *** *** Bug 66836 has been marked as a duplicate of this bug. *** |