Summary: | decodeURLEscapeSequences will unescape NULLs and will mangle not encodable characters | ||
---|---|---|---|
Product: | WebKit | Reporter: | Brett Wilson (Google) <brettw> |
Component: | Platform | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED CONFIGURATION CHANGED | ||
Severity: | Normal | CC: | abarth, annevk, darin, mjs, sam |
Priority: | P2 | Keywords: | InRadar |
Version: | 528+ (Nightly build) | ||
Hardware: | All | ||
OS: | All |
Description
Brett Wilson (Google)
2008-08-28 10:54:16 PDT
I can do a patch for this, but it would be nice to get some comments on my approach first. CCing folks who would know/feel-responsible-for URL handling in WebKit. Can we create a test that demonstrates these problems? Do these affect real websites?
We should make fixes based on effects rather than on critique of the code. I'd like to see us start with a test case. And it would be really great to have an example of at least one website that will work better if we make the code change.
I'm not fully convinced by the "embedded null characters are dangerous" argument. We don't have code that treats null characters as a special case, and I don't find the fact that other browsers had code like that to be a compelling argument. I'm more convinced by the "be consistent with IE" argument, though, so it may not be important what I think about the other argument. But also, you can change my mind if I'm wrong.
> This is actually a pretty big problem. If I'm on a Japanese page with a path
> encoded as ShiftJIS (escaped), if that page requests
> document.location.pathname, it will be wrong.
I'd like to understand exactly what "wrong" means here. We do need to match the behavior of other browsers.
But as you probably know, in general URLs don't necessarily correspond to a Unicode string. They are sent, byte for byte, to the server, and the server responds, so even URLs with invalid %-escape sequences might work on some sites and servers.
So I'm not sure exactly what the right behavior is for JavaScript functions that return pieces of URLs, since JavaScript strings are UTF-16, and not a stream of bytes. I'd need to see some evidence of what behavior is on the web and what websites expect in addition to compelling arguments about how things should work.
> We don't have code that treats null characters as a special case
I'm not arguing for or against this change, but Windows APIs do special-case NULLs, so routines that validate strings for using in Windows system calls often must understand the magical, string-terminating semantics of NULL.
(In reply to comment #5) > I'm not arguing for or against this change, but Windows APIs do special-case > NULLs, so routines that validate strings for using in Windows system calls > often must understand the magical, string-terminating semantics of NULL. Makes sense. Any code path that involves calling String::charactersWithNullTermination could run afoul of a null character, and although I can't find any examples of doing that with a URL or a piece of a URL it's possible there could be some. Another thought is that if we have a code path that has a problem with URLs with null characters, it can possibly be triggered by making a URL with JavaScript's String.fromCharCode(0). (In reply to comment #7) > Another thought is that if we have a code path that has a problem with URLs > with null characters, it can possibly be triggered by making a URL with > JavaScript's String.fromCharCode(0). I believe that will get escaped by KURL automatically. Sorry, I forgot to add the effect for the encoding problem: I have part of a layout test that demonstrates the path being messed up. If you have a page at "foo.com/asdf%F0" and you say document.location.pathname, it will give you "/asdf\xFFFD" (the %F0 being replaced by the Unicode "replacement character"). Although I don't know of any sites that break because of this, it's hard to argue that it's not wrong. |