Bug 30241

Summary: Inconsistent URL encoding/decoding of JavaScript URLs.
Product: WebKit Reporter: Daniel Bates <dbates>
Component: WebCore Misc.Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 37641    
Attachments:
Description Flags
Example none

Description Daniel Bates 2009-10-08 17:12:32 PDT
Created attachment 40919 [details]
Example

JavaScript URLs that are URL encoded via FrameLoader::completeURL are not properly decoded before eventually being passed to both the XSSAuditor and ScriptController::evaluate, because the method KURL::decodeURLEscapeSequences is NOT the inverse function of KURL::parse().

In particular, this occurs in FrameLoader::requestFrame:
http://trac.webkit.org/browser/trunk/WebCore/loader/FrameLoader.cpp#L348
where the completeURL() is called on |scriptURL| before it is passed to frame->loader()->executeIfJavaScriptURL().

Remarks:
The call flow of FrameLoader::completeURL is:
FrameLoader::completeURL -> Document::completeURL -> KURL::KURL(const KURL& base, const String& relative, ...) -> KURL::init - > KURL::parse

The issue is that KURL::parse uses the method KURL::appendEscapingBadChars, which as its name implies escapes only bad characters.

One such bad character is the space character. Consider the JavaScript URL, "javascript: '%0A'" (*). Calling KURL::parse on this (directly or implicitly via one of the functions in the above call chain) will result in a KURL object that represents the URL, "javascript:%20'%46'" (**). Notice, this result differs from the fully URL encoded result of "javascript:%20%27%2546%27". Decoding the string form of (**) using KURL::decodeURLEscapeSequences produces the result: "javascript: 'F'". Clearly, this is not the inverse of the (**).
Comment 1 Daniel Bates 2009-10-08 17:21:22 PDT
(*) should be "javascript: '%46'"

(In reply to comment #0)
> Created an attachment (id=40919) [details]
> Example
> 
> JavaScript URLs that are URL encoded via FrameLoader::completeURL are not
> properly decoded before eventually being passed to both the XSSAuditor and
> ScriptController::evaluate, because the method KURL::decodeURLEscapeSequences
> is NOT the inverse function of KURL::parse().
> 
> In particular, this occurs in FrameLoader::requestFrame:
> http://trac.webkit.org/browser/trunk/WebCore/loader/FrameLoader.cpp#L348
> where the completeURL() is called on |scriptURL| before it is passed to
> frame->loader()->executeIfJavaScriptURL().
> 
> Remarks:
> The call flow of FrameLoader::completeURL is:
> FrameLoader::completeURL -> Document::completeURL -> KURL::KURL(const KURL&
> base, const String& relative, ...) -> KURL::init - > KURL::parse
> 
> The issue is that KURL::parse uses the method KURL::appendEscapingBadChars,
> which as its name implies escapes only bad characters.
> 
> One such bad character is the space character. Consider the JavaScript URL,
> "javascript: '%0A'" (*). Calling KURL::parse on this (directly or implicitly
> via one of the functions in the above call chain) will result in a KURL object
> that represents the URL, "javascript:%20'%46'" (**). Notice, this result
> differs from the fully URL encoded result of "javascript:%20%27%2546%27".
> Decoding the string form of (**) using KURL::decodeURLEscapeSequences produces
> the result: "javascript: 'F'". Clearly, this is not the inverse of the (**).
Comment 2 Adam Barth 2011-08-19 13:39:15 PDT
This bug is fixed in the new architecture.