Bug 68092

Summary: xssauditor - truncate inline snippets at a reasonable length before comparison
Product: WebKit Reporter: Thomas Sepez <tsepez>
Component: WebCore Misc.Assignee: Thomas Sepez <tsepez>
Status: RESOLVED FIXED    
Severity: Minor CC: abarth, dbates, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Proposed patch plus new test case
abarth: review+, tsepez: commit-queue-
Patch, testcase, plus don't treat non-dangerous attributes as JS. none

Description Thomas Sepez 2011-09-14 11:09:45 PDT
Work postponed from https://bugs.webkit.org/show_bug.cgi?id=27895.

Truncating the snipped will cause a problem if we truncate in the middle of a %-escape sequence. Maybe we should canonicalize before trimming the snippet.
Comment 1 Thomas Sepez 2011-09-14 17:30:53 PDT
alterative approach is to walk backwards from the end while the character is a hex digit, then check if we're preceded by % or %u.
Comment 2 Adam Barth 2011-09-14 17:32:12 PDT
What about the multiple encoding case?
Comment 3 Thomas Sepez 2011-09-15 15:35:06 PDT
Yup.  Let's try re-ordering, then.
Comment 4 Thomas Sepez 2011-09-15 17:33:20 PDT
Created attachment 107570 [details]
Proposed patch plus new test case

Proposed patch plus test case.  I manually tuned the length and confirmed the test case fails if we move the truncation prior to the decoding.  The test case is kind of devious in that I wanted the alert to fire if the xss auditor didn't catch the issue, but without the ability to introduce strings or regexps via punctuation, I settled for a numeric expression, exploiting the dual nature of the %-sign -- an escape for URL characters versus a modulo operation in JS.

Full tests still running on my box, hence no commit-queue "?" just yet.  But please review.  Thanks heaps.
Comment 5 Adam Barth 2011-09-15 17:39:52 PDT
Comment on attachment 107570 [details]
Proposed patch plus new test case

View in context: https://bugs.webkit.org/attachment.cgi?id=107570&action=review

> Source/WebCore/html/parser/XSSAuditor.cpp:512
> -    if ((position = snippet.find("=")) != notFound
> -        && (position = snippet.find(isNotHTMLSpace, position + 1)) != notFound
> -        && (position = snippet.find(isTerminatingCharacter, isHTMLQuote(snippet[position]) ? position + 1 : position)) != notFound) {
> -        snippet.truncate(position);
> +    if ((position = decodedSnippet.find("=")) != notFound
> +        && (position = decodedSnippet.find(isNotHTMLSpace, position + 1)) != notFound
> +        && (position = decodedSnippet.find(isTerminatingCharacter, isHTMLQuote(decodedSnippet[position]) ? position + 1 : position)) != notFound) {
> +        decodedSnippet.truncate(position);

I'm slightly confused.  Are URLs going through this path too?

> Source/WebCore/html/parser/XSSAuditor.cpp:-516
> -    ASSERT(!snippet.isEmpty());

Maybe we should assert the fullyDecoding the decodedSnippet doesn't change it?
Comment 6 Thomas Sepez 2011-09-16 09:40:47 PDT
Comment on attachment 107570 [details]
Proposed patch plus new test case

View in context: https://bugs.webkit.org/attachment.cgi?id=107570&action=review

>> Source/WebCore/html/parser/XSSAuditor.cpp:512
>> +        decodedSnippet.truncate(position);
> 
> I'm slightly confused.  Are URLs going through this path too?

URLs (as in the request paramaters and/post body) being matched against don't go through this path.  They go through fullyDecodeString as part of xssauditor::init().
URLs (as in <object src="URL"> go through this path, but the check for == "javascript:" is made against the parsed form of the attribute, not the snippet.  Snippet is just used to see if injected.

>> Source/WebCore/html/parser/XSSAuditor.cpp:-516

> 
> Maybe we should assert the fullyDecoding the decodedSnippet doesn't change it?

The refactoring first moved the responsibility for decoding to each of the three callers of isContainedInrequest().  They could each assert this.
The refactoring then removed the responsibility from two of the three callers by putting it into the middle of (now renamed) decodedSnippetForAttribute.  It could assert this, too.
There isn't any decoding going on in this function any longer, the sole remaining check is isEmpty(), so all the decoding transformations take place outside this.
Comment 7 Thomas Sepez 2011-09-16 09:41:24 PDT
Tests finished locally without any new noise.
Comment 8 Adam Barth 2011-09-16 10:43:05 PDT
Right, I'm saying that it could have:

ASSERT(decodedSnippet == fullyDecodeString(decodedSnippet));

to make sure all the callers have already decoded the string.

The reason I asked about whether URLs go through this path is whether we're going to get too many false positives from something like the following:

<object data="http://example.com/foo/bar.swf"></object>

which will generate the snippet:

data="http:/

which doesn't seem specific enough to avoid false positives.  We probably want this to be the snippet:

data="http://example.com
Comment 9 Thomas Sepez 2011-09-16 10:49:03 PDT
(In reply to comment #8)
> Right, I'm saying that it could have:
> 
> ASSERT(decodedSnippet == fullyDecodeString(decodedSnippet));
> 
> to make sure all the callers have already decoded the string.
> 

Ah, gotcha.  Seems reasonable.  Will do.


> The reason I asked about whether URLs go through this path is whether we're going to get too many false positives from something like the following:
> 
> <object data="http://example.com/foo/bar.swf"></object>
> 
> which will generate the snippet:
> 
> data="http:/
> 
> which doesn't seem specific enough to avoid false positives.  We probably want this to be the snippet:
> 
> data="http://example.com

think you're right.  that would have broken on previous commit.  Need to avoid the comment-truncating path on the non-dangerous attributes.  Will fix.
Comment 10 Adam Barth 2011-09-16 10:53:16 PDT
Thanks Tom.
Comment 11 Thomas Sepez 2011-09-16 10:54:18 PDT
(In reply to comment #9)
> (In reply to comment #8)
> > Right, I'm saying that it could have:
> > 
> > ASSERT(decodedSnippet == fullyDecodeString(decodedSnippet));
> > 
> > to make sure all the callers have already decoded the string.
> > 
> 
> Ah, gotcha.  Seems reasonable.  Will do.
> 
Actually, don't think we can make this assertion.  FullyDecoceString decodes in a loop then canonicalizes.  So if your string was %[high char]23, FDS (despite the name) returns %23.  No matter, what we want is %23 in this case because the intervening high char has meaning.  Just not consistent meaning across charsets.
Comment 12 Adam Barth 2011-09-16 11:04:01 PDT
Comment on attachment 107570 [details]
Proposed patch plus new test case

You're totally right.  We could grep for %, but that's probably not worth it.
Comment 13 Thomas Sepez 2011-09-16 11:12:10 PDT
Created attachment 107686 [details]
Patch, testcase, plus don't treat non-dangerous attributes as JS.
Comment 14 Thomas Sepez 2011-09-16 11:35:40 PDT
Comment on attachment 107686 [details]
Patch, testcase, plus don't treat non-dangerous attributes as JS.

Please review revised.
Comment 15 Adam Barth 2011-09-16 11:43:02 PDT
Comment on attachment 107686 [details]
Patch, testcase, plus don't treat non-dangerous attributes as JS.

This looks great.  I think this fixes another subtle bug when the characters we were looking to truncate at were URL-encoded.
Comment 16 WebKit Review Bot 2011-09-16 22:15:13 PDT
Comment on attachment 107686 [details]
Patch, testcase, plus don't treat non-dangerous attributes as JS.

Clearing flags on attachment: 107686

Committed r95366: <http://trac.webkit.org/changeset/95366>
Comment 17 WebKit Review Bot 2011-09-16 22:15:18 PDT
All reviewed patches have been landed.  Closing bug.