Second part of https://bugs.webkit.org/show_bug.cgi?id=27895. This is for the situation where the page contains a naturally-occurring close script tag on the same line as the injecton, e.g. <img src="$q"><script>foo()</script> where $q is the injected vector from the URL, typically ?q="><script>alert(0)//, yeilding <img src=""><script>alert(0)//"><script>foo()</script> and the comment removes the unparseable fragment from consideration by JS. xssauditor looking at the entire script block will fail to match because the "><script>foo() fragment isn't in the URL. Unlike the inline case, entities aren't an issue here, but because of false positives, stopping on the first comment isn't likely to be acceptable. Many script blocks will begin with a comment, eg. <script> // Copyright (c) 2001 by some guy.
Copy some of dbates's additional details from the original bug: We may want to file a separate bug to fix comments within script tags such as: <script>alert(1);/*<!-- Example: http://good.webblaze.org/dbates/xsstest-script-comment.php?q=%3Cscript%3Ealert%28/XSS/%29%3B/*%3C%21-- <script>alert(1);/* Example: http://good.webblaze.org/dbates/xsstest-script-comment.php?q=%3Cscript%3Ealert%28/XSS/%29%3B/* <script>alert(1);// Example: http://good.webblaze.org/dbates/xsstest-script-comment.php?q=%3Cscript%3Ealert%28/XSS/%29%3B// <script>alert(1);<!-- Example: http://good.webblaze.org/dbates/xsstest-script-comment.php?q=%3Cscript%3Ealert%28/XSS/%29%3B%3C%21--
Even worse case is when attacker controls two injections on the page, and can escape the intervening page content with /* in one injection and */ in the other. Not sure if we want to tackle this case [chromium bug http://code.google.com/p/chromium/issues/detail?id=96616 closed as wontfix for this case]. Consider a page print '<img src="' . $q1 . '"><foo>bar<baz/><img src="' . $q2 . '">'; ?q1="><script>/*&other=clutter&q2=bam*//**/alert(0)//foo%0a</script> One rule I can think of that might catch this is something like this is: - verify a <script> tag in the URL, else do nothing. - without further decoding, parse the script block in the page, skipping over leading comments, and find the first actual code fragment, stopping at the next place a comment is introduced. - reduce/canonicalize the fragment - check if that fragment is present anywhere in the reduced/canonicalized URL. so: <img src=""><script>/*"><foo>bar<baz/><img src="bam*//**/alert(0)//foo </script> - verify <script> in URL. - skip first comment /*"><foo>bar<baz/><img src="bam*/ - skip second comment /**/ - extract code fragment alert(0) up to next comment. - decode and then find in URL.
Multiple injections are explicitly out of scope.
Created attachment 108052 [details] Patch + test cases. Rough cut at code to lex out the relevant portions of the script block for comparison.
Created attachment 108054 [details] Patch - unrelated files picked up by mistake.
Comment on attachment 108054 [details] Patch - unrelated files picked up by mistake. Woah there. I don't think we should be lexing JavaScript. That's way too complicated. Isn't there something simpler we can do?
I still think we can't stop at the first comment as we do for inline handlers, since it is so common for legitimate script blocks to start with comments; doing so would cause any <script> in the URL to match, making it too noisy. And having found the code, comparing it and the script tag in two comparisons against the URL seems to do a good job here (as it turns out, it happens to catch the case in the "wontfix bug" but that was not the intent). I'm thinking that it is OK to eliminate the quote balancing when looking for the end of the code fragment. I wanted to avoid stopping on something like "http://", but since legitimate code will have something before the string literal, we probably have enough signal even if we do stop there. If you throw that requirement out, then it becomes more practical to do this via a series of finds() since you're not worried about the context. Simpler, in some sense, but less elegant and slower. I'd expect that most of the code in here would reduce to state machines if we wanted to grease it ...
For the "starts with a comment" case, we could notice that there's only whitespace in the snippet and keep going. Having a whole JavaScript lexer here is way more complexity that we want.
Well, as you can see, its not a full lexer, it only understands quotes and comment. But point taken.
The main reason to grab more of the snippet is to reduce false positives. In the case where you have a string, we've probably already got a good enough snippet to avoid false positives: <script> var x = "http:// It seems fine to terminate the snippet there. It's not super important that it's not really the beginning of a comment.
Created attachment 108232 [details] Simpler patch
Comment on attachment 108232 [details] Simpler patch View in context: https://bugs.webkit.org/attachment.cgi?id=108232&action=review > Source/WebCore/html/parser/XSSAuditor.cpp:311 > + int start = 0; > + int end = token.endIndex() - token.startIndex(); snippetForRange doesn't use the same indicies as token.startIndex? I guess this is the same way the code used to work. (Also, these indicies should probably all be size_t, but that's a yak for another day.) > Source/WebCore/html/parser/XSSAuditor.cpp:313 > + if (isContainedInRequest(fullyDecodeString(snippet, decoder))) { I see, we're not requiring the two pieces to be contiguous in the request. Is that intentional? I guess the code fragment isn't necessarily at the beginning of the script. > Source/WebCore/html/parser/XSSAuditor.cpp:542 > +String XSSAuditor::extractCodeFragment(const String& string) If this is specific to JavaScript, maybe we should call it snippetForJavaScript ? > Source/WebCore/html/parser/XSSAuditor.cpp:548 > + size_t foundPosition; I would initialize this scalar with notFound, but that's just because I'm paranoid. > Source/WebCore/html/parser/XSSAuditor.cpp:561 > + if (string.substring(startPosition, 4) == "<!--") { > + while (startPosition < endPosition && string[startPosition] != '\r' && string[startPosition] != '\n') > + startPosition++; > + while (startPosition < endPosition && isHTMLSpace(string[startPosition])) > + startPosition++; > + } I'm not sure why this is a special case. It seems like we could just fold this into the following loop. <!-- is really the same thing as //. > Source/WebCore/html/parser/XSSAuditor.cpp:566 > + if (string.substring(startPosition, 2) == "//") { Here too we're doing a malloc for each character we scan. This might be worth adding to WTFString. > Source/WebCore/html/parser/XSSAuditor.cpp:568 > + while (startPosition < endPosition && string[startPosition] != '\r' && string[startPosition] != '\n') > + startPosition++; Maybe have an inline function to test for isLinebreak ? > Source/WebCore/html/parser/XSSAuditor.cpp:584 > + foundPosition = startPosition + kMaximumFragmentLength; So, kMaximumFragmentLength isn't really the maximum. It's some sort of target for the maximum. > Source/WebCore/html/parser/XSSAuditor.cpp:595 > + // Stop at next comment. > + if ((foundPosition = string.find("//", startPosition, endPosition - startPosition)) != notFound) > + endPosition = foundPosition; > + if ((foundPosition = string.find("/*", startPosition, endPosition - startPosition)) != notFound) > + endPosition = foundPosition; We should probably stop at the first <!-- too. In JavaScript <!-- is just like //.
Comment on attachment 108232 [details] Simpler patch This looks great. I'd like to see another iteration of extractCodeFragment.
Comment on attachment 108232 [details] Simpler patch View in context: https://bugs.webkit.org/attachment.cgi?id=108232&action=review >> Source/WebCore/html/parser/XSSAuditor.cpp:311 >> + int end = token.endIndex() - token.startIndex(); > > snippetForRange doesn't use the same indicies as token.startIndex? I guess this is the same way the code used to work. (Also, these indicies should probably all be size_t, but that's a yak for another day.) Will leave as is. >> Source/WebCore/html/parser/XSSAuditor.cpp:313 >> + if (isContainedInRequest(fullyDecodeString(snippet, decoder))) { > > I see, we're not requiring the two pieces to be contiguous in the request. Is that intentional? I guess the code fragment isn't necessarily at the beginning of the script. Yes, we need to deal with an injection like <script>// comment to trick you %0a alert(0) </script> now. We won't include the comment in the snippet we extract. >> Source/WebCore/html/parser/XSSAuditor.cpp:542 >> +String XSSAuditor::extractCodeFragment(const String& string) > > If this is specific to JavaScript, maybe we should call it snippetForJavaScript ? Sure. Will rename. >> Source/WebCore/html/parser/XSSAuditor.cpp:548 >> + size_t foundPosition; > > I would initialize this scalar with notFound, but that's just because I'm paranoid. Sure. would like an uninit reference to lead to string[notFound] taking an exception. >> Source/WebCore/html/parser/XSSAuditor.cpp:561 >> + } > > I'm not sure why this is a special case. It seems like we could just fold this into the following loop. <!-- is really the same thing as //. Ok. My recollection of the spec was that the first instance of <!-- was a comment, but that subsequent ones were <!-- , as in if (a<!--b) {} but I may be wrong here. >> Source/WebCore/html/parser/XSSAuditor.cpp:566 >> + if (string.substring(startPosition, 2) == "//") { > > Here too we're doing a malloc for each character we scan. This might be worth adding to WTFString. Or just sp + 1 < ep && string[sp] == '/' && string[sp+1] == '/' >> Source/WebCore/html/parser/XSSAuditor.cpp:568 >> + startPosition++; > > Maybe have an inline function to test for isLinebreak ? Sure. >> Source/WebCore/html/parser/XSSAuditor.cpp:584 >> + foundPosition = startPosition + kMaximumFragmentLength; > > So, kMaximumFragmentLength isn't really the maximum. It's some sort of target for the maximum. Ok, I'll call it kMaximumFragmentLengthTarget. >> Source/WebCore/html/parser/XSSAuditor.cpp:595 >> + endPosition = foundPosition; > > We should probably stop at the first <!-- too. In JavaScript <!-- is just like //. Yes, see above.
You're probably right w.r.t. <!-- only being a comment the first time. Maybe run a quick test?
<script> <!-- this is a comment alert(0) <!-- this may not be a comment // --> </script> <script>a=1;alert("case 1: " + (false<!--a));</script> <script>a=1;alert("case 2: " + (false<! --a));</script> <script>a=1;alert("case 3: " + (false< !--a));</script> gives an alert: 0, syntax error, alert: case2:true, and alert: case3 true. This implies that <!-- is always a comment. At least on webkit and FF (6.0.2). So will fold all those into the loop.
> So will fold all those into the loop. Great. Thanks.
Comment on attachment 108232 [details] Simpler patch View in context: https://bugs.webkit.org/attachment.cgi?id=108232&action=review >>> Source/WebCore/html/parser/XSSAuditor.cpp:595 >>> + endPosition = foundPosition; >> >> We should probably stop at the first <!-- too. In JavaScript <!-- is just like //. > > Yes, see above. Argh, as it turns out the 3-arg versions of these calls don't do what I expected them to do. Will fix this too.
Created attachment 108261 [details] Patch + changes for previous comments.
Comment on attachment 108261 [details] Patch + changes for previous comments. View in context: https://bugs.webkit.org/attachment.cgi?id=108261&action=review Fantastic. > Source/WebCore/html/parser/XSSAuditor.cpp:561 > + Looks like there's an extra line here.
Comment on attachment 108261 [details] Patch + changes for previous comments. It dawns on me :There's a subtle hole I introduced when I merged the two loops together in the end. I'll retry tomorrow.
Created attachment 108353 [details] Revised patch closing hole with truncation case.
Comment on attachment 108261 [details] Patch + changes for previous comments. View in context: https://bugs.webkit.org/attachment.cgi?id=108261&action=review > Source/WebCore/html/parser/XSSAuditor.cpp:600 > + while (foundPosition < endPosition && !isHTMLSpace(string[foundPosition])) > + foundPosition++; I see. This could walk over a carefully-placed comment that was just after position kMaximumFragmentLengthTarget. Good catch sir.
Comment on attachment 108353 [details] Revised patch closing hole with truncation case. Clearing flags on attachment: 108353 Committed r95774: <http://trac.webkit.org/changeset/95774>
All reviewed patches have been landed. Closing bug.