66588 – XSS filter bypass via non-standard URL encoding

RESOLVED FIXED66588

XSS filter bypass via non-standard URL encoding

https://bugs.webkit.org/show_bug.cgi?id=66588

Summary XSS filter bypass via non-standard URL encoding

Adam Barth

Reported 2011-08-19 14:07:01 PDT

(I thought this bug was filed somewhere already, but I couldn't find it.) Quoting Wikipedia: [[ There exists a non-standard encoding for Unicode characters: %uxxxx, where xxxx is a Unicode value represented as four hexadecimal digits. This behavior is not specified by any RFC and has been rejected by the W3C. The third edition of ECMA-262 still includes an escape(string) function that uses this syntax, but also an encodeURI(uri) function that converts to UTF-8 and percent-encodes each octet. ]] -- http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations It turns out ASP (or possibly ASP.NET) servers decode these sequences, which leads to an XSS filter bypass because we don't understand that transformation. ASP servers are a large enough population that it's probably worth teaching the XSS auditor to understand this case.

Attachments
Work-in-progress patch (18.74 KB, patch) 2011-09-01 22:35 PDT, Daniel Bates	no flags	Details Formatted Diff Diff
Patch and layout tests (24.08 KB, patch) 2011-09-03 22:32 PDT, Daniel Bates	no flags	Details Formatted Diff Diff
Patch and layout tests (30.95 KB, patch) 2011-09-04 16:37 PDT, Daniel Bates	abarth: review-	Details Formatted Diff Diff
Patch and layout tests (38.53 KB, patch) 2011-09-08 17:09 PDT, Daniel Bates	abarth: review+	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Daniel Bates

Comment 1 2011-09-01 22:35:49 PDT

Created attachment 106094 [details] Work-in-progress patch

Adam Barth

Comment 2 2011-09-01 22:43:09 PDT

Comment on attachment 106094 [details] Work-in-progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > Source/WebCore/html/parser/XSSAuditor.cpp:119 > +static inline String decodeFancyUnicodeEscapeSequences(const String& string) Love the name. > Source/WebCore/platform/text/DecodeEscapeSequences.h:38 > +static inline int hexDigitValue(UChar c) Supposedly including static functions in a header makes the linker sad. I think this function is in ASCIIType anyway.

Thomas Sepez

Comment 3 2011-09-02 11:20:15 PDT

Comment on attachment 106094 [details] Work-in-progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review >> Source/WebCore/html/parser/XSSAuditor.cpp:119 >> +static inline String decodeFancyUnicodeEscapeSequences(const String& string) > > Love the name. Might call it decode16BitUnicodeEscapeSequences. > Source/WebCore/html/parser/XSSAuditor.cpp:135 > String decodedString = decoder->encoding().decode(workingStringUTF8.data(), workingStringUTF8.length()); We know this can't work. OK for now, but the decoding has to happen inside decodeURLEscapeSequences so maybe pass it the decoder. > Source/WebCore/platform/text/DecodeEscapeSequences.h:62 > + *p++ = (hexDigitValue(run[2]) << 12) | (hexDigitValue(run[3]) << 8) | (hexDigitValue(run[4]) << 4) | hexDigitValue(run[5]); *p is a char, but you're assigning 16 bit value to it. > Source/WebCore/platform/text/DecodeEscapeSequences.h:112 > + String decoded = (encoding.isValid() ? encoding : UTF8Encoding()).decode(buffer.data(), numBytesWritten); There's a difference between the 8bit version where the %-escapes are interepreted relative to an encoding vs. the %u-style escapes, which are expected to represent a unicode code point no matter what the encoding and shouldn't get decoded. I think your palceDecodedRunInBuffer needs to be passed the encoding so it can decide whether to do this or not, with the output buffer being a String (or UChar equivalent).

Thomas Sepez

Comment 4 2011-09-02 11:23:08 PDT

Comment on attachment 106094 [details] Work-in-progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > LayoutTests/http/tests/security/xssAuditor/script-tag-with-fancy-unicode2.html:13 > +</iframe> Really need to stick some high-valued codepoints in here -- that may show the *p bug above. Also %252525u0061 should be tried as well to test interaction between the two decoders.

Thomas Sepez

Comment 5 2011-09-02 12:00:53 PDT

Comment on attachment 106094 [details] Work-in-progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > Source/WebCore/html/parser/XSSAuditor.cpp:130 > workingString = decodeURLEscapeSequences(workingString); Might want to circumvent the path through KURL.cpp to avoid schizophrenic issues about whether KURLGoogle.cpp is used on a given build. You could just call the templated function directly.

Daniel Bates

Comment 6 2011-09-03 21:59:44 PDT

(In reply to comment #2) > (From update of attachment 106094 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > > > Source/WebCore/html/parser/XSSAuditor.cpp:119 > > +static inline String decodeFancyUnicodeEscapeSequences(const String& string) > > Love the name. Although we understand the meaning of "fancy", I suggest we rename this function to decode16BitUnicodeEscapeSequences() as suggested by Thomas Sepez because this name better conveys the fanciness of these Unicode escape sequences. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:38 > > +static inline int hexDigitValue(UChar c) > > Supposedly including static functions in a header makes the linker sad. I think this function is in ASCIIType anyway. Will remove.

Adam Barth

Comment 7 2011-09-03 22:01:36 PDT

Okiedokes.

Daniel Bates

Comment 8 2011-09-03 22:06:46 PDT

(In reply to comment #3) > (From update of attachment 106094 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > > >> Source/WebCore/html/parser/XSSAuditor.cpp:119 > >> +static inline String decodeFancyUnicodeEscapeSequences(const String& string) > > > > Love the name. > > Might call it decode16BitUnicodeEscapeSequences. Will rename. > > > Source/WebCore/html/parser/XSSAuditor.cpp:135 > > String decodedString = decoder->encoding().decode(workingStringUTF8.data(), workingStringUTF8.length()); > > We know this can't work. OK for now, but the decoding has to happen inside decodeURLEscapeSequences so maybe pass it the decoder. Will remove and instead pass text encoding to decodeURLEscapeSequences(). > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:62 > > + *p++ = (hexDigitValue(run[2]) << 12) | (hexDigitValue(run[3]) << 8) | (hexDigitValue(run[4]) << 4) | hexDigitValue(run[5]); > > *p is a char, but you're assigning 16 bit value to it. > Will fix. > > Source/WebCore/platform/text/DecodeEscapeSequences.h:112 > > + String decoded = (encoding.isValid() ? encoding : UTF8Encoding()).decode(buffer.data(), numBytesWritten); > > There's a difference between the 8bit version where the %-escapes are interepreted relative to an encoding vs. the %u-style escapes, which are expected to represent a unicode code point no matter what the encoding and shouldn't get decoded. I think your palceDecodedRunInBuffer needs to be passed the encoding so it can decide whether to do this or not, with the output buffer being a String (or UChar equivalent). Will rename placeDecodedRunInBuffer() to decodeRun(), have it take as input a text encoding, and have it return a String object.

Daniel Bates

Comment 9 2011-09-03 22:08:04 PDT

(In reply to comment #4) > (From update of attachment 106094 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > > > LayoutTests/http/tests/security/xssAuditor/script-tag-with-fancy-unicode2.html:13 > > +</iframe> > > Really need to stick some high-valued codepoints in here -- that may show the *p bug above. Also %252525u0061 should be tried as well to test interaction between the two decoders. Will add test cases that include high-valued Unicode code points.

Daniel Bates

Comment 10 2011-09-03 22:21:06 PDT

(In reply to comment #5) > (From update of attachment 106094 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > > > Source/WebCore/html/parser/XSSAuditor.cpp:130 > > workingString = decodeURLEscapeSequences(workingString); > > Might want to circumvent the path through KURL.cpp to avoid schizophrenic issues about whether KURLGoogle.cpp is used on a given build. You could just call the templated function directly. Will call templated function decodeEscapeSequences<URLEscapeSequence>() instead of KURL's decodeURLEscapeSequences().

Daniel Bates

Comment 11 2011-09-03 22:32:54 PDT

Created attachment 106279 [details] Patch and layout tests Needs change log. We could also look to bolster the test case script-tag-with-fancy-unicode5.html with more high-valued Unicode code points. Additionally, added an early return in XSSAuditor::init() when the document's URL is null or the empty string (i.e. url.isEmpty() evaluates to true). This can happen when opening a new browser window or calling window.open("") (i.e. not specifying a URL to window.open()).

Daniel Bates

Comment 12 2011-09-04 16:37:33 PDT

Created attachment 106296 [details] Patch and layout tests Add more high-valued Unicode code points to test case script-tag-with-16bit-unicode5.html (formerly named script-tag-with-fancy-unicode5.html). I renamed the test cases such that they have the suffix 16bit-unicode in their name so as to more closely match the references to 16-bit Unicode escape sequences in the code. I am open to suggestions. Should we decide to use this suffix then we should rename the existing test case script-tag-with-fancy-unicode.html for consistency. Added an additional test case, window-open-without-url-should-not-assert.html, to ensure we don't assert when opening a browser window without a URL using window.open("") (verbatim).

Adam Barth

Comment 13 2011-09-05 11:38:47 PDT

Comment on attachment 106296 [details] Patch and layout tests View in context: https://bugs.webkit.org/attachment.cgi?id=106296&action=review This looks good. Below are a bunch of minor comments. I'd like to look at it one more time to make sure we get the multi-byte unicode stuff right. > Source/WebCore/html/parser/XSSAuditor.cpp:126 > +static inline String commonDecodeURLEscapeSequences(const String& string, const TextEncoding& encoding) decodeCommonURLEscapeSequences ? As written, the name doesn't seem to read quite right. Maybe decodeStandardURLEscapSequences ? > Source/WebCore/html/parser/XSSAuditor.cpp:142 > + workingString = decode16BitUnicodeEscapeSequences(workingString); Don't we need to do this inside the loop too? > Source/WebCore/platform/text/DecodeEscapeSequences.h:46 > + && WTF::isASCIIHexDigit(string[start + 2]) && WTF::isASCIIHexDigit(string[start + 3]) > + && WTF::isASCIIHexDigit(string[start + 4]) && WTF::isASCIIHexDigit(string[start + 5]); You probably don't need the WTF:: parts. > Source/WebCore/platform/text/DecodeEscapeSequences.h:52 > + size_t numSequences = runLength / size; numSequences => numberOfSequences > Source/WebCore/platform/text/DecodeEscapeSequences.h:53 > + Vector<UChar, 512> buffer; Given that we need to return a String anyway, we should just use StringBuilder and reserveCapacity. We can't avoid the malloc in any case. > Source/WebCore/platform/text/DecodeEscapeSequences.h:57 > + *p++ = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]); Is this correct even with surrogates? I would have expected you to do something like what we do here: http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp#L73 > Source/WebCore/platform/text/DecodeEscapeSequences.h:76 > + DEFINE_STATIC_LOCAL(CharBuffer, buffer, ()); Why are we declaring it static? That can cause problems if this code is run on multiple threads or re-entered. Just allocating it on the stack should be find (and essentially free). > Source/WebCore/platform/text/DecodeEscapeSequences.h:78 > + size_t numSequences = runLength / size; numSequences => numberOfSequences > Source/WebCore/platform/text/DecodeEscapeSequences.h:84 > + } I'd add an ASSERT about buffer.size. > Source/WebCore/platform/text/DecodeEscapeSequences.h:92 > + Vector<UChar> result; StringBuilder is probably better here. > Source/WebCore/platform/text/DecodeEscapeSequences.h:93 > + unsigned length = string.length(); size_t <-- amounts of memory are measured in size_t so they work properly on 64-bit machines. > Source/WebCore/platform/text/DecodeEscapeSequences.h:95 > + unsigned decodedPosition = 0; > + unsigned searchPosition = 0; Same here.

Daniel Bates

Comment 14 2011-09-08 16:17:35 PDT

(In reply to comment #13) > [...] > > Source/WebCore/html/parser/XSSAuditor.cpp:126 > > +static inline String commonDecodeURLEscapeSequences(const String& string, const TextEncoding& encoding) > > decodeCommonURLEscapeSequences ? As written, the name doesn't seem to read quite right. Maybe decodeStandardURLEscapSequences ? > I'll use decodeStandardURLEscapeSequences(). Both decodeCommonURLEscapeSequences() and decodeStandardURLEscapeSequences() seem to imply that there exists some uncommon/non-standard URL escape sequences that this function doesn't decode. I wanted to convey that this function doesn't use the platform-specific KURL URL decoding machinery. So, I chose the prefix "common" as a synonym for cross-platform. Maybe prefixing with "crossPlatform" would be more descriptive? Another idea is to rename decodeURLEscapeSequences() to platformDecodeURLEscapeSequences() in KURL.{cpp, h} and KURLGoogle.{cpp, h}. Then we can reuse the name decodeURLEscapeSequences for a function that calls decodeEscapeSequences<URLEscapeSequence>(). > > Source/WebCore/html/parser/XSSAuditor.cpp:142 > > + workingString = decode16BitUnicodeEscapeSequences(workingString); > > Don't we need to do this inside the loop too? Yes, will fix. Then we'll catch the following attacks that intermix the %u- and URL- encodings, such as: http://localhost:8000/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl?q=<script>alert(/XS%u002525u0053/)</script>. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:46 > > + && WTF::isASCIIHexDigit(string[start + 2]) && WTF::isASCIIHexDigit(string[start + 3]) > > + && WTF::isASCIIHexDigit(string[start + 4]) && WTF::isASCIIHexDigit(string[start + 5]); > > You probably don't need the WTF:: parts. Will remove. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:52 > > + size_t numSequences = runLength / size; > > numSequences => numberOfSequences Will rename. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:53 > > + Vector<UChar, 512> buffer; > > Given that we need to return a String anyway, we should just use StringBuilder and reserveCapacity. We can't avoid the malloc in any case. Will use StringBuilder. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:57 > > + *p++ = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]); > > Is this correct even with surrogates? I would have expected you to do something like what we do here: > > http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp#L73 Yes, this works with surrogates. We discussed this over IRC on 09/06/2011. Will follow this comment with a transcript from that discussion. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:76 > > + DEFINE_STATIC_LOCAL(CharBuffer, buffer, ()); > > Why are we declaring it static? That can cause problems if this code is run on multiple threads or re-entered. Just allocating it on the stack should be find (and essentially free). Will remove. Instead, will stack allocate a buffer. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:78 > > + size_t numSequences = runLength / size; > > numSequences => numberOfSequences Will rename. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:84 > > + } > > I'd add an ASSERT about buffer.size. Will add ASSERT(buffer.size() == static_cast<size_t>(p - buffer.data())). > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:92 > > + Vector<UChar> result; > > StringBuilder is probably better here. Will use StringBuilder. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:93 > > + unsigned length = string.length(); > > size_t <-- amounts of memory are measured in size_t so they work properly on 64-bit machines. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:95 > > + unsigned decodedPosition = 0; > > + unsigned searchPosition = 0; > > Same here. Will use size_t.

Daniel Bates

Comment 15 2011-09-08 16:24:49 PDT

For completeness, I discussed with Adam over IRC on 09/06/2011 about the correctness of the decoding routine for %u-escape sequences that he asked about in comment 13: "> Source/WebCore/platform/text/DecodeEscapeSequences.h:57 > + *p++ = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]); Is this correct even with surrogates? I would have expected you to do something like what we do here: http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp#L73" The paraphrased IRC transcript follows: [7:56pm] abarth: dydz: were you looking for me earlier? [7:57pm] dydz: abarth: Hi abarth. I was. I wanted to talk about <https://bugs.webkit.org/show_bug.cgi?id=66588#c13> [7:57pm] dydz: abarth: In particular, about the surrogates. [7:57pm] abarth: yessir [7:58pm] dydz: abarth: From my understanding a String obejct represents UTF-16 string [7:58pm] dydz: And %u-escape sequences represent UTF-16 code points. [7:58pm] dydz: err [7:58pm] dydz: UTF-16 code unit [7:58pm] dydz: That is, the escape sequence doesn't represent a Unicode code point. [7:59pm] abarth: dydz: why do you think that? [7:59pm] abarth: maybe ECMA-262 says? [8:00pm] abarth: "%uxxxx, where xxxx is a Unicode value represented as four hexadecimal digits" [8:00pm] abarth: is what wikipedia says [8:00pm] abarth: which isn't all that precise [8:00pm] abarth: but you're right that 262 usually deals in code units [8:01pm] abarth: http://www.w3.org/International/iri-edit/draft-duerst-iri.html [8:01pm] abarth: might be a good source [8:02pm] abarth: "Instead of using the existing percent-encoding convention of URIs, which is based on octets, the idea was to create a new encoding convention, for example to use '%u' to introduce UCS code points." [8:02pm] dydz: abarth: I haven't read ECMA-262 yet, but for your reference the definition of Unicode code units seems consistent with the implementation of unescape <http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp#L568> [8:03pm] pizlo joined the chat room. [8:03pm] abarth: Lexer::convertUnicode [8:03pm] abarth: ok [8:03pm] abarth: you're probably right [8:03pm] abarth: maybe add a comment explaining? [8:04pm] dydz: abarth: sure [8:04pm] dydz: abarth: Can you save me some time, is a UCS-2 code point the same as a UTF-16 code unit? [8:05pm] abarth: http://en.wikipedia.org/wiki/UTF-16/UCS-2 [8:05pm] abarth: "The older UCS-2 (2-byte Universal Character Set) is a similar character encoding that was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996.[2] It produces a fixed-length format by simply using the code point as the 16-bit code unit and produces exactly the same result as UTF-16 for 96.9% of all the code points in the range 0-0xFFFF, including all characters that had been assigned a value at that time." [8:05pm] abarth: sounds like UTF-16 without the surroages [8:06pm] abarth: and unable to represent characters that UTF-16 uses surrogates for [8:10pm] dydz: abarth: ok. So a UCS-2 code point is analogous to a UTF-16 code unit.

Daniel Bates

Comment 16 2011-09-08 17:09:33 PDT

Created attachment 106817 [details] Patch and layout tests Updated patch based on Adam Barth's comments. Added additional tests, including script-tag-with-16bit-unicode-surrogate-pair.html.

Adam Barth

Comment 17 2011-09-08 17:13:28 PDT

Comment on attachment 106817 [details] Patch and layout tests Thanks Dan.

Daniel Bates

Comment 18 2011-09-08 19:39:16 PDT

Committed r94828: <http://trac.webkit.org/changeset/94828>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component WebKit Misc.

Assignee

Daniel Bates

Reported

2011-08-19 14:07 PDT

Modified

2011-09-09 05:18 PDT History

CC List

1 user Show

URL

Keywords XSSAuditor

Depends on

Blocks

66579 67134 67842

Dependencies

tree graph