Summary: | XSS filter bypass via non-standard URL encoding | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Adam Barth <abarth> | ||||||||||
Component: | WebKit Misc. | Assignee: | Daniel Bates <dbates> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | dbates | ||||||||||
Priority: | P2 | Keywords: | XSSAuditor | ||||||||||
Version: | 528+ (Nightly build) | ||||||||||||
Hardware: | All | ||||||||||||
OS: | All | ||||||||||||
Bug Depends on: | |||||||||||||
Bug Blocks: | 66579, 67134, 67842 | ||||||||||||
Attachments: |
|
Description
Adam Barth
2011-08-19 14:07:01 PDT
Created attachment 106094 [details]
Work-in-progress patch
Comment on attachment 106094 [details] Work-in-progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > Source/WebCore/html/parser/XSSAuditor.cpp:119 > +static inline String decodeFancyUnicodeEscapeSequences(const String& string) Love the name. > Source/WebCore/platform/text/DecodeEscapeSequences.h:38 > +static inline int hexDigitValue(UChar c) Supposedly including static functions in a header makes the linker sad. I think this function is in ASCIIType anyway. Comment on attachment 106094 [details] Work-in-progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review >> Source/WebCore/html/parser/XSSAuditor.cpp:119 >> +static inline String decodeFancyUnicodeEscapeSequences(const String& string) > > Love the name. Might call it decode16BitUnicodeEscapeSequences. > Source/WebCore/html/parser/XSSAuditor.cpp:135 > String decodedString = decoder->encoding().decode(workingStringUTF8.data(), workingStringUTF8.length()); We know this can't work. OK for now, but the decoding has to happen inside decodeURLEscapeSequences so maybe pass it the decoder. > Source/WebCore/platform/text/DecodeEscapeSequences.h:62 > + *p++ = (hexDigitValue(run[2]) << 12) | (hexDigitValue(run[3]) << 8) | (hexDigitValue(run[4]) << 4) | hexDigitValue(run[5]); *p is a char, but you're assigning 16 bit value to it. > Source/WebCore/platform/text/DecodeEscapeSequences.h:112 > + String decoded = (encoding.isValid() ? encoding : UTF8Encoding()).decode(buffer.data(), numBytesWritten); There's a difference between the 8bit version where the %-escapes are interepreted relative to an encoding vs. the %u-style escapes, which are expected to represent a unicode code point no matter what the encoding and shouldn't get decoded. I think your palceDecodedRunInBuffer needs to be passed the encoding so it can decide whether to do this or not, with the output buffer being a String (or UChar equivalent). Comment on attachment 106094 [details] Work-in-progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > LayoutTests/http/tests/security/xssAuditor/script-tag-with-fancy-unicode2.html:13 > +</iframe> Really need to stick some high-valued codepoints in here -- that may show the *p bug above. Also %252525u0061 should be tried as well to test interaction between the two decoders. Comment on attachment 106094 [details] Work-in-progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > Source/WebCore/html/parser/XSSAuditor.cpp:130 > workingString = decodeURLEscapeSequences(workingString); Might want to circumvent the path through KURL.cpp to avoid schizophrenic issues about whether KURLGoogle.cpp is used on a given build. You could just call the templated function directly. (In reply to comment #2) > (From update of attachment 106094 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > > > Source/WebCore/html/parser/XSSAuditor.cpp:119 > > +static inline String decodeFancyUnicodeEscapeSequences(const String& string) > > Love the name. Although we understand the meaning of "fancy", I suggest we rename this function to decode16BitUnicodeEscapeSequences() as suggested by Thomas Sepez because this name better conveys the fanciness of these Unicode escape sequences. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:38 > > +static inline int hexDigitValue(UChar c) > > Supposedly including static functions in a header makes the linker sad. I think this function is in ASCIIType anyway. Will remove. Okiedokes. (In reply to comment #3) > (From update of attachment 106094 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > > >> Source/WebCore/html/parser/XSSAuditor.cpp:119 > >> +static inline String decodeFancyUnicodeEscapeSequences(const String& string) > > > > Love the name. > > Might call it decode16BitUnicodeEscapeSequences. Will rename. > > > Source/WebCore/html/parser/XSSAuditor.cpp:135 > > String decodedString = decoder->encoding().decode(workingStringUTF8.data(), workingStringUTF8.length()); > > We know this can't work. OK for now, but the decoding has to happen inside decodeURLEscapeSequences so maybe pass it the decoder. Will remove and instead pass text encoding to decodeURLEscapeSequences(). > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:62 > > + *p++ = (hexDigitValue(run[2]) << 12) | (hexDigitValue(run[3]) << 8) | (hexDigitValue(run[4]) << 4) | hexDigitValue(run[5]); > > *p is a char, but you're assigning 16 bit value to it. > Will fix. > > Source/WebCore/platform/text/DecodeEscapeSequences.h:112 > > + String decoded = (encoding.isValid() ? encoding : UTF8Encoding()).decode(buffer.data(), numBytesWritten); > > There's a difference between the 8bit version where the %-escapes are interepreted relative to an encoding vs. the %u-style escapes, which are expected to represent a unicode code point no matter what the encoding and shouldn't get decoded. I think your palceDecodedRunInBuffer needs to be passed the encoding so it can decide whether to do this or not, with the output buffer being a String (or UChar equivalent). Will rename placeDecodedRunInBuffer() to decodeRun(), have it take as input a text encoding, and have it return a String object. (In reply to comment #4) > (From update of attachment 106094 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > > > LayoutTests/http/tests/security/xssAuditor/script-tag-with-fancy-unicode2.html:13 > > +</iframe> > > Really need to stick some high-valued codepoints in here -- that may show the *p bug above. Also %252525u0061 should be tried as well to test interaction between the two decoders. Will add test cases that include high-valued Unicode code points. (In reply to comment #5) > (From update of attachment 106094 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106094&action=review > > > Source/WebCore/html/parser/XSSAuditor.cpp:130 > > workingString = decodeURLEscapeSequences(workingString); > > Might want to circumvent the path through KURL.cpp to avoid schizophrenic issues about whether KURLGoogle.cpp is used on a given build. You could just call the templated function directly. Will call templated function decodeEscapeSequences<URLEscapeSequence>() instead of KURL's decodeURLEscapeSequences(). Created attachment 106279 [details]
Patch and layout tests
Needs change log. We could also look to bolster the test case script-tag-with-fancy-unicode5.html with more high-valued Unicode code points.
Additionally, added an early return in XSSAuditor::init() when the document's URL is null or the empty string (i.e. url.isEmpty() evaluates to true). This can happen when opening a new browser window or calling window.open("") (i.e. not specifying a URL to window.open()).
Created attachment 106296 [details]
Patch and layout tests
Add more high-valued Unicode code points to test case script-tag-with-16bit-unicode5.html (formerly named script-tag-with-fancy-unicode5.html).
I renamed the test cases such that they have the suffix 16bit-unicode in their name so as to more closely match the references to 16-bit Unicode escape sequences in the code. I am open to suggestions. Should we decide to use this suffix then we should rename the existing test case script-tag-with-fancy-unicode.html for consistency.
Added an additional test case, window-open-without-url-should-not-assert.html, to ensure we don't assert when opening a browser window without a URL using window.open("") (verbatim).
Comment on attachment 106296 [details] Patch and layout tests View in context: https://bugs.webkit.org/attachment.cgi?id=106296&action=review This looks good. Below are a bunch of minor comments. I'd like to look at it one more time to make sure we get the multi-byte unicode stuff right. > Source/WebCore/html/parser/XSSAuditor.cpp:126 > +static inline String commonDecodeURLEscapeSequences(const String& string, const TextEncoding& encoding) decodeCommonURLEscapeSequences ? As written, the name doesn't seem to read quite right. Maybe decodeStandardURLEscapSequences ? > Source/WebCore/html/parser/XSSAuditor.cpp:142 > + workingString = decode16BitUnicodeEscapeSequences(workingString); Don't we need to do this inside the loop too? > Source/WebCore/platform/text/DecodeEscapeSequences.h:46 > + && WTF::isASCIIHexDigit(string[start + 2]) && WTF::isASCIIHexDigit(string[start + 3]) > + && WTF::isASCIIHexDigit(string[start + 4]) && WTF::isASCIIHexDigit(string[start + 5]); You probably don't need the WTF:: parts. > Source/WebCore/platform/text/DecodeEscapeSequences.h:52 > + size_t numSequences = runLength / size; numSequences => numberOfSequences > Source/WebCore/platform/text/DecodeEscapeSequences.h:53 > + Vector<UChar, 512> buffer; Given that we need to return a String anyway, we should just use StringBuilder and reserveCapacity. We can't avoid the malloc in any case. > Source/WebCore/platform/text/DecodeEscapeSequences.h:57 > + *p++ = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]); Is this correct even with surrogates? I would have expected you to do something like what we do here: http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp#L73 > Source/WebCore/platform/text/DecodeEscapeSequences.h:76 > + DEFINE_STATIC_LOCAL(CharBuffer, buffer, ()); Why are we declaring it static? That can cause problems if this code is run on multiple threads or re-entered. Just allocating it on the stack should be find (and essentially free). > Source/WebCore/platform/text/DecodeEscapeSequences.h:78 > + size_t numSequences = runLength / size; numSequences => numberOfSequences > Source/WebCore/platform/text/DecodeEscapeSequences.h:84 > + } I'd add an ASSERT about buffer.size. > Source/WebCore/platform/text/DecodeEscapeSequences.h:92 > + Vector<UChar> result; StringBuilder is probably better here. > Source/WebCore/platform/text/DecodeEscapeSequences.h:93 > + unsigned length = string.length(); size_t <-- amounts of memory are measured in size_t so they work properly on 64-bit machines. > Source/WebCore/platform/text/DecodeEscapeSequences.h:95 > + unsigned decodedPosition = 0; > + unsigned searchPosition = 0; Same here. (In reply to comment #13) > [...] > > Source/WebCore/html/parser/XSSAuditor.cpp:126 > > +static inline String commonDecodeURLEscapeSequences(const String& string, const TextEncoding& encoding) > > decodeCommonURLEscapeSequences ? As written, the name doesn't seem to read quite right. Maybe decodeStandardURLEscapSequences ? > I'll use decodeStandardURLEscapeSequences(). Both decodeCommonURLEscapeSequences() and decodeStandardURLEscapeSequences() seem to imply that there exists some uncommon/non-standard URL escape sequences that this function doesn't decode. I wanted to convey that this function doesn't use the platform-specific KURL URL decoding machinery. So, I chose the prefix "common" as a synonym for cross-platform. Maybe prefixing with "crossPlatform" would be more descriptive? Another idea is to rename decodeURLEscapeSequences() to platformDecodeURLEscapeSequences() in KURL.{cpp, h} and KURLGoogle.{cpp, h}. Then we can reuse the name decodeURLEscapeSequences for a function that calls decodeEscapeSequences<URLEscapeSequence>(). > > Source/WebCore/html/parser/XSSAuditor.cpp:142 > > + workingString = decode16BitUnicodeEscapeSequences(workingString); > > Don't we need to do this inside the loop too? Yes, will fix. Then we'll catch the following attacks that intermix the %u- and URL- encodings, such as: http://localhost:8000/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl?q=<script>alert(/XS%u002525u0053/)</script>. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:46 > > + && WTF::isASCIIHexDigit(string[start + 2]) && WTF::isASCIIHexDigit(string[start + 3]) > > + && WTF::isASCIIHexDigit(string[start + 4]) && WTF::isASCIIHexDigit(string[start + 5]); > > You probably don't need the WTF:: parts. Will remove. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:52 > > + size_t numSequences = runLength / size; > > numSequences => numberOfSequences Will rename. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:53 > > + Vector<UChar, 512> buffer; > > Given that we need to return a String anyway, we should just use StringBuilder and reserveCapacity. We can't avoid the malloc in any case. Will use StringBuilder. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:57 > > + *p++ = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]); > > Is this correct even with surrogates? I would have expected you to do something like what we do here: > > http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp#L73 Yes, this works with surrogates. We discussed this over IRC on 09/06/2011. Will follow this comment with a transcript from that discussion. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:76 > > + DEFINE_STATIC_LOCAL(CharBuffer, buffer, ()); > > Why are we declaring it static? That can cause problems if this code is run on multiple threads or re-entered. Just allocating it on the stack should be find (and essentially free). Will remove. Instead, will stack allocate a buffer. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:78 > > + size_t numSequences = runLength / size; > > numSequences => numberOfSequences Will rename. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:84 > > + } > > I'd add an ASSERT about buffer.size. Will add ASSERT(buffer.size() == static_cast<size_t>(p - buffer.data())). > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:92 > > + Vector<UChar> result; > > StringBuilder is probably better here. Will use StringBuilder. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:93 > > + unsigned length = string.length(); > > size_t <-- amounts of memory are measured in size_t so they work properly on 64-bit machines. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:95 > > + unsigned decodedPosition = 0; > > + unsigned searchPosition = 0; > > Same here. Will use size_t. For completeness, I discussed with Adam over IRC on 09/06/2011 about the correctness of the decoding routine for %u-escape sequences that he asked about in comment 13: "> Source/WebCore/platform/text/DecodeEscapeSequences.h:57 > + *p++ = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]); Is this correct even with surrogates? I would have expected you to do something like what we do here: http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp#L73" The paraphrased IRC transcript follows: [7:56pm] abarth: dydz: were you looking for me earlier? [7:57pm] dydz: abarth: Hi abarth. I was. I wanted to talk about <https://bugs.webkit.org/show_bug.cgi?id=66588#c13> [7:57pm] dydz: abarth: In particular, about the surrogates. [7:57pm] abarth: yessir [7:58pm] dydz: abarth: From my understanding a String obejct represents UTF-16 string [7:58pm] dydz: And %u-escape sequences represent UTF-16 code points. [7:58pm] dydz: err [7:58pm] dydz: UTF-16 code unit [7:58pm] dydz: That is, the escape sequence doesn't represent a Unicode code point. [7:59pm] abarth: dydz: why do you think that? [7:59pm] abarth: maybe ECMA-262 says? [8:00pm] abarth: "%uxxxx, where xxxx is a Unicode value represented as four hexadecimal digits" [8:00pm] abarth: is what wikipedia says [8:00pm] abarth: which isn't all that precise [8:00pm] abarth: but you're right that 262 usually deals in code units [8:01pm] abarth: http://www.w3.org/International/iri-edit/draft-duerst-iri.html [8:01pm] abarth: might be a good source [8:02pm] abarth: "Instead of using the existing percent-encoding convention of URIs, which is based on octets, the idea was to create a new encoding convention, for example to use '%u' to introduce UCS code points." [8:02pm] dydz: abarth: I haven't read ECMA-262 yet, but for your reference the definition of Unicode code units seems consistent with the implementation of unescape <http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp#L568> [8:03pm] pizlo joined the chat room. [8:03pm] abarth: Lexer::convertUnicode [8:03pm] abarth: ok [8:03pm] abarth: you're probably right [8:03pm] abarth: maybe add a comment explaining? [8:04pm] dydz: abarth: sure [8:04pm] dydz: abarth: Can you save me some time, is a UCS-2 code point the same as a UTF-16 code unit? [8:05pm] abarth: http://en.wikipedia.org/wiki/UTF-16/UCS-2 [8:05pm] abarth: "The older UCS-2 (2-byte Universal Character Set) is a similar character encoding that was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996.[2] It produces a fixed-length format by simply using the code point as the 16-bit code unit and produces exactly the same result as UTF-16 for 96.9% of all the code points in the range 0-0xFFFF, including all characters that had been assigned a value at that time." [8:05pm] abarth: sounds like UTF-16 without the surroages [8:06pm] abarth: and unable to represent characters that UTF-16 uses surrogates for [8:10pm] dydz: abarth: ok. So a UCS-2 code point is analogous to a UTF-16 code unit. Created attachment 106817 [details]
Patch and layout tests
Updated patch based on Adam Barth's comments.
Added additional tests, including script-tag-with-16bit-unicode-surrogate-pair.html.
Comment on attachment 106817 [details]
Patch and layout tests
Thanks Dan.
Committed r94828: <http://trac.webkit.org/changeset/94828> |