WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
66588
XSS filter bypass via non-standard URL encoding
https://bugs.webkit.org/show_bug.cgi?id=66588
Summary
XSS filter bypass via non-standard URL encoding
Adam Barth
Reported
2011-08-19 14:07:01 PDT
(I thought this bug was filed somewhere already, but I couldn't find it.) Quoting Wikipedia: [[ There exists a non-standard encoding for Unicode characters: %uxxxx, where xxxx is a Unicode value represented as four hexadecimal digits. This behavior is not specified by any RFC and has been rejected by the W3C. The third edition of ECMA-262 still includes an escape(string) function that uses this syntax, but also an encodeURI(uri) function that converts to UTF-8 and percent-encodes each octet. ]] --
http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations
It turns out ASP (or possibly ASP.NET) servers decode these sequences, which leads to an XSS filter bypass because we don't understand that transformation. ASP servers are a large enough population that it's probably worth teaching the XSS auditor to understand this case.
Attachments
Work-in-progress patch
(18.74 KB, patch)
2011-09-01 22:35 PDT
,
Daniel Bates
no flags
Details
Formatted Diff
Diff
Patch and layout tests
(24.08 KB, patch)
2011-09-03 22:32 PDT
,
Daniel Bates
no flags
Details
Formatted Diff
Diff
Patch and layout tests
(30.95 KB, patch)
2011-09-04 16:37 PDT
,
Daniel Bates
abarth
: review-
Details
Formatted Diff
Diff
Patch and layout tests
(38.53 KB, patch)
2011-09-08 17:09 PDT
,
Daniel Bates
abarth
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Daniel Bates
Comment 1
2011-09-01 22:35:49 PDT
Created
attachment 106094
[details]
Work-in-progress patch
Adam Barth
Comment 2
2011-09-01 22:43:09 PDT
Comment on
attachment 106094
[details]
Work-in-progress patch View in context:
https://bugs.webkit.org/attachment.cgi?id=106094&action=review
> Source/WebCore/html/parser/XSSAuditor.cpp:119 > +static inline String decodeFancyUnicodeEscapeSequences(const String& string)
Love the name.
> Source/WebCore/platform/text/DecodeEscapeSequences.h:38 > +static inline int hexDigitValue(UChar c)
Supposedly including static functions in a header makes the linker sad. I think this function is in ASCIIType anyway.
Thomas Sepez
Comment 3
2011-09-02 11:20:15 PDT
Comment on
attachment 106094
[details]
Work-in-progress patch View in context:
https://bugs.webkit.org/attachment.cgi?id=106094&action=review
>> Source/WebCore/html/parser/XSSAuditor.cpp:119 >> +static inline String decodeFancyUnicodeEscapeSequences(const String& string) > > Love the name.
Might call it decode16BitUnicodeEscapeSequences.
> Source/WebCore/html/parser/XSSAuditor.cpp:135 > String decodedString = decoder->encoding().decode(workingStringUTF8.data(), workingStringUTF8.length());
We know this can't work. OK for now, but the decoding has to happen inside decodeURLEscapeSequences so maybe pass it the decoder.
> Source/WebCore/platform/text/DecodeEscapeSequences.h:62 > + *p++ = (hexDigitValue(run[2]) << 12) | (hexDigitValue(run[3]) << 8) | (hexDigitValue(run[4]) << 4) | hexDigitValue(run[5]);
*p is a char, but you're assigning 16 bit value to it.
> Source/WebCore/platform/text/DecodeEscapeSequences.h:112 > + String decoded = (encoding.isValid() ? encoding : UTF8Encoding()).decode(buffer.data(), numBytesWritten);
There's a difference between the 8bit version where the %-escapes are interepreted relative to an encoding vs. the %u-style escapes, which are expected to represent a unicode code point no matter what the encoding and shouldn't get decoded. I think your palceDecodedRunInBuffer needs to be passed the encoding so it can decide whether to do this or not, with the output buffer being a String (or UChar equivalent).
Thomas Sepez
Comment 4
2011-09-02 11:23:08 PDT
Comment on
attachment 106094
[details]
Work-in-progress patch View in context:
https://bugs.webkit.org/attachment.cgi?id=106094&action=review
> LayoutTests/http/tests/security/xssAuditor/script-tag-with-fancy-unicode2.html:13 > +</iframe>
Really need to stick some high-valued codepoints in here -- that may show the *p bug above. Also %252525u0061 should be tried as well to test interaction between the two decoders.
Thomas Sepez
Comment 5
2011-09-02 12:00:53 PDT
Comment on
attachment 106094
[details]
Work-in-progress patch View in context:
https://bugs.webkit.org/attachment.cgi?id=106094&action=review
> Source/WebCore/html/parser/XSSAuditor.cpp:130 > workingString = decodeURLEscapeSequences(workingString);
Might want to circumvent the path through KURL.cpp to avoid schizophrenic issues about whether KURLGoogle.cpp is used on a given build. You could just call the templated function directly.
Daniel Bates
Comment 6
2011-09-03 21:59:44 PDT
(In reply to
comment #2
)
> (From update of
attachment 106094
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=106094&action=review
> > > Source/WebCore/html/parser/XSSAuditor.cpp:119 > > +static inline String decodeFancyUnicodeEscapeSequences(const String& string) > > Love the name.
Although we understand the meaning of "fancy", I suggest we rename this function to decode16BitUnicodeEscapeSequences() as suggested by Thomas Sepez because this name better conveys the fanciness of these Unicode escape sequences.
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:38 > > +static inline int hexDigitValue(UChar c) > > Supposedly including static functions in a header makes the linker sad. I think this function is in ASCIIType anyway.
Will remove.
Adam Barth
Comment 7
2011-09-03 22:01:36 PDT
Okiedokes.
Daniel Bates
Comment 8
2011-09-03 22:06:46 PDT
(In reply to
comment #3
)
> (From update of
attachment 106094
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=106094&action=review
> > >> Source/WebCore/html/parser/XSSAuditor.cpp:119 > >> +static inline String decodeFancyUnicodeEscapeSequences(const String& string) > > > > Love the name. > > Might call it decode16BitUnicodeEscapeSequences.
Will rename.
> > > Source/WebCore/html/parser/XSSAuditor.cpp:135 > > String decodedString = decoder->encoding().decode(workingStringUTF8.data(), workingStringUTF8.length()); > > We know this can't work. OK for now, but the decoding has to happen inside decodeURLEscapeSequences so maybe pass it the decoder.
Will remove and instead pass text encoding to decodeURLEscapeSequences().
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:62 > > + *p++ = (hexDigitValue(run[2]) << 12) | (hexDigitValue(run[3]) << 8) | (hexDigitValue(run[4]) << 4) | hexDigitValue(run[5]); > > *p is a char, but you're assigning 16 bit value to it. >
Will fix.
> > Source/WebCore/platform/text/DecodeEscapeSequences.h:112 > > + String decoded = (encoding.isValid() ? encoding : UTF8Encoding()).decode(buffer.data(), numBytesWritten); > > There's a difference between the 8bit version where the %-escapes are interepreted relative to an encoding vs. the %u-style escapes, which are expected to represent a unicode code point no matter what the encoding and shouldn't get decoded. I think your palceDecodedRunInBuffer needs to be passed the encoding so it can decide whether to do this or not, with the output buffer being a String (or UChar equivalent).
Will rename placeDecodedRunInBuffer() to decodeRun(), have it take as input a text encoding, and have it return a String object.
Daniel Bates
Comment 9
2011-09-03 22:08:04 PDT
(In reply to
comment #4
)
> (From update of
attachment 106094
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=106094&action=review
> > > LayoutTests/http/tests/security/xssAuditor/script-tag-with-fancy-unicode2.html:13 > > +</iframe> > > Really need to stick some high-valued codepoints in here -- that may show the *p bug above. Also %252525u0061 should be tried as well to test interaction between the two decoders.
Will add test cases that include high-valued Unicode code points.
Daniel Bates
Comment 10
2011-09-03 22:21:06 PDT
(In reply to
comment #5
)
> (From update of
attachment 106094
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=106094&action=review
> > > Source/WebCore/html/parser/XSSAuditor.cpp:130 > > workingString = decodeURLEscapeSequences(workingString); > > Might want to circumvent the path through KURL.cpp to avoid schizophrenic issues about whether KURLGoogle.cpp is used on a given build. You could just call the templated function directly.
Will call templated function decodeEscapeSequences<URLEscapeSequence>() instead of KURL's decodeURLEscapeSequences().
Daniel Bates
Comment 11
2011-09-03 22:32:54 PDT
Created
attachment 106279
[details]
Patch and layout tests Needs change log. We could also look to bolster the test case script-tag-with-fancy-unicode5.html with more high-valued Unicode code points. Additionally, added an early return in XSSAuditor::init() when the document's URL is null or the empty string (i.e. url.isEmpty() evaluates to true). This can happen when opening a new browser window or calling window.open("") (i.e. not specifying a URL to window.open()).
Daniel Bates
Comment 12
2011-09-04 16:37:33 PDT
Created
attachment 106296
[details]
Patch and layout tests Add more high-valued Unicode code points to test case script-tag-with-16bit-unicode5.html (formerly named script-tag-with-fancy-unicode5.html). I renamed the test cases such that they have the suffix 16bit-unicode in their name so as to more closely match the references to 16-bit Unicode escape sequences in the code. I am open to suggestions. Should we decide to use this suffix then we should rename the existing test case script-tag-with-fancy-unicode.html for consistency. Added an additional test case, window-open-without-url-should-not-assert.html, to ensure we don't assert when opening a browser window without a URL using window.open("") (verbatim).
Adam Barth
Comment 13
2011-09-05 11:38:47 PDT
Comment on
attachment 106296
[details]
Patch and layout tests View in context:
https://bugs.webkit.org/attachment.cgi?id=106296&action=review
This looks good. Below are a bunch of minor comments. I'd like to look at it one more time to make sure we get the multi-byte unicode stuff right.
> Source/WebCore/html/parser/XSSAuditor.cpp:126 > +static inline String commonDecodeURLEscapeSequences(const String& string, const TextEncoding& encoding)
decodeCommonURLEscapeSequences ? As written, the name doesn't seem to read quite right. Maybe decodeStandardURLEscapSequences ?
> Source/WebCore/html/parser/XSSAuditor.cpp:142 > + workingString = decode16BitUnicodeEscapeSequences(workingString);
Don't we need to do this inside the loop too?
> Source/WebCore/platform/text/DecodeEscapeSequences.h:46 > + && WTF::isASCIIHexDigit(string[start + 2]) && WTF::isASCIIHexDigit(string[start + 3]) > + && WTF::isASCIIHexDigit(string[start + 4]) && WTF::isASCIIHexDigit(string[start + 5]);
You probably don't need the WTF:: parts.
> Source/WebCore/platform/text/DecodeEscapeSequences.h:52 > + size_t numSequences = runLength / size;
numSequences => numberOfSequences
> Source/WebCore/platform/text/DecodeEscapeSequences.h:53 > + Vector<UChar, 512> buffer;
Given that we need to return a String anyway, we should just use StringBuilder and reserveCapacity. We can't avoid the malloc in any case.
> Source/WebCore/platform/text/DecodeEscapeSequences.h:57 > + *p++ = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]);
Is this correct even with surrogates? I would have expected you to do something like what we do here:
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp#L73
> Source/WebCore/platform/text/DecodeEscapeSequences.h:76 > + DEFINE_STATIC_LOCAL(CharBuffer, buffer, ());
Why are we declaring it static? That can cause problems if this code is run on multiple threads or re-entered. Just allocating it on the stack should be find (and essentially free).
> Source/WebCore/platform/text/DecodeEscapeSequences.h:78 > + size_t numSequences = runLength / size;
numSequences => numberOfSequences
> Source/WebCore/platform/text/DecodeEscapeSequences.h:84 > + }
I'd add an ASSERT about buffer.size.
> Source/WebCore/platform/text/DecodeEscapeSequences.h:92 > + Vector<UChar> result;
StringBuilder is probably better here.
> Source/WebCore/platform/text/DecodeEscapeSequences.h:93 > + unsigned length = string.length();
size_t <-- amounts of memory are measured in size_t so they work properly on 64-bit machines.
> Source/WebCore/platform/text/DecodeEscapeSequences.h:95 > + unsigned decodedPosition = 0; > + unsigned searchPosition = 0;
Same here.
Daniel Bates
Comment 14
2011-09-08 16:17:35 PDT
(In reply to
comment #13
)
> [...] > > Source/WebCore/html/parser/XSSAuditor.cpp:126 > > +static inline String commonDecodeURLEscapeSequences(const String& string, const TextEncoding& encoding) > > decodeCommonURLEscapeSequences ? As written, the name doesn't seem to read quite right. Maybe decodeStandardURLEscapSequences ? >
I'll use decodeStandardURLEscapeSequences(). Both decodeCommonURLEscapeSequences() and decodeStandardURLEscapeSequences() seem to imply that there exists some uncommon/non-standard URL escape sequences that this function doesn't decode. I wanted to convey that this function doesn't use the platform-specific KURL URL decoding machinery. So, I chose the prefix "common" as a synonym for cross-platform. Maybe prefixing with "crossPlatform" would be more descriptive? Another idea is to rename decodeURLEscapeSequences() to platformDecodeURLEscapeSequences() in KURL.{cpp, h} and KURLGoogle.{cpp, h}. Then we can reuse the name decodeURLEscapeSequences for a function that calls decodeEscapeSequences<URLEscapeSequence>().
> > Source/WebCore/html/parser/XSSAuditor.cpp:142 > > + workingString = decode16BitUnicodeEscapeSequences(workingString); > > Don't we need to do this inside the loop too?
Yes, will fix. Then we'll catch the following attacks that intermix the %u- and URL- encodings, such as:
http://localhost:8000/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl?q
=<script>alert(/XS%u002525u0053/)</script>.
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:46 > > + && WTF::isASCIIHexDigit(string[start + 2]) && WTF::isASCIIHexDigit(string[start + 3]) > > + && WTF::isASCIIHexDigit(string[start + 4]) && WTF::isASCIIHexDigit(string[start + 5]); > > You probably don't need the WTF:: parts.
Will remove.
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:52 > > + size_t numSequences = runLength / size; > > numSequences => numberOfSequences
Will rename.
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:53 > > + Vector<UChar, 512> buffer; > > Given that we need to return a String anyway, we should just use StringBuilder and reserveCapacity. We can't avoid the malloc in any case.
Will use StringBuilder.
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:57 > > + *p++ = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]); > > Is this correct even with surrogates? I would have expected you to do something like what we do here: > >
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp#L73
Yes, this works with surrogates. We discussed this over IRC on 09/06/2011. Will follow this comment with a transcript from that discussion.
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:76 > > + DEFINE_STATIC_LOCAL(CharBuffer, buffer, ()); > > Why are we declaring it static? That can cause problems if this code is run on multiple threads or re-entered. Just allocating it on the stack should be find (and essentially free).
Will remove. Instead, will stack allocate a buffer.
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:78 > > + size_t numSequences = runLength / size; > > numSequences => numberOfSequences
Will rename.
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:84 > > + } > > I'd add an ASSERT about buffer.size.
Will add ASSERT(buffer.size() == static_cast<size_t>(p - buffer.data())).
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:92 > > + Vector<UChar> result; > > StringBuilder is probably better here.
Will use StringBuilder.
> > > Source/WebCore/platform/text/DecodeEscapeSequences.h:93 > > + unsigned length = string.length(); > > size_t <-- amounts of memory are measured in size_t so they work properly on 64-bit machines. > > > Source/WebCore/platform/text/DecodeEscapeSequences.h:95 > > + unsigned decodedPosition = 0; > > + unsigned searchPosition = 0; > > Same here.
Will use size_t.
Daniel Bates
Comment 15
2011-09-08 16:24:49 PDT
For completeness, I discussed with Adam over IRC on 09/06/2011 about the correctness of the decoding routine for %u-escape sequences that he asked about in
comment 13
: "> Source/WebCore/platform/text/DecodeEscapeSequences.h:57
> + *p++ = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]);
Is this correct even with surrogates? I would have expected you to do something like what we do here:
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp#L73
" The paraphrased IRC transcript follows: [7:56pm] abarth: dydz: were you looking for me earlier? [7:57pm] dydz: abarth: Hi abarth. I was. I wanted to talk about <
https://bugs.webkit.org/show_bug.cgi?id=66588#c13
> [7:57pm] dydz: abarth: In particular, about the surrogates. [7:57pm] abarth: yessir [7:58pm] dydz: abarth: From my understanding a String obejct represents UTF-16 string [7:58pm] dydz: And %u-escape sequences represent UTF-16 code points. [7:58pm] dydz: err [7:58pm] dydz: UTF-16 code unit [7:58pm] dydz: That is, the escape sequence doesn't represent a Unicode code point. [7:59pm] abarth: dydz: why do you think that? [7:59pm] abarth: maybe ECMA-262 says? [8:00pm] abarth: "%uxxxx, where xxxx is a Unicode value represented as four hexadecimal digits" [8:00pm] abarth: is what wikipedia says [8:00pm] abarth: which isn't all that precise [8:00pm] abarth: but you're right that 262 usually deals in code units [8:01pm] abarth:
http://www.w3.org/International/iri-edit/draft-duerst-iri.html
[8:01pm] abarth: might be a good source [8:02pm] abarth: "Instead of using the existing percent-encoding convention of URIs, which is based on octets, the idea was to create a new encoding convention, for example to use '%u' to introduce UCS code points." [8:02pm] dydz: abarth: I haven't read ECMA-262 yet, but for your reference the definition of Unicode code units seems consistent with the implementation of unescape <
http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp#L568
> [8:03pm] pizlo joined the chat room. [8:03pm] abarth: Lexer::convertUnicode [8:03pm] abarth: ok [8:03pm] abarth: you're probably right [8:03pm] abarth: maybe add a comment explaining? [8:04pm] dydz: abarth: sure [8:04pm] dydz: abarth: Can you save me some time, is a UCS-2 code point the same as a UTF-16 code unit? [8:05pm] abarth:
http://en.wikipedia.org/wiki/UTF-16/UCS-2
[8:05pm] abarth: "The older UCS-2 (2-byte Universal Character Set) is a similar character encoding that was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996.[2] It produces a fixed-length format by simply using the code point as the 16-bit code unit and produces exactly the same result as UTF-16 for 96.9% of all the code points in the range 0-0xFFFF, including all characters that had been assigned a value at that time." [8:05pm] abarth: sounds like UTF-16 without the surroages [8:06pm] abarth: and unable to represent characters that UTF-16 uses surrogates for [8:10pm] dydz: abarth: ok. So a UCS-2 code point is analogous to a UTF-16 code unit.
Daniel Bates
Comment 16
2011-09-08 17:09:33 PDT
Created
attachment 106817
[details]
Patch and layout tests Updated patch based on Adam Barth's comments. Added additional tests, including script-tag-with-16bit-unicode-surrogate-pair.html.
Adam Barth
Comment 17
2011-09-08 17:13:28 PDT
Comment on
attachment 106817
[details]
Patch and layout tests Thanks Dan.
Daniel Bates
Comment 18
2011-09-08 19:39:16 PDT
Committed
r94828
: <
http://trac.webkit.org/changeset/94828
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug