Bug 107603 - Call XSSAuditor.filterToken() from threaded HTML parser
Summary: Call XSSAuditor.filterToken() from threaded HTML parser
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Tony Gentilcore
URL:
Keywords:
Depends on: 108394 108531 108557 108655 108666 108698 108726 108880
Blocks: 106127
  Show dependency treegraph
 
Reported: 2013-01-22 17:29 PST by Tony Gentilcore
Modified: 2013-02-07 05:02 PST (History)
6 users (show)

See Also:


Attachments
Patch (16.86 KB, patch)
2013-02-06 15:02 PST, Tony Gentilcore
no flags Details | Formatted Diff | Diff
Patch (21.56 KB, patch)
2013-02-06 16:48 PST, Tony Gentilcore
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tony Gentilcore 2013-01-22 17:29:57 PST
Call XSSAuditor.filterToken() from threaded HTML parser
Comment 1 Eric Seidel (no email) 2013-01-22 18:14:51 PST
omg.
Comment 2 Adam Barth 2013-01-22 18:27:14 PST
Using the atomic token probably isn't going to work.  The XSS filter needs to be able to call sourceForToken, which isn't going to work for atomic tokens.

I think that means we're going to need to run the XSS filter on the background thread.
Comment 3 Adam Barth 2013-01-22 18:29:11 PST
We should be able to deal with the HTMLNames dependency.
Comment 4 Tony Gentilcore 2013-01-23 11:12:26 PST
This is going to be pretty tricky. Adam and I discussed it offline and came up with the following sketch:
- HTMLDocumentParser creates the XSSAuditor on the main thread
- HTMLDocumentParser calls XSSAuditor::init() on the main thread
- Transfer ownership of the XSSAuditor to the parser thread when the parser thread is created
- Call filterToken() on the parser thread
- XSSAuditor's actions in filterToken()'s didBlockScript block are queued up in the token steam and if the HTMLDocumentParser uses the token, it executes the action on the main thread.

The tricky part is sharing string data, particularly for POST data which may be large. One option is to parse on the main thread for POST data or just for large POST data.

The other question is whether we can break this into multiple patches so it isn't one monolithic switch.
Comment 5 Eric Seidel (no email) 2013-01-23 11:22:02 PST
I think we should consider splitting the filterToken bits out into a separate object which does not depend on document, etc.  Or maybe the document, etc. parts are in an XSSFilterDelegate object.

In either case, I think we should make it easy to tell what parts are OK on the background thread vs. not. Obviously all of XSSFilter isn't, and I don't think per-function separation is going to work as well as whole-object separation would.
Comment 6 Tony Gentilcore 2013-01-31 17:57:33 PST
After the blocking patches, there are 3 non-thread safe dependencies left:
1. sourceForToken()/Document's TextResourceDecoder
2. Tokenizer's shouldAllowCDATA()
3. Documents's url()

#3 is easy, but I'm trying to figure out the cleanest way to handle #1 and #2. Any thoughts?
Comment 7 Adam Barth 2013-02-01 00:28:48 PST
(In reply to comment #6)
> After the blocking patches, there are 3 non-thread safe dependencies left:
> 1. sourceForToken()/Document's TextResourceDecoder

^^^ We should be able to get sourceForToken from the BackgroundHTMLParser.  As for TextResourceDecoder, we can get the name of the decoder and create a new one on the background thread.

> 2. Tokenizer's shouldAllowCDATA()

^^^ This will also be available from the BackgroundHTMLParser's tokenizer.

> 3. Documents's url()

^^^ As you say, this one is easy.  :)
Comment 8 Tony Gentilcore 2013-02-06 15:02:04 PST
Created attachment 186929 [details]
Patch
Comment 9 Adam Barth 2013-02-06 15:56:13 PST
Comment on attachment 186929 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:165
> +        {

Why do we need these { } brackets?

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:531
> +    ASSERT(decoder->hasOneRef());

Is the decoder itself safe to send to another thread?  Perhaps we should add a isSafeToSendToAnotherThread function to it.
Comment 10 Adam Barth 2013-02-06 15:57:38 PST
Comment on attachment 186929 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review

> Source/WebCore/ChangeLog:8
> +        With this patch we now pass 178 of 182 tests in http/tests/security/xssAuditor.

That's great!

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:528
> +    Settings* settings = document()->settings();
> +    bool usesEncodingDetector = settings && settings->usesEncodingDetector();
> +    String mimeType;
> +    if (DocumentLoader* loader = document()->loader())
> +        mimeType = loader->responseMIMEType();
> +    String defaultTextEncodingName;
> +    if (settings)
> +        defaultTextEncodingName = settings->defaultTextEncodingName();
> +    RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create(mimeType, defaultTextEncodingName, usesEncodingDetector);

Can we break this logic into a separate function that returns a PassRefPtr<TextResourceDecoder> ?
Comment 11 Tony Gentilcore 2013-02-06 16:10:56 PST
(In reply to comment #9)
> (From update of attachment 186929 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review
> 
> > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:165
> > +        {
> 
> Why do we need these { } brackets?

To delete the CompactHTMLToken off the stack so that the token passes the isSafeToSendToAnotherThread() test.

> 
> > Source/WebCore/html/parser/HTMLDocumentParser.cpp:531
> > +    ASSERT(decoder->hasOneRef());
> 
> Is the decoder itself safe to send to another thread?  Perhaps we should add a isSafeToSendToAnotherThread function to it.

It looks safe to me. I don't see anything that we can validate (like Strings, RefPtrs, KURLs, etc). Would you mind double checking in case I missed something?
Comment 12 Adam Barth 2013-02-06 16:15:44 PST
Comment on attachment 186929 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review

>>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:531
>>> +    ASSERT(decoder->hasOneRef());
>> 
>> Is the decoder itself safe to send to another thread?  Perhaps we should add a isSafeToSendToAnotherThread function to it.
> 
> It looks safe to me. I don't see anything that we can validate (like Strings, RefPtrs, KURLs, etc). Would you mind double checking in case I missed something?

TextResourceDecoder has a HTMLMetaCharsetParser, which uses HTMLNames.  TextCodec has a bunch of subclasses, which I haven't studied in detail.  We might need to look at this more carefully.
Comment 13 Adam Barth 2013-02-06 16:17:58 PST
Looks like we need to make sure we're not going to use the HTMLMetaCharsetParser.
Comment 14 Adam Barth 2013-02-06 16:18:51 PST
Comment on attachment 186929 [details]
Patch

The bulk of this change looks good, but we need to be more careful about moving the TextResourceDecoder between threads.
Comment 15 Eric Seidel (no email) 2013-02-06 16:19:13 PST
(In reply to comment #13)
> Looks like we need to make sure we're not going to use the HTMLMetaCharsetParser.

I think we're just gonna ned up taking the eventNames() per-thread AtomicString approach and not caring in the short-term.  We need to see how much of a perf-hit that would be, but it would dramatically simplify moving code between threads like this.
Comment 16 Adam Barth 2013-02-06 16:20:21 PST
Maybe, but HTMLMetaCharsetParser calls

    , m_assumedCodec(newTextCodec(Latin1Encoding()))

which might not be threadsafe either.
Comment 17 Adam Barth 2013-02-06 16:22:36 PST
static String fullyDecodeString(const String& string, const TextResourceDecoder* decoder)

in XSSAuditor.cpp calls:

    const TextEncoding& encoding = decoder ? decoder->encoding() : UTF8Encoding();

It looks like we need only the TextEncoding, not the TextResourceDecoder.
Comment 18 Adam Barth 2013-02-06 16:26:20 PST
newTextCodec itself looks ok:

PassOwnPtr<TextCodec> newTextCodec(const TextEncoding& encoding)
{
    MutexLocker lock(encodingRegistryMutex());

    ASSERT(textCodecMap);
    TextCodecFactory factory = textCodecMap->get(encoding.name());
    ASSERT(factory.function);
    return factory.function(encoding, factory.additionalData);
}
Comment 19 Tony Gentilcore 2013-02-06 16:48:57 PST
Created attachment 186948 [details]
Patch
Comment 20 Tony Gentilcore 2013-02-06 16:49:35 PST
(In reply to comment #17)
> static String fullyDecodeString(const String& string, const TextResourceDecoder* decoder)
> 
> in XSSAuditor.cpp calls:
> 
>     const TextEncoding& encoding = decoder ? decoder->encoding() : UTF8Encoding();
> 
> It looks like we need only the TextEncoding, not the TextResourceDecoder.

I'm really glad you noticed that! The patch is so much cleaner and safer now.
Comment 21 Tony Gentilcore 2013-02-06 16:50:29 PST
Now we pass 180 of 182 tests. Two were failing in the previous patch due to creating the wrong decoder type. Now we use the proper one from the Document.
Comment 22 Adam Barth 2013-02-06 18:04:01 PST
Comment on attachment 186948 [details]
Patch

Ah, much nicer.
Comment 23 WebKit Review Bot 2013-02-07 05:02:39 PST
Comment on attachment 186948 [details]
Patch

Clearing flags on attachment: 186948

Committed r142099: <http://trac.webkit.org/changeset/142099>
Comment 24 WebKit Review Bot 2013-02-07 05:02:46 PST
All reviewed patches have been landed.  Closing bug.