107603 – Call XSSAuditor.filterToken() from threaded HTML parser

RESOLVED FIXED 107603

Call XSSAuditor.filterToken() from threaded HTML parser

https://bugs.webkit.org/show_bug.cgi?id=107603

Summary Call XSSAuditor.filterToken() from threaded HTML parser

Tony Gentilcore

Reported 2013-01-22 17:29:57 PST

Call XSSAuditor.filterToken() from threaded HTML parser

Attachments
Patch (16.86 KB, patch) 2013-02-06 15:02 PST, Tony Gentilcore	no flags	Details Formatted Diff Diff
Patch (21.56 KB, patch) 2013-02-06 16:48 PST, Tony Gentilcore	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2013-01-22 18:14:51 PST

omg.

Adam Barth

Comment 2 2013-01-22 18:27:14 PST

Using the atomic token probably isn't going to work. The XSS filter needs to be able to call sourceForToken, which isn't going to work for atomic tokens. I think that means we're going to need to run the XSS filter on the background thread.

Adam Barth

Comment 3 2013-01-22 18:29:11 PST

We should be able to deal with the HTMLNames dependency.

Tony Gentilcore

Comment 4 2013-01-23 11:12:26 PST

This is going to be pretty tricky. Adam and I discussed it offline and came up with the following sketch: - HTMLDocumentParser creates the XSSAuditor on the main thread - HTMLDocumentParser calls XSSAuditor::init() on the main thread - Transfer ownership of the XSSAuditor to the parser thread when the parser thread is created - Call filterToken() on the parser thread - XSSAuditor's actions in filterToken()'s didBlockScript block are queued up in the token steam and if the HTMLDocumentParser uses the token, it executes the action on the main thread. The tricky part is sharing string data, particularly for POST data which may be large. One option is to parse on the main thread for POST data or just for large POST data. The other question is whether we can break this into multiple patches so it isn't one monolithic switch.

Eric Seidel (no email)

Comment 5 2013-01-23 11:22:02 PST

I think we should consider splitting the filterToken bits out into a separate object which does not depend on document, etc. Or maybe the document, etc. parts are in an XSSFilterDelegate object. In either case, I think we should make it easy to tell what parts are OK on the background thread vs. not. Obviously all of XSSFilter isn't, and I don't think per-function separation is going to work as well as whole-object separation would.

Tony Gentilcore

Comment 6 2013-01-31 17:57:33 PST

After the blocking patches, there are 3 non-thread safe dependencies left: 1. sourceForToken()/Document's TextResourceDecoder 2. Tokenizer's shouldAllowCDATA() 3. Documents's url() #3 is easy, but I'm trying to figure out the cleanest way to handle #1 and #2. Any thoughts?

Adam Barth

Comment 7 2013-02-01 00:28:48 PST

(In reply to comment #6) > After the blocking patches, there are 3 non-thread safe dependencies left: > 1. sourceForToken()/Document's TextResourceDecoder ^^^ We should be able to get sourceForToken from the BackgroundHTMLParser. As for TextResourceDecoder, we can get the name of the decoder and create a new one on the background thread. > 2. Tokenizer's shouldAllowCDATA() ^^^ This will also be available from the BackgroundHTMLParser's tokenizer. > 3. Documents's url() ^^^ As you say, this one is easy. :)

Tony Gentilcore

Comment 8 2013-02-06 15:02:04 PST

Created attachment 186929 [details] Patch

Adam Barth

Comment 9 2013-02-06 15:56:13 PST

Comment on attachment 186929 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:165 > + { Why do we need these { } brackets? > Source/WebCore/html/parser/HTMLDocumentParser.cpp:531 > + ASSERT(decoder->hasOneRef()); Is the decoder itself safe to send to another thread? Perhaps we should add a isSafeToSendToAnotherThread function to it.

Adam Barth

Comment 10 2013-02-06 15:57:38 PST

Comment on attachment 186929 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review > Source/WebCore/ChangeLog:8 > + With this patch we now pass 178 of 182 tests in http/tests/security/xssAuditor. That's great! > Source/WebCore/html/parser/HTMLDocumentParser.cpp:528 > + Settings* settings = document()->settings(); > + bool usesEncodingDetector = settings && settings->usesEncodingDetector(); > + String mimeType; > + if (DocumentLoader* loader = document()->loader()) > + mimeType = loader->responseMIMEType(); > + String defaultTextEncodingName; > + if (settings) > + defaultTextEncodingName = settings->defaultTextEncodingName(); > + RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create(mimeType, defaultTextEncodingName, usesEncodingDetector); Can we break this logic into a separate function that returns a PassRefPtr<TextResourceDecoder> ?

Tony Gentilcore

Comment 11 2013-02-06 16:10:56 PST

(In reply to comment #9) > (From update of attachment 186929 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review > > > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:165 > > + { > > Why do we need these { } brackets? To delete the CompactHTMLToken off the stack so that the token passes the isSafeToSendToAnotherThread() test. > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:531 > > + ASSERT(decoder->hasOneRef()); > > Is the decoder itself safe to send to another thread? Perhaps we should add a isSafeToSendToAnotherThread function to it. It looks safe to me. I don't see anything that we can validate (like Strings, RefPtrs, KURLs, etc). Would you mind double checking in case I missed something?

Adam Barth

Comment 12 2013-02-06 16:15:44 PST

Comment on attachment 186929 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review >>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:531 >>> + ASSERT(decoder->hasOneRef()); >> >> Is the decoder itself safe to send to another thread? Perhaps we should add a isSafeToSendToAnotherThread function to it. > > It looks safe to me. I don't see anything that we can validate (like Strings, RefPtrs, KURLs, etc). Would you mind double checking in case I missed something? TextResourceDecoder has a HTMLMetaCharsetParser, which uses HTMLNames. TextCodec has a bunch of subclasses, which I haven't studied in detail. We might need to look at this more carefully.

Adam Barth

Comment 13 2013-02-06 16:17:58 PST

Looks like we need to make sure we're not going to use the HTMLMetaCharsetParser.

Adam Barth

Comment 14 2013-02-06 16:18:51 PST

Comment on attachment 186929 [details] Patch The bulk of this change looks good, but we need to be more careful about moving the TextResourceDecoder between threads.

Eric Seidel (no email)

Comment 15 2013-02-06 16:19:13 PST

(In reply to comment #13) > Looks like we need to make sure we're not going to use the HTMLMetaCharsetParser. I think we're just gonna ned up taking the eventNames() per-thread AtomicString approach and not caring in the short-term. We need to see how much of a perf-hit that would be, but it would dramatically simplify moving code between threads like this.

Adam Barth

Comment 16 2013-02-06 16:20:21 PST

Maybe, but HTMLMetaCharsetParser calls , m_assumedCodec(newTextCodec(Latin1Encoding())) which might not be threadsafe either.

Adam Barth

Comment 17 2013-02-06 16:22:36 PST

static String fullyDecodeString(const String& string, const TextResourceDecoder* decoder) in XSSAuditor.cpp calls: const TextEncoding& encoding = decoder ? decoder->encoding() : UTF8Encoding(); It looks like we need only the TextEncoding, not the TextResourceDecoder.

Adam Barth

Comment 18 2013-02-06 16:26:20 PST

newTextCodec itself looks ok: PassOwnPtr<TextCodec> newTextCodec(const TextEncoding& encoding) { MutexLocker lock(encodingRegistryMutex()); ASSERT(textCodecMap); TextCodecFactory factory = textCodecMap->get(encoding.name()); ASSERT(factory.function); return factory.function(encoding, factory.additionalData); }

Tony Gentilcore

Comment 19 2013-02-06 16:48:57 PST

Created attachment 186948 [details] Patch

Tony Gentilcore

Comment 20 2013-02-06 16:49:35 PST

(In reply to comment #17) > static String fullyDecodeString(const String& string, const TextResourceDecoder* decoder) > > in XSSAuditor.cpp calls: > > const TextEncoding& encoding = decoder ? decoder->encoding() : UTF8Encoding(); > > It looks like we need only the TextEncoding, not the TextResourceDecoder. I'm really glad you noticed that! The patch is so much cleaner and safer now.

Tony Gentilcore

Comment 21 2013-02-06 16:50:29 PST

Now we pass 180 of 182 tests. Two were failing in the previous patch due to creating the wrong decoder type. Now we use the proper one from the Document.

Adam Barth

Comment 22 2013-02-06 18:04:01 PST

Comment on attachment 186948 [details] Patch Ah, much nicer.

WebKit Review Bot

Comment 23 2013-02-07 05:02:39 PST

Comment on attachment 186948 [details] Patch Clearing flags on attachment: 186948 Committed r142099: <http://trac.webkit.org/changeset/142099>

WebKit Review Bot

Comment 24 2013-02-07 05:02:46 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebCore Misc.

Assignee

Tony Gentilcore

Reported

2013-01-22 17:29 PST

Modified

2013-02-07 05:02 PST History

CC List

6 users Show

URL

Keywords

Depends on

108394 108531 108557 108655 108666 108698 108726 108880

Blocks

106127

Dependencies

tree graph