Call XSSAuditor.filterToken() from threaded HTML parser
omg.
Using the atomic token probably isn't going to work. The XSS filter needs to be able to call sourceForToken, which isn't going to work for atomic tokens. I think that means we're going to need to run the XSS filter on the background thread.
We should be able to deal with the HTMLNames dependency.
This is going to be pretty tricky. Adam and I discussed it offline and came up with the following sketch: - HTMLDocumentParser creates the XSSAuditor on the main thread - HTMLDocumentParser calls XSSAuditor::init() on the main thread - Transfer ownership of the XSSAuditor to the parser thread when the parser thread is created - Call filterToken() on the parser thread - XSSAuditor's actions in filterToken()'s didBlockScript block are queued up in the token steam and if the HTMLDocumentParser uses the token, it executes the action on the main thread. The tricky part is sharing string data, particularly for POST data which may be large. One option is to parse on the main thread for POST data or just for large POST data. The other question is whether we can break this into multiple patches so it isn't one monolithic switch.
I think we should consider splitting the filterToken bits out into a separate object which does not depend on document, etc. Or maybe the document, etc. parts are in an XSSFilterDelegate object. In either case, I think we should make it easy to tell what parts are OK on the background thread vs. not. Obviously all of XSSFilter isn't, and I don't think per-function separation is going to work as well as whole-object separation would.
After the blocking patches, there are 3 non-thread safe dependencies left: 1. sourceForToken()/Document's TextResourceDecoder 2. Tokenizer's shouldAllowCDATA() 3. Documents's url() #3 is easy, but I'm trying to figure out the cleanest way to handle #1 and #2. Any thoughts?
(In reply to comment #6) > After the blocking patches, there are 3 non-thread safe dependencies left: > 1. sourceForToken()/Document's TextResourceDecoder ^^^ We should be able to get sourceForToken from the BackgroundHTMLParser. As for TextResourceDecoder, we can get the name of the decoder and create a new one on the background thread. > 2. Tokenizer's shouldAllowCDATA() ^^^ This will also be available from the BackgroundHTMLParser's tokenizer. > 3. Documents's url() ^^^ As you say, this one is easy. :)
Created attachment 186929 [details] Patch
Comment on attachment 186929 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:165 > + { Why do we need these { } brackets? > Source/WebCore/html/parser/HTMLDocumentParser.cpp:531 > + ASSERT(decoder->hasOneRef()); Is the decoder itself safe to send to another thread? Perhaps we should add a isSafeToSendToAnotherThread function to it.
Comment on attachment 186929 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review > Source/WebCore/ChangeLog:8 > + With this patch we now pass 178 of 182 tests in http/tests/security/xssAuditor. That's great! > Source/WebCore/html/parser/HTMLDocumentParser.cpp:528 > + Settings* settings = document()->settings(); > + bool usesEncodingDetector = settings && settings->usesEncodingDetector(); > + String mimeType; > + if (DocumentLoader* loader = document()->loader()) > + mimeType = loader->responseMIMEType(); > + String defaultTextEncodingName; > + if (settings) > + defaultTextEncodingName = settings->defaultTextEncodingName(); > + RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create(mimeType, defaultTextEncodingName, usesEncodingDetector); Can we break this logic into a separate function that returns a PassRefPtr<TextResourceDecoder> ?
(In reply to comment #9) > (From update of attachment 186929 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review > > > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:165 > > + { > > Why do we need these { } brackets? To delete the CompactHTMLToken off the stack so that the token passes the isSafeToSendToAnotherThread() test. > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:531 > > + ASSERT(decoder->hasOneRef()); > > Is the decoder itself safe to send to another thread? Perhaps we should add a isSafeToSendToAnotherThread function to it. It looks safe to me. I don't see anything that we can validate (like Strings, RefPtrs, KURLs, etc). Would you mind double checking in case I missed something?
Comment on attachment 186929 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review >>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:531 >>> + ASSERT(decoder->hasOneRef()); >> >> Is the decoder itself safe to send to another thread? Perhaps we should add a isSafeToSendToAnotherThread function to it. > > It looks safe to me. I don't see anything that we can validate (like Strings, RefPtrs, KURLs, etc). Would you mind double checking in case I missed something? TextResourceDecoder has a HTMLMetaCharsetParser, which uses HTMLNames. TextCodec has a bunch of subclasses, which I haven't studied in detail. We might need to look at this more carefully.
Looks like we need to make sure we're not going to use the HTMLMetaCharsetParser.
Comment on attachment 186929 [details] Patch The bulk of this change looks good, but we need to be more careful about moving the TextResourceDecoder between threads.
(In reply to comment #13) > Looks like we need to make sure we're not going to use the HTMLMetaCharsetParser. I think we're just gonna ned up taking the eventNames() per-thread AtomicString approach and not caring in the short-term. We need to see how much of a perf-hit that would be, but it would dramatically simplify moving code between threads like this.
Maybe, but HTMLMetaCharsetParser calls , m_assumedCodec(newTextCodec(Latin1Encoding())) which might not be threadsafe either.
static String fullyDecodeString(const String& string, const TextResourceDecoder* decoder) in XSSAuditor.cpp calls: const TextEncoding& encoding = decoder ? decoder->encoding() : UTF8Encoding(); It looks like we need only the TextEncoding, not the TextResourceDecoder.
newTextCodec itself looks ok: PassOwnPtr<TextCodec> newTextCodec(const TextEncoding& encoding) { MutexLocker lock(encodingRegistryMutex()); ASSERT(textCodecMap); TextCodecFactory factory = textCodecMap->get(encoding.name()); ASSERT(factory.function); return factory.function(encoding, factory.additionalData); }
Created attachment 186948 [details] Patch
(In reply to comment #17) > static String fullyDecodeString(const String& string, const TextResourceDecoder* decoder) > > in XSSAuditor.cpp calls: > > const TextEncoding& encoding = decoder ? decoder->encoding() : UTF8Encoding(); > > It looks like we need only the TextEncoding, not the TextResourceDecoder. I'm really glad you noticed that! The patch is so much cleaner and safer now.
Now we pass 180 of 182 tests. Two were failing in the previous patch due to creating the wrong decoder type. Now we use the proper one from the Document.
Comment on attachment 186948 [details] Patch Ah, much nicer.
Comment on attachment 186948 [details] Patch Clearing flags on attachment: 186948 Committed r142099: <http://trac.webkit.org/changeset/142099>
All reviewed patches have been landed. Closing bug.