Bug 107603

Summary: Call XSSAuditor.filterToken() from threaded HTML parser
Product: WebKit Reporter: Tony Gentilcore <tonyg>
Component: WebCore Misc.Assignee: Tony Gentilcore <tonyg>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, dbates, eric, ojan.autocc, tsepez, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 108394, 108531, 108557, 108655, 108666, 108698, 108726, 108880    
Bug Blocks: 106127    
Attachments:
Description Flags
Patch
none
Patch none

Tony Gentilcore
Reported 2013-01-22 17:29:57 PST
Call XSSAuditor.filterToken() from threaded HTML parser
Attachments
Patch (16.86 KB, patch)
2013-02-06 15:02 PST, Tony Gentilcore
no flags
Patch (21.56 KB, patch)
2013-02-06 16:48 PST, Tony Gentilcore
no flags
Eric Seidel (no email)
Comment 1 2013-01-22 18:14:51 PST
omg.
Adam Barth
Comment 2 2013-01-22 18:27:14 PST
Using the atomic token probably isn't going to work. The XSS filter needs to be able to call sourceForToken, which isn't going to work for atomic tokens. I think that means we're going to need to run the XSS filter on the background thread.
Adam Barth
Comment 3 2013-01-22 18:29:11 PST
We should be able to deal with the HTMLNames dependency.
Tony Gentilcore
Comment 4 2013-01-23 11:12:26 PST
This is going to be pretty tricky. Adam and I discussed it offline and came up with the following sketch: - HTMLDocumentParser creates the XSSAuditor on the main thread - HTMLDocumentParser calls XSSAuditor::init() on the main thread - Transfer ownership of the XSSAuditor to the parser thread when the parser thread is created - Call filterToken() on the parser thread - XSSAuditor's actions in filterToken()'s didBlockScript block are queued up in the token steam and if the HTMLDocumentParser uses the token, it executes the action on the main thread. The tricky part is sharing string data, particularly for POST data which may be large. One option is to parse on the main thread for POST data or just for large POST data. The other question is whether we can break this into multiple patches so it isn't one monolithic switch.
Eric Seidel (no email)
Comment 5 2013-01-23 11:22:02 PST
I think we should consider splitting the filterToken bits out into a separate object which does not depend on document, etc. Or maybe the document, etc. parts are in an XSSFilterDelegate object. In either case, I think we should make it easy to tell what parts are OK on the background thread vs. not. Obviously all of XSSFilter isn't, and I don't think per-function separation is going to work as well as whole-object separation would.
Tony Gentilcore
Comment 6 2013-01-31 17:57:33 PST
After the blocking patches, there are 3 non-thread safe dependencies left: 1. sourceForToken()/Document's TextResourceDecoder 2. Tokenizer's shouldAllowCDATA() 3. Documents's url() #3 is easy, but I'm trying to figure out the cleanest way to handle #1 and #2. Any thoughts?
Adam Barth
Comment 7 2013-02-01 00:28:48 PST
(In reply to comment #6) > After the blocking patches, there are 3 non-thread safe dependencies left: > 1. sourceForToken()/Document's TextResourceDecoder ^^^ We should be able to get sourceForToken from the BackgroundHTMLParser. As for TextResourceDecoder, we can get the name of the decoder and create a new one on the background thread. > 2. Tokenizer's shouldAllowCDATA() ^^^ This will also be available from the BackgroundHTMLParser's tokenizer. > 3. Documents's url() ^^^ As you say, this one is easy. :)
Tony Gentilcore
Comment 8 2013-02-06 15:02:04 PST
Adam Barth
Comment 9 2013-02-06 15:56:13 PST
Comment on attachment 186929 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:165 > + { Why do we need these { } brackets? > Source/WebCore/html/parser/HTMLDocumentParser.cpp:531 > + ASSERT(decoder->hasOneRef()); Is the decoder itself safe to send to another thread? Perhaps we should add a isSafeToSendToAnotherThread function to it.
Adam Barth
Comment 10 2013-02-06 15:57:38 PST
Comment on attachment 186929 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review > Source/WebCore/ChangeLog:8 > + With this patch we now pass 178 of 182 tests in http/tests/security/xssAuditor. That's great! > Source/WebCore/html/parser/HTMLDocumentParser.cpp:528 > + Settings* settings = document()->settings(); > + bool usesEncodingDetector = settings && settings->usesEncodingDetector(); > + String mimeType; > + if (DocumentLoader* loader = document()->loader()) > + mimeType = loader->responseMIMEType(); > + String defaultTextEncodingName; > + if (settings) > + defaultTextEncodingName = settings->defaultTextEncodingName(); > + RefPtr<TextResourceDecoder> decoder = TextResourceDecoder::create(mimeType, defaultTextEncodingName, usesEncodingDetector); Can we break this logic into a separate function that returns a PassRefPtr<TextResourceDecoder> ?
Tony Gentilcore
Comment 11 2013-02-06 16:10:56 PST
(In reply to comment #9) > (From update of attachment 186929 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review > > > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:165 > > + { > > Why do we need these { } brackets? To delete the CompactHTMLToken off the stack so that the token passes the isSafeToSendToAnotherThread() test. > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:531 > > + ASSERT(decoder->hasOneRef()); > > Is the decoder itself safe to send to another thread? Perhaps we should add a isSafeToSendToAnotherThread function to it. It looks safe to me. I don't see anything that we can validate (like Strings, RefPtrs, KURLs, etc). Would you mind double checking in case I missed something?
Adam Barth
Comment 12 2013-02-06 16:15:44 PST
Comment on attachment 186929 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=186929&action=review >>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:531 >>> + ASSERT(decoder->hasOneRef()); >> >> Is the decoder itself safe to send to another thread? Perhaps we should add a isSafeToSendToAnotherThread function to it. > > It looks safe to me. I don't see anything that we can validate (like Strings, RefPtrs, KURLs, etc). Would you mind double checking in case I missed something? TextResourceDecoder has a HTMLMetaCharsetParser, which uses HTMLNames. TextCodec has a bunch of subclasses, which I haven't studied in detail. We might need to look at this more carefully.
Adam Barth
Comment 13 2013-02-06 16:17:58 PST
Looks like we need to make sure we're not going to use the HTMLMetaCharsetParser.
Adam Barth
Comment 14 2013-02-06 16:18:51 PST
Comment on attachment 186929 [details] Patch The bulk of this change looks good, but we need to be more careful about moving the TextResourceDecoder between threads.
Eric Seidel (no email)
Comment 15 2013-02-06 16:19:13 PST
(In reply to comment #13) > Looks like we need to make sure we're not going to use the HTMLMetaCharsetParser. I think we're just gonna ned up taking the eventNames() per-thread AtomicString approach and not caring in the short-term. We need to see how much of a perf-hit that would be, but it would dramatically simplify moving code between threads like this.
Adam Barth
Comment 16 2013-02-06 16:20:21 PST
Maybe, but HTMLMetaCharsetParser calls , m_assumedCodec(newTextCodec(Latin1Encoding())) which might not be threadsafe either.
Adam Barth
Comment 17 2013-02-06 16:22:36 PST
static String fullyDecodeString(const String& string, const TextResourceDecoder* decoder) in XSSAuditor.cpp calls: const TextEncoding& encoding = decoder ? decoder->encoding() : UTF8Encoding(); It looks like we need only the TextEncoding, not the TextResourceDecoder.
Adam Barth
Comment 18 2013-02-06 16:26:20 PST
newTextCodec itself looks ok: PassOwnPtr<TextCodec> newTextCodec(const TextEncoding& encoding) { MutexLocker lock(encodingRegistryMutex()); ASSERT(textCodecMap); TextCodecFactory factory = textCodecMap->get(encoding.name()); ASSERT(factory.function); return factory.function(encoding, factory.additionalData); }
Tony Gentilcore
Comment 19 2013-02-06 16:48:57 PST
Tony Gentilcore
Comment 20 2013-02-06 16:49:35 PST
(In reply to comment #17) > static String fullyDecodeString(const String& string, const TextResourceDecoder* decoder) > > in XSSAuditor.cpp calls: > > const TextEncoding& encoding = decoder ? decoder->encoding() : UTF8Encoding(); > > It looks like we need only the TextEncoding, not the TextResourceDecoder. I'm really glad you noticed that! The patch is so much cleaner and safer now.
Tony Gentilcore
Comment 21 2013-02-06 16:50:29 PST
Now we pass 180 of 182 tests. Two were failing in the previous patch due to creating the wrong decoder type. Now we use the proper one from the Document.
Adam Barth
Comment 22 2013-02-06 18:04:01 PST
Comment on attachment 186948 [details] Patch Ah, much nicer.
WebKit Review Bot
Comment 23 2013-02-07 05:02:39 PST
Comment on attachment 186948 [details] Patch Clearing flags on attachment: 186948 Committed r142099: <http://trac.webkit.org/changeset/142099>
WebKit Review Bot
Comment 24 2013-02-07 05:02:46 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.