Bug 35373 - XSSAuditor is super super super slow
: XSSAuditor is super super super slow
Status: RESOLVED FIXED
Product: WebKit
Classification: Unclassified
Component: New Bugs
: 528+ (Nightly build)
: PC Mac OS X 10.5
: P2 Normal
Assigned To: Nobody
http://tersk.intudata.com/~dbates/web...
: XSSAuditor
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-02-24 18:12 PST by James Robinson
Modified: 2010-02-26 10:13 PST (History)
4 users (show)

See Also:


Attachments
shark profile of load (1.22 MB, application/octet-stream)
2010-02-24 18:14 PST, James Robinson
no flags Details
test page (PHP) (5.43 KB, text/php)
2010-02-24 18:15 PST, James Robinson
no flags Details
Patch (1.51 KB, patch)
2010-02-25 11:56 PST, James Robinson
no flags Details | Formatted Diff | Diff
Patch (2.98 KB, patch)
2010-02-26 09:33 PST, Adam Barth
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James Robinson 2010-02-24 18:12:19 PST
A page that contained a large number of <input> elements with inline onblur handlers and that POSTs a lot of data is super super super slow with the XSSAuditor enabled.  This is a really huge perf regression.
Comment 1 James Robinson 2010-02-24 18:14:07 PST
Created attachment 49456 [details]
shark profile of load
Comment 2 James Robinson 2010-02-24 18:15:43 PST
Created attachment 49457 [details]
test page (PHP)

Test page, from the original Chromium bug.  It has directions for running it in the page.  A bit messy but shows the problem clearly.
Comment 3 James Robinson 2010-02-24 18:25:05 PST
Here are the two really bad callstacks on the second load:

	0.0%	91.8%	WebCore	                WebCore::HTMLTokenizer::write(WebCore::SegmentedString const&, bool)
	0.1%	91.7%	WebCore	                 WebCore::HTMLTokenizer::parseTag(WebCore::SegmentedString&, WebCore::HTMLTokenizer::State)
	0.0%	91.5%	WebCore	                  WebCore::HTMLTokenizer::processToken()
	0.0%	91.5%	WebCore	                   WebCore::HTMLParser::parseToken(WebCore::Token*)
	0.0%	90.3%	WebCore	                    WebCore::Element::setAttributeMap(WTF::PassRefPtr<WebCore::NamedNodeMap>, WebCore::FragmentScriptingPermission)
	0.0%	90.2%	WebCore	                     WebCore::StyledElement::attributeChanged(WebCore::Attribute*, bool)
	0.0%	58.2%	WebCore	                      WebCore::HTMLInputElement::parseMappedAttribute(WebCore::MappedAttribute*)
	0.0%	58.1%	WebCore	                       WebCore::HTMLTextFormControlElement::parseMappedAttribute(WebCore::MappedAttribute*)
	0.0%	51.5%	WebCore	                        WebCore::createAttributeEventListener(WebCore::Node*, WebCore::Attribute*)
	0.0%	51.5%	WebCore	                         WebCore::XSSAuditor::canCreateInlineEventListener(WebCore::String const&, WebCore::String const&) const

and

	0.0%	91.8%	WebCore	                WebCore::HTMLTokenizer::write(WebCore::SegmentedString const&, bool)
	0.1%	91.7%	WebCore	                 WebCore::HTMLTokenizer::parseTag(WebCore::SegmentedString&, WebCore::HTMLTokenizer::State)
	0.0%	91.5%	WebCore	                  WebCore::HTMLTokenizer::processToken()
	0.0%	91.5%	WebCore	                   WebCore::HTMLParser::parseToken(WebCore::Token*)
	0.0%	90.3%	WebCore	                    WebCore::Element::setAttributeMap(WTF::PassRefPtr<WebCore::NamedNodeMap>, WebCore::FragmentScriptingPermission)
	0.0%	90.2%	WebCore	                     WebCore::StyledElement::attributeChanged(WebCore::Attribute*, bool)
	0.0%	58.2%	WebCore	                      WebCore::HTMLInputElement::parseMappedAttribute(WebCore::MappedAttribute*)
	0.0%	25.5%	WebCore	                      WebCore::HTMLSelectElement::parseMappedAttribute(WebCore::MappedAttribute*)
	0.0%	25.5%	WebCore	                       WebCore::createAttributeEventListener(WebCore::Node*, WebCore::Attribute*)
	0.0%	25.4%	WebCore	                        WebCore::XSSAuditor::canCreateInlineEventListener(WebCore::String const&, WebCore::String const&) const

They don't make sense to me - why would the XSSAuditor have to check anything coming out of the HTMLTokenizer?  This is HTML served up by the server, not coming from any user-controlled data.
Comment 4 Dimitri Glazkov (Google) 2010-02-24 18:27:49 PST
Are you sure this warrants three supers? It looks only super super slow.
Comment 5 Adam Barth 2010-02-25 00:24:34 PST
> They don't make sense to me - why would the XSSAuditor have to check anything
> coming out of the HTMLTokenizer?  This is HTML served up by the server, not
> coming from any user-controlled data.

The XSS auditor is checking whether the event handlers are present in the post request.  If they are, it gets suspicious that there might be a reflected XSS attack.

I need to spend some time with this in the debugger to see how we can speed this up.  In the worse case, we might need to build some sort of index over the post data to make substring queries faster.
Comment 6 James Robinson 2010-02-25 11:56:28 PST
Created attachment 49513 [details]
Patch
Comment 7 James Robinson 2010-02-25 11:58:02 PST
While you're thinking, could you please review this patch?
Comment 8 Adam Barth 2010-02-25 12:20:24 PST
Comment on attachment 49513 [details]
Patch

If you want to disable the auditor, we can do that via the Settings object.  In particular, you can look in Chromium at how the --disable-xss-auditor command line switch is implemented.
Comment 9 Adam Barth 2010-02-26 09:22:42 PST
Patch in hand, benchmarking.
Comment 10 Adam Barth 2010-02-26 09:33:56 PST
Created attachment 49593 [details]
Patch
Comment 11 Darin Adler 2010-02-26 09:35:09 PST
Comment on attachment 49593 [details]
Patch

This patch needs a "why" comment explaining why it's important to use two caches.
Comment 12 Darin Adler 2010-02-26 09:35:25 PST
Comment on attachment 49593 [details]
Patch

The code, not the change log.
Comment 13 Adam Barth 2010-02-26 09:41:19 PST
I've added the following comment to the header:

        // A state store to help us avoid canonicalizing the same URL repeated.
        // When a page has form data, we need two caches: one to store the
        // canonicalized URL and another to store the cannonicalized form
        // data. If we only had one cache, we'd always generate a cache miss
        // and load some pages extremely slowly.
        // https://bugs.webkit.org/show_bug.cgi?id=35373

We're still slower on loading this page by a factor of four, but that's better than the factor of infinity slower we were before this patch.  (Ok, it's not really infinity, I just got bored waiting for the timer to finish.)
Comment 14 Adam Barth 2010-02-26 10:13:02 PST
http://trac.webkit.org/changeset/55290