There are some simple optimizations that make things a lot faster:
Not interpreting data urls. These can be many kilobytes of base64 encoded things like images.
When jumping to the DFA root, don't re-add all the actions. We handle the actions from the root in special optimized ways, anyway.
Created attachment 250280 [details]
Created attachment 250281 [details]
View in context: https://bugs.webkit.org/attachment.cgi?id=250281&action=review
> + if (resourceLoadInfo.resourceURL.protocolIsData()
This could be on a single line.
> + WTFLogAlways("Time added: %f microseconds %s", (addedTimeEnd - addedTimeStart) * 1.0e6, resourceLoadInfo.resourceURL.string().utf8().data());
Isn't that a bit too verbose? Maybe printing the average after every 100 urls would be easier to work with?
> + // If we jump to the root, we don't want to re-add its actions to a HashSet.
> + // We know we have already added them because the root is always compiled first and we always start interpreting at the beginning.
That's a good point!
Touching URL.h broke things. Undid that in http://trac.webkit.org/changeset/182500