Bug 143010 - Content extensions need multi-dfa compiler
Summary: Content extensions need multi-dfa compiler
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
Depends on: 143041
  Show dependency treegraph
Reported: 2015-03-24 12:22 PDT by Alex Christensen
Modified: 2015-03-25 10:33 PDT (History)
2 users (show)

See Also:

Patch (14.23 KB, patch)
2015-03-24 12:30 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (21.92 KB, patch)
2015-03-24 18:55 PDT, Alex Christensen
benjamin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2015-03-24 12:22:54 PDT
In order to balance compiling time, byte code size, and interpreting time, we need to be able to compile multiple DFAs.  Here's a patch that does that, but I'm not sure about the conditions to add another NFA or how to decide which NFA to put this regex in.
Comment 1 Alex Christensen 2015-03-24 12:30:05 PDT
Created attachment 249339 [details]
Comment 2 Alex Christensen 2015-03-24 13:31:41 PDT
This could also cause problems with locality of memory accesses within the byte code.  If we do things right, we would have as many hot spots as DFAs.
Comment 3 Benjamin Poulain 2015-03-24 15:36:11 PDT
Comment on attachment 249339 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=249339&action=review

I was thinking about this feature very differently:
-Add a global header chain to the binary.
-Make all jumps relative.
-Stream the DFAs one by one to the output.

You patch is much simpler :)

The patch looks good, but can you please add at least one test? Generate a filter of 2k patterns programmatically and test that patterns matching works with both half.
It may be good to test actions from the second half can be overridden by actions from the first half.

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:118
> +        if (nfas[nfas.size() - 1].graphSize() > 1000) // Need a better condition here. Not sure what.

That's fine, we'll study this closely.

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:120
> +        NFA& lastNFA = nfas[nfas.size() - 1]; // Need to find "best" nfa here. Not sure how.

You can remove the comment.

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:157
> +    for (size_t i = 0; i < nfas.size(); i++) {

WebKit style: ++i

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:187
> +    // Set size header.
> +    set32Bits(m_bytecode, startLocation, m_bytecode.size() - startLocation);

I was thinking of implementing this using a global header, like on real binaries. This is so much simpler :)

> Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:64
> +    for (unsigned programCounter = 0; programCounter < m_bytecodeLength;) {

This is a bit cryptic. I think the condition would be clearer if it was handled at "nextDFA:".

I am afraid this for() loop make it look like the program counter is incremented sequentially in the loop.

> Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:142
> +        programCounter = dfaStart + dfaBytecodeLength;

Let's assert dfaBytecodeLength is non-zero.
Comment 4 Benjamin Poulain 2015-03-24 15:50:10 PDT
(In reply to comment #2)
> This could also cause problems with locality of memory accesses within the
> byte code.  If we do things right, we would have as many hot spots as DFAs.

I think that would be fine. Pages are small, it does not matter how they are distributed.

Are you gonna look into locality optimization next? :)
Comment 5 Alex Christensen 2015-03-24 18:55:02 PDT
Created attachment 249375 [details]
Comment 6 Alex Christensen 2015-03-24 22:20:42 PDT
Comment 7 Alexey Proskuryakov 2015-03-25 00:19:19 PDT
This test fails most of the time on bots. Will roll out.

TIMEOUT ContentExtensionTest.MultiDFA
Comment 8 WebKit Commit Bot 2015-03-25 00:37:53 PDT
Re-opened since this is blocked by bug 143041
Comment 9 Alex Christensen 2015-03-25 10:33:57 PDT
Tweaked the test and max NFA size slightly to not timeout on debug builds while still testing the multi-DFA functionality.
Recommitted to http://trac.webkit.org/changeset/181932