Bug 143010

Summary: Content extensions need multi-dfa compiler
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: WebCore Misc.Assignee: Alex Christensen <achristensen>
Severity: Normal CC: benjamin, commit-queue
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 143041    
Bug Blocks:    
Description Flags
Patch benjamin: review+

Description Alex Christensen 2015-03-24 12:22:54 PDT
In order to balance compiling time, byte code size, and interpreting time, we need to be able to compile multiple DFAs.  Here's a patch that does that, but I'm not sure about the conditions to add another NFA or how to decide which NFA to put this regex in.
Comment 1 Alex Christensen 2015-03-24 12:30:05 PDT
Created attachment 249339 [details]
Comment 2 Alex Christensen 2015-03-24 13:31:41 PDT
This could also cause problems with locality of memory accesses within the byte code.  If we do things right, we would have as many hot spots as DFAs.
Comment 3 Benjamin Poulain 2015-03-24 15:36:11 PDT
Comment on attachment 249339 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=249339&action=review

I was thinking about this feature very differently:
-Add a global header chain to the binary.
-Make all jumps relative.
-Stream the DFAs one by one to the output.

You patch is much simpler :)

The patch looks good, but can you please add at least one test? Generate a filter of 2k patterns programmatically and test that patterns matching works with both half.
It may be good to test actions from the second half can be overridden by actions from the first half.

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:118
> +        if (nfas[nfas.size() - 1].graphSize() > 1000) // Need a better condition here. Not sure what.

That's fine, we'll study this closely.

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:120
> +        NFA& lastNFA = nfas[nfas.size() - 1]; // Need to find "best" nfa here. Not sure how.

You can remove the comment.

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:157
> +    for (size_t i = 0; i < nfas.size(); i++) {

WebKit style: ++i

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:187
> +    // Set size header.
> +    set32Bits(m_bytecode, startLocation, m_bytecode.size() - startLocation);

I was thinking of implementing this using a global header, like on real binaries. This is so much simpler :)

> Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:64
> +    for (unsigned programCounter = 0; programCounter < m_bytecodeLength;) {

This is a bit cryptic. I think the condition would be clearer if it was handled at "nextDFA:".

I am afraid this for() loop make it look like the program counter is incremented sequentially in the loop.

> Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:142
> +        programCounter = dfaStart + dfaBytecodeLength;

Let's assert dfaBytecodeLength is non-zero.
Comment 4 Benjamin Poulain 2015-03-24 15:50:10 PDT
(In reply to comment #2)
> This could also cause problems with locality of memory accesses within the
> byte code.  If we do things right, we would have as many hot spots as DFAs.

I think that would be fine. Pages are small, it does not matter how they are distributed.

Are you gonna look into locality optimization next? :)
Comment 5 Alex Christensen 2015-03-24 18:55:02 PDT
Created attachment 249375 [details]
Comment 6 Alex Christensen 2015-03-24 22:20:42 PDT
Comment 7 Alexey Proskuryakov 2015-03-25 00:19:19 PDT
This test fails most of the time on bots. Will roll out.

TIMEOUT ContentExtensionTest.MultiDFA
Comment 8 WebKit Commit Bot 2015-03-25 00:37:53 PDT
Re-opened since this is blocked by bug 143041
Comment 9 Alex Christensen 2015-03-25 10:33:57 PDT
Tweaked the test and max NFA size slightly to not timeout on debug builds while still testing the multi-DFA functionality.
Recommitted to http://trac.webkit.org/changeset/181932