RESOLVED FIXED 143010
Content extensions need multi-dfa compiler
https://bugs.webkit.org/show_bug.cgi?id=143010
Summary Content extensions need multi-dfa compiler
Alex Christensen
Reported 2015-03-24 12:22:54 PDT
In order to balance compiling time, byte code size, and interpreting time, we need to be able to compile multiple DFAs. Here's a patch that does that, but I'm not sure about the conditions to add another NFA or how to decide which NFA to put this regex in.
Attachments
Patch (14.23 KB, patch)
2015-03-24 12:30 PDT, Alex Christensen
no flags
Patch (21.92 KB, patch)
2015-03-24 18:55 PDT, Alex Christensen
benjamin: review+
Alex Christensen
Comment 1 2015-03-24 12:30:05 PDT
Alex Christensen
Comment 2 2015-03-24 13:31:41 PDT
This could also cause problems with locality of memory accesses within the byte code. If we do things right, we would have as many hot spots as DFAs.
Benjamin Poulain
Comment 3 2015-03-24 15:36:11 PDT
Comment on attachment 249339 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=249339&action=review I was thinking about this feature very differently: -Add a global header chain to the binary. -Make all jumps relative. -Stream the DFAs one by one to the output. You patch is much simpler :) The patch looks good, but can you please add at least one test? Generate a filter of 2k patterns programmatically and test that patterns matching works with both half. It may be good to test actions from the second half can be overridden by actions from the first half. > Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:118 > + if (nfas[nfas.size() - 1].graphSize() > 1000) // Need a better condition here. Not sure what. That's fine, we'll study this closely. > Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:120 > + NFA& lastNFA = nfas[nfas.size() - 1]; // Need to find "best" nfa here. Not sure how. You can remove the comment. > Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:157 > + for (size_t i = 0; i < nfas.size(); i++) { WebKit style: ++i > Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:187 > + // Set size header. > + set32Bits(m_bytecode, startLocation, m_bytecode.size() - startLocation); I was thinking of implementing this using a global header, like on real binaries. This is so much simpler :) > Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:64 > + for (unsigned programCounter = 0; programCounter < m_bytecodeLength;) { This is a bit cryptic. I think the condition would be clearer if it was handled at "nextDFA:". I am afraid this for() loop make it look like the program counter is incremented sequentially in the loop. > Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:142 > + programCounter = dfaStart + dfaBytecodeLength; Let's assert dfaBytecodeLength is non-zero.
Benjamin Poulain
Comment 4 2015-03-24 15:50:10 PDT
(In reply to comment #2) > This could also cause problems with locality of memory accesses within the > byte code. If we do things right, we would have as many hot spots as DFAs. I think that would be fine. Pages are small, it does not matter how they are distributed. Are you gonna look into locality optimization next? :)
Alex Christensen
Comment 5 2015-03-24 18:55:02 PDT
Alex Christensen
Comment 6 2015-03-24 22:20:42 PDT
Alexey Proskuryakov
Comment 7 2015-03-25 00:19:19 PDT
This test fails most of the time on bots. Will roll out. TIMEOUT ContentExtensionTest.MultiDFA
WebKit Commit Bot
Comment 8 2015-03-25 00:37:53 PDT
Re-opened since this is blocked by bug 143041
Alex Christensen
Comment 9 2015-03-25 10:33:57 PDT
Tweaked the test and max NFA size slightly to not timeout on debug builds while still testing the multi-DFA functionality. Recommitted to http://trac.webkit.org/changeset/181932
Note You need to log in before you can comment on or make changes to this bug.