The DFA nodes are currently compiled in the order they are added, which provides pretty-good memory locality compared to random ordering, but we could do better.
Created attachment 250107 [details] Patch
Comment on attachment 250107 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=250107&action=review > Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:202 > +static Vector<unsigned> zeroes(size_t count) > +{ > + Vector<unsigned> vector; > + vector.reserveInitialCapacity(count); > + for (unsigned i = 0; i < count; ++i) > + vector.append(0); > + return vector; > +} Wat. Plz use Vector::fill, which uses memset, which uses SIMD, which makes me happy. > Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:240 > +static Vector<unsigned> incomingTransitions(const DFA& dfa) > +{ > + Vector<unsigned> transitionsIn = zeroes(dfa.size()); > + for (unsigned i = 0; i < dfa.size(); ++i) { > + for (unsigned transition : dfa.nodeAt(i).transitions.values()) > + transitionsIn[transition]++; > + } > + return transitionsIn; > +} > + > +static Vector<unsigned> outgoingTransitions(const DFA& dfa) > +{ > + Vector<unsigned> transitionsOut; > + transitionsOut.reserveInitialCapacity(dfa.size()); > + for (unsigned i = 0; i < dfa.size(); ++i) > + transitionsOut.append(dfa.nodeAt(i).transitions.size()); > + return transitionsOut; > +} Can we compute these in the same loop, to avoid looping twice? Or is this loop really inexpensive? > Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:279 > + // We should tune this objective function before landing, but this gives a slight improvement. How much of an improvement is this?
Comment on attachment 250107 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=250107&action=review We should look into this and measure different objective functions' effects on memory usage after minimizing the size of the DFA bytecode. >> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:202 >> +} > > Wat. Plz use Vector::fill, which uses memset, which uses SIMD, which makes me happy. I knew there should be something in Vector to do this. >> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:240 >> +} > > Can we compute these in the same loop, to avoid looping twice? Or is this loop really inexpensive? Unless we do things terribly wrong, most of the time compiling is spent in NFAToDFA::convert. This is really inexpensive, but we could do both in the same loop and organize the functions differently. >> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:279 >> + // We should tune this objective function before landing, but this gives a slight improvement. > > How much of an improvement is this? Not as much as I want.