RESOLVED FIXED 143501
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Summary Add a conservative DFA minimizer for the content extension matcher
Benjamin Poulain
Reported 2015-04-07 15:22:32 PDT
Add a conservative DFA minimizer for the content extension matcher
Attachments
Patch (57.45 KB, patch)
2015-04-07 16:00 PDT, Benjamin Poulain
no flags
Patch (57.66 KB, patch)
2015-04-07 16:06 PDT, Benjamin Poulain
no flags
Patch (71.83 KB, patch)
2015-04-13 15:48 PDT, Benjamin Poulain
achristensen: review+
Benjamin Poulain
Comment 1 2015-04-07 16:00:07 PDT
Benjamin Poulain
Comment 2 2015-04-07 16:06:10 PDT
Alex Christensen
Comment 3 2015-04-08 12:42:09 PDT
Comment on attachment 250312 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=250312&action=review I still need to look at DFAMinimizer.cpp, but there are some other things that need to be addressed. > Source/WebCore/ChangeLog:110 > + Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect > + work on this patch. Thanks for your wonderful papers about DFA minimization. It might be nice to put links to the free articles here for reference. http://arxiv.org/pdf/0802.2826v1.pdf etc. > Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:-192 > - double dfaBuildTimeStart = monotonicallyIncreasingTime(); Also remove line 228 which still uses this. And line 227, which declares dfaBuildTimeEnd, which would also become unused. > Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:100 > + m_nodeStartOffsets[index] = std::numeric_limits<unsigned>::max(); You could add an assertion when linking that this value is never used. > Source/WebCore/contentextensions/DFAMinimizer.h:37 > +namespace DFAMinimizer { It seems a little excessive to make a namespace just for this. This is WebCore, where we don't believe in namespaces ;) What about a class with static methods like NFAToDFA? > Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:732 > +TEST_F(ContentExtensionTest, MinimizingWithMoreFinalStatesThanNonFinalStates) It would be nice to add an EXPECT_EQ with the number of non-killed DFA nodes. I don't think these tests accurately test minimization.
Gavin Barraclough
Comment 4 2015-04-08 12:57:02 PDT
Ben, do you have any performance results? – say, a percent reduction on nodes in the graph for some representative input?
Benjamin Poulain
Comment 5 2015-04-08 13:07:42 PDT
(In reply to comment #4) > Ben, do you have any performance results? – say, a percent reduction on > nodes in the graph for some representative input? I was working with one of the two lists. The bytecode size shrinks by >15% IIRC.
Alex Christensen
Comment 6 2015-04-08 13:24:24 PDT
Here is the output of a large content extension with CONTENT_EXTENSIONS_PERFORMANCE_REPORTING set to 1: Time spent loading extension 0.120485 Time spent partitioning the rules into groups: 0.307346 Time spent building the NFAs: 0.184733 Time spent building and compiling the DFAs: 8.248510 Bytecode size 26787531 DFA count 2 Here's the output of the same content extension with this patch applied: Time spent loading extension 0.112486 Time spent partitioning the rules into groups: 0.298134 Time spent building the NFAs: 0.203010 Time spent building the DFA 0: 1.145915 Time spent miniminizing the DFA 0: 0.889264 Time spent building the DFA 1: 5.517612 Time spent miniminizing the DFA 1: 3.532788 Bytecode size 18704055 DFA count 2
Alex Christensen
Comment 7 2015-04-09 14:50:52 PDT
Comment on attachment 250312 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=250312&action=review > Source/WebCore/contentextensions/DFAMinimizer.cpp:2 > + * Copyright (C) 2014, 2015 Apple Inc. All rights reserved. 2014? > Source/WebCore/contentextensions/DFAMinimizer.cpp:326 > + m_transitionPartition.refineGeneration([](unsigned) { }); This seems a little strange. > Source/WebCore/contentextensions/DFAMinimizer.cpp:435 > + hasher.addCharactersAssumingAligned(reinterpret_cast<const UChar*>(actions.data()), sizeof(uint64_t) / sizeof(UChar)); Don't you want actions.size() in here?
Benjamin Poulain
Comment 8 2015-04-13 15:48:46 PDT
Benjamin Poulain
Comment 9 2015-04-13 15:49:45 PDT
(In reply to comment #3) > > Source/WebCore/ChangeLog:110 > > + Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect > > + work on this patch. Thanks for your wonderful papers about DFA minimization. > > It might be nice to put links to the free articles here for reference. > http://arxiv.org/pdf/0802.2826v1.pdf etc. Both teams published several papers, I did not bother linking to every one of them.
Alex Christensen
Comment 10 2015-04-13 17:42:35 PDT
Comment on attachment 250680 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=250680&action=review r=me, with some comments: > Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:224 > + set32Bits(m_bytecode, offset, m_nodeStartOffsets[linkRecord.second]); Add this assertion here: ASSERT(m_nodeStartOffsets[linkRecord.second] != std::numeric_limits<unsigned>::max()); > Source/WebCore/contentextensions/DFAMinimizer.cpp:37 > +namespace { Does this add any useful organization? > Source/WebCore/contentextensions/DFAMinimizer.cpp:102 > + m_sets.append(SetDescriptor({ 0, size, 0 })); m_sets.append({ 0, size, 0 }); > Source/WebCore/contentextensions/DFAMinimizer.cpp:171 > + void iterateSets(const Function& function) I believe this is unused. Remove! > Source/WebCore/contentextensions/DFAMinimizer.cpp:188 > + size_t findInSet(unsigned setIndex, const TestFunction& testFunction) Also unused! Remove! > Source/WebCore/contentextensions/DFAMinimizer.cpp:206 > + unsigned firstElementInSet(unsigned setIndex) const This is not mentioned in the ChangeLog entry. > Source/WebCore/contentextensions/DFAMinimizer.cpp:303 > + m_flattenedTransitions[start + offset] = Transition({ i, targetNodeIndex, transition.key }); I think this also unnecessarily calls the copy constructor. > Source/WebCore/contentextensions/DFAMinimizer.cpp:356 > + m_transitionPartition.refineGeneration([](unsigned) { }); > + m_fallbackTransitionPartition.refineGeneration([](unsigned) { }); These need a comment explaining why it is a good idea to use a lambda that does nothing. > Source/WebCore/contentextensions/DFAMinimizer.cpp:418 > +struct Transition { This is defined as a private struct on line 397. Remove this unused definition. > Source/WebCore/contentextensions/DFAMinimizer.cpp:469 > +} Is this the end of the unnamed namespace? If we keep that, there should be a comment here. > Tools/TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:70 > + EXPECT_EQ(static_cast<size_t>(10), countLiveNodes(dfa)); 10u should work here instead of static_cast<size_t>(10). Similar elsewhere in this file.
Benjamin Poulain
Comment 11 2015-04-14 14:09:02 PDT
Note You need to log in before you can comment on or make changes to this bug.