WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
143501
Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501
Summary
Add a conservative DFA minimizer for the content extension matcher
Benjamin Poulain
Reported
2015-04-07 15:22:32 PDT
Add a conservative DFA minimizer for the content extension matcher
Attachments
Patch
(57.45 KB, patch)
2015-04-07 16:00 PDT
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Patch
(57.66 KB, patch)
2015-04-07 16:06 PDT
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Patch
(71.83 KB, patch)
2015-04-13 15:48 PDT
,
Benjamin Poulain
achristensen
: review+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Benjamin Poulain
Comment 1
2015-04-07 16:00:07 PDT
Created
attachment 250311
[details]
Patch
Benjamin Poulain
Comment 2
2015-04-07 16:06:10 PDT
Created
attachment 250312
[details]
Patch
Alex Christensen
Comment 3
2015-04-08 12:42:09 PDT
Comment on
attachment 250312
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=250312&action=review
I still need to look at DFAMinimizer.cpp, but there are some other things that need to be addressed.
> Source/WebCore/ChangeLog:110 > + Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect > + work on this patch. Thanks for your wonderful papers about DFA minimization.
It might be nice to put links to the free articles here for reference.
http://arxiv.org/pdf/0802.2826v1.pdf
etc.
> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:-192 > - double dfaBuildTimeStart = monotonicallyIncreasingTime();
Also remove line 228 which still uses this. And line 227, which declares dfaBuildTimeEnd, which would also become unused.
> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:100 > + m_nodeStartOffsets[index] = std::numeric_limits<unsigned>::max();
You could add an assertion when linking that this value is never used.
> Source/WebCore/contentextensions/DFAMinimizer.h:37 > +namespace DFAMinimizer {
It seems a little excessive to make a namespace just for this. This is WebCore, where we don't believe in namespaces ;) What about a class with static methods like NFAToDFA?
> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:732 > +TEST_F(ContentExtensionTest, MinimizingWithMoreFinalStatesThanNonFinalStates)
It would be nice to add an EXPECT_EQ with the number of non-killed DFA nodes. I don't think these tests accurately test minimization.
Gavin Barraclough
Comment 4
2015-04-08 12:57:02 PDT
Ben, do you have any performance results? – say, a percent reduction on nodes in the graph for some representative input?
Benjamin Poulain
Comment 5
2015-04-08 13:07:42 PDT
(In reply to
comment #4
)
> Ben, do you have any performance results? – say, a percent reduction on > nodes in the graph for some representative input?
I was working with one of the two lists. The bytecode size shrinks by >15% IIRC.
Alex Christensen
Comment 6
2015-04-08 13:24:24 PDT
Here is the output of a large content extension with CONTENT_EXTENSIONS_PERFORMANCE_REPORTING set to 1: Time spent loading extension 0.120485 Time spent partitioning the rules into groups: 0.307346 Time spent building the NFAs: 0.184733 Time spent building and compiling the DFAs: 8.248510 Bytecode size 26787531 DFA count 2 Here's the output of the same content extension with this patch applied: Time spent loading extension 0.112486 Time spent partitioning the rules into groups: 0.298134 Time spent building the NFAs: 0.203010 Time spent building the DFA 0: 1.145915 Time spent miniminizing the DFA 0: 0.889264 Time spent building the DFA 1: 5.517612 Time spent miniminizing the DFA 1: 3.532788 Bytecode size 18704055 DFA count 2
Alex Christensen
Comment 7
2015-04-09 14:50:52 PDT
Comment on
attachment 250312
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=250312&action=review
> Source/WebCore/contentextensions/DFAMinimizer.cpp:2 > + * Copyright (C) 2014, 2015 Apple Inc. All rights reserved.
2014?
> Source/WebCore/contentextensions/DFAMinimizer.cpp:326 > + m_transitionPartition.refineGeneration([](unsigned) { });
This seems a little strange.
> Source/WebCore/contentextensions/DFAMinimizer.cpp:435 > + hasher.addCharactersAssumingAligned(reinterpret_cast<const UChar*>(actions.data()), sizeof(uint64_t) / sizeof(UChar));
Don't you want actions.size() in here?
Benjamin Poulain
Comment 8
2015-04-13 15:48:46 PDT
Created
attachment 250680
[details]
Patch
Benjamin Poulain
Comment 9
2015-04-13 15:49:45 PDT
(In reply to
comment #3
)
> > Source/WebCore/ChangeLog:110 > > + Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect > > + work on this patch. Thanks for your wonderful papers about DFA minimization. > > It might be nice to put links to the free articles here for reference. >
http://arxiv.org/pdf/0802.2826v1.pdf
etc.
Both teams published several papers, I did not bother linking to every one of them.
Alex Christensen
Comment 10
2015-04-13 17:42:35 PDT
Comment on
attachment 250680
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=250680&action=review
r=me, with some comments:
> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:224 > + set32Bits(m_bytecode, offset, m_nodeStartOffsets[linkRecord.second]);
Add this assertion here: ASSERT(m_nodeStartOffsets[linkRecord.second] != std::numeric_limits<unsigned>::max());
> Source/WebCore/contentextensions/DFAMinimizer.cpp:37 > +namespace {
Does this add any useful organization?
> Source/WebCore/contentextensions/DFAMinimizer.cpp:102 > + m_sets.append(SetDescriptor({ 0, size, 0 }));
m_sets.append({ 0, size, 0 });
> Source/WebCore/contentextensions/DFAMinimizer.cpp:171 > + void iterateSets(const Function& function)
I believe this is unused. Remove!
> Source/WebCore/contentextensions/DFAMinimizer.cpp:188 > + size_t findInSet(unsigned setIndex, const TestFunction& testFunction)
Also unused! Remove!
> Source/WebCore/contentextensions/DFAMinimizer.cpp:206 > + unsigned firstElementInSet(unsigned setIndex) const
This is not mentioned in the ChangeLog entry.
> Source/WebCore/contentextensions/DFAMinimizer.cpp:303 > + m_flattenedTransitions[start + offset] = Transition({ i, targetNodeIndex, transition.key });
I think this also unnecessarily calls the copy constructor.
> Source/WebCore/contentextensions/DFAMinimizer.cpp:356 > + m_transitionPartition.refineGeneration([](unsigned) { }); > + m_fallbackTransitionPartition.refineGeneration([](unsigned) { });
These need a comment explaining why it is a good idea to use a lambda that does nothing.
> Source/WebCore/contentextensions/DFAMinimizer.cpp:418 > +struct Transition {
This is defined as a private struct on line 397. Remove this unused definition.
> Source/WebCore/contentextensions/DFAMinimizer.cpp:469 > +}
Is this the end of the unnamed namespace? If we keep that, there should be a comment here.
> Tools/TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:70 > + EXPECT_EQ(static_cast<size_t>(10), countLiveNodes(dfa));
10u should work here instead of static_cast<size_t>(10). Similar elsewhere in this file.
Benjamin Poulain
Comment 11
2015-04-14 14:09:02 PDT
Committed
r182808
: <
http://trac.webkit.org/changeset/182808
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug