Bug 143501 - Add a conservative DFA minimizer for the content extension matcher
Summary: Add a conservative DFA minimizer for the content extension matcher
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Benjamin Poulain
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-07 15:22 PDT by Benjamin Poulain
Modified: 2015-04-14 14:09 PDT (History)
2 users (show)

See Also:


Attachments
Patch (57.45 KB, patch)
2015-04-07 16:00 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (57.66 KB, patch)
2015-04-07 16:06 PDT, Benjamin Poulain
no flags Details | Formatted Diff | Diff
Patch (71.83 KB, patch)
2015-04-13 15:48 PDT, Benjamin Poulain
achristensen: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Poulain 2015-04-07 15:22:32 PDT
Add a conservative DFA minimizer for the content extension matcher
Comment 1 Benjamin Poulain 2015-04-07 16:00:07 PDT
Created attachment 250311 [details]
Patch
Comment 2 Benjamin Poulain 2015-04-07 16:06:10 PDT
Created attachment 250312 [details]
Patch
Comment 3 Alex Christensen 2015-04-08 12:42:09 PDT
Comment on attachment 250312 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250312&action=review

I still need to look at DFAMinimizer.cpp, but there are some other things that need to be addressed.

> Source/WebCore/ChangeLog:110
> +        Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
> +        work on this patch. Thanks for your wonderful papers about DFA minimization.

It might be nice to put links to the free articles here for reference.  http://arxiv.org/pdf/0802.2826v1.pdf etc.

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:-192
> -    double dfaBuildTimeStart = monotonicallyIncreasingTime();

Also remove line 228 which still uses this.  And line 227, which declares dfaBuildTimeEnd, which would also become unused.

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:100
> +        m_nodeStartOffsets[index] = std::numeric_limits<unsigned>::max();

You could add an assertion when linking that this value is never used.

> Source/WebCore/contentextensions/DFAMinimizer.h:37
> +namespace DFAMinimizer {

It seems a little excessive to make a namespace just for this.  This is WebCore, where we don't believe in namespaces ;)
What about a class with static methods like NFAToDFA?

> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:732
> +TEST_F(ContentExtensionTest, MinimizingWithMoreFinalStatesThanNonFinalStates)

It would be nice to add an EXPECT_EQ with the number of non-killed DFA nodes.  I don't think these tests accurately test minimization.
Comment 4 Gavin Barraclough 2015-04-08 12:57:02 PDT
Ben, do you have any performance results? – say, a percent reduction on nodes in the graph for some representative input?
Comment 5 Benjamin Poulain 2015-04-08 13:07:42 PDT
(In reply to comment #4)
> Ben, do you have any performance results? – say, a percent reduction on
> nodes in the graph for some representative input?

I was working with one of the two lists. The bytecode size shrinks by >15% IIRC.
Comment 6 Alex Christensen 2015-04-08 13:24:24 PDT
Here is the output of a large content extension with CONTENT_EXTENSIONS_PERFORMANCE_REPORTING set to 1:

Time spent loading extension 0.120485
    Time spent partitioning the rules into groups: 0.307346
    Time spent building the NFAs: 0.184733
    Time spent building and compiling the DFAs: 8.248510
    Bytecode size 26787531
    DFA count 2

Here's the output of the same content extension with this patch applied:

Time spent loading extension 0.112486
    Time spent partitioning the rules into groups: 0.298134
    Time spent building the NFAs: 0.203010
    Time spent building the DFA 0: 1.145915
    Time spent miniminizing the DFA 0: 0.889264
    Time spent building the DFA 1: 5.517612
    Time spent miniminizing the DFA 1: 3.532788
    Bytecode size 18704055
    DFA count 2
Comment 7 Alex Christensen 2015-04-09 14:50:52 PDT
Comment on attachment 250312 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250312&action=review

> Source/WebCore/contentextensions/DFAMinimizer.cpp:2
> + * Copyright (C) 2014, 2015 Apple Inc. All rights reserved.

2014?

> Source/WebCore/contentextensions/DFAMinimizer.cpp:326
> +            m_transitionPartition.refineGeneration([](unsigned) { });

This seems a little strange.

> Source/WebCore/contentextensions/DFAMinimizer.cpp:435
> +        hasher.addCharactersAssumingAligned(reinterpret_cast<const UChar*>(actions.data()), sizeof(uint64_t) / sizeof(UChar));

Don't you want actions.size() in here?
Comment 8 Benjamin Poulain 2015-04-13 15:48:46 PDT
Created attachment 250680 [details]
Patch
Comment 9 Benjamin Poulain 2015-04-13 15:49:45 PDT
(In reply to comment #3)
> > Source/WebCore/ChangeLog:110
> > +        Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
> > +        work on this patch. Thanks for your wonderful papers about DFA minimization.
> 
> It might be nice to put links to the free articles here for reference. 
> http://arxiv.org/pdf/0802.2826v1.pdf etc.

Both teams published several papers, I did not bother linking to every one of them.
Comment 10 Alex Christensen 2015-04-13 17:42:35 PDT
Comment on attachment 250680 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=250680&action=review

r=me, with some comments:

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:224
> +        set32Bits(m_bytecode, offset, m_nodeStartOffsets[linkRecord.second]);

Add this assertion here:
ASSERT(m_nodeStartOffsets[linkRecord.second] != std::numeric_limits<unsigned>::max());

> Source/WebCore/contentextensions/DFAMinimizer.cpp:37
> +namespace {

Does this add any useful organization?

> Source/WebCore/contentextensions/DFAMinimizer.cpp:102
> +        m_sets.append(SetDescriptor({ 0, size, 0 }));

m_sets.append({ 0, size, 0 });

> Source/WebCore/contentextensions/DFAMinimizer.cpp:171
> +    void iterateSets(const Function& function)

I believe this is unused.  Remove!

> Source/WebCore/contentextensions/DFAMinimizer.cpp:188
> +    size_t findInSet(unsigned setIndex, const TestFunction& testFunction)

Also unused! Remove!

> Source/WebCore/contentextensions/DFAMinimizer.cpp:206
> +    unsigned firstElementInSet(unsigned setIndex) const

This is not mentioned in the ChangeLog entry.

> Source/WebCore/contentextensions/DFAMinimizer.cpp:303
> +                m_flattenedTransitions[start + offset] = Transition({ i, targetNodeIndex, transition.key });

I think this also unnecessarily calls the copy constructor.

> Source/WebCore/contentextensions/DFAMinimizer.cpp:356
> +            m_transitionPartition.refineGeneration([](unsigned) { });
> +            m_fallbackTransitionPartition.refineGeneration([](unsigned) { });

These need a comment explaining why it is a good idea to use a lambda that does nothing.

> Source/WebCore/contentextensions/DFAMinimizer.cpp:418
> +struct Transition {

This is defined as a private struct on line 397.  Remove this unused definition.

> Source/WebCore/contentextensions/DFAMinimizer.cpp:469
> +}

Is this the end of the unnamed namespace?  If we keep that, there should be a comment here.

> Tools/TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:70
> +    EXPECT_EQ(static_cast<size_t>(10), countLiveNodes(dfa));

10u should work here instead of static_cast<size_t>(10).  Similar elsewhere in this file.
Comment 11 Benjamin Poulain 2015-04-14 14:09:02 PDT
Committed r182808: <http://trac.webkit.org/changeset/182808>