Bug 144485

Summary: [Content Extensions] Redo NFA generation
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: WebCore Misc.Assignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch benjamin: review+

Description Alex Christensen 2015-04-30 18:48:39 PDT
We want to delete the prefix tree as we generate the NFA to decrease the max memory usage, and we also want to combine all the regexes with the same prefix up to the last quantified term into one NFA to decrease the number of NFAs.
Comment 1 Alex Christensen 2015-04-30 18:56:21 PDT
Created attachment 252129 [details]
Patch
Comment 2 Alex Christensen 2015-04-30 18:58:56 PDT
Comment on attachment 252129 [details]
Patch

This patch doesn't quite pass all the api tests, and it uses recursion instead of a stack and a loop, but it's *pretty good* work in progress.  Good enough to upload.
Comment 3 Alex Christensen 2015-05-01 11:45:33 PDT
Created attachment 252160 [details]
Patch
Comment 4 Alex Christensen 2015-05-01 11:46:42 PDT
This last patch passes most of the tests, but when I make multiple NFAs it generates too many NFA nodes.  I think something is wrong with my calls to generateGraph.  Ben, could you take a look?
Comment 5 Benjamin Poulain 2015-05-01 14:12:02 PDT
Comment on attachment 252160 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=252160&action=review

Some quick comments:

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:171
> +        if (edge.term.quantity() != AtomQuantifier::One)
> +            continue; // This will be a part of another NFA.

That does not look correct. You can have quantifier in subterm of a group. That's why hasFixedLength() exists.

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:173
> +            edge.term.destroy(); // Mark this leaf for deleting.

Instead of making destroy() public, can't you just have:
    edge.term = Term(DeletedValue);
?

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:202
> +        while (!stack.isEmpty() && stack.last().vertex->edges.last().term.quantity() == AtomQuantifier::One)

Please don't use quantifiers directly.
Comment 6 Alex Christensen 2015-05-02 11:13:45 PDT
Created attachment 252240 [details]
Patch
Comment 7 Alex Christensen 2015-05-02 11:15:33 PDT
Comment on attachment 252240 [details]
Patch

This latest patch passes all the current tests (HOORAY!).  I still need to add some tests, make it non-recursive, and address Ben's comments.
Comment 8 Alex Christensen 2015-05-02 12:20:40 PDT
We will still need to combine small DFAs even after this patch.
Comment 9 Alex Christensen 2015-05-02 23:31:42 PDT
Created attachment 252260 [details]
Patch
Comment 10 Alex Christensen 2015-05-02 23:38:25 PDT
Created attachment 252261 [details]
Patch
Comment 11 Alex Christensen 2015-05-02 23:38:52 PDT
This last patch should be pretty final and ready for real review.
Comment 12 Alex Christensen 2015-05-02 23:53:59 PDT
Created attachment 252262 [details]
Patch
Comment 13 Alex Christensen 2015-05-04 08:40:52 PDT
Created attachment 252312 [details]
Patch
Comment 14 Alex Christensen 2015-05-04 08:55:25 PDT
Created attachment 252313 [details]
Patch
Comment 15 Simon Fraser (smfr) 2015-05-04 16:23:37 PDT
Comment on attachment 252313 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=252313&action=review

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:170
> +    /*
> +    // Non-recursive form of this:
> +    for (auto& edge : root.edges) {
> +        if (!edge.term.hasFixedLength())
> +            continue; // This will be a part of another NFA.
> +        unsigned subtreeRootId = edge.term.generateGraph(nfa, nfaRootId, edge.child->finalActions);
> +        generateNFAForSubtree(nfa, subtreeRootId, *edge.child.get());
> +        if (edge.child->edges.isEmpty())
> +            edge.term = Term(Term::DeletedValue); // Mark this leaf for deleting.
> +    }
> +    root.edges.removeAllMatching([](PrefixTreeEdge& edge)
> +    {
> +        return edge.term.isDeletedValue();
> +    });
> +    */

Commented-out code?
Comment 16 Benjamin Poulain 2015-05-04 22:10:02 PDT
Comment on attachment 252313 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=252313&action=review

Looks great. Sorry for the review delay.

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:157
> +    /*
> +    // Non-recursive form of this:

Did you intend to leave this?

I think a description of the algorithm in a few paragraph would be more valuable than the code.

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:180
> +        unsigned rootId;

nfaNodeId?

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:192
> +        if (edgeIndex >= vertex.edges.size()) {

I would inverse the two blocks of the if and the condition.

The reason is readability. As you read, you can read them in order, the first case being the recursive step, the second being the edge case.

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:200
> +                auto& edge = stack.last().vertex.edges[stack.last().edgeIndex];

Let's put stack.last() in a temporary.

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:262
> +        // FIXME: Combine small NFAs to reduce the number of NFAs.

That will be done later, outside this class. That FIXME is at the wrong place.

> Source/WebCore/contentextensions/CombinedURLFilters.cpp:268
> +                if (stack.last()->edges.isEmpty()) {

It might be clearer to use stack[stack.size() - 1] here to make the next line more obvious?

> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:537
> +    const unsigned size = 100000;

You went for really big :)

> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:542
> +    // FIXME: DFAToNFA::convert takes way too long on these deep NFAs. We should optimize for that case.

That's weird, that seems like an easy case.
Comment 17 Alex Christensen 2015-05-05 10:29:15 PDT
http://trac.webkit.org/changeset/183818