RESOLVED FIXED 144485
[Content Extensions] Redo NFA generation
https://bugs.webkit.org/show_bug.cgi?id=144485
Summary [Content Extensions] Redo NFA generation
Alex Christensen
Reported 2015-04-30 18:48:39 PDT
We want to delete the prefix tree as we generate the NFA to decrease the max memory usage, and we also want to combine all the regexes with the same prefix up to the last quantified term into one NFA to decrease the number of NFAs.
Attachments
Patch (15.04 KB, patch)
2015-04-30 18:56 PDT, Alex Christensen
no flags
Patch (12.14 KB, patch)
2015-05-01 11:45 PDT, Alex Christensen
no flags
Patch (12.43 KB, patch)
2015-05-02 11:13 PDT, Alex Christensen
no flags
Patch (19.66 KB, patch)
2015-05-02 23:31 PDT, Alex Christensen
no flags
Patch (19.83 KB, patch)
2015-05-02 23:38 PDT, Alex Christensen
no flags
Patch (20.01 KB, patch)
2015-05-02 23:53 PDT, Alex Christensen
no flags
Patch (20.01 KB, patch)
2015-05-04 08:40 PDT, Alex Christensen
no flags
Patch (22.42 KB, patch)
2015-05-04 08:55 PDT, Alex Christensen
benjamin: review+
Alex Christensen
Comment 1 2015-04-30 18:56:21 PDT
Alex Christensen
Comment 2 2015-04-30 18:58:56 PDT
Comment on attachment 252129 [details] Patch This patch doesn't quite pass all the api tests, and it uses recursion instead of a stack and a loop, but it's *pretty good* work in progress. Good enough to upload.
Alex Christensen
Comment 3 2015-05-01 11:45:33 PDT
Alex Christensen
Comment 4 2015-05-01 11:46:42 PDT
This last patch passes most of the tests, but when I make multiple NFAs it generates too many NFA nodes. I think something is wrong with my calls to generateGraph. Ben, could you take a look?
Benjamin Poulain
Comment 5 2015-05-01 14:12:02 PDT
Comment on attachment 252160 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=252160&action=review Some quick comments: > Source/WebCore/contentextensions/CombinedURLFilters.cpp:171 > + if (edge.term.quantity() != AtomQuantifier::One) > + continue; // This will be a part of another NFA. That does not look correct. You can have quantifier in subterm of a group. That's why hasFixedLength() exists. > Source/WebCore/contentextensions/CombinedURLFilters.cpp:173 > + edge.term.destroy(); // Mark this leaf for deleting. Instead of making destroy() public, can't you just have: edge.term = Term(DeletedValue); ? > Source/WebCore/contentextensions/CombinedURLFilters.cpp:202 > + while (!stack.isEmpty() && stack.last().vertex->edges.last().term.quantity() == AtomQuantifier::One) Please don't use quantifiers directly.
Alex Christensen
Comment 6 2015-05-02 11:13:45 PDT
Alex Christensen
Comment 7 2015-05-02 11:15:33 PDT
Comment on attachment 252240 [details] Patch This latest patch passes all the current tests (HOORAY!). I still need to add some tests, make it non-recursive, and address Ben's comments.
Alex Christensen
Comment 8 2015-05-02 12:20:40 PDT
We will still need to combine small DFAs even after this patch.
Alex Christensen
Comment 9 2015-05-02 23:31:42 PDT
Alex Christensen
Comment 10 2015-05-02 23:38:25 PDT
Alex Christensen
Comment 11 2015-05-02 23:38:52 PDT
This last patch should be pretty final and ready for real review.
Alex Christensen
Comment 12 2015-05-02 23:53:59 PDT
Alex Christensen
Comment 13 2015-05-04 08:40:52 PDT
Alex Christensen
Comment 14 2015-05-04 08:55:25 PDT
Simon Fraser (smfr)
Comment 15 2015-05-04 16:23:37 PDT
Comment on attachment 252313 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=252313&action=review > Source/WebCore/contentextensions/CombinedURLFilters.cpp:170 > + /* > + // Non-recursive form of this: > + for (auto& edge : root.edges) { > + if (!edge.term.hasFixedLength()) > + continue; // This will be a part of another NFA. > + unsigned subtreeRootId = edge.term.generateGraph(nfa, nfaRootId, edge.child->finalActions); > + generateNFAForSubtree(nfa, subtreeRootId, *edge.child.get()); > + if (edge.child->edges.isEmpty()) > + edge.term = Term(Term::DeletedValue); // Mark this leaf for deleting. > + } > + root.edges.removeAllMatching([](PrefixTreeEdge& edge) > + { > + return edge.term.isDeletedValue(); > + }); > + */ Commented-out code?
Benjamin Poulain
Comment 16 2015-05-04 22:10:02 PDT
Comment on attachment 252313 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=252313&action=review Looks great. Sorry for the review delay. > Source/WebCore/contentextensions/CombinedURLFilters.cpp:157 > + /* > + // Non-recursive form of this: Did you intend to leave this? I think a description of the algorithm in a few paragraph would be more valuable than the code. > Source/WebCore/contentextensions/CombinedURLFilters.cpp:180 > + unsigned rootId; nfaNodeId? > Source/WebCore/contentextensions/CombinedURLFilters.cpp:192 > + if (edgeIndex >= vertex.edges.size()) { I would inverse the two blocks of the if and the condition. The reason is readability. As you read, you can read them in order, the first case being the recursive step, the second being the edge case. > Source/WebCore/contentextensions/CombinedURLFilters.cpp:200 > + auto& edge = stack.last().vertex.edges[stack.last().edgeIndex]; Let's put stack.last() in a temporary. > Source/WebCore/contentextensions/CombinedURLFilters.cpp:262 > + // FIXME: Combine small NFAs to reduce the number of NFAs. That will be done later, outside this class. That FIXME is at the wrong place. > Source/WebCore/contentextensions/CombinedURLFilters.cpp:268 > + if (stack.last()->edges.isEmpty()) { It might be clearer to use stack[stack.size() - 1] here to make the next line more obvious? > Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:537 > + const unsigned size = 100000; You went for really big :) > Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:542 > + // FIXME: DFAToNFA::convert takes way too long on these deep NFAs. We should optimize for that case. That's weird, that seems like an easy case.
Alex Christensen
Comment 17 2015-05-05 10:29:15 PDT
Note You need to log in before you can comment on or make changes to this bug.