RESOLVED FIXED 142998
Make URL filter patterns matching consistent and add a simple canonicalization step
https://bugs.webkit.org/show_bug.cgi?id=142998
Summary Make URL filter patterns matching consistent and add a simple canonicalizatio...
Benjamin Poulain
Reported 2015-03-23 21:30:48 PDT
Make URL filter patterns matching consistent and add a simple canonicalization step
Attachments
Patch (28.66 KB, patch)
2015-03-23 21:37 PDT, Benjamin Poulain
no flags
Patch (31.73 KB, patch)
2015-03-24 14:25 PDT, Benjamin Poulain
achristensen: review+
Benjamin Poulain
Comment 1 2015-03-23 21:37:50 PDT
Alex Christensen
Comment 2 2015-03-24 10:53:30 PDT
Comment on attachment 249311 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=249311&action=review Hooray! Fewer DFA nodes! This looks good, but there are a few things that need to be addressed. > Source/WebCore/contentextensions/URLFilterParser.cpp:260 > + // ".*" is the only simple term matching any string. > + return isUniversalTransition() && m_quantifier == AtomQuantifier::ZeroOrMore; You forgot about our good friend .? > Source/WebCore/contentextensions/URLFilterParser.cpp:273 > + // -(.)*, (.+)*, (.?)* etc. Put some parsing test cases in for these. testPatternStatus("(.?)*", ContentExtensions::URLFilterParser::ParseStatus::MatchesEverything); etc. > Source/WebCore/contentextensions/URLFilterParser.cpp:768 > + Term canonicalDotStart(Term::UniversalTransition); canonicalDotStar > Source/WebCore/contentextensions/URLFilterParser.cpp:774 > + bool isAfterDotStart = false; isAfterDotStar > Source/WebCore/contentextensions/URLFilterParser.cpp:777 > + m_sunkTerms.remove(termIndex); This is removing from the middle of a vector, which could be slow. I think we should use a Deque for m_sunkTerms, only remove the first term while it matches everything, then prepend the canonicalDotStar if we removed anything. > Source/WebCore/contentextensions/URLFilterParser.cpp:795 > + if (m_sunkTerms.size() > 2 && m_sunkTerms.last().isEndOfLineAssertion() && m_sunkTerms[m_sunkTerms.size() - 1].isKnownToMatchAnyString()) m_sunkTerms[m_sunkTerms.size() - 2].isKnownToMatchAnyString(), right? > Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:348 > + const char* termsKnownToMatchAnythingFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre1.*post1$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre2(.*)post2$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre3(.*)?post3$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre4(.*)+post4$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre5(.*)*post5$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre6(.)*post6$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre7(.+)*post7$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre8(.?)*post8$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre9(.+)?post9$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre9(.?)+post9$\"}}]"; Put each action on its own line here and elsewhere. This will match the other tests. Also, you have two pre9/post9 pairs. Replace one with pre0/post0 and add a test for it. > Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:355 > + testRequest(backend, mainDocumentRequest("pre1://webkit.org/post1"), { ContentExtensions::ActionType::BlockLoad }); Put some failures in like pre1://webkit.org/post2 > Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:416 > + // The patten must finish with cat, with any number of of any number of s's following it.
Benjamin Poulain
Comment 3 2015-03-24 14:23:27 PDT
Comment on attachment 249311 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=249311&action=review >> Source/WebCore/contentextensions/URLFilterParser.cpp:260 >> + return isUniversalTransition() && m_quantifier == AtomQuantifier::ZeroOrMore; > > You forgot about our good friend .? I think you are mixing this with matchesAtLeastOneCharacter(). ".?" does not match any string. It only match a string of size 0 or 1. >> Source/WebCore/contentextensions/URLFilterParser.cpp:777 >> + m_sunkTerms.remove(termIndex); > > This is removing from the middle of a vector, which could be slow. I think we should use a Deque for m_sunkTerms, only remove the first term while it matches everything, then prepend the canonicalDotStar if we removed anything. I don't think it is worth speculating the performance of this yet. This should be dwarfed by any operations of the DFA graph since it is so much bigger.
Benjamin Poulain
Comment 4 2015-03-24 14:25:21 PDT
Alex Christensen
Comment 5 2015-03-24 16:00:11 PDT
Comment on attachment 249353 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=249353&action=review > Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:458 > + // The patten must finish with cat, with any number of 's' following it, but no other character. pattern
Benjamin Poulain
Comment 6 2015-03-24 16:25:15 PDT
Note You need to log in before you can comment on or make changes to this bug.