WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
142998
Make URL filter patterns matching consistent and add a simple canonicalization step
https://bugs.webkit.org/show_bug.cgi?id=142998
Summary
Make URL filter patterns matching consistent and add a simple canonicalizatio...
Benjamin Poulain
Reported
2015-03-23 21:30:48 PDT
Make URL filter patterns matching consistent and add a simple canonicalization step
Attachments
Patch
(28.66 KB, patch)
2015-03-23 21:37 PDT
,
Benjamin Poulain
no flags
Details
Formatted Diff
Diff
Patch
(31.73 KB, patch)
2015-03-24 14:25 PDT
,
Benjamin Poulain
achristensen
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Benjamin Poulain
Comment 1
2015-03-23 21:37:50 PDT
Created
attachment 249311
[details]
Patch
Alex Christensen
Comment 2
2015-03-24 10:53:30 PDT
Comment on
attachment 249311
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=249311&action=review
Hooray! Fewer DFA nodes! This looks good, but there are a few things that need to be addressed.
> Source/WebCore/contentextensions/URLFilterParser.cpp:260 > + // ".*" is the only simple term matching any string. > + return isUniversalTransition() && m_quantifier == AtomQuantifier::ZeroOrMore;
You forgot about our good friend .?
> Source/WebCore/contentextensions/URLFilterParser.cpp:273 > + // -(.)*, (.+)*, (.?)* etc.
Put some parsing test cases in for these. testPatternStatus("(.?)*", ContentExtensions::URLFilterParser::ParseStatus::MatchesEverything); etc.
> Source/WebCore/contentextensions/URLFilterParser.cpp:768 > + Term canonicalDotStart(Term::UniversalTransition);
canonicalDotStar
> Source/WebCore/contentextensions/URLFilterParser.cpp:774 > + bool isAfterDotStart = false;
isAfterDotStar
> Source/WebCore/contentextensions/URLFilterParser.cpp:777 > + m_sunkTerms.remove(termIndex);
This is removing from the middle of a vector, which could be slow. I think we should use a Deque for m_sunkTerms, only remove the first term while it matches everything, then prepend the canonicalDotStar if we removed anything.
> Source/WebCore/contentextensions/URLFilterParser.cpp:795 > + if (m_sunkTerms.size() > 2 && m_sunkTerms.last().isEndOfLineAssertion() && m_sunkTerms[m_sunkTerms.size() - 1].isKnownToMatchAnyString())
m_sunkTerms[m_sunkTerms.size() - 2].isKnownToMatchAnyString(), right?
> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:348 > + const char* termsKnownToMatchAnythingFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre1.*post1$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre2(.*)post2$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre3(.*)?post3$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre4(.*)+post4$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre5(.*)*post5$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre6(.)*post6$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre7(.+)*post7$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre8(.?)*post8$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre9(.+)?post9$\"}}, {\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^pre9(.?)+post9$\"}}]";
Put each action on its own line here and elsewhere. This will match the other tests. Also, you have two pre9/post9 pairs. Replace one with pre0/post0 and add a test for it.
> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:355 > + testRequest(backend, mainDocumentRequest("pre1://webkit.org/post1"), { ContentExtensions::ActionType::BlockLoad });
Put some failures in like pre1://webkit.org/post2
> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:416 > + // The patten must finish with cat, with any number of of
any number of s's following it.
Benjamin Poulain
Comment 3
2015-03-24 14:23:27 PDT
Comment on
attachment 249311
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=249311&action=review
>> Source/WebCore/contentextensions/URLFilterParser.cpp:260 >> + return isUniversalTransition() && m_quantifier == AtomQuantifier::ZeroOrMore; > > You forgot about our good friend .?
I think you are mixing this with matchesAtLeastOneCharacter(). ".?" does not match any string. It only match a string of size 0 or 1.
>> Source/WebCore/contentextensions/URLFilterParser.cpp:777 >> + m_sunkTerms.remove(termIndex); > > This is removing from the middle of a vector, which could be slow. I think we should use a Deque for m_sunkTerms, only remove the first term while it matches everything, then prepend the canonicalDotStar if we removed anything.
I don't think it is worth speculating the performance of this yet. This should be dwarfed by any operations of the DFA graph since it is so much bigger.
Benjamin Poulain
Comment 4
2015-03-24 14:25:21 PDT
Created
attachment 249353
[details]
Patch
Alex Christensen
Comment 5
2015-03-24 16:00:11 PDT
Comment on
attachment 249353
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=249353&action=review
> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:458 > + // The patten must finish with cat, with any number of 's' following it, but no other character.
pattern
Benjamin Poulain
Comment 6
2015-03-24 16:25:15 PDT
Committed
r181917
: <
http://trac.webkit.org/changeset/181917
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug