Bug 142257

Summary: Add basic support for character sets to the URL Filter parser
Product: WebKit Reporter: Benjamin Poulain <benjamin>
Component: New BugsAssignee: Benjamin Poulain <benjamin>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, barraclough, bfulgham, buildbot, commit-queue, rniwa
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ews106 for mac-mavericks-wk2
none
Patch
none
Patch
none
Patch none

Description Benjamin Poulain 2015-03-03 21:04:54 PST
Add basic support for character sets to the URL Filter parser
Comment 1 Benjamin Poulain 2015-03-03 21:11:36 PST
Created attachment 247834 [details]
Patch
Comment 2 Build Bot 2015-03-03 22:07:20 PST
Comment on attachment 247834 [details]
Patch

Attachment 247834 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.appspot.com/results/6528593282727936

New failing tests:
http/tests/usercontentfilter/basic-filter.html
Comment 3 Build Bot 2015-03-03 22:07:23 PST
Created attachment 247840 [details]
Archive of layout-test-results from ews106 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews106  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 4 Benjamin Poulain 2015-03-04 14:12:17 PST
Created attachment 247896 [details]
Patch
Comment 5 WebKit Commit Bot 2015-03-04 14:14:21 PST
Attachment 247896 [details] did not pass style-queue:


ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:71:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:73:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:75:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:77:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 4 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Alex Christensen 2015-03-04 15:03:54 PST
Comment on attachment 247896 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=247896&action=review

I need a bit more time to look at this, but I already have a few questions:

> Source/WebCore/contentextensions/URLFilterParser.cpp:58
>      ZeroToMany = 0x2000,
>      OneToMany = 0x4000

I think these should be named ZeroOrMore and OneOrMore.

> Source/WebCore/contentextensions/URLFilterParser.cpp:261
> +        // Nothing to do here. The character set atom may have a quantifier, we sink the atom on lazily.

"atom on lazily" -> "atom lazily"
Why are we sinking the atom lazily instead of sinking it here?

> Source/WebCore/contentextensions/URLFilterParser.cpp:266
> +        fail(ASCIILiteral("Buildins character class atoms are not supported yet."));

Buildins

> Source/WebCore/contentextensions/URLFilterParser.cpp:499
> +    BitVector m_pendingCharacterSet { 128 };

A BitVector this big would always allocate a buffer.  It would be more efficient to use 2 uint64_t's.  I wish we had a uint128_t.  A lot of SIMD registers have 128 bits.  Could we eventually use those?
Comment 7 Alex Christensen 2015-03-04 17:17:03 PST
Comment on attachment 247896 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=247896&action=review

> Source/WebCore/contentextensions/URLFilterParser.cpp:70
> +    switch (trivialAtom & 0xf000) {

static_cast<AtomQuantifier>(trivialAtom & 0xf00) here to avoid static_cast<unsigned> at each case.  Then the switch will complain if we miss an enum value.
Comment 8 Benjamin Poulain 2015-03-04 17:44:24 PST
(In reply to comment #7)
> Comment on attachment 247896 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=247896&action=review
> 
> > Source/WebCore/contentextensions/URLFilterParser.cpp:70
> > +    switch (trivialAtom & 0xf000) {
> 
> static_cast<AtomQuantifier>(trivialAtom & 0xf00) here to avoid
> static_cast<unsigned> at each case.  Then the switch will complain if we
> miss an enum value.

That is semantically worse though. I like to make sure enum class never take incorrect values by design.
Comment 9 Benjamin Poulain 2015-03-04 18:15:11 PST
Created attachment 247916 [details]
Patch
Comment 10 WebKit Commit Bot 2015-03-04 18:18:07 PST
Attachment 247916 [details] did not pass style-queue:


ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:71:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:73:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:75:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:77:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 4 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 11 Alex Christensen 2015-03-04 20:08:37 PST
Comment on attachment 247916 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=247916&action=review

This looks good to me.  I don't completely understand sinkPendingAtomIfNecessary, but the rest looks good.  Lambdas calling different generateTransition functions confused me a little, but I figured out what was going on.

I really think this needs unit tests.  Tests should have some regular expressions, some strings they match, and some strings they do not match, as well as some invalid regular expressions that fail in each possible way to fail.

> Source/WebCore/contentextensions/URLFilterParser.cpp:255
> +        for (unsigned i = a; i <= b; ++i)

This seems to have correct behavior.  [z-a] should not match anything.

> Source/WebCore/contentextensions/URLFilterParser.cpp:381
> +    void generateTransition(BitVector characterSet, bool isInverted, unsigned source, unsigned target)

A lot of these should probably be const BitVector&.  Here and elsewhere.

> Source/WebCore/contentextensions/URLFilterParser.cpp:386
> +                char character = static_cast<char>(characterIterator);

Is the characterIterator just an integer that counts from 0 to 127 if that bit is set in the BitVector?

> Source/WebCore/contentextensions/URLFilterParser.cpp:422
> +        case BuildMode::PrefixTree: {

The only thing I don't understand is the difference between PrefixTree and DirectGeneration.
Comment 12 Alex Christensen 2015-03-04 20:37:30 PST
Comment on attachment 247916 [details]
Patch

r=me

This needs some serious unit testing, but we can land that in a separate patch.
Comment 13 Benjamin Poulain 2015-03-05 15:22:15 PST
Created attachment 248010 [details]
Patch
Comment 14 WebKit Commit Bot 2015-03-05 15:25:34 PST
Attachment 248010 [details] did not pass style-queue:


ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:71:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:73:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:75:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
ERROR: Source/WebCore/contentextensions/URLFilterParser.cpp:77:  static_cast is incorrectly named. Don't use underscores in your identifier names.  [readability/naming/underscores] [4]
Total errors found: 4 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 Benjamin Poulain 2015-03-05 15:47:42 PST
Comment on attachment 248010 [details]
Patch

Clearing flags on attachment: 248010

Committed r181107: <http://trac.webkit.org/changeset/181107>
Comment 16 Benjamin Poulain 2015-03-05 15:47:47 PST
All reviewed patches have been landed.  Closing bug.
Comment 17 Brent Fulgham 2015-03-05 20:35:58 PST
This patch is causing failures on EFL, Gtk, Mavericks, Yosemite and Windows test bots <http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=http%2Ftests%2Fusercontentfilter%2Fcharacter-set-basic-support.html>.

Stephanie checked in a Skip of this <https://trac.webkit.org/changeset/181129>, but we should not be landing tests that cause all the bots to fail.
Comment 18 Benjamin Poulain 2015-03-05 20:39:11 PST
(In reply to comment #17)
> This patch is causing failures on EFL, Gtk, Mavericks, Yosemite and Windows
> test bots
> <http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.
> html#showAllRuns=true&tests=http%2Ftests%2Fusercontentfilter%2Fcharacter-set-
> basic-support.html>.
> 
> Stephanie checked in a Skip of this
> <https://trac.webkit.org/changeset/181129>, but we should not be landing
> tests that cause all the bots to fail.

What happened is Alex moved the tests in a new repository before I landed this: http://trac.webkit.org/changeset/181031

As a result, this test was not run as it should have been.