Bug 142799 - content extensions should have special path for global selectors
Summary: content extensions should have special path for global selectors
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords:
: 142298 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-03-17 15:47 PDT by Alex Christensen
Modified: 2015-03-19 08:58 PDT (History)
3 users (show)

See Also:


Attachments
Patch (15.64 KB, patch)
2015-03-17 15:54 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (9.65 KB, patch)
2015-03-17 16:42 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (5.54 KB, patch)
2015-03-17 17:44 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (20.26 KB, patch)
2015-03-18 11:30 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (34.79 KB, patch)
2015-03-18 18:30 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (30.79 KB, patch)
2015-03-18 19:26 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2015-03-17 15:47:58 PDT
Lots of selectors will apply to every page, so we should have a special path to compile these once and use them automatically.
Comment 1 Alex Christensen 2015-03-17 15:54:31 PDT
Created attachment 248880 [details]
Patch
Comment 2 Brady Eidson 2015-03-17 16:09:02 PDT
Comment on attachment 248880 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=248880&action=review

> Source/WebCore/contentextensions/CompiledContentExtension.cpp:50
> +    const SerializedActionByte* begin = actions();
> +    unsigned length = actionsLength();
> +    RELEASE_ASSERT(length >= sizeof(unsigned));
> +    unsigned selectorCount = *reinterpret_cast<const unsigned*>(begin);
> +    Vector<String> selectors;
> +    selectors.reserveInitialCapacity(selectorCount);
> +    unsigned position = sizeof(unsigned);
> +    for (unsigned i = 0; i < selectorCount; i++)
> +        selectors.append(deserializeSelector(begin, length, position));
> +    return selectors;

Could use more paragraphing (empty lines between obviously connected chunks of code) - This is dense to read through.

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:60
> +    unsigned stringStartIndex = location + sizeof(unsigned) + sizeof(bool);
> +    RELEASE_ASSERT(actionsLength >= stringStartIndex);
> +    unsigned selectorLength = *reinterpret_cast<const unsigned*>(&actions[location]);
> +    bool wideCharacters = actions[location + sizeof(ActionType) + sizeof(unsigned)];
> +    if (wideCharacters) {
> +        RELEASE_ASSERT(actionsLength >= stringStartIndex + selectorLength * sizeof(UChar));
> +        location += sizeof(unsigned) + sizeof(bool) + selectorLength * sizeof(UChar);
> +        return String(reinterpret_cast<const UChar*>(&actions[stringStartIndex]), selectorLength);
> +    }
> +    RELEASE_ASSERT(actionsLength >= stringStartIndex + selectorLength * sizeof(LChar));
> +    location += sizeof(unsigned) + sizeof(bool) + selectorLength * sizeof(LChar);
> +    return String(reinterpret_cast<const LChar*>(&actions[stringStartIndex]), selectorLength);

Ditto

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:79
> +    unsigned selectorLength = selector.length();
> +    actions.resize(actions.size() + sizeof(unsigned));
> +    *reinterpret_cast<unsigned*>(&actions[actions.size() - sizeof(unsigned)]) = selectorLength;
> +    bool wideCharacters = !selector.is8Bit();
> +    actions.append(wideCharacters);
> +    if (wideCharacters) {
> +        for (unsigned i = 0; i < selectorLength; i++) {
> +            actions.resize(actions.size() + sizeof(UChar));
> +            *reinterpret_cast<UChar*>(&actions[actions.size() - sizeof(UChar)]) = selector[i];
> +        }
> +    } else {
> +        for (unsigned i = 0; i < selectorLength; i++)
> +            actions.append(selector[i]);
> +    }
> +}

Etc... etc...

> LayoutTests/http/tests/contentextensions/css-display-none.html.json:34
> +    {
> +        "action": {
> +            "type": "css-display-none",
> +            "selector": ".always_hidden"
> +        },
>      }

This patch adds a new option to the json file, which is an action without a trigger.

But what if the json author writes the action with a trigger that always triggers?  i.e. - url-filter: ".*" ?

Shouldn't those go to the global selector collection as well?

And if they do, do we even need the triggerlesss actions at all?
Comment 3 Alex Christensen 2015-03-17 16:16:56 PDT
(In reply to comment #2)
> But what if the json author writes the action with a trigger that always
> triggers?  i.e. - url-filter: ".*" ?
Ben, correct me if I'm wrong, but those would be actions on the root of the DFA, right?
> Shouldn't those go to the global selector collection as well?
Yes.
> And if they do, do we even need the triggerlesss actions at all?
Probably not.  Just have .* mean triggerless and not change the parser.  That's probably a cleaner solution.
Comment 4 Brady Eidson 2015-03-17 16:28:48 PDT
Comment on attachment 248880 [details]
Patch

Based on agreement that things should work a little differently from this, r-
Comment 5 Alex Christensen 2015-03-17 16:42:37 PDT
Created attachment 248885 [details]
Patch
Comment 6 Alex Christensen 2015-03-17 17:20:41 PDT
Comment on attachment 248885 [details]
Patch

A new byte code is not necessary for this.  We should just add a loop at the beginning of the interpreter.  There are many optimizations we could do related to detecting if a regex matches everything.
Comment 7 Alex Christensen 2015-03-17 17:21:46 PDT
And ignore-previous-rules could cause problems.  I think we should start by only allowing rules that match everything at the beginning, so that ignore-previous-rules would simply disable use of the whole compiled stylesheet.
Comment 8 Alex Christensen 2015-03-17 17:44:02 PDT
Created attachment 248891 [details]
Patch
Comment 9 Brady Eidson 2015-03-18 08:38:59 PDT
Comment on attachment 248891 [details]
Patch

I don't see how the DFA code change above can possibly cause the layout test changes in this patch...?
Comment 10 Alex Christensen 2015-03-18 11:30:20 PDT
Created attachment 248945 [details]
Patch
Comment 11 Brady Eidson 2015-03-18 15:42:55 PDT
Comment on attachment 248945 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=248945&action=review

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:127
> +                WTFLogAlways("Trigger matching everything found not at beginning.  This may cause incorrect behavior with ignore-previous-rules");

Is it enough to simply log an error for bad outside data? In general we understand that outside data cannot be trusted and might be malformed, and therefore we should handle it correctly.

> Source/WebCore/contentextensions/URLFilterParser.cpp:451
> +        if (matchesEverything)
> +            fail(ASCIILiteral("Matches everything."));

It's unclear why "matches everything" is an error.

Also, representing and error that you reference elsewhere (up above) by a non-shared ASCII literal string seems like a maintenance nightmare.

> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:350
> +    testPatternError("a*b?.*.?[a-z]?[a-z]*", "Matches everything.");

And is this relevant to the whole "matches everything" string up above?
Comment 12 Alex Christensen 2015-03-18 15:57:27 PDT
Comment on attachment 248945 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=248945&action=review

>> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:127
>> +                WTFLogAlways("Trigger matching everything found not at beginning.  This may cause incorrect behavior with ignore-previous-rules");
> 
> Is it enough to simply log an error for bad outside data? In general we understand that outside data cannot be trusted and might be malformed, and therefore we should handle it correctly.

I'm not sure.  I think we should fail completely, but Ben thinks we should succeed, but inform developers about semi-bad data like this.

>> Source/WebCore/contentextensions/URLFilterParser.cpp:451
>> +            fail(ASCIILiteral("Matches everything."));
> 
> It's unclear why "matches everything" is an error.
> 
> Also, representing and error that you reference elsewhere (up above) by a non-shared ASCII literal string seems like a maintenance nightmare.

It's not an error.  If a regex matches everything we should not compile it in the NFA because we will put it directly in the root of the DFA later.  

I agree, failure cases should be an enum.  There are a lot of them, and that might be a switch for another patch, though.

>> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:350
>> +    testPatternError("a*b?.*.?[a-z]?[a-z]*", "Matches everything.");
> 
> And is this relevant to the whole "matches everything" string up above?

Yes.  This is the failure string.
Comment 13 Brady Eidson 2015-03-18 16:07:54 PDT
(In reply to comment #12)
> Comment on attachment 248945 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=248945&action=review
> 
> >> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:127
> >> +                WTFLogAlways("Trigger matching everything found not at beginning.  This may cause incorrect behavior with ignore-previous-rules");
> > 
> > Is it enough to simply log an error for bad outside data? In general we understand that outside data cannot be trusted and might be malformed, and therefore we should handle it correctly.
> 
> I'm not sure.  I think we should fail completely, but Ben thinks we should
> succeed, but inform developers about semi-bad data like this.

I don't understand why this is not a case that we can make work correctly?

But I would rather see failure with a log message than see incorrect success with an error message.

> 
> >> Source/WebCore/contentextensions/URLFilterParser.cpp:451
> >> +            fail(ASCIILiteral("Matches everything."));
> > 
> > It's unclear why "matches everything" is an error.
> > 
> > Also, representing and error that you reference elsewhere (up above) by a non-shared ASCII literal string seems like a maintenance nightmare.
> 
> It's not an error.  If a regex matches everything we should not compile it
> in the NFA because we will put it directly in the root of the DFA later.  
> 
> I agree, failure cases should be an enum.  There are a lot of them, and that
> might be a switch for another patch, though.

Can we not add new ones in the meantime? Especially since it is apparently just to support a test...?

> 
> >> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:350
> >> +    testPatternError("a*b?.*.?[a-z]?[a-z]*", "Matches everything.");
> > 
> > And is this relevant to the whole "matches everything" string up above?
> 
> Yes.  This is the failure string.

Having a string in a test hard coded in the code itself just to support the test is highly irregular. Please don't do that?
Comment 14 Alex Christensen 2015-03-18 16:16:59 PDT
(In reply to comment #13)
> (In reply to comment #12)
> > Comment on attachment 248945 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=248945&action=review
> > 
> > >> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:127
> > >> +                WTFLogAlways("Trigger matching everything found not at beginning.  This may cause incorrect behavior with ignore-previous-rules");
> > > 
> > > Is it enough to simply log an error for bad outside data? In general we understand that outside data cannot be trusted and might be malformed, and therefore we should handle it correctly.
> > 
> > I'm not sure.  I think we should fail completely, but Ben thinks we should
> > succeed, but inform developers about semi-bad data like this.
> 
> I don't understand why this is not a case that we can make work correctly?
If there is ignore-previous-rules between rules that match anything and we compiled a stylesheet of everything that matches everything, then we would not be able to use the compiled stylesheet or we would have incorrect behavior. 
> But I would rather see failure with a log message than see incorrect success
> with an error message.
I agree.  I think it should fail completely.
> 
> > 
> > >> Source/WebCore/contentextensions/URLFilterParser.cpp:451
> > >> +            fail(ASCIILiteral("Matches everything."));
> > > 
> > > It's unclear why "matches everything" is an error.
> > > 
> > > Also, representing and error that you reference elsewhere (up above) by a non-shared ASCII literal string seems like a maintenance nightmare.
> > 
> > It's not an error.  If a regex matches everything we should not compile it
> > in the NFA because we will put it directly in the root of the DFA later.  
> > 
> > I agree, failure cases should be an enum.  There are a lot of them, and that
> > might be a switch for another patch, though.
> 
> Can we not add new ones in the meantime? Especially since it is apparently
> just to support a test...?
It's not just to support a test.  Regular expressions that match everything are not compiled through the NFA because that would just add a lot of transitions, slow down compiling, and end up having all those actions in the root anyways.  We are shortcutting that process to speed up compiling.  I added the test to make sure we are taking the shortcut at the correct times.
> 
> > 
> > >> Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:350
> > >> +    testPatternError("a*b?.*.?[a-z]?[a-z]*", "Matches everything.");
> > > 
> > > And is this relevant to the whole "matches everything" string up above?
> > 
> > Yes.  This is the failure string.
> 
> Having a string in a test hard coded in the code itself just to support the
> test is highly irregular. Please don't do that?
I'll replace fail(const String&) with fail(enum FailureReason)
Comment 15 Brady Eidson 2015-03-18 16:29:47 PDT
(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > Comment on attachment 248945 [details]
> > > Patch
> > > 
> > > View in context:
> > > https://bugs.webkit.org/attachment.cgi?id=248945&action=review
> > > 
> > > >> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:127
> > > >> +                WTFLogAlways("Trigger matching everything found not at beginning.  This may cause incorrect behavior with ignore-previous-rules");
> > > > 
> > > > Is it enough to simply log an error for bad outside data? In general we understand that outside data cannot be trusted and might be malformed, and therefore we should handle it correctly.
> > > 
> > > I'm not sure.  I think we should fail completely, but Ben thinks we should
> > > succeed, but inform developers about semi-bad data like this.
> > 
> > I don't understand why this is not a case that we can make work correctly?
> If there is ignore-previous-rules between rules that match anything and we
> compiled a stylesheet of everything that matches everything, then we would
> not be able to use the compiled stylesheet or we would have incorrect
> behavior. 

The ContentExtensionCompiler is NOT compiling a stylesheet. It's simply collecting a set of selectors that will be used in the match-all case.

So if the match-all case says "do these 20 selectors, then ignore previous rules, then do these other 30 selectors"... we could simply toss the first 20 and use only the following 30.

The stylesheet will be put together later in the backend.

Does this make sense?
Comment 16 Alex Christensen 2015-03-18 18:30:47 PDT
Created attachment 249006 [details]
Patch
Comment 17 Brady Eidson 2015-03-18 18:48:55 PDT
Comment on attachment 249006 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=249006&action=review

> Source/WebCore/contentextensions/CompiledContentExtension.cpp:41
> +    // FIXME: This needs to be cleaned up, optimized, and moved to DFABytecodeInterpreter.

Why a FIXME, and not doing it now?  Especially since a class friend'ing is required as-is

> Source/WebCore/contentextensions/CompiledContentExtension.cpp:58
> +        // FIXME: Make sure only css selectors can get here.

Why a FIXME, and not doing it in this patch?
Comment 18 Brady Eidson 2015-03-18 18:49:26 PDT
Also, does this patch actually affect the CSS display none test that is changed?
Comment 19 Alex Christensen 2015-03-18 19:26:16 PDT
Created attachment 249008 [details]
Patch
Comment 20 Alex Christensen 2015-03-18 19:29:45 PDT
(In reply to comment #17)
> Comment on attachment 249006 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=249006&action=review
> 
> > Source/WebCore/contentextensions/CompiledContentExtension.cpp:41
> > +    // FIXME: This needs to be cleaned up, optimized, and moved to DFABytecodeInterpreter.
Done.
> 
> Why a FIXME, and not doing it now?  Especially since a class friend'ing is
> required as-is
> 
> > Source/WebCore/contentextensions/CompiledContentExtension.cpp:58
> > +        // FIXME: Make sure only css selectors can get here.
> 
> Why a FIXME, and not doing it in this patch?
Nothing terrible would happen if a non-css action matched everything.  We could add a test for that later if we want to do something special.
(In reply to comment #17)
> Also, does this patch actually affect the CSS display none test that is changed?
I removed the changes to LayoutTests.  They were not really related to this patch.
Comment 21 WebKit Commit Bot 2015-03-18 21:57:57 PDT
Comment on attachment 249008 [details]
Patch

Clearing flags on attachment: 249008

Committed r181726: <http://trac.webkit.org/changeset/181726>
Comment 22 WebKit Commit Bot 2015-03-18 21:58:03 PDT
All reviewed patches have been landed.  Closing bug.
Comment 23 Alex Christensen 2015-03-19 08:58:01 PDT
*** Bug 142298 has been marked as a duplicate of this bug. ***