Bug 145619

Summary: [Content Extensions] Implement branch compaction for DFA byte code
Product: WebKit Reporter: Alex Christensen <achristensen>
Component: WebCore Misc.Assignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, japhet
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Alex Christensen 2015-06-03 15:14:38 PDT
We use lots of 0's for the high bits of jumps in the DFA byte code.  Let's compact this down.
This patch reduced the total byte code size in a large test case from 15123003 bytes to 14085291.
Comment 1 Alex Christensen 2015-06-03 15:28:27 PDT
Created attachment 254217 [details]
Patch
Comment 2 Alex Christensen 2015-06-03 16:40:14 PDT
Int24 should be added in a following patch
Comment 3 Benjamin Poulain 2015-06-05 15:07:50 PDT
Comment on attachment 254217 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=254217&action=review

> Source/WebCore/contentextensions/ContentExtensionCompiler.cpp:237
> +    unsigned totalBytecodeSize = 0;

Don't forget to revisit this since I landed the same thing.

> Source/WebCore/contentextensions/DFABytecode.h:83
> +enum DFABytecodeJumpSize {

May I suggest using exclusive bits?
ARM64 has a test-bit-and-branch instruction for those cases to avoid mask+jump.

> Source/WebCore/contentextensions/DFABytecode.h:94
> +    if (longestPossibleJump <= std::numeric_limits<int8_t>::max() && longestPossibleJump >= std::numeric_limits<int8_t>::min())
> +        return Int8;
> +    if (longestPossibleJump <= std::numeric_limits<int16_t>::max() && longestPossibleJump >= std::numeric_limits<int16_t>::min())
> +        return Int16;

I was gonna complain about using masks instead...but actually this is way cleaner.

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:64
> +    *reinterpret_cast<IntType*>(&bytecode[index]) = value;

Can we assert we have enough free space to write the value? Too slow?

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:69
> +    if (action & 0xFFFF00000000)

Assert the top 2 bytes are zero?

Can you move 0xFFFF00000000 to a constant with a good name?

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:104
> +        return m_maxNodeStartOffsets[destinationNodeIndex] - jumpLocation;

This is very pessimistic. As the compaction progress, the jumpLocation drift further appart from the m_maxNodeStartOffsets of its own node.

The jump length should be the worst distance between the two nodes as computed for m_maxNodeStartOffsets - the offset since the start of the current node.
    (m_maxNodeStartOffsets[destinationNodeIndex] - m_maxNodeStartOffsets[sourceNodeIndex] - (m_nodeStartOffsets[sourceNodeIndex] - jumpLocation)).

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:114
> +    uint32_t jumpLocation = m_bytecode.size() + sizeof(uint8_t);

Suggestion:
-Have the instruction offset and jumpLocation in LinkRecord.
-Do the relative jump from the instruction position instead of the relative address position.

That should remove all the +sizeof(thingy) that are scaring the hell out of me.

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:128
> +    append<uint8_t>(m_bytecode, static_cast<uint8_t>(caseSensitive ? DFABytecodeInstruction::CheckValueCaseSensitive : DFABytecodeInstruction::CheckValueCaseInsensitive) | jumpSize);

Can you move "caseSensitive ? DFABytecodeInstruction::CheckValueCaseSensitive : DFABytecodeInstruction::CheckValueCaseInsensitive" to its own temporary variable?
That line does too much.

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:138
> +    uint32_t jumpLocation = m_bytecode.size() + 3 * sizeof(uint8_t);

This is the kind of code I don't like regarding the base of the jumps. Too much magic.

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:141
> +    append<uint8_t>(m_bytecode, static_cast<uint8_t>(caseSensitive ? DFABytecodeInstruction::CheckValueRangeCaseSensitive : DFABytecodeInstruction::CheckValueRangeCaseInsensitive) | jumpSize);

Temp variable with the instruction?

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:195
> -    memset(destinations, 0xff, sizeof(destinations));
> +    memset(destinations, 0xFF, sizeof(destinations));

WHYYYYYY.

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:318
> +    unsigned nextIndex = sizeof(DFAHeader) + compiledNodeMaxBytecodeSize(m_dfa.root);

I don't understand this.

Shouldn't the start offset of the root be at the offset of the header + the actions of the root?

Isn't compiledNodeMaxBytecodeSize(m_dfa.root) too far?

> Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:319
> +    m_maxNodeStartOffsets[m_dfa.root] = nextIndex;

I have the feeling this case should be split for root:

m_maxNodeStartOffsets[m_dfa.root] = offset of header + initial actions;
nextIndex += size of actions;

> Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:53
> +// FIXME: Get rid of pagesUsed. We don't need to keep track of the pages used.

But, but, I really like it.

> Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:205
> +                    uint32_t jumpLocation = programCounter + sizeof(DFABytecodeInstruction) + sizeof(uint8_t);

jumpLocation -> jumpBase? To keep the pattern destination = base + offset.
Comment 4 Alex Christensen 2015-06-16 13:31:03 PDT
Created attachment 254965 [details]
Patch
Comment 5 Alex Christensen 2015-06-16 13:39:19 PDT
I addressed everything with these notes:
(In reply to comment #3)
> > Source/WebCore/contentextensions/DFABytecode.h:83
> > +enum DFABytecodeJumpSize {
> 
> May I suggest using exclusive bits?
> ARM64 has a test-bit-and-branch instruction for those cases to avoid
> mask+jump.
Good idea, but the compiler cannot prove that this will always have a valid value in it, and it needs a default: RELEASE_ASSERT_NOT_REACHED() in case we are reading invalid byte code.  This would not improve anything.
> > Source/WebCore/contentextensions/DFABytecodeCompiler.cpp:318
> > +    unsigned nextIndex = sizeof(DFAHeader) + compiledNodeMaxBytecodeSize(m_dfa.root);
> 
> I don't understand this.
> 
> Shouldn't the start offset of the root be at the offset of the header + the
> actions of the root?
> 
> Isn't compiledNodeMaxBytecodeSize(m_dfa.root) too far?
That's right, this is an overestimate, but the root is always compiled first so this is never an issue.  Even if it weren't compiled first, this would only be slightly pessimistic.  I don't think it's worth making the code much more complicated just to make this a little more accurate when it doesn't matter.
> > Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:53
> > +// FIXME: Get rid of pagesUsed. We don't need to keep track of the pages used.
> 
> But, but, I really like it.
It's annoying and not even accurate ever since I made the jumps relative to the current DFA's byte code.
> 
> > Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:205
> > +                    uint32_t jumpLocation = programCounter + sizeof(DFABytecodeInstruction) + sizeof(uint8_t);
> 
> jumpLocation -> jumpBase? To keep the pattern destination = base + offset.
I made the jumps relative from the current program counter instead.
Comment 6 Alex Christensen 2015-06-16 14:30:43 PDT
Created attachment 254969 [details]
Patch
Comment 7 WebKit Commit Bot 2015-06-16 15:29:59 PDT
Comment on attachment 254969 [details]
Patch

Clearing flags on attachment: 254969

Committed r185621: <http://trac.webkit.org/changeset/185621>
Comment 8 WebKit Commit Bot 2015-06-16 15:30:02 PDT
All reviewed patches have been landed.  Closing bug.