Bug 42264 - Prepare YARR JIT for matching regexps with iterative parentheses
Summary: Prepare YARR JIT for matching regexps with iterative parentheses
Status: UNCONFIRMED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 42159
  Show dependency treegraph
 
Reported: 2010-07-14 08:16 PDT by Peter Varga
Modified: 2023-08-11 10:32 PDT (History)
6 users (show)

See Also:


Attachments
proposed patch (17.68 KB, patch)
2010-07-14 08:20 PDT, Peter Varga
no flags Details | Formatted Diff | Diff
proposed patch v2 (23.27 KB, patch)
2010-07-15 09:05 PDT, Peter Varga
no flags Details | Formatted Diff | Diff
proposed patch v3 (21.27 KB, patch)
2010-07-28 03:06 PDT, Peter Varga
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Varga 2010-07-14 08:16:12 PDT
The current YARR logic allocates the stack in advance. This behaviour is a disadvantage for generating JIT code for iterative parentheses matching.
Comment 1 Peter Varga 2010-07-14 08:20:11 PDT
Created attachment 61520 [details]
proposed patch

This patch prepares JSC for the further modifications which can
solve the iterative parentheses matching problem. It doesn't change the
current results and doesn't add more features to the engine.
This patch only changes the engine's logic to avoid allocating stack in
advance.

The current solution should be avoided, because in case of iterative parentheses matching
the size of the needed stack for backtrack informations isn't known in advance, it depends
on the number of matching iterations.

I suggest a solution which stores frames on the stack in each of the
cases when backtrack information is needed. These frames are marked by a special value on
the stack which indicates what kind of information is stored in the frame.
These marks allow to allocate the stack dinamically by indicating the start
of the frames, instead of using precalculated offsets.

This solution causes more memory usage but it has possibilities for further optimizations.
The goal was to get an identical behaviour to the previous implementation.

This solution passes all the regression and fast/js layout tests.
For details of the logic I have commented the code.

NOTE: This patch is just a suggestion, not the final solution.
I don't offer this for a review now, I just would like to get some feedback before finishing the
whole work on iterative parentheses matching.

Please don't hesitate to comment!
Comment 2 Gavin Barraclough 2010-07-15 00:54:15 PDT
Hi Peter, this is really interesting, there are a couple of quite major problems here but hopefully they are fixable.

(1) You are recursing on the machine stack without stack guards.  This is unsafe, and may prove to be a security vulnerability - particularly in the case of JS execution in worker threads, where the stack may be small, and pages for the stack may lie within address ranges containing pages of the heap.

The trivial fix to this problem would be to set a recursion limit, but I don't think this will be sufficient in this case.  We have seen important use cases with regular expressions that contain quantified parentheses being run over very large strings, and have run into problems with recursion limits being too low in the past.  I'm doubtful that we can safely rely on there always being sufficient space available on the machine stack at the point a match starts to run all reasonable regexs (again, particularly considering worker threads).  Setting too low a threshold would be a functional regression with respect to the current implementation (which allocates memory on the heap, and as such is not bounded in this fashion).

I'd suggest a better approach to fixing this would be to implement a fast stack-like allocator that does bump-pointer memory allocation from pages allocated on the heap, rather than using the machine stack (possibly making an exception for regexes that only require a small, fixed stack space, in order to avoid the overhead of a memory allocation).  A class implementing the allocator could expose a simple data oriented interface to JIT code, which could describe the bounds of the allocation space current available, and could be used from JIT code to perform bounded bump-pointer allocations.  Upon performing an allocation a limit check could be performed, and if this failed we could call out to C code to attempt to allocate further pages.  Such an allocator could also be used from the YARR interpreter (which currently inefficiently mallocs/frees frames, despite having stack-like behaviour), and we would likely want to share a common implementation of an allocator.


(2) Performance.  Applying this patch I see the runtime of v8-regex increase from 227ms to 470ms - is this in line with your expectations?  Given that some of the time in this benchmark is currently spent in PCRE, this is a >2x regression in the YARR JIT code.  Since it is not strictly necessary for the code generation for regular expressions currently supported by the JIT to be changed in adding support for those containing quantified parentheses, then I do not think any regression in the performance of currently supported regexes can be accepted.  I'm assuming that the problem here is related to the overhead you introduce by removing the use of compile-time determined stack frames, and replacing this with continuous dynamic allocation from the stack throughout.  I don't really expect you to be able to make this change without introducing a significant regression, and would suggest that to overcome this you're likely going to have to return to using compile-timer determined stack frames for disjunctions (excluding the variable amount of space required for nested disjunctions with a quantifier >=2).  This way a stack(-like) allocation need only be performed upon each iteration of a quantified set of parentheses, rather than needing to do so for every single quantified atom encountered (or other atom that needs to record information).


One last thought: fixing problem (1) seems only likely to aggravate problem (2) - resolving the problem of stack overruns seems to inevitably involve introducing some form of bounds checking, which is likely to be further overhead upon every stack allocation, which would only makes any move away from using a compile-time determined stack frames more expensive.
Comment 3 Peter Varga 2010-07-15 09:05:03 PDT
Created attachment 61669 [details]
proposed patch v2

I have updated the patch to avoid the performance regression in YARR JIT.
The modified solution uses the original preallocated stack logic in case of patterns which don't contain
iterative parentheses.

These are the performance results:

		  reference		modified
regexp-dna:	22.2ms +/- 1.4%    21.9ms +/- 1.0%
v8-regexp:	435.1ms +/- 1.2%    435.9ms +/- 1.2%
Comment 4 Peter Varga 2010-07-15 09:06:57 PDT
(In reply to comment #2)

Hi Gavin, thank you very much for your thorough feedback.

I'm trying to answer according your points.

(1) I was planning to introduce stack guards after finishing the remaining features.
A custom allocator for YARR sounds good, but I'm not so experienced in the topic of allocators 
therefore I need some time to investigate the possibilities.

(2) On which platform did you measure this performance result? I did a measurement too on v8-regex 
and didn't get such a big performance regression on Linux with the old patch:

				   reference		modified
regexp:       *1.081x as slow*   418.7ms +/- 1.4%    452.7ms +/- 1.3%

Despite that I fixed the issue. You can also check the new combined performance results at the master bug.
Comment 5 Gavin Barraclough 2010-07-15 15:25:17 PDT
Hi Peter,

I'm very sorry, but I'm afraid I don't think that this patch correctly addresses the issue at hand.  Having two code paths for allocation of local storage adds an undesirable and unnecessary additional complexity and maintenance overhead to the engine.

Either the strategy of using dynamic stack (or similar) allocation for individual atoms can provide performance matching that of using compile-time determined stack frames, or it cannot.  If the performance of a dynamic allocation based design can be made sufficient then we should use this for all regular expressions - having the code to calculate stack-frames is unnecessary complexity and overhead if it is not giving us anything, and should be removed.  That said, if the use of stack frames is beneficial there should be no need to exclude a subset of regular expressions from taking advantage of this.

But again, I would suggest you might want to look into the correctness issue before trying to resolve the performance problem.  My expectation is that any solution to this will only further accentuate the need for stack-frame to be calculated by the compiler.

cheers,
G.
Comment 6 Peter Varga 2010-07-16 00:31:07 PDT
(In reply to comment #5)


Hi Gavin,

With my previous patch I just want to show this logic doesn't cause performance regression. 
IMHO the two kinds of allocation in different cases still cause less complexity and maintance overhead
to the engine than do fallback to PCRE. Of course this patch doesn't solve the stack allocation problem
which you mentioned at point (1). To solve that problem I need further investigation.

I think it is important to write an engine which avoids to use of PCRE by reason of the different results of
matching as well.
At the moment I see two ways to solve the problem:
(a) To extend the current YARR JIT implementation with capable of matching the fallback cases. This solution
changes the current logic. To refer at your point (1) I'm thinking on a solution which keeps the original stack
preallocating logic and use heap only for store each iteration of parentheses.
(b) Write a NEW regex JIT engine which doesn't do fallback and is faster than PCRE. If it will slower than
the YARR JIT in the currently handled cases then the YARR JIT can do fallback to this (instead of PCRE).

What would you prefer to solve this problem? Maybe you can suggest another way? Perhaps my train of thought is wrong?
Comment 7 Peter Varga 2010-07-28 03:06:37 PDT
Created attachment 62807 [details]
proposed patch v3

Here is another patch to prepare the YARR JIT logic to be capable for iterative
parentheses matching. This solution is different from the previous ones.

The goal was to free a register in order to use a new heap memory space
for JIT. The logic stores the results of subpattern matching on the
preallocated stack frame instead of an array allocated on heap (output). 
The pointer of the result array (output) is stored on the stack, therefore this 
array can be filled with information from the stack after the whole pattern matching
process completed.

This solution causes more stack usage but it uses preallocated stack.
Here are the performance results:

                    ref                 mod
regexp-dna:    22.1ms +/- 1.0%    22.1ms +/- 1.0%
regexp-v8:    431.6ms +/- 2.2%    430.9ms +/- 2.3%

At the moment I don't see any better way to solve this problem by
modificating the current YARR logic. Of course any other ideas are welcome.
Comment 8 Ahmad Saleem 2023-08-11 10:32:54 PDT
@msaboff - Is it something needed still?