UNCONFIRMED Bug 42264
Prepare YARR JIT for matching regexps with iterative parentheses
https://bugs.webkit.org/show_bug.cgi?id=42264
Summary Prepare YARR JIT for matching regexps with iterative parentheses
Peter Varga
Reported 2010-07-14 08:16:12 PDT
The current YARR logic allocates the stack in advance. This behaviour is a disadvantage for generating JIT code for iterative parentheses matching.
Attachments
proposed patch (17.68 KB, patch)
2010-07-14 08:20 PDT, Peter Varga
no flags
proposed patch v2 (23.27 KB, patch)
2010-07-15 09:05 PDT, Peter Varga
no flags
proposed patch v3 (21.27 KB, patch)
2010-07-28 03:06 PDT, Peter Varga
no flags
Peter Varga
Comment 1 2010-07-14 08:20:11 PDT
Created attachment 61520 [details] proposed patch This patch prepares JSC for the further modifications which can solve the iterative parentheses matching problem. It doesn't change the current results and doesn't add more features to the engine. This patch only changes the engine's logic to avoid allocating stack in advance. The current solution should be avoided, because in case of iterative parentheses matching the size of the needed stack for backtrack informations isn't known in advance, it depends on the number of matching iterations. I suggest a solution which stores frames on the stack in each of the cases when backtrack information is needed. These frames are marked by a special value on the stack which indicates what kind of information is stored in the frame. These marks allow to allocate the stack dinamically by indicating the start of the frames, instead of using precalculated offsets. This solution causes more memory usage but it has possibilities for further optimizations. The goal was to get an identical behaviour to the previous implementation. This solution passes all the regression and fast/js layout tests. For details of the logic I have commented the code. NOTE: This patch is just a suggestion, not the final solution. I don't offer this for a review now, I just would like to get some feedback before finishing the whole work on iterative parentheses matching. Please don't hesitate to comment!
Gavin Barraclough
Comment 2 2010-07-15 00:54:15 PDT
Hi Peter, this is really interesting, there are a couple of quite major problems here but hopefully they are fixable. (1) You are recursing on the machine stack without stack guards. This is unsafe, and may prove to be a security vulnerability - particularly in the case of JS execution in worker threads, where the stack may be small, and pages for the stack may lie within address ranges containing pages of the heap. The trivial fix to this problem would be to set a recursion limit, but I don't think this will be sufficient in this case. We have seen important use cases with regular expressions that contain quantified parentheses being run over very large strings, and have run into problems with recursion limits being too low in the past. I'm doubtful that we can safely rely on there always being sufficient space available on the machine stack at the point a match starts to run all reasonable regexs (again, particularly considering worker threads). Setting too low a threshold would be a functional regression with respect to the current implementation (which allocates memory on the heap, and as such is not bounded in this fashion). I'd suggest a better approach to fixing this would be to implement a fast stack-like allocator that does bump-pointer memory allocation from pages allocated on the heap, rather than using the machine stack (possibly making an exception for regexes that only require a small, fixed stack space, in order to avoid the overhead of a memory allocation). A class implementing the allocator could expose a simple data oriented interface to JIT code, which could describe the bounds of the allocation space current available, and could be used from JIT code to perform bounded bump-pointer allocations. Upon performing an allocation a limit check could be performed, and if this failed we could call out to C code to attempt to allocate further pages. Such an allocator could also be used from the YARR interpreter (which currently inefficiently mallocs/frees frames, despite having stack-like behaviour), and we would likely want to share a common implementation of an allocator. (2) Performance. Applying this patch I see the runtime of v8-regex increase from 227ms to 470ms - is this in line with your expectations? Given that some of the time in this benchmark is currently spent in PCRE, this is a >2x regression in the YARR JIT code. Since it is not strictly necessary for the code generation for regular expressions currently supported by the JIT to be changed in adding support for those containing quantified parentheses, then I do not think any regression in the performance of currently supported regexes can be accepted. I'm assuming that the problem here is related to the overhead you introduce by removing the use of compile-time determined stack frames, and replacing this with continuous dynamic allocation from the stack throughout. I don't really expect you to be able to make this change without introducing a significant regression, and would suggest that to overcome this you're likely going to have to return to using compile-timer determined stack frames for disjunctions (excluding the variable amount of space required for nested disjunctions with a quantifier >=2). This way a stack(-like) allocation need only be performed upon each iteration of a quantified set of parentheses, rather than needing to do so for every single quantified atom encountered (or other atom that needs to record information). One last thought: fixing problem (1) seems only likely to aggravate problem (2) - resolving the problem of stack overruns seems to inevitably involve introducing some form of bounds checking, which is likely to be further overhead upon every stack allocation, which would only makes any move away from using a compile-time determined stack frames more expensive.
Peter Varga
Comment 3 2010-07-15 09:05:03 PDT
Created attachment 61669 [details] proposed patch v2 I have updated the patch to avoid the performance regression in YARR JIT. The modified solution uses the original preallocated stack logic in case of patterns which don't contain iterative parentheses. These are the performance results: reference modified regexp-dna: 22.2ms +/- 1.4% 21.9ms +/- 1.0% v8-regexp: 435.1ms +/- 1.2% 435.9ms +/- 1.2%
Peter Varga
Comment 4 2010-07-15 09:06:57 PDT
(In reply to comment #2) Hi Gavin, thank you very much for your thorough feedback. I'm trying to answer according your points. (1) I was planning to introduce stack guards after finishing the remaining features. A custom allocator for YARR sounds good, but I'm not so experienced in the topic of allocators therefore I need some time to investigate the possibilities. (2) On which platform did you measure this performance result? I did a measurement too on v8-regex and didn't get such a big performance regression on Linux with the old patch: reference modified regexp: *1.081x as slow* 418.7ms +/- 1.4% 452.7ms +/- 1.3% Despite that I fixed the issue. You can also check the new combined performance results at the master bug.
Gavin Barraclough
Comment 5 2010-07-15 15:25:17 PDT
Hi Peter, I'm very sorry, but I'm afraid I don't think that this patch correctly addresses the issue at hand. Having two code paths for allocation of local storage adds an undesirable and unnecessary additional complexity and maintenance overhead to the engine. Either the strategy of using dynamic stack (or similar) allocation for individual atoms can provide performance matching that of using compile-time determined stack frames, or it cannot. If the performance of a dynamic allocation based design can be made sufficient then we should use this for all regular expressions - having the code to calculate stack-frames is unnecessary complexity and overhead if it is not giving us anything, and should be removed. That said, if the use of stack frames is beneficial there should be no need to exclude a subset of regular expressions from taking advantage of this. But again, I would suggest you might want to look into the correctness issue before trying to resolve the performance problem. My expectation is that any solution to this will only further accentuate the need for stack-frame to be calculated by the compiler. cheers, G.
Peter Varga
Comment 6 2010-07-16 00:31:07 PDT
(In reply to comment #5) Hi Gavin, With my previous patch I just want to show this logic doesn't cause performance regression. IMHO the two kinds of allocation in different cases still cause less complexity and maintance overhead to the engine than do fallback to PCRE. Of course this patch doesn't solve the stack allocation problem which you mentioned at point (1). To solve that problem I need further investigation. I think it is important to write an engine which avoids to use of PCRE by reason of the different results of matching as well. At the moment I see two ways to solve the problem: (a) To extend the current YARR JIT implementation with capable of matching the fallback cases. This solution changes the current logic. To refer at your point (1) I'm thinking on a solution which keeps the original stack preallocating logic and use heap only for store each iteration of parentheses. (b) Write a NEW regex JIT engine which doesn't do fallback and is faster than PCRE. If it will slower than the YARR JIT in the currently handled cases then the YARR JIT can do fallback to this (instead of PCRE). What would you prefer to solve this problem? Maybe you can suggest another way? Perhaps my train of thought is wrong?
Peter Varga
Comment 7 2010-07-28 03:06:37 PDT
Created attachment 62807 [details] proposed patch v3 Here is another patch to prepare the YARR JIT logic to be capable for iterative parentheses matching. This solution is different from the previous ones. The goal was to free a register in order to use a new heap memory space for JIT. The logic stores the results of subpattern matching on the preallocated stack frame instead of an array allocated on heap (output). The pointer of the result array (output) is stored on the stack, therefore this array can be filled with information from the stack after the whole pattern matching process completed. This solution causes more stack usage but it uses preallocated stack. Here are the performance results: ref mod regexp-dna: 22.1ms +/- 1.0% 22.1ms +/- 1.0% regexp-v8: 431.6ms +/- 2.2% 430.9ms +/- 2.3% At the moment I don't see any better way to solve this problem by modificating the current YARR logic. Of course any other ideas are welcome.
Ahmad Saleem
Comment 8 2023-08-11 10:32:54 PDT
@msaboff - Is it something needed still?
Note You need to log in before you can comment on or make changes to this bug.