Bug 16036 - Further cleanup to PCRE
Summary: Further cleanup to PCRE
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Macintosh OS X 10.4
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-17 21:09 PST by Eric Seidel (no email)
Modified: 2007-12-02 00:55 PST (History)
0 users

See Also:


Attachments
[PATCH] Clean up jsRegExpExecute (24.14 KB, patch)
2007-11-17 21:09 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Remove register keyword and more cleanup (35.50 KB, patch)
2007-11-17 21:09 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] reformat get_othercase_range (1.27 KB, patch)
2007-11-17 21:09 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Get rid of PCRE custom char types (16.80 KB, patch)
2007-11-17 21:09 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Fix spacing for read_repeat_counts (2.29 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] clean up formating in compile_branch (98.14 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] removing more macros (9.84 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Clean up compile_branch, move _pcre_ord2utf8, and rename CompileData (20.25 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Pass around CompileData& instead of CompileData* (10.98 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
timothy: review+
Details | Formatted Diff | Diff
[PATCH] Clean up find_firstassertedchar (5.64 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] give PCRE_STARTLINE a better name and rename match_data to MatchData (13.60 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Pass around MatchData objects by reference (43.88 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Remove unused function could_be_empty_branch (6.00 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] reformat is_anchored (2.31 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Reformat find_fixedlength (7.91 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review-
Details | Formatted Diff | Diff
[PATCH] clean up check_escape (8.04 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] clean up is_counted_repeat (2.02 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] split first_significant_code into two simpler functions (2.84 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Use better variable names for case ignoring options (15.70 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Section off MatchData arguments into args struct (91.45 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review-
Details | Formatted Diff | Diff
[PATCH] Abstract frame variables into locals and args (72.81 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review-
Details | Formatted Diff | Diff
[PATCH] cleanup _pcre_ucp_othercase (2.67 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Add inlines for toLowerCase, isWordChar, isSpaceChar for further regexp speedup (25.56 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Give GET, PUT better names, and add (poor) moveOpcodePtrPastAnyAlternateBranches (37.66 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Further cleanup GET/PUT inlines (3.66 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Get rid of GETCHAR* macros, replacing them with better named inlines (28.06 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Remove redundant match_call_count and move recursion check out of super-hot code path (5.70 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Remove redundant eptrblock struct (19.61 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Remove match_is_group variable for another 5% speedup (23.38 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Lower MATCH_LIMIT_RECURSION to more sane levels to prevent hangs on run-javascriptcore-tests (1.01 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Small speedup (0.7%) by simplifying canUseStackBufferForNextFrame() check (1.91 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Give subjectPtr and instructionPtr sane names, reduce size of MatchFrame for a 0.2% speedup. (108.42 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Comment cleanup (2.43 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Make i locally scoped for better code clarity (16.08 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] change getChar* functions to return result and push 'c' into local scopes for clarity (29.01 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Remove no longer used error code JSRegExpErrorMatchLimit (1.72 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review-
Details | Formatted Diff | Diff
[PATCH] Add repeatInformationFromInstructionOffset inline (7.08 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Remove branch from return (1.74 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Change _NC operators to use _IGNORING_CASE for clarity (5.03 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mitz: review+
Details | Formatted Diff | Diff
[PATCH] Make cur_is_word and prev_is_word locals, and change OP_ANY to OP_ANY_CHAR for clarity (6.27 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mitz: review+
Details | Formatted Diff | Diff
[PATCH] Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings. (16.28 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Deprecate jsRegExpExecute's offset-vector fallback code (1.10 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
oliver: review+
Details | Formatted Diff | Diff
[PATCH] give PCRE_MULTILINE a better name: OptionMatchAcrossMultipleLines (3.33 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Pull first_byte and req_byte optimizations out into separate static funtions, SunSpider reported this as a win. (10.54 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Give consistent naming to the RegExp options/compile flags (13.19 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
[PATCH] Further cleanups to calculateCompiledPatternLengthAndFlags (17.92 KB, patch)
2007-11-19 01:47 PST, Eric Seidel (no email)
sam: review+
Details | Formatted Diff | Diff
Combined patch for Maciej's viewing pleasure (361.97 KB, patch)
2007-11-19 01:52 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2007-11-17 21:09:37 PST
Clean up PCRE more so we can make it faster. :)  Going to try attaching a whole bunch of patches to this bug which we can review one at at time.
Comment 1 Eric Seidel (no email) 2007-11-17 21:09:54 PST
Created attachment 17335 [details]
[PATCH] Clean up jsRegExpExecute

 JavaScriptCore/pcre/pcre_compile.cpp |   12 +-
 JavaScriptCore/pcre/pcre_exec.cpp    |  553 ++++++++++++++++------------------
 JavaScriptCore/pcre/pcre_internal.h  |    6 +-
 3 files changed, 269 insertions(+), 302 deletions(-)
Comment 2 Eric Seidel (no email) 2007-11-17 21:09:55 PST
Created attachment 17336 [details]
[PATCH] Remove register keyword and more cleanup

 JavaScriptCore/pcre/pcre_compile.cpp  |   97 ++++++++++++---------------
 JavaScriptCore/pcre/pcre_exec.cpp     |  119 +++++++++++++++++++--------------
 JavaScriptCore/pcre/pcre_internal.h   |   92 ++++++++-----------------
 JavaScriptCore/pcre/pcre_ord2utf8.cpp |   25 +++----
 JavaScriptCore/pcre/pcre_xclass.cpp   |   85 ++++++++++++++----------
 5 files changed, 203 insertions(+), 215 deletions(-)
Comment 3 Eric Seidel (no email) 2007-11-17 21:09:57 PST
Created attachment 17337 [details]
[PATCH] reformat get_othercase_range

 JavaScriptCore/pcre/pcre_compile.cpp |   46 +++++++++++++++++----------------
 1 files changed, 24 insertions(+), 22 deletions(-)
Comment 4 Eric Seidel (no email) 2007-11-17 21:09:58 PST
Created attachment 17338 [details]
[PATCH] Get rid of PCRE custom char types

 JavaScriptCore/pcre/pcre_compile.cpp |   84 +++++++++++++++------------------
 JavaScriptCore/pcre/pcre_exec.cpp    |   48 ++++++++++----------
 JavaScriptCore/pcre/pcre_internal.h  |   10 +---
 3 files changed, 65 insertions(+), 77 deletions(-)
Comment 5 Eric Seidel (no email) 2007-11-17 21:10:00 PST
Created attachment 17339 [details]
[PATCH] Fix spacing for read_repeat_counts

 JavaScriptCore/pcre/pcre_compile.cpp |   79 +++++++++++++++++-----------------
 1 files changed, 39 insertions(+), 40 deletions(-)
Comment 6 Eric Seidel (no email) 2007-11-17 21:10:02 PST
Created attachment 17340 [details]
[PATCH] clean up formating in compile_branch

 JavaScriptCore/pcre/pcre_compile.cpp | 2351 +++++++++++++++++-----------------
 1 files changed, 1145 insertions(+), 1206 deletions(-)
Comment 7 Eric Seidel (no email) 2007-11-17 21:10:03 PST
Created attachment 17341 [details]
[PATCH] removing more macros

 JavaScriptCore/pcre/pcre_compile.cpp |   36 +++++++++++++++++-----------------
 JavaScriptCore/pcre/pcre_exec.cpp    |   24 +++++++++++-----------
 JavaScriptCore/pcre/pcre_internal.h  |   18 ++++++++--------
 JavaScriptCore/pcre/pcre_xclass.cpp  |    4 +-
 4 files changed, 41 insertions(+), 41 deletions(-)
Comment 8 Eric Seidel (no email) 2007-11-17 21:10:05 PST
Created attachment 17342 [details]
[PATCH] Clean up compile_branch, move _pcre_ord2utf8, and rename CompileData

 .../JavaScriptCore.xcodeproj/project.pbxproj       |    4 -
 JavaScriptCore/pcre/pcre_compile.cpp               |  110 ++++++++++++--------
 JavaScriptCore/pcre/pcre_internal.h                |   19 ++--
 JavaScriptCore/pcre/pcre_ord2utf8.cpp              |   75 -------------
 4 files changed, 76 insertions(+), 132 deletions(-)
Comment 9 Eric Seidel (no email) 2007-11-17 21:10:06 PST
Created attachment 17343 [details]
[PATCH] Pass around CompileData& instead of CompileData*

 JavaScriptCore/pcre/pcre_compile.cpp |   69 +++++++++++++++------------------
 1 files changed, 31 insertions(+), 38 deletions(-)
Comment 10 Eric Seidel (no email) 2007-11-17 21:10:09 PST
Created attachment 17344 [details]
[PATCH] Clean up find_firstassertedchar

 JavaScriptCore/pcre/pcre_compile.cpp |  111 +++++++++++++++++----------------
 1 files changed, 57 insertions(+), 54 deletions(-)
Comment 11 Eric Seidel (no email) 2007-11-17 21:10:10 PST
Created attachment 17345 [details]
[PATCH] give PCRE_STARTLINE a better name and rename match_data to MatchData

 JavaScriptCore/pcre/pcre_compile.cpp |  160 ++++++++++++++++-----------------
 JavaScriptCore/pcre/pcre_exec.cpp    |   45 +++++-----
 JavaScriptCore/pcre/pcre_internal.h  |    3 +-
 3 files changed, 104 insertions(+), 104 deletions(-)
Comment 12 Eric Seidel (no email) 2007-11-17 21:10:12 PST
Created attachment 17346 [details]
[PATCH] Pass around MatchData objects by reference

 JavaScriptCore/pcre/pcre_exec.cpp |  294 ++++++++++++++++++------------------
 1 files changed, 147 insertions(+), 147 deletions(-)
Comment 13 Eric Seidel (no email) 2007-11-17 21:10:14 PST
Created attachment 17347 [details]
[PATCH] Remove unused function could_be_empty_branch

 JavaScriptCore/pcre/pcre_compile.cpp |  155 ++--------------------------------
 1 files changed, 7 insertions(+), 148 deletions(-)
Comment 14 Eric Seidel (no email) 2007-11-17 21:10:15 PST
Created attachment 17348 [details]
[PATCH] reformat is_anchored

 JavaScriptCore/pcre/pcre_compile.cpp |   62 +++++++++++++++------------------
 1 files changed, 28 insertions(+), 34 deletions(-)
Comment 15 Eric Seidel (no email) 2007-11-17 21:10:17 PST
Created attachment 17349 [details]
[PATCH] Reformat find_fixedlength

 JavaScriptCore/pcre/pcre_compile.cpp |  281 +++++++++++++++++-----------------
 1 files changed, 143 insertions(+), 138 deletions(-)
Comment 16 Eric Seidel (no email) 2007-11-17 21:10:19 PST
Created attachment 17350 [details]
[PATCH] clean up check_escape

 JavaScriptCore/pcre/pcre_compile.cpp |  242 +++++++++++++++++-----------------
 1 files changed, 118 insertions(+), 124 deletions(-)
Comment 17 Eric Seidel (no email) 2007-11-17 21:10:20 PST
Created attachment 17351 [details]
[PATCH] clean up is_counted_repeat

 JavaScriptCore/pcre/pcre_compile.cpp |   44 ++++++++++++++++-----------------
 1 files changed, 21 insertions(+), 23 deletions(-)
Comment 18 Eric Seidel (no email) 2007-11-17 21:10:22 PST
Created attachment 17352 [details]
[PATCH] split first_significant_code into two simpler functions

 JavaScriptCore/pcre/pcre_compile.cpp |   55 ++++++++++++++++------------------
 1 files changed, 26 insertions(+), 29 deletions(-)
Comment 19 Eric Seidel (no email) 2007-11-17 21:10:23 PST
Created attachment 17353 [details]
[PATCH] Use better variable names for case ignoring options

 JavaScriptCore/pcre/pcre_compile.cpp |   41 ++++++++++++++++-----------------
 JavaScriptCore/pcre/pcre_exec.cpp    |   36 +++++++++++++++---------------
 JavaScriptCore/pcre/pcre_internal.h  |    4 +-
 3 files changed, 40 insertions(+), 41 deletions(-)
Comment 20 Eric Seidel (no email) 2007-11-17 21:10:25 PST
Created attachment 17354 [details]
[PATCH] Section off MatchData arguments into args struct

 JavaScriptCore/pcre/pcre_exec.cpp |  728 +++++++++++++++++++------------------
 1 files changed, 367 insertions(+), 361 deletions(-)
Comment 21 Eric Seidel (no email) 2007-11-17 21:10:27 PST
Created attachment 17355 [details]
[PATCH] Abstract frame variables into locals and args

 JavaScriptCore/pcre/pcre_compile.cpp |    2 +-
 JavaScriptCore/pcre/pcre_exec.cpp    |  465 +++++++++++++++++-----------------
 JavaScriptCore/pcre/pcre_internal.h  |   16 +-
 3 files changed, 244 insertions(+), 239 deletions(-)
Comment 22 Eric Seidel (no email) 2007-11-17 21:10:29 PST
Created attachment 17356 [details]
[PATCH] cleanup _pcre_ucp_othercase

 JavaScriptCore/pcre/pcre_ucp_searchfuncs.cpp |   69 +++++++++++++------------
 1 files changed, 36 insertions(+), 33 deletions(-)
Comment 23 Eric Seidel (no email) 2007-11-17 21:10:30 PST
Created attachment 17357 [details]
[PATCH] Add inlines for toLowerCase, isWordChar, isSpaceChar for further regexp speedup

 JavaScriptCore/pcre/pcre_compile.cpp |   22 +++++-----
 JavaScriptCore/pcre/pcre_exec.cpp    |   75 +++++++++++++++-------------------
 JavaScriptCore/pcre/pcre_internal.h  |   56 ++++++++++++++++---------
 JavaScriptCore/pcre/pcre_xclass.cpp  |    3 +-
 4 files changed, 81 insertions(+), 75 deletions(-)
Comment 24 Eric Seidel (no email) 2007-11-17 21:10:32 PST
Created attachment 17358 [details]
[PATCH] Give GET, PUT better names, and add (poor) moveOpcodePtrPastAnyAlternateBranches

 JavaScriptCore/pcre/pcre_compile.cpp         |  120 +++++++++++++-------------
 JavaScriptCore/pcre/pcre_exec.cpp            |   62 ++++++-------
 JavaScriptCore/pcre/pcre_internal.h          |   47 ++++++----
 JavaScriptCore/pcre/pcre_ucp_searchfuncs.cpp |    4 +-
 4 files changed, 117 insertions(+), 116 deletions(-)
Comment 25 Eric Seidel (no email) 2007-11-17 21:10:33 PST
Created attachment 17359 [details]
[PATCH] Further cleanup GET/PUT inlines

 JavaScriptCore/pcre/pcre_internal.h |   49 ++++++++++++++++++-----------------
 1 files changed, 25 insertions(+), 24 deletions(-)
Comment 26 Eric Seidel (no email) 2007-11-17 21:10:35 PST
Created attachment 17360 [details]
[PATCH] Get rid of GETCHAR* macros, replacing them with better named inlines

 JavaScriptCore/pcre/pcre_compile.cpp |   33 ++++------
 JavaScriptCore/pcre/pcre_exec.cpp    |  112 +++++++++++++++++-----------------
 JavaScriptCore/pcre/pcre_internal.h  |   55 ++++++++++++-----
 3 files changed, 106 insertions(+), 94 deletions(-)
Comment 27 Eric Seidel (no email) 2007-11-17 21:10:36 PST
Created attachment 17361 [details]
[PATCH] Remove redundant match_call_count and move recursion check out of super-hot code path

SunSpider says this is at least an 8% speedup for regexp.
---
 JavaScriptCore/pcre/pcre_exec.cpp   |   33 ++++++++++-----------------------
 JavaScriptCore/pcre/pcre_internal.h |   25 +++++--------------------
 2 files changed, 15 insertions(+), 43 deletions(-)
Comment 28 Eric Seidel (no email) 2007-11-17 21:10:38 PST
Created attachment 17362 [details]
[PATCH] Remove redundant eptrblock struct

 JavaScriptCore/pcre/pcre_exec.cpp |   99 ++++++++++++++----------------------
 1 files changed, 39 insertions(+), 60 deletions(-)
Comment 29 Eric Seidel (no email) 2007-11-17 21:11:40 PST
So I've not marked any of these for review.  But anyone who would like to go through and review them, that would be most appreciated. :)

Hum... I might need to make white-space ignoring versions.
Comment 30 Sam Weinig 2007-11-17 21:23:00 PST
Comment on attachment 17337 [details]
[PATCH] reformat get_othercase_range

The *'s should be next to the type here.

+static bool get_othercase_range(int *cptr, int d, int *ocptr, int *odptr)

Needs changelog.  Otherwise, r=me.
Comment 31 Sam Weinig 2007-11-17 21:24:49 PST
Comment on attachment 17339 [details]
[PATCH] Fix spacing for read_repeat_counts

Needs ChangeLog
r=me.
Comment 32 Sam Weinig 2007-11-17 21:28:09 PST
Comment on attachment 17344 [details]
[PATCH] Clean up find_firstassertedchar

r=me.  Changelog please
Comment 33 Sam Weinig 2007-11-17 21:36:12 PST
Comment on attachment 17348 [details]
[PATCH] reformat is_anchored

We generally spell unsigned int as unsigned.  Otherwise, r=me with a log of changes.
Comment 34 Sam Weinig 2007-11-17 21:38:09 PST
Comment on attachment 17351 [details]
[PATCH] clean up is_counted_repeat

r=me with ChangeLog.
Comment 35 Sam Weinig 2007-11-18 15:10:54 PST
Comment on attachment 17352 [details]
[PATCH] split first_significant_code into two simpler functions

The * should go next to the type here.
+        const uschar *scode = firstSignificantOpCode(code + 1 + LINK_SIZE);

r=me with ChangeLog.
Comment 36 Sam Weinig 2007-11-18 15:14:04 PST
Comment on attachment 17356 [details]
[PATCH] cleanup _pcre_ucp_othercase

In the other patches I have reviewed so far you are changing infinite loops like
+    for (;;) {
to while (true).  I think it is better to be consistent and change this one as well.

r=me with ChangeLog.
Comment 37 Sam Weinig 2007-11-18 15:22:24 PST
Comment on attachment 17359 [details]
[PATCH] Further cleanup GET/PUT inlines

r=me with the ChangeLog
Comment 38 Sam Weinig 2007-11-18 17:48:43 PST
Comment on attachment 17341 [details]
[PATCH] removing more macros

r=me with ChangeLog.
Comment 39 Sam Weinig 2007-11-18 19:10:11 PST
Comment on attachment 17338 [details]
[PATCH] Get rid of PCRE custom char types

The typedef for USPTR was const UChar* but you seem to be using a non-const version in most places you replaced it.  What was the motivation behind that change.  Otherwise, r=me with a ChangeLog.
Comment 40 Sam Weinig 2007-11-18 19:32:13 PST
Comment on attachment 17349 [details]
[PATCH] Reformat find_fixedlength

The * should go next to the type here.
+static int find_fixedlength(uschar *code, int options)

and here
+    uschar *cc = code + 1 + LINK_SIZE;

I think should be while (true) for consistency.
+    for (;;) {

Brace should go on the same line as the switch statement.
+        switch (op)
+        {

I think spliting this onto multiple lines will make it more clear. (appears twice)
+                do cc += GET(cc, 1); while (*cc == OP_ALT);

In the inner switch statement you indent the cases differently than the outer switch.  Consistency is the key.
+                switch (*cc) {
+                case OP_CRSTAR:

r-
Comment 41 Sam Weinig 2007-11-18 19:45:15 PST
Comment on attachment 17350 [details]
[PATCH] clean up check_escape

I am not a fan of multiple case statements on a line but I don't think we have a rule about this.
+        case '1': case '2': case '3': case '4': case '5':
+        case '6': case '7': case '8': case '9':

r=me.
Comment 42 Sam Weinig 2007-11-18 19:49:56 PST
Comment on attachment 17345 [details]
[PATCH] give PCRE_STARTLINE a better name and rename match_data to MatchData

r=me with ChangeLoggy.
Comment 43 Sam Weinig 2007-11-18 19:53:15 PST
Comment on attachment 17342 [details]
[PATCH] Clean up compile_branch, move _pcre_ord2utf8, and rename CompileData

r=me
Comment 44 Eric Seidel (no email) 2007-11-19 00:58:27 PST
Created attachment 17376 [details]
[PATCH] Remove match_is_group variable for another 5% speedup

 JavaScriptCore/pcre/pcre_compile.cpp |    1 +
 JavaScriptCore/pcre/pcre_exec.cpp    |  143 +++++++++++++++------------------
 2 files changed, 66 insertions(+), 78 deletions(-)
Comment 45 Eric Seidel (no email) 2007-11-19 00:58:29 PST
Created attachment 17377 [details]
[PATCH] Lower MATCH_LIMIT_RECURSION to more sane levels to prevent hangs on run-javascriptcore-tests

 JavaScriptCore/pcre/pcre_internal.h |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)
Comment 46 Eric Seidel (no email) 2007-11-19 00:58:30 PST
Created attachment 17378 [details]
[PATCH] Small speedup (0.7%) by simplifying canUseStackBufferForNextFrame() check

 JavaScriptCore/pcre/pcre_exec.cpp |   20 +++++++++-----------
 1 files changed, 9 insertions(+), 11 deletions(-)
Comment 47 Eric Seidel (no email) 2007-11-19 00:58:31 PST
Created attachment 17379 [details]
[PATCH] Give subjectPtr and instructionPtr sane names, reduce size of MatchFrame for a 0.2% speedup.

 JavaScriptCore/pcre/pcre_compile.cpp |   15 +-
 JavaScriptCore/pcre/pcre_exec.cpp    |  653 +++++++++++++++++-----------------
 JavaScriptCore/pcre/pcre_internal.h  |   38 +-
 JavaScriptCore/pcre/pcre_xclass.cpp  |    6 +-
 4 files changed, 355 insertions(+), 357 deletions(-)
Comment 48 Eric Seidel (no email) 2007-11-19 00:58:33 PST
Created attachment 17380 [details]
[PATCH] Comment cleanup

 JavaScriptCore/pcre/pcre_exec.cpp |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)
Comment 49 Eric Seidel (no email) 2007-11-19 00:58:34 PST
Created attachment 17381 [details]
[PATCH] Make i locally scoped for better code clarity

 JavaScriptCore/pcre/pcre_exec.cpp |   64 ++++++++++++++++++-------------------
 1 files changed, 31 insertions(+), 33 deletions(-)
Comment 50 Eric Seidel (no email) 2007-11-19 00:58:35 PST
Created attachment 17382 [details]
[PATCH] change getChar* functions to return result and push 'c' into local scopes for clarity

 JavaScriptCore/pcre/pcre_compile.cpp |   13 ++--
 JavaScriptCore/pcre/pcre_exec.cpp    |  138 ++++++++++++++++------------------
 JavaScriptCore/pcre/pcre_internal.h  |   25 ++++---
 3 files changed, 87 insertions(+), 89 deletions(-)
Comment 51 Eric Seidel (no email) 2007-11-19 00:58:36 PST
Created attachment 17383 [details]
[PATCH] Remove no longer used error code JSRegExpErrorMatchLimit

 JavaScriptCore/kjs/regexp.cpp       |    2 +-
 JavaScriptCore/pcre/pcre.h          |    1 -
 JavaScriptCore/pcre/pcre_internal.h |    2 +-
 3 files changed, 2 insertions(+), 3 deletions(-)
Comment 52 Eric Seidel (no email) 2007-11-19 00:58:37 PST
Created attachment 17384 [details]
[PATCH] Add repeatInformationFromInstructionOffset inline

 JavaScriptCore/pcre/pcre_exec.cpp |   63 ++++++++++++------------------------
 1 files changed, 21 insertions(+), 42 deletions(-)
Comment 53 Eric Seidel (no email) 2007-11-19 00:58:39 PST
Created attachment 17385 [details]
[PATCH] Remove branch from return

 JavaScriptCore/pcre/pcre_compile.cpp |    6 ++++--
 JavaScriptCore/pcre/pcre_exec.cpp    |    3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)
Comment 54 Eric Seidel (no email) 2007-11-19 00:58:40 PST
Created attachment 17386 [details]
[PATCH] Change _NC operators to use _IGNORING_CASE for clarity

 JavaScriptCore/pcre/pcre_compile.cpp |   16 ++++++++--------
 JavaScriptCore/pcre/pcre_exec.cpp    |    4 ++--
 JavaScriptCore/pcre/pcre_internal.h  |    4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)
Comment 55 Eric Seidel (no email) 2007-11-19 00:58:41 PST
Created attachment 17387 [details]
[PATCH] Make cur_is_word and prev_is_word locals, and change OP_ANY to OP_ANY_CHAR for clarity

 JavaScriptCore/pcre/pcre_compile.cpp |    8 ++++----
 JavaScriptCore/pcre/pcre_exec.cpp    |   15 +++++++++------
 JavaScriptCore/pcre/pcre_internal.h  |    4 ++--
 3 files changed, 15 insertions(+), 12 deletions(-)
Comment 56 Eric Seidel (no email) 2007-11-19 00:58:42 PST
Created attachment 17388 [details]
[PATCH] Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings.

 JavaScriptCore/pcre/pcre_exec.cpp   |   94 +++++++++++++++--------------------
 JavaScriptCore/pcre/pcre_internal.h |   35 +++++++++++++-
 2 files changed, 74 insertions(+), 55 deletions(-)
Comment 57 Eric Seidel (no email) 2007-11-19 00:58:44 PST
Created attachment 17389 [details]
[PATCH] Deprecate jsRegExpExecute's offset-vector fallback code

 JavaScriptCore/pcre/pcre_exec.cpp |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)
Comment 58 Eric Seidel (no email) 2007-11-19 00:58:45 PST
Created attachment 17390 [details]
[PATCH] give PCRE_MULTILINE a better name: OptionMatchAcrossMultipleLines

 JavaScriptCore/pcre/pcre_compile.cpp |    8 ++++----
 JavaScriptCore/pcre/pcre_exec.cpp    |    2 +-
 JavaScriptCore/pcre/pcre_internal.h  |    2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)
Comment 59 Eric Seidel (no email) 2007-11-19 00:58:46 PST
Created attachment 17391 [details]
[PATCH] Pull first_byte and req_byte optimizations out into separate static funtions, SunSpider reported this as a win.

 JavaScriptCore/pcre/pcre_exec.cpp   |  186 ++++++++++++++++------------------
 JavaScriptCore/pcre/pcre_internal.h |   18 ++--
 2 files changed, 99 insertions(+), 105 deletions(-)
Comment 60 Eric Seidel (no email) 2007-11-19 00:58:47 PST
Created attachment 17392 [details]
[PATCH] Give consistent naming to the RegExp options/compile flags

 JavaScriptCore/pcre/pcre_compile.cpp |   48 +++++++++++++++++-----------------
 JavaScriptCore/pcre/pcre_exec.cpp    |   10 +++---
 JavaScriptCore/pcre/pcre_internal.h  |   16 +++++-----
 3 files changed, 37 insertions(+), 37 deletions(-)
Comment 61 Eric Seidel (no email) 2007-11-19 01:04:53 PST
Ok.  I'm ready to start landing these suckers.  After all these patches, RegExp is 18.8% faster.  Other tests are randomly slower and faster.  None of them beyond 2.7% slower (controlflow-recursive) or 2.9% faster (math-cordic).  These are all tests which have nothing to to with RegExps.

Getting this in will allow others (like darin, ggaren) to pick off some of the remaining low-hanging fruit in the RegExp engine (now that one can actually begin to tell what the code is trying to do!)

If you would like any of the patches re-attached in a white-space ignoring form (or if you'd like a combined patch) I'm happy to provide.  I hope this "do lots of work in git and upload all the little patches" approach is a help, not a hinderance.  I plan to land these all individually (with changelogs), as then it's very easy to track any accidental regressions.
Comment 62 Maciej Stachowiak 2007-11-19 01:20:15 PST
Comment on attachment 17347 [details]
[PATCH] Remove unused function could_be_empty_branch

r=me
Comment 63 Eric Seidel (no email) 2007-11-19 01:23:57 PST
unccing darin to curb the flood of email to his inbox
Comment 64 Maciej Stachowiak 2007-11-19 01:28:38 PST
Comment on attachment 17335 [details]
[PATCH] Clean up jsRegExpExecute

r=me
Comment 65 Maciej Stachowiak 2007-11-19 01:29:24 PST
Comment on attachment 17336 [details]
[PATCH] Remove register keyword and more cleanup

r=me
Comment 66 Maciej Stachowiak 2007-11-19 01:30:04 PST
Comment on attachment 17340 [details]
[PATCH] clean up formating in compile_branch

r=me
Comment 67 Eric Seidel (no email) 2007-11-19 01:47:30 PST
Created attachment 17393 [details]
[PATCH] Further cleanups to calculateCompiledPatternLengthAndFlags

 JavaScriptCore/pcre/pcre_compile.cpp |  154 +++++++++++++++++-----------------
 JavaScriptCore/pcre/pcre_internal.h  |    2 +-
 2 files changed, 79 insertions(+), 77 deletions(-)
Comment 68 Eric Seidel (no email) 2007-11-19 01:52:06 PST
Created attachment 17394 [details]
Combined patch for Maciej's viewing pleasure

Here is a combined patch of all preceding patches for folks to try on their own machines.  (I'm still looking  for individual patch reviews, and plan to land them individually.)
Comment 69 Maciej Stachowiak 2007-11-19 01:52:13 PST
Comment on attachment 17390 [details]
[PATCH] give PCRE_MULTILINE a better name: OptionMatchAcrossMultipleLines

r=me
Comment 70 Maciej Stachowiak 2007-11-19 01:55:41 PST
Comment on attachment 17353 [details]
[PATCH] Use better variable names for case ignoring options

r=me
Comment 71 Eric Seidel (no email) 2007-11-19 17:51:28 PST
I just updated to TOT and re-timed.  SunSpider now claims this total patch is a 1.1% speedup.
Comment 72 Sam Weinig 2007-11-19 18:45:22 PST
Comment on attachment 17346 [details]
[PATCH] Pass around MatchData objects by reference

r=me.

This comment could be slightly altered now that you are passing by reference.
   md          pointer to matching data block, if is_subject is true
Comment 73 Sam Weinig 2007-11-19 18:47:10 PST
Comment on attachment 17392 [details]
[PATCH] Give consistent naming to the RegExp options/compile flags

r=me
Comment 74 Sam Weinig 2007-11-19 19:51:14 PST
Comment on attachment 17357 [details]
[PATCH] Add inlines for toLowerCase, isWordChar, isSpaceChar for further regexp speedup

r=me
Comment 75 Sam Weinig 2007-11-19 20:00:37 PST
Comment on attachment 17360 [details]
[PATCH] Get rid of GETCHAR* macros, replacing them with better named inlines

r=me
Comment 76 Sam Weinig 2007-11-19 20:08:43 PST
Comment on attachment 17380 [details]
[PATCH] Comment cleanup

r=me
Comment 77 Sam Weinig 2007-11-20 11:03:24 PST
Comment on attachment 17358 [details]
[PATCH] Give GET, PUT better names, and add (poor) moveOpcodePtrPastAnyAlternateBranches

r=me
Comment 78 Sam Weinig 2007-11-20 13:37:30 PST
Comment on attachment 17362 [details]
[PATCH] Remove redundant eptrblock struct

r=me
Comment 79 Sam Weinig 2007-11-20 13:39:24 PST
Comment on attachment 17354 [details]
[PATCH] Section off MatchData arguments into args struct

For this and http://bugs.webkit.org/attachment.cgi?id=17355 I would like to see some performance numbers to make sure this restructuring did not regress us in any way.  r-
Comment 80 Sam Weinig 2007-11-20 13:39:55 PST
Comment on attachment 17355 [details]
[PATCH] Abstract frame variables into locals and args

r- for same reason as above.
Comment 81 Sam Weinig 2007-11-20 13:41:20 PST
Comment on attachment 17377 [details]
[PATCH] Lower MATCH_LIMIT_RECURSION to more sane levels to prevent hangs on run-javascriptcore-tests

r=me
Comment 82 Sam Weinig 2007-11-20 13:45:46 PST
Comment on attachment 17378 [details]
[PATCH] Small speedup (0.7%) by simplifying canUseStackBufferForNextFrame() check

This looks good but I think you can use a static const instead of the #define
+#define FRAMES_ON_STACK 16

r=me otherwise.
Comment 83 Sam Weinig 2007-11-20 14:09:48 PST
Comment on attachment 17391 [details]
[PATCH] Pull first_byte and req_byte optimizations out into separate static funtions, SunSpider reported this as a win.

There are a few style issues in tryFirstByteOptimization:

Brace after if.
+        if (first_byte_caseless)
+            while (subjectPtr < endSubject) {

braces after the else
+        else
+            while (subjectPtr < endSubject && *subjectPtr != first_char)
+                subjectPtr++;


Else on the same line as the brace.
+    }
+    /* Or to just after \n for a multiline match if possible */
+    else if (useMultiLineFirstCharOptimization) {

r=me
Comment 84 Sam Weinig 2007-11-20 14:52:08 PST
Comment on attachment 17393 [details]
[PATCH] Further cleanups to calculateCompiledPatternLengthAndFlags

r=me
Comment 85 Sam Weinig 2007-11-20 15:00:19 PST
Comment on attachment 17381 [details]
[PATCH] Make i locally scoped for better code clarity

Bad bad Eric.  This code will decrement the i counter twice as much.  Also, the indenting is off.
-    i = 255;
-    while (!opcode_jump_table[i])
-        opcode_jump_table[i--] = &&CAPTURING_BRACKET;
+    for (int i = 255; !opcode_jump_table[i]; i--)
+            opcode_jump_table[i--] = &&CAPTURING_BRACKET;

Otherwise, this looks great.  r=me.
Comment 86 Sam Weinig 2007-11-20 15:10:39 PST
Comment on attachment 17383 [details]
[PATCH] Remove no longer used error code JSRegExpErrorMatchLimit

I think if you remove a error code it is a good idea to make the range continuous again (make JSRegExpErrorRecursionLimit = -4)
as we aren't keeping any kind of compatibility here.

This change seems like it should go in the patch that originally made the comment.
- Currently that's 100000 - 16 * (23 * 4)  ~ 9MB
+ Currently that's 100000 - 16 * (23 * 4)  ~ 90MB

r-
Comment 87 Sam Weinig 2007-11-20 16:07:02 PST
Comment on attachment 17385 [details]
[PATCH] Remove branch from return

r=me
Comment 88 mitz 2007-11-21 20:50:33 PST
Comment on attachment 17386 [details]
[PATCH] Change _NC operators to use _IGNORING_CASE for clarity

r=me
Comment 89 Maciej Stachowiak 2007-11-21 20:50:41 PST
Comment on attachment 17361 [details]
[PATCH] Remove redundant match_call_count and move recursion check out of super-hot code path

r=me
Comment 90 mitz 2007-11-21 20:51:58 PST
Comment on attachment 17387 [details]
[PATCH] Make cur_is_word and prev_is_word locals, and change OP_ANY to OP_ANY_CHAR for clarity

r=me
Comment 91 Maciej Stachowiak 2007-11-21 21:00:24 PST
Comment on attachment 17387 [details]
[PATCH] Make cur_is_word and prev_is_word locals, and change OP_ANY to OP_ANY_CHAR for clarity

r=me
Comment 92 Maciej Stachowiak 2007-11-22 19:58:03 PST
Comment on attachment 17384 [details]
[PATCH] Add repeatInformationFromInstructionOffset inline

r=me
Comment 93 Maciej Stachowiak 2007-11-22 23:03:40 PST
Comment on attachment 17376 [details]
[PATCH] Remove match_is_group variable for another 5% speedup

r=me
Comment 94 Maciej Stachowiak 2007-11-22 23:34:33 PST
Comment on attachment 17379 [details]
[PATCH] Give subjectPtr and instructionPtr sane names, reduce size of MatchFrame for a 0.2% speedup.

r=me
Comment 95 Maciej Stachowiak 2007-11-22 23:35:17 PST
Comment on attachment 17379 [details]
[PATCH] Give subjectPtr and instructionPtr sane names, reduce size of MatchFrame for a 0.2% speedup.


You should fix the indentation here:

   offset      index into the offset vector
-  eptr        points into the subject
+  subjectPtr        points into the subject
   length      length to be matched
Comment 96 Eric Seidel (no email) 2007-11-24 13:49:27 PST
(In reply to comment #40)
> (From update of attachment 17349 [details] [edit])
> r-

I fixed all your issues except for the case indenting.  That will have to go on a later patch, as a I fear it would break all dependent patches in my tree.  Sadly, you're not around on IRC, but I'm going to assume you're OK with this.  this is git, so I can always go back and change it. :)

Comment 97 Maciej Stachowiak 2007-11-24 19:10:40 PST
Comment on attachment 17382 [details]
[PATCH] change getChar* functions to return result and push 'c' into local scopes for clarity

r=me
Comment 98 Maciej Stachowiak 2007-11-24 19:26:37 PST
Comment on attachment 17388 [details]
[PATCH] Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings.

r=me
Comment 99 Eric Seidel (no email) 2007-11-29 03:02:36 PST
Ok.  I made the suggested changes, added ChangeLogs to each commit, and rechecked performance.  The final number was a 1.7% improvement (some of that is caused by positive noise in the results).

Going to land this in a minute.

The final perf numbers:

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.7% faster     4320.6ms +/- 0.3%   4249.0ms +/- 0.2%     significant

=============================================================================

  3d:                  -                500.2ms +/- 0.4%    498.0ms +/- 0.4% 
    cube:              -                168.2ms +/- 1.0%    166.4ms +/- 0.9% 
    morph:             ??               161.4ms +/- 0.4%    161.6ms +/- 0.4%     not conclusive: might be 0.1% *slower*
    raytrace:          0.4% faster      170.6ms +/- 0.4%    170.0ms +/- 0.0%     significant

  access:              1.3% faster      716.4ms +/- 0.4%    707.0ms +/- 0.4%     significant
    binary-trees:      ??               105.4ms +/- 0.6%    105.6ms +/- 0.6%     not conclusive: might be 0.2% *slower*
    fannkuch:          1.8% faster      348.8ms +/- 0.3%    342.6ms +/- 0.4%     significant
    nbody:             1.6% faster      184.0ms +/- 0.8%    181.0ms +/- 0.5%     significant
    nsieve:            -                 78.2ms +/- 0.7%     77.8ms +/- 0.7% 

  bitops:              -                611.2ms +/- 0.2%    611.0ms +/- 0.2% 
    3bit-bits-in-byte: ??               111.8ms +/- 0.5%    112.0ms +/- 1.1%     not conclusive: might be 0.2% *slower*
    bits-in-byte:      -                150.0ms +/- 0.6%    149.8ms +/- 0.4% 
    bitwise-and:       -                214.0ms +/- 0.0%    214.0ms +/- 0.0% 
    nsieve-bits:       -                135.4ms +/- 0.5%    135.2ms +/- 0.4% 

  controlflow:         -                151.0ms +/- 0.0%    151.0ms +/- 0.0% 
    recursive:         -                151.0ms +/- 0.0%    151.0ms +/- 0.0% 

  crypto:              -                339.8ms +/- 0.5%    340.0ms +/- 0.5% 
    aes:               1.4% *slower*     97.2ms +/- 0.6%     98.6ms +/- 0.7%     significant
    md5:               -                121.2ms +/- 0.5%    121.2ms +/- 0.9% 
    sha1:              1.0% faster      121.4ms +/- 0.6%    120.2ms +/- 0.9%     significant

  date:                0.8% faster      371.8ms +/- 0.3%    369.0ms +/- 0.8%     significant
    format-tofte:      0.9% *slower*    163.8ms +/- 0.3%    165.2ms +/- 1.6%     significant
    format-xparb:      2.0% faster      208.0ms +/- 0.4%    203.8ms +/- 0.3%     significant

  math:                0.4% faster      509.2ms +/- 0.2%    507.0ms +/- 0.5%     significant
    cordic:            0.8% faster      216.0ms +/- 0.4%    214.2ms +/- 0.8%     significant
    partial-sums:      -                172.8ms +/- 0.3%    172.2ms +/- 0.6% 
    spectral-norm:     ??               120.4ms +/- 0.6%    120.6ms +/- 0.6%     not conclusive: might be 0.2% *slower*

  regexp:              18.0% faster     268.2ms +/- 0.4%    220.0ms +/- 0.4%     significant
    dna:               18.0% faster     268.2ms +/- 0.4%    220.0ms +/- 0.4%     significant

  string:              0.8% faster      852.8ms +/- 0.5%    846.0ms +/- 0.4%     significant
    base64:            0.8% faster      121.4ms +/- 0.6%    120.4ms +/- 0.6%     significant
    fasta:             1.2% faster      205.0ms +/- 0.6%    202.6ms +/- 0.3%     significant
    tagcloud:          ??               191.8ms +/- 0.8%    192.4ms +/- 0.4%     not conclusive: might be 0.3% *slower*
    unpack-code:       1.5% faster      199.0ms +/- 0.4%    196.0ms +/- 0.4%     significant
    validate-input:    -                135.6ms +/- 0.8%    134.6ms +/- 1.9%