Bug 16036

Summary: Further cleanup to PCRE
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: JavaScriptCoreAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal    
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
[PATCH] Clean up jsRegExpExecute
mjs: review+
[PATCH] Remove register keyword and more cleanup
mjs: review+
[PATCH] reformat get_othercase_range
sam: review+
[PATCH] Get rid of PCRE custom char types
sam: review+
[PATCH] Fix spacing for read_repeat_counts
sam: review+
[PATCH] clean up formating in compile_branch
mjs: review+
[PATCH] removing more macros
sam: review+
[PATCH] Clean up compile_branch, move _pcre_ord2utf8, and rename CompileData
sam: review+
[PATCH] Pass around CompileData& instead of CompileData*
timothy: review+
[PATCH] Clean up find_firstassertedchar
sam: review+
[PATCH] give PCRE_STARTLINE a better name and rename match_data to MatchData
sam: review+
[PATCH] Pass around MatchData objects by reference
sam: review+
[PATCH] Remove unused function could_be_empty_branch
mjs: review+
[PATCH] reformat is_anchored
sam: review+
[PATCH] Reformat find_fixedlength
sam: review-
[PATCH] clean up check_escape
sam: review+
[PATCH] clean up is_counted_repeat
sam: review+
[PATCH] split first_significant_code into two simpler functions
sam: review+
[PATCH] Use better variable names for case ignoring options
mjs: review+
[PATCH] Section off MatchData arguments into args struct
sam: review-
[PATCH] Abstract frame variables into locals and args
sam: review-
[PATCH] cleanup _pcre_ucp_othercase
sam: review+
[PATCH] Add inlines for toLowerCase, isWordChar, isSpaceChar for further regexp speedup
sam: review+
[PATCH] Give GET, PUT better names, and add (poor) moveOpcodePtrPastAnyAlternateBranches
sam: review+
[PATCH] Further cleanup GET/PUT inlines
sam: review+
[PATCH] Get rid of GETCHAR* macros, replacing them with better named inlines
sam: review+
[PATCH] Remove redundant match_call_count and move recursion check out of super-hot code path
mjs: review+
[PATCH] Remove redundant eptrblock struct
sam: review+
[PATCH] Remove match_is_group variable for another 5% speedup
mjs: review+
[PATCH] Lower MATCH_LIMIT_RECURSION to more sane levels to prevent hangs on run-javascriptcore-tests
sam: review+
[PATCH] Small speedup (0.7%) by simplifying canUseStackBufferForNextFrame() check
sam: review+
[PATCH] Give subjectPtr and instructionPtr sane names, reduce size of MatchFrame for a 0.2% speedup.
mjs: review+
[PATCH] Comment cleanup
sam: review+
[PATCH] Make i locally scoped for better code clarity
sam: review+
[PATCH] change getChar* functions to return result and push 'c' into local scopes for clarity
mjs: review+
[PATCH] Remove no longer used error code JSRegExpErrorMatchLimit
sam: review-
[PATCH] Add repeatInformationFromInstructionOffset inline
mjs: review+
[PATCH] Remove branch from return
sam: review+
[PATCH] Change _NC operators to use _IGNORING_CASE for clarity
mitz: review+
[PATCH] Make cur_is_word and prev_is_word locals, and change OP_ANY to OP_ANY_CHAR for clarity
mitz: review+
[PATCH] Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings.
mjs: review+
[PATCH] Deprecate jsRegExpExecute's offset-vector fallback code
oliver: review+
[PATCH] give PCRE_MULTILINE a better name: OptionMatchAcrossMultipleLines
mjs: review+
[PATCH] Pull first_byte and req_byte optimizations out into separate static funtions, SunSpider reported this as a win.
sam: review+
[PATCH] Give consistent naming to the RegExp options/compile flags
sam: review+
[PATCH] Further cleanups to calculateCompiledPatternLengthAndFlags
sam: review+
Combined patch for Maciej's viewing pleasure none

Eric Seidel (no email)
Reported 2007-11-17 21:09:37 PST
Clean up PCRE more so we can make it faster. :) Going to try attaching a whole bunch of patches to this bug which we can review one at at time.
Attachments
[PATCH] Clean up jsRegExpExecute (24.14 KB, patch)
2007-11-17 21:09 PST, Eric Seidel (no email)
mjs: review+
[PATCH] Remove register keyword and more cleanup (35.50 KB, patch)
2007-11-17 21:09 PST, Eric Seidel (no email)
mjs: review+
[PATCH] reformat get_othercase_range (1.27 KB, patch)
2007-11-17 21:09 PST, Eric Seidel (no email)
sam: review+
[PATCH] Get rid of PCRE custom char types (16.80 KB, patch)
2007-11-17 21:09 PST, Eric Seidel (no email)
sam: review+
[PATCH] Fix spacing for read_repeat_counts (2.29 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] clean up formating in compile_branch (98.14 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
mjs: review+
[PATCH] removing more macros (9.84 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Clean up compile_branch, move _pcre_ord2utf8, and rename CompileData (20.25 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Pass around CompileData& instead of CompileData* (10.98 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
timothy: review+
[PATCH] Clean up find_firstassertedchar (5.64 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] give PCRE_STARTLINE a better name and rename match_data to MatchData (13.60 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Pass around MatchData objects by reference (43.88 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Remove unused function could_be_empty_branch (6.00 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
mjs: review+
[PATCH] reformat is_anchored (2.31 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Reformat find_fixedlength (7.91 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review-
[PATCH] clean up check_escape (8.04 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] clean up is_counted_repeat (2.02 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] split first_significant_code into two simpler functions (2.84 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Use better variable names for case ignoring options (15.70 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
mjs: review+
[PATCH] Section off MatchData arguments into args struct (91.45 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review-
[PATCH] Abstract frame variables into locals and args (72.81 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review-
[PATCH] cleanup _pcre_ucp_othercase (2.67 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Add inlines for toLowerCase, isWordChar, isSpaceChar for further regexp speedup (25.56 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Give GET, PUT better names, and add (poor) moveOpcodePtrPastAnyAlternateBranches (37.66 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Further cleanup GET/PUT inlines (3.66 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Get rid of GETCHAR* macros, replacing them with better named inlines (28.06 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Remove redundant match_call_count and move recursion check out of super-hot code path (5.70 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
mjs: review+
[PATCH] Remove redundant eptrblock struct (19.61 KB, patch)
2007-11-17 21:10 PST, Eric Seidel (no email)
sam: review+
[PATCH] Remove match_is_group variable for another 5% speedup (23.38 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
[PATCH] Lower MATCH_LIMIT_RECURSION to more sane levels to prevent hangs on run-javascriptcore-tests (1.01 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
[PATCH] Small speedup (0.7%) by simplifying canUseStackBufferForNextFrame() check (1.91 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
[PATCH] Give subjectPtr and instructionPtr sane names, reduce size of MatchFrame for a 0.2% speedup. (108.42 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
[PATCH] Comment cleanup (2.43 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
[PATCH] Make i locally scoped for better code clarity (16.08 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
[PATCH] change getChar* functions to return result and push 'c' into local scopes for clarity (29.01 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
[PATCH] Remove no longer used error code JSRegExpErrorMatchLimit (1.72 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review-
[PATCH] Add repeatInformationFromInstructionOffset inline (7.08 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
[PATCH] Remove branch from return (1.74 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
[PATCH] Change _NC operators to use _IGNORING_CASE for clarity (5.03 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mitz: review+
[PATCH] Make cur_is_word and prev_is_word locals, and change OP_ANY to OP_ANY_CHAR for clarity (6.27 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mitz: review+
[PATCH] Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings. (16.28 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
[PATCH] Deprecate jsRegExpExecute's offset-vector fallback code (1.10 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
oliver: review+
[PATCH] give PCRE_MULTILINE a better name: OptionMatchAcrossMultipleLines (3.33 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
mjs: review+
[PATCH] Pull first_byte and req_byte optimizations out into separate static funtions, SunSpider reported this as a win. (10.54 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
[PATCH] Give consistent naming to the RegExp options/compile flags (13.19 KB, patch)
2007-11-19 00:58 PST, Eric Seidel (no email)
sam: review+
[PATCH] Further cleanups to calculateCompiledPatternLengthAndFlags (17.92 KB, patch)
2007-11-19 01:47 PST, Eric Seidel (no email)
sam: review+
Combined patch for Maciej's viewing pleasure (361.97 KB, patch)
2007-11-19 01:52 PST, Eric Seidel (no email)
no flags
Eric Seidel (no email)
Comment 1 2007-11-17 21:09:54 PST
Created attachment 17335 [details] [PATCH] Clean up jsRegExpExecute JavaScriptCore/pcre/pcre_compile.cpp | 12 +- JavaScriptCore/pcre/pcre_exec.cpp | 553 ++++++++++++++++------------------ JavaScriptCore/pcre/pcre_internal.h | 6 +- 3 files changed, 269 insertions(+), 302 deletions(-)
Eric Seidel (no email)
Comment 2 2007-11-17 21:09:55 PST
Created attachment 17336 [details] [PATCH] Remove register keyword and more cleanup JavaScriptCore/pcre/pcre_compile.cpp | 97 ++++++++++++--------------- JavaScriptCore/pcre/pcre_exec.cpp | 119 +++++++++++++++++++-------------- JavaScriptCore/pcre/pcre_internal.h | 92 ++++++++----------------- JavaScriptCore/pcre/pcre_ord2utf8.cpp | 25 +++---- JavaScriptCore/pcre/pcre_xclass.cpp | 85 ++++++++++++++---------- 5 files changed, 203 insertions(+), 215 deletions(-)
Eric Seidel (no email)
Comment 3 2007-11-17 21:09:57 PST
Created attachment 17337 [details] [PATCH] reformat get_othercase_range JavaScriptCore/pcre/pcre_compile.cpp | 46 +++++++++++++++++---------------- 1 files changed, 24 insertions(+), 22 deletions(-)
Eric Seidel (no email)
Comment 4 2007-11-17 21:09:58 PST
Created attachment 17338 [details] [PATCH] Get rid of PCRE custom char types JavaScriptCore/pcre/pcre_compile.cpp | 84 +++++++++++++++------------------ JavaScriptCore/pcre/pcre_exec.cpp | 48 ++++++++++---------- JavaScriptCore/pcre/pcre_internal.h | 10 +--- 3 files changed, 65 insertions(+), 77 deletions(-)
Eric Seidel (no email)
Comment 5 2007-11-17 21:10:00 PST
Created attachment 17339 [details] [PATCH] Fix spacing for read_repeat_counts JavaScriptCore/pcre/pcre_compile.cpp | 79 +++++++++++++++++----------------- 1 files changed, 39 insertions(+), 40 deletions(-)
Eric Seidel (no email)
Comment 6 2007-11-17 21:10:02 PST
Created attachment 17340 [details] [PATCH] clean up formating in compile_branch JavaScriptCore/pcre/pcre_compile.cpp | 2351 +++++++++++++++++----------------- 1 files changed, 1145 insertions(+), 1206 deletions(-)
Eric Seidel (no email)
Comment 7 2007-11-17 21:10:03 PST
Created attachment 17341 [details] [PATCH] removing more macros JavaScriptCore/pcre/pcre_compile.cpp | 36 +++++++++++++++++----------------- JavaScriptCore/pcre/pcre_exec.cpp | 24 +++++++++++----------- JavaScriptCore/pcre/pcre_internal.h | 18 ++++++++-------- JavaScriptCore/pcre/pcre_xclass.cpp | 4 +- 4 files changed, 41 insertions(+), 41 deletions(-)
Eric Seidel (no email)
Comment 8 2007-11-17 21:10:05 PST
Created attachment 17342 [details] [PATCH] Clean up compile_branch, move _pcre_ord2utf8, and rename CompileData .../JavaScriptCore.xcodeproj/project.pbxproj | 4 - JavaScriptCore/pcre/pcre_compile.cpp | 110 ++++++++++++-------- JavaScriptCore/pcre/pcre_internal.h | 19 ++-- JavaScriptCore/pcre/pcre_ord2utf8.cpp | 75 ------------- 4 files changed, 76 insertions(+), 132 deletions(-)
Eric Seidel (no email)
Comment 9 2007-11-17 21:10:06 PST
Created attachment 17343 [details] [PATCH] Pass around CompileData& instead of CompileData* JavaScriptCore/pcre/pcre_compile.cpp | 69 +++++++++++++++------------------ 1 files changed, 31 insertions(+), 38 deletions(-)
Eric Seidel (no email)
Comment 10 2007-11-17 21:10:09 PST
Created attachment 17344 [details] [PATCH] Clean up find_firstassertedchar JavaScriptCore/pcre/pcre_compile.cpp | 111 +++++++++++++++++---------------- 1 files changed, 57 insertions(+), 54 deletions(-)
Eric Seidel (no email)
Comment 11 2007-11-17 21:10:10 PST
Created attachment 17345 [details] [PATCH] give PCRE_STARTLINE a better name and rename match_data to MatchData JavaScriptCore/pcre/pcre_compile.cpp | 160 ++++++++++++++++----------------- JavaScriptCore/pcre/pcre_exec.cpp | 45 +++++----- JavaScriptCore/pcre/pcre_internal.h | 3 +- 3 files changed, 104 insertions(+), 104 deletions(-)
Eric Seidel (no email)
Comment 12 2007-11-17 21:10:12 PST
Created attachment 17346 [details] [PATCH] Pass around MatchData objects by reference JavaScriptCore/pcre/pcre_exec.cpp | 294 ++++++++++++++++++------------------ 1 files changed, 147 insertions(+), 147 deletions(-)
Eric Seidel (no email)
Comment 13 2007-11-17 21:10:14 PST
Created attachment 17347 [details] [PATCH] Remove unused function could_be_empty_branch JavaScriptCore/pcre/pcre_compile.cpp | 155 ++-------------------------------- 1 files changed, 7 insertions(+), 148 deletions(-)
Eric Seidel (no email)
Comment 14 2007-11-17 21:10:15 PST
Created attachment 17348 [details] [PATCH] reformat is_anchored JavaScriptCore/pcre/pcre_compile.cpp | 62 +++++++++++++++------------------ 1 files changed, 28 insertions(+), 34 deletions(-)
Eric Seidel (no email)
Comment 15 2007-11-17 21:10:17 PST
Created attachment 17349 [details] [PATCH] Reformat find_fixedlength JavaScriptCore/pcre/pcre_compile.cpp | 281 +++++++++++++++++----------------- 1 files changed, 143 insertions(+), 138 deletions(-)
Eric Seidel (no email)
Comment 16 2007-11-17 21:10:19 PST
Created attachment 17350 [details] [PATCH] clean up check_escape JavaScriptCore/pcre/pcre_compile.cpp | 242 +++++++++++++++++----------------- 1 files changed, 118 insertions(+), 124 deletions(-)
Eric Seidel (no email)
Comment 17 2007-11-17 21:10:20 PST
Created attachment 17351 [details] [PATCH] clean up is_counted_repeat JavaScriptCore/pcre/pcre_compile.cpp | 44 ++++++++++++++++----------------- 1 files changed, 21 insertions(+), 23 deletions(-)
Eric Seidel (no email)
Comment 18 2007-11-17 21:10:22 PST
Created attachment 17352 [details] [PATCH] split first_significant_code into two simpler functions JavaScriptCore/pcre/pcre_compile.cpp | 55 ++++++++++++++++------------------ 1 files changed, 26 insertions(+), 29 deletions(-)
Eric Seidel (no email)
Comment 19 2007-11-17 21:10:23 PST
Created attachment 17353 [details] [PATCH] Use better variable names for case ignoring options JavaScriptCore/pcre/pcre_compile.cpp | 41 ++++++++++++++++----------------- JavaScriptCore/pcre/pcre_exec.cpp | 36 +++++++++++++++--------------- JavaScriptCore/pcre/pcre_internal.h | 4 +- 3 files changed, 40 insertions(+), 41 deletions(-)
Eric Seidel (no email)
Comment 20 2007-11-17 21:10:25 PST
Created attachment 17354 [details] [PATCH] Section off MatchData arguments into args struct JavaScriptCore/pcre/pcre_exec.cpp | 728 +++++++++++++++++++------------------ 1 files changed, 367 insertions(+), 361 deletions(-)
Eric Seidel (no email)
Comment 21 2007-11-17 21:10:27 PST
Created attachment 17355 [details] [PATCH] Abstract frame variables into locals and args JavaScriptCore/pcre/pcre_compile.cpp | 2 +- JavaScriptCore/pcre/pcre_exec.cpp | 465 +++++++++++++++++----------------- JavaScriptCore/pcre/pcre_internal.h | 16 +- 3 files changed, 244 insertions(+), 239 deletions(-)
Eric Seidel (no email)
Comment 22 2007-11-17 21:10:29 PST
Created attachment 17356 [details] [PATCH] cleanup _pcre_ucp_othercase JavaScriptCore/pcre/pcre_ucp_searchfuncs.cpp | 69 +++++++++++++------------ 1 files changed, 36 insertions(+), 33 deletions(-)
Eric Seidel (no email)
Comment 23 2007-11-17 21:10:30 PST
Created attachment 17357 [details] [PATCH] Add inlines for toLowerCase, isWordChar, isSpaceChar for further regexp speedup JavaScriptCore/pcre/pcre_compile.cpp | 22 +++++----- JavaScriptCore/pcre/pcre_exec.cpp | 75 +++++++++++++++------------------- JavaScriptCore/pcre/pcre_internal.h | 56 ++++++++++++++++--------- JavaScriptCore/pcre/pcre_xclass.cpp | 3 +- 4 files changed, 81 insertions(+), 75 deletions(-)
Eric Seidel (no email)
Comment 24 2007-11-17 21:10:32 PST
Created attachment 17358 [details] [PATCH] Give GET, PUT better names, and add (poor) moveOpcodePtrPastAnyAlternateBranches JavaScriptCore/pcre/pcre_compile.cpp | 120 +++++++++++++------------- JavaScriptCore/pcre/pcre_exec.cpp | 62 ++++++------- JavaScriptCore/pcre/pcre_internal.h | 47 ++++++---- JavaScriptCore/pcre/pcre_ucp_searchfuncs.cpp | 4 +- 4 files changed, 117 insertions(+), 116 deletions(-)
Eric Seidel (no email)
Comment 25 2007-11-17 21:10:33 PST
Created attachment 17359 [details] [PATCH] Further cleanup GET/PUT inlines JavaScriptCore/pcre/pcre_internal.h | 49 ++++++++++++++++++----------------- 1 files changed, 25 insertions(+), 24 deletions(-)
Eric Seidel (no email)
Comment 26 2007-11-17 21:10:35 PST
Created attachment 17360 [details] [PATCH] Get rid of GETCHAR* macros, replacing them with better named inlines JavaScriptCore/pcre/pcre_compile.cpp | 33 ++++------ JavaScriptCore/pcre/pcre_exec.cpp | 112 +++++++++++++++++----------------- JavaScriptCore/pcre/pcre_internal.h | 55 ++++++++++++----- 3 files changed, 106 insertions(+), 94 deletions(-)
Eric Seidel (no email)
Comment 27 2007-11-17 21:10:36 PST
Created attachment 17361 [details] [PATCH] Remove redundant match_call_count and move recursion check out of super-hot code path SunSpider says this is at least an 8% speedup for regexp. --- JavaScriptCore/pcre/pcre_exec.cpp | 33 ++++++++++----------------------- JavaScriptCore/pcre/pcre_internal.h | 25 +++++-------------------- 2 files changed, 15 insertions(+), 43 deletions(-)
Eric Seidel (no email)
Comment 28 2007-11-17 21:10:38 PST
Created attachment 17362 [details] [PATCH] Remove redundant eptrblock struct JavaScriptCore/pcre/pcre_exec.cpp | 99 ++++++++++++++---------------------- 1 files changed, 39 insertions(+), 60 deletions(-)
Eric Seidel (no email)
Comment 29 2007-11-17 21:11:40 PST
So I've not marked any of these for review. But anyone who would like to go through and review them, that would be most appreciated. :) Hum... I might need to make white-space ignoring versions.
Sam Weinig
Comment 30 2007-11-17 21:23:00 PST
Comment on attachment 17337 [details] [PATCH] reformat get_othercase_range The *'s should be next to the type here. +static bool get_othercase_range(int *cptr, int d, int *ocptr, int *odptr) Needs changelog. Otherwise, r=me.
Sam Weinig
Comment 31 2007-11-17 21:24:49 PST
Comment on attachment 17339 [details] [PATCH] Fix spacing for read_repeat_counts Needs ChangeLog r=me.
Sam Weinig
Comment 32 2007-11-17 21:28:09 PST
Comment on attachment 17344 [details] [PATCH] Clean up find_firstassertedchar r=me. Changelog please
Sam Weinig
Comment 33 2007-11-17 21:36:12 PST
Comment on attachment 17348 [details] [PATCH] reformat is_anchored We generally spell unsigned int as unsigned. Otherwise, r=me with a log of changes.
Sam Weinig
Comment 34 2007-11-17 21:38:09 PST
Comment on attachment 17351 [details] [PATCH] clean up is_counted_repeat r=me with ChangeLog.
Sam Weinig
Comment 35 2007-11-18 15:10:54 PST
Comment on attachment 17352 [details] [PATCH] split first_significant_code into two simpler functions The * should go next to the type here. + const uschar *scode = firstSignificantOpCode(code + 1 + LINK_SIZE); r=me with ChangeLog.
Sam Weinig
Comment 36 2007-11-18 15:14:04 PST
Comment on attachment 17356 [details] [PATCH] cleanup _pcre_ucp_othercase In the other patches I have reviewed so far you are changing infinite loops like + for (;;) { to while (true). I think it is better to be consistent and change this one as well. r=me with ChangeLog.
Sam Weinig
Comment 37 2007-11-18 15:22:24 PST
Comment on attachment 17359 [details] [PATCH] Further cleanup GET/PUT inlines r=me with the ChangeLog
Sam Weinig
Comment 38 2007-11-18 17:48:43 PST
Comment on attachment 17341 [details] [PATCH] removing more macros r=me with ChangeLog.
Sam Weinig
Comment 39 2007-11-18 19:10:11 PST
Comment on attachment 17338 [details] [PATCH] Get rid of PCRE custom char types The typedef for USPTR was const UChar* but you seem to be using a non-const version in most places you replaced it. What was the motivation behind that change. Otherwise, r=me with a ChangeLog.
Sam Weinig
Comment 40 2007-11-18 19:32:13 PST
Comment on attachment 17349 [details] [PATCH] Reformat find_fixedlength The * should go next to the type here. +static int find_fixedlength(uschar *code, int options) and here + uschar *cc = code + 1 + LINK_SIZE; I think should be while (true) for consistency. + for (;;) { Brace should go on the same line as the switch statement. + switch (op) + { I think spliting this onto multiple lines will make it more clear. (appears twice) + do cc += GET(cc, 1); while (*cc == OP_ALT); In the inner switch statement you indent the cases differently than the outer switch. Consistency is the key. + switch (*cc) { + case OP_CRSTAR: r-
Sam Weinig
Comment 41 2007-11-18 19:45:15 PST
Comment on attachment 17350 [details] [PATCH] clean up check_escape I am not a fan of multiple case statements on a line but I don't think we have a rule about this. + case '1': case '2': case '3': case '4': case '5': + case '6': case '7': case '8': case '9': r=me.
Sam Weinig
Comment 42 2007-11-18 19:49:56 PST
Comment on attachment 17345 [details] [PATCH] give PCRE_STARTLINE a better name and rename match_data to MatchData r=me with ChangeLoggy.
Sam Weinig
Comment 43 2007-11-18 19:53:15 PST
Comment on attachment 17342 [details] [PATCH] Clean up compile_branch, move _pcre_ord2utf8, and rename CompileData r=me
Eric Seidel (no email)
Comment 44 2007-11-19 00:58:27 PST
Created attachment 17376 [details] [PATCH] Remove match_is_group variable for another 5% speedup JavaScriptCore/pcre/pcre_compile.cpp | 1 + JavaScriptCore/pcre/pcre_exec.cpp | 143 +++++++++++++++------------------ 2 files changed, 66 insertions(+), 78 deletions(-)
Eric Seidel (no email)
Comment 45 2007-11-19 00:58:29 PST
Created attachment 17377 [details] [PATCH] Lower MATCH_LIMIT_RECURSION to more sane levels to prevent hangs on run-javascriptcore-tests JavaScriptCore/pcre/pcre_internal.h | 11 ++++++++--- 1 files changed, 8 insertions(+), 3 deletions(-)
Eric Seidel (no email)
Comment 46 2007-11-19 00:58:30 PST
Created attachment 17378 [details] [PATCH] Small speedup (0.7%) by simplifying canUseStackBufferForNextFrame() check JavaScriptCore/pcre/pcre_exec.cpp | 20 +++++++++----------- 1 files changed, 9 insertions(+), 11 deletions(-)
Eric Seidel (no email)
Comment 47 2007-11-19 00:58:31 PST
Created attachment 17379 [details] [PATCH] Give subjectPtr and instructionPtr sane names, reduce size of MatchFrame for a 0.2% speedup. JavaScriptCore/pcre/pcre_compile.cpp | 15 +- JavaScriptCore/pcre/pcre_exec.cpp | 653 +++++++++++++++++----------------- JavaScriptCore/pcre/pcre_internal.h | 38 +- JavaScriptCore/pcre/pcre_xclass.cpp | 6 +- 4 files changed, 355 insertions(+), 357 deletions(-)
Eric Seidel (no email)
Comment 48 2007-11-19 00:58:33 PST
Created attachment 17380 [details] [PATCH] Comment cleanup JavaScriptCore/pcre/pcre_exec.cpp | 14 ++++++++------ 1 files changed, 8 insertions(+), 6 deletions(-)
Eric Seidel (no email)
Comment 49 2007-11-19 00:58:34 PST
Created attachment 17381 [details] [PATCH] Make i locally scoped for better code clarity JavaScriptCore/pcre/pcre_exec.cpp | 64 ++++++++++++++++++------------------- 1 files changed, 31 insertions(+), 33 deletions(-)
Eric Seidel (no email)
Comment 50 2007-11-19 00:58:35 PST
Created attachment 17382 [details] [PATCH] change getChar* functions to return result and push 'c' into local scopes for clarity JavaScriptCore/pcre/pcre_compile.cpp | 13 ++-- JavaScriptCore/pcre/pcre_exec.cpp | 138 ++++++++++++++++------------------ JavaScriptCore/pcre/pcre_internal.h | 25 ++++--- 3 files changed, 87 insertions(+), 89 deletions(-)
Eric Seidel (no email)
Comment 51 2007-11-19 00:58:36 PST
Created attachment 17383 [details] [PATCH] Remove no longer used error code JSRegExpErrorMatchLimit JavaScriptCore/kjs/regexp.cpp | 2 +- JavaScriptCore/pcre/pcre.h | 1 - JavaScriptCore/pcre/pcre_internal.h | 2 +- 3 files changed, 2 insertions(+), 3 deletions(-)
Eric Seidel (no email)
Comment 52 2007-11-19 00:58:37 PST
Created attachment 17384 [details] [PATCH] Add repeatInformationFromInstructionOffset inline JavaScriptCore/pcre/pcre_exec.cpp | 63 ++++++++++++------------------------ 1 files changed, 21 insertions(+), 42 deletions(-)
Eric Seidel (no email)
Comment 53 2007-11-19 00:58:39 PST
Created attachment 17385 [details] [PATCH] Remove branch from return JavaScriptCore/pcre/pcre_compile.cpp | 6 ++++-- JavaScriptCore/pcre/pcre_exec.cpp | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-)
Eric Seidel (no email)
Comment 54 2007-11-19 00:58:40 PST
Created attachment 17386 [details] [PATCH] Change _NC operators to use _IGNORING_CASE for clarity JavaScriptCore/pcre/pcre_compile.cpp | 16 ++++++++-------- JavaScriptCore/pcre/pcre_exec.cpp | 4 ++-- JavaScriptCore/pcre/pcre_internal.h | 4 ++-- 3 files changed, 12 insertions(+), 12 deletions(-)
Eric Seidel (no email)
Comment 55 2007-11-19 00:58:41 PST
Created attachment 17387 [details] [PATCH] Make cur_is_word and prev_is_word locals, and change OP_ANY to OP_ANY_CHAR for clarity JavaScriptCore/pcre/pcre_compile.cpp | 8 ++++---- JavaScriptCore/pcre/pcre_exec.cpp | 15 +++++++++------ JavaScriptCore/pcre/pcre_internal.h | 4 ++-- 3 files changed, 15 insertions(+), 12 deletions(-)
Eric Seidel (no email)
Comment 56 2007-11-19 00:58:42 PST
Created attachment 17388 [details] [PATCH] Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings. JavaScriptCore/pcre/pcre_exec.cpp | 94 +++++++++++++++-------------------- JavaScriptCore/pcre/pcre_internal.h | 35 +++++++++++++- 2 files changed, 74 insertions(+), 55 deletions(-)
Eric Seidel (no email)
Comment 57 2007-11-19 00:58:44 PST
Created attachment 17389 [details] [PATCH] Deprecate jsRegExpExecute's offset-vector fallback code JavaScriptCore/pcre/pcre_exec.cpp | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-)
Eric Seidel (no email)
Comment 58 2007-11-19 00:58:45 PST
Created attachment 17390 [details] [PATCH] give PCRE_MULTILINE a better name: OptionMatchAcrossMultipleLines JavaScriptCore/pcre/pcre_compile.cpp | 8 ++++---- JavaScriptCore/pcre/pcre_exec.cpp | 2 +- JavaScriptCore/pcre/pcre_internal.h | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-)
Eric Seidel (no email)
Comment 59 2007-11-19 00:58:46 PST
Created attachment 17391 [details] [PATCH] Pull first_byte and req_byte optimizations out into separate static funtions, SunSpider reported this as a win. JavaScriptCore/pcre/pcre_exec.cpp | 186 ++++++++++++++++------------------ JavaScriptCore/pcre/pcre_internal.h | 18 ++-- 2 files changed, 99 insertions(+), 105 deletions(-)
Eric Seidel (no email)
Comment 60 2007-11-19 00:58:47 PST
Created attachment 17392 [details] [PATCH] Give consistent naming to the RegExp options/compile flags JavaScriptCore/pcre/pcre_compile.cpp | 48 +++++++++++++++++----------------- JavaScriptCore/pcre/pcre_exec.cpp | 10 +++--- JavaScriptCore/pcre/pcre_internal.h | 16 +++++----- 3 files changed, 37 insertions(+), 37 deletions(-)
Eric Seidel (no email)
Comment 61 2007-11-19 01:04:53 PST
Ok. I'm ready to start landing these suckers. After all these patches, RegExp is 18.8% faster. Other tests are randomly slower and faster. None of them beyond 2.7% slower (controlflow-recursive) or 2.9% faster (math-cordic). These are all tests which have nothing to to with RegExps. Getting this in will allow others (like darin, ggaren) to pick off some of the remaining low-hanging fruit in the RegExp engine (now that one can actually begin to tell what the code is trying to do!) If you would like any of the patches re-attached in a white-space ignoring form (or if you'd like a combined patch) I'm happy to provide. I hope this "do lots of work in git and upload all the little patches" approach is a help, not a hinderance. I plan to land these all individually (with changelogs), as then it's very easy to track any accidental regressions.
Maciej Stachowiak
Comment 62 2007-11-19 01:20:15 PST
Comment on attachment 17347 [details] [PATCH] Remove unused function could_be_empty_branch r=me
Eric Seidel (no email)
Comment 63 2007-11-19 01:23:57 PST
unccing darin to curb the flood of email to his inbox
Maciej Stachowiak
Comment 64 2007-11-19 01:28:38 PST
Comment on attachment 17335 [details] [PATCH] Clean up jsRegExpExecute r=me
Maciej Stachowiak
Comment 65 2007-11-19 01:29:24 PST
Comment on attachment 17336 [details] [PATCH] Remove register keyword and more cleanup r=me
Maciej Stachowiak
Comment 66 2007-11-19 01:30:04 PST
Comment on attachment 17340 [details] [PATCH] clean up formating in compile_branch r=me
Eric Seidel (no email)
Comment 67 2007-11-19 01:47:30 PST
Created attachment 17393 [details] [PATCH] Further cleanups to calculateCompiledPatternLengthAndFlags JavaScriptCore/pcre/pcre_compile.cpp | 154 +++++++++++++++++----------------- JavaScriptCore/pcre/pcre_internal.h | 2 +- 2 files changed, 79 insertions(+), 77 deletions(-)
Eric Seidel (no email)
Comment 68 2007-11-19 01:52:06 PST
Created attachment 17394 [details] Combined patch for Maciej's viewing pleasure Here is a combined patch of all preceding patches for folks to try on their own machines. (I'm still looking for individual patch reviews, and plan to land them individually.)
Maciej Stachowiak
Comment 69 2007-11-19 01:52:13 PST
Comment on attachment 17390 [details] [PATCH] give PCRE_MULTILINE a better name: OptionMatchAcrossMultipleLines r=me
Maciej Stachowiak
Comment 70 2007-11-19 01:55:41 PST
Comment on attachment 17353 [details] [PATCH] Use better variable names for case ignoring options r=me
Eric Seidel (no email)
Comment 71 2007-11-19 17:51:28 PST
I just updated to TOT and re-timed. SunSpider now claims this total patch is a 1.1% speedup.
Sam Weinig
Comment 72 2007-11-19 18:45:22 PST
Comment on attachment 17346 [details] [PATCH] Pass around MatchData objects by reference r=me. This comment could be slightly altered now that you are passing by reference. md pointer to matching data block, if is_subject is true
Sam Weinig
Comment 73 2007-11-19 18:47:10 PST
Comment on attachment 17392 [details] [PATCH] Give consistent naming to the RegExp options/compile flags r=me
Sam Weinig
Comment 74 2007-11-19 19:51:14 PST
Comment on attachment 17357 [details] [PATCH] Add inlines for toLowerCase, isWordChar, isSpaceChar for further regexp speedup r=me
Sam Weinig
Comment 75 2007-11-19 20:00:37 PST
Comment on attachment 17360 [details] [PATCH] Get rid of GETCHAR* macros, replacing them with better named inlines r=me
Sam Weinig
Comment 76 2007-11-19 20:08:43 PST
Comment on attachment 17380 [details] [PATCH] Comment cleanup r=me
Sam Weinig
Comment 77 2007-11-20 11:03:24 PST
Comment on attachment 17358 [details] [PATCH] Give GET, PUT better names, and add (poor) moveOpcodePtrPastAnyAlternateBranches r=me
Sam Weinig
Comment 78 2007-11-20 13:37:30 PST
Comment on attachment 17362 [details] [PATCH] Remove redundant eptrblock struct r=me
Sam Weinig
Comment 79 2007-11-20 13:39:24 PST
Comment on attachment 17354 [details] [PATCH] Section off MatchData arguments into args struct For this and http://bugs.webkit.org/attachment.cgi?id=17355 I would like to see some performance numbers to make sure this restructuring did not regress us in any way. r-
Sam Weinig
Comment 80 2007-11-20 13:39:55 PST
Comment on attachment 17355 [details] [PATCH] Abstract frame variables into locals and args r- for same reason as above.
Sam Weinig
Comment 81 2007-11-20 13:41:20 PST
Comment on attachment 17377 [details] [PATCH] Lower MATCH_LIMIT_RECURSION to more sane levels to prevent hangs on run-javascriptcore-tests r=me
Sam Weinig
Comment 82 2007-11-20 13:45:46 PST
Comment on attachment 17378 [details] [PATCH] Small speedup (0.7%) by simplifying canUseStackBufferForNextFrame() check This looks good but I think you can use a static const instead of the #define +#define FRAMES_ON_STACK 16 r=me otherwise.
Sam Weinig
Comment 83 2007-11-20 14:09:48 PST
Comment on attachment 17391 [details] [PATCH] Pull first_byte and req_byte optimizations out into separate static funtions, SunSpider reported this as a win. There are a few style issues in tryFirstByteOptimization: Brace after if. + if (first_byte_caseless) + while (subjectPtr < endSubject) { braces after the else + else + while (subjectPtr < endSubject && *subjectPtr != first_char) + subjectPtr++; Else on the same line as the brace. + } + /* Or to just after \n for a multiline match if possible */ + else if (useMultiLineFirstCharOptimization) { r=me
Sam Weinig
Comment 84 2007-11-20 14:52:08 PST
Comment on attachment 17393 [details] [PATCH] Further cleanups to calculateCompiledPatternLengthAndFlags r=me
Sam Weinig
Comment 85 2007-11-20 15:00:19 PST
Comment on attachment 17381 [details] [PATCH] Make i locally scoped for better code clarity Bad bad Eric. This code will decrement the i counter twice as much. Also, the indenting is off. - i = 255; - while (!opcode_jump_table[i]) - opcode_jump_table[i--] = &&CAPTURING_BRACKET; + for (int i = 255; !opcode_jump_table[i]; i--) + opcode_jump_table[i--] = &&CAPTURING_BRACKET; Otherwise, this looks great. r=me.
Sam Weinig
Comment 86 2007-11-20 15:10:39 PST
Comment on attachment 17383 [details] [PATCH] Remove no longer used error code JSRegExpErrorMatchLimit I think if you remove a error code it is a good idea to make the range continuous again (make JSRegExpErrorRecursionLimit = -4) as we aren't keeping any kind of compatibility here. This change seems like it should go in the patch that originally made the comment. - Currently that's 100000 - 16 * (23 * 4) ~ 9MB + Currently that's 100000 - 16 * (23 * 4) ~ 90MB r-
Sam Weinig
Comment 87 2007-11-20 16:07:02 PST
Comment on attachment 17385 [details] [PATCH] Remove branch from return r=me
mitz
Comment 88 2007-11-21 20:50:33 PST
Comment on attachment 17386 [details] [PATCH] Change _NC operators to use _IGNORING_CASE for clarity r=me
Maciej Stachowiak
Comment 89 2007-11-21 20:50:41 PST
Comment on attachment 17361 [details] [PATCH] Remove redundant match_call_count and move recursion check out of super-hot code path r=me
mitz
Comment 90 2007-11-21 20:51:58 PST
Comment on attachment 17387 [details] [PATCH] Make cur_is_word and prev_is_word locals, and change OP_ANY to OP_ANY_CHAR for clarity r=me
Maciej Stachowiak
Comment 91 2007-11-21 21:00:24 PST
Comment on attachment 17387 [details] [PATCH] Make cur_is_word and prev_is_word locals, and change OP_ANY to OP_ANY_CHAR for clarity r=me
Maciej Stachowiak
Comment 92 2007-11-22 19:58:03 PST
Comment on attachment 17384 [details] [PATCH] Add repeatInformationFromInstructionOffset inline r=me
Maciej Stachowiak
Comment 93 2007-11-22 23:03:40 PST
Comment on attachment 17376 [details] [PATCH] Remove match_is_group variable for another 5% speedup r=me
Maciej Stachowiak
Comment 94 2007-11-22 23:34:33 PST
Comment on attachment 17379 [details] [PATCH] Give subjectPtr and instructionPtr sane names, reduce size of MatchFrame for a 0.2% speedup. r=me
Maciej Stachowiak
Comment 95 2007-11-22 23:35:17 PST
Comment on attachment 17379 [details] [PATCH] Give subjectPtr and instructionPtr sane names, reduce size of MatchFrame for a 0.2% speedup. You should fix the indentation here: offset index into the offset vector - eptr points into the subject + subjectPtr points into the subject length length to be matched
Eric Seidel (no email)
Comment 96 2007-11-24 13:49:27 PST
(In reply to comment #40) > (From update of attachment 17349 [details] [edit]) > r- I fixed all your issues except for the case indenting. That will have to go on a later patch, as a I fear it would break all dependent patches in my tree. Sadly, you're not around on IRC, but I'm going to assume you're OK with this. this is git, so I can always go back and change it. :)
Maciej Stachowiak
Comment 97 2007-11-24 19:10:40 PST
Comment on attachment 17382 [details] [PATCH] change getChar* functions to return result and push 'c' into local scopes for clarity r=me
Maciej Stachowiak
Comment 98 2007-11-24 19:26:37 PST
Comment on attachment 17388 [details] [PATCH] Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings. r=me
Eric Seidel (no email)
Comment 99 2007-11-29 03:02:36 PST
Ok. I made the suggested changes, added ChangeLogs to each commit, and rechecked performance. The final number was a 1.7% improvement (some of that is caused by positive noise in the results). Going to land this in a minute. The final perf numbers: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.7% faster 4320.6ms +/- 0.3% 4249.0ms +/- 0.2% significant ============================================================================= 3d: - 500.2ms +/- 0.4% 498.0ms +/- 0.4% cube: - 168.2ms +/- 1.0% 166.4ms +/- 0.9% morph: ?? 161.4ms +/- 0.4% 161.6ms +/- 0.4% not conclusive: might be 0.1% *slower* raytrace: 0.4% faster 170.6ms +/- 0.4% 170.0ms +/- 0.0% significant access: 1.3% faster 716.4ms +/- 0.4% 707.0ms +/- 0.4% significant binary-trees: ?? 105.4ms +/- 0.6% 105.6ms +/- 0.6% not conclusive: might be 0.2% *slower* fannkuch: 1.8% faster 348.8ms +/- 0.3% 342.6ms +/- 0.4% significant nbody: 1.6% faster 184.0ms +/- 0.8% 181.0ms +/- 0.5% significant nsieve: - 78.2ms +/- 0.7% 77.8ms +/- 0.7% bitops: - 611.2ms +/- 0.2% 611.0ms +/- 0.2% 3bit-bits-in-byte: ?? 111.8ms +/- 0.5% 112.0ms +/- 1.1% not conclusive: might be 0.2% *slower* bits-in-byte: - 150.0ms +/- 0.6% 149.8ms +/- 0.4% bitwise-and: - 214.0ms +/- 0.0% 214.0ms +/- 0.0% nsieve-bits: - 135.4ms +/- 0.5% 135.2ms +/- 0.4% controlflow: - 151.0ms +/- 0.0% 151.0ms +/- 0.0% recursive: - 151.0ms +/- 0.0% 151.0ms +/- 0.0% crypto: - 339.8ms +/- 0.5% 340.0ms +/- 0.5% aes: 1.4% *slower* 97.2ms +/- 0.6% 98.6ms +/- 0.7% significant md5: - 121.2ms +/- 0.5% 121.2ms +/- 0.9% sha1: 1.0% faster 121.4ms +/- 0.6% 120.2ms +/- 0.9% significant date: 0.8% faster 371.8ms +/- 0.3% 369.0ms +/- 0.8% significant format-tofte: 0.9% *slower* 163.8ms +/- 0.3% 165.2ms +/- 1.6% significant format-xparb: 2.0% faster 208.0ms +/- 0.4% 203.8ms +/- 0.3% significant math: 0.4% faster 509.2ms +/- 0.2% 507.0ms +/- 0.5% significant cordic: 0.8% faster 216.0ms +/- 0.4% 214.2ms +/- 0.8% significant partial-sums: - 172.8ms +/- 0.3% 172.2ms +/- 0.6% spectral-norm: ?? 120.4ms +/- 0.6% 120.6ms +/- 0.6% not conclusive: might be 0.2% *slower* regexp: 18.0% faster 268.2ms +/- 0.4% 220.0ms +/- 0.4% significant dna: 18.0% faster 268.2ms +/- 0.4% 220.0ms +/- 0.4% significant string: 0.8% faster 852.8ms +/- 0.5% 846.0ms +/- 0.4% significant base64: 0.8% faster 121.4ms +/- 0.6% 120.4ms +/- 0.6% significant fasta: 1.2% faster 205.0ms +/- 0.6% 202.6ms +/- 0.3% significant tagcloud: ?? 191.8ms +/- 0.8% 192.4ms +/- 0.4% not conclusive: might be 0.3% *slower* unpack-code: 1.5% faster 199.0ms +/- 0.4% 196.0ms +/- 0.4% significant validate-input: - 135.6ms +/- 0.8% 134.6ms +/- 1.9%
Note You need to log in before you can comment on or make changes to this bug.