WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
41238
RegExp performance slow on Dromaeo benchmark
https://bugs.webkit.org/show_bug.cgi?id=41238
Summary
RegExp performance slow on Dromaeo benchmark
Michael Saboff
Reported
2010-06-25 17:21:21 PDT
Other javascript engines appear to cache prior results of regular expression operations. Suggest adding some sort of caching mechanism to regular expression processing.
Attachments
The patch
(4.98 KB, patch)
2010-06-28 15:04 PDT
,
Michael Saboff
darin
: review-
Details
Formatted Diff
Diff
Updated patch.
(8.31 KB, patch)
2010-06-28 18:54 PDT
,
Michael Saboff
darin
: review-
Details
Formatted Diff
Diff
Latest revised patch
(7.31 KB, patch)
2010-06-29 11:32 PDT
,
Michael Saboff
darin
: review+
darin
: commit-queue-
Details
Formatted Diff
Diff
Patch with updated ChangeLog and minor fix.
(5.82 KB, patch)
2010-06-29 14:27 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2010-06-27 14:40:28 PDT
See also:
bug 37908
,
bug 38828
(I don't know to what degree they are related, if any).
Michael Saboff
Comment 2
2010-06-28 15:04:31 PDT
Created
attachment 59949
[details]
The patch
WebKit Review Bot
Comment 3
2010-06-28 15:09:00 PDT
Attachment 59949
[details]
did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 JavaScriptCore/runtime/UString.h:163: One space before end of line comments [whitespace/comments] [5] JavaScriptCore/ChangeLog:7: Line contains tab character. [whitespace/tab] [5] JavaScriptCore/runtime/RegExp.cpp:129: One line control clauses should not use braces. [whitespace/braces] [4] JavaScriptCore/runtime/RegExp.cpp:131: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4] Total errors found: 4 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 4
2010-06-28 15:20:38 PDT
Comment on
attachment 59949
[details]
The patch This looks great!
> + , m_lastMatchStr(UString::null())
This shouldn't be needed. A UString will start out null.
> + , m_lastMatchStart(-1) > { > + m_lastOVector.resize(0);
This shouldn't be needed. The vector should start out empty.
> + , m_lastMatchStr(UString::null())
This shouldn't be needed. A UString will start out null.
> + , m_lastMatchStart(-1) > { > + m_lastOVector.resize(0);
This shouldn't be needed. The vector should start out empty.
> + m_lastMatchStr = UString::null();
I think this should just be: m_lastMatchStr = UString(); You should check which is more efficient, but I think my form is.
> + m_lastOVector.resize(0);
It's more efficient to call shrink(0) than resize(0).
> + // Check to see if this match call is the same as the last match invocation
We normally use sentence style for comments, with a period.
> + if ((startOffset == m_lastMatchStart) && (s.rep() == m_lastMatchStr.rep())) {
We normally
> + if (ovector) { > + (Vector<int, 32>&)*ovector = (Vector<int, 32>&)m_lastOVector; > + }
I don't understand why you need these typecasts. They should not be needed. Also, we don't use braces around single line if statement bodies. It should be: if (ovector) *ovector = m_lastOVector;
> + if (m_lastOVector.isEmpty()) > + return -1; > + else > + return m_lastOVector.at(0);
We normally don't do else after return, so the else should be removed.
> + if (ovector) > + (Vector<int, 32>&)m_lastOVector = (Vector<int, 32>&)*ovector; > + else > + (Vector<int, 32>&)m_lastOVector = (Vector<int, 32>&)nonReturnedOvector;
Same comments about typecasts.
> unsigned m_numSubpatterns; > + UString m_lastMatchStr;
We normally try to avoid abbreviations. So I suggest you either call this m_lastMatchString or m_lastMatch and leave out the "str".
> ALWAYS_INLINE bool operator==(const UString& s1, const UString& s2) > { > unsigned size = s1.size(); > + > + if (size != s2.size()) // Sizes not the same, we're done. > + return false; > + > + if (s1.data() == s2.data()) // Quick check of data pointers being the same. > + return true;
Another way to do this optimization is to check if s1.rep() is == s2.rep(). That would be more efficient than the repeated calls to data() this function has. And I think it should be relatively rare to have the two data pointers to match if they are not the exact same rep. You could put the rep check before the size check, in fact. review- because I think you should do at least one of the things I suggested above
Gavin Barraclough
Comment 5
2010-06-28 16:04:59 PDT
Comment on
attachment 59949
[details]
The patch Hey Michael, As discussed: * There is a tab in the changelog, * Please remove the initialization of m_lastMatchStr & the call to resize on m_lastOVector from the RegExp constructor. * Please remove redundant braces & else in the code checking the cache. * Re the UString change, instrumenting & checking if a comparison of the rep()s rather than a compare of data() sounded like a good plan. As discussed, this code can be made much cleaner by both removing the output param & never resizing the array to zero – but I agree with you that this would be better done as a separate patch. Also, one other thing I missed,
> Are all the (Vector<int, 32>&) casts necessary? - could you check if these are still needed? – if the coding style requires a static_cast<> here instead of a C-style case (which could silently be doing nasties like casting away constness).
Oh, a second last thought! - I'm tempted to suggest ASSERTs in the switch statements, where you have removed the checks of size – these may just make it more obvious to someone else coming to the code why the lack of check is safe. What do you think? r-ing this for now, please attach a new patch with the fixes above. cheers, G.
Michael Saboff
Comment 6
2010-06-28 18:54:37 PDT
Created
attachment 59974
[details]
Updated patch. Patch with updates in response to review comments.
WebKit Review Bot
Comment 7
2010-06-29 10:47:45 PDT
Attachment 59974
[details]
did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 JavaScriptCore/runtime/UString.h:166: One space before end of line comments [whitespace/comments] [5] JavaScriptCore/runtime/UString.h:175: One space before end of line comments [whitespace/comments] [5] JavaScriptCore/runtime/UString.h:178: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] JavaScriptCore/runtime/UString.h:197: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 4 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 8
2010-06-29 10:51:32 PDT
Comment on
attachment 59974
[details]
Updated patch. Looks great.
> + Refactored Vector<>::operator=() method into a separate copy() method > + that it calls. The copy() method is explicitly used for deep copying.
I don't understand this change. Why can't we use the assignment operator directly? One small problem I see with adding a copy function is that in WebKit functions with copy in their name are typically functions that return a copy of an object or some sort of new object. So using the name copy for a function with no return value that copies data into an object is a bit non-standard and thus possibly confusing.
> + m_lastMatchString = UString::null(); > + m_lastMatchStart = -1; > + m_lastOVector.resize(0);
This site needs the UString() and shrink(0) changes we made earlier in the same file.
> template<typename T, size_t inlineCapacity> > Vector<T, inlineCapacity>& Vector<T, inlineCapacity>::operator=(const Vector<T, inlineCapacity>& other) > { > + if (&other != this) > + copy(other); > + > + return *this; > + }
I think this should be marked inline. We don't want to add another level of function call here.
> + template<typename T, size_t inlineCapacity> template<size_t otherCapacity> > + void Vector<T, inlineCapacity>::copy(const Vector<T, otherCapacity>& other) > + { > if (&other == this) > - return *this; > + return;
Both the copy function and the assignment operator are doing this check. That seems like overkill to me unless there's a reason I'm missing this needs to be done. review- because I don't want to add this copy function unless there's a good reason to do so, and the patch doesn't give a reason.
Michael Saboff
Comment 9
2010-06-29 11:32:06 PDT
Created
attachment 60039
[details]
Latest revised patch Eliminated Vector.h changes. Cleaned up two review bot issues.
WebKit Review Bot
Comment 10
2010-06-29 11:33:50 PDT
Attachment 60039
[details]
did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 JavaScriptCore/runtime/UString.h:178: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] JavaScriptCore/runtime/UString.h:197: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 2 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 11
2010-06-29 11:40:53 PDT
Comment on
attachment 60039
[details]
Latest revised patch ChangeLog was mismerged here. This removed some change log entries. Because of that, I set commit-queue- on this patch.
> + if (size1 == 0) > + return true;
One strange thing about WebKit style is that we write such this as: if (!size1) return true; And the style bot is telling you about that.
> - return s2.size() == size && memcmp(s1.data(), s2.data(), size * sizeof(UChar)) == 0; > + return memcmp(d1, d2, size1 * sizeof(UChar)) == 0;
The style bot has the same complaint on the existing code on this line (not something you added). I personally think memcmp is an exception. I find it easier to read calls to memcmp and strcmp if there is an "== 0" to mean "==" or "!= 0" to mean "!=" or even "< 0" to mean "<". r=me as is if we fix the change log before committing, but you could consider using !size1 to make the style bot happy
Michael Saboff
Comment 12
2010-06-29 14:27:49 PDT
Created
attachment 60054
[details]
Patch with updated ChangeLog and minor fix. ChangeLog and style bot fixes.
WebKit Review Bot
Comment 13
2010-06-29 14:33:46 PDT
Attachment 60054
[details]
did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 JavaScriptCore/runtime/UString.h:197: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Commit Bot
Comment 14
2010-06-29 15:01:39 PDT
Comment on
attachment 60054
[details]
Patch with updated ChangeLog and minor fix. Clearing flags on attachment: 60054 Committed
r62148
: <
http://trac.webkit.org/changeset/62148
>
WebKit Commit Bot
Comment 15
2010-06-29 15:01:44 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug