RESOLVED FIXED 13227
StringImpl::isLower incorrectly assumes islower returns 1 (it can return any non-0)
https://bugs.webkit.org/show_bug.cgi?id=13227
Summary StringImpl::isLower incorrectly assumes islower returns 1 (it can return any ...
Jungshik Shin
Reported 2007-03-29 16:55:12 PDT
We'd better make 'allLower' bool and use '!= 0' before using '&='
Attachments
patch : 2-liner (1.22 KB, patch)
2007-04-05 13:21 PDT, Jungshik Shin
no flags
benchmark source file (1.90 KB, text/plain)
2007-04-10 11:58 PDT, Kirby White
no flags
updated benchmark source file (2.14 KB, text/plain)
2007-04-11 01:26 PDT, Darin Adler
no flags
patch that uses early-exit instead of &= (5.96 KB, patch)
2007-04-11 01:46 PDT, Darin Adler
no flags
patch that uses && instead of & (5.49 KB, patch)
2007-04-11 01:56 PDT, Darin Adler
no flags
patch that uses && instead of & and fixes non-BMP issue (5.49 KB, patch)
2007-04-11 02:03 PDT, Darin Adler
sullivan: review+
Jungshik Shin
Comment 1 2007-04-05 13:21:29 PDT
Created attachment 13972 [details] patch : 2-liner a simple patch.
Darin Adler
Comment 2 2007-04-05 13:45:41 PDT
Comment on attachment 13972 [details] patch : 2-liner What does "not very kosher" mean? Does this actually cause a problem? Does the fix have any performance impact?
Kirby White
Comment 3 2007-04-05 14:14:25 PDT
islower() from the C lib only claims to return nonzero if the character is lower-case, not necessarily 1. If it happened to return 1 in some situation and 2 in some other, the current test would fail. An alternative to allLower &= islower(c & 0x7F) != 0; would be allLower = allLower && islower(c & 0x7F); if that's faster.
Darin Adler
Comment 4 2007-04-09 15:47:22 PDT
(In reply to comment #3) > An alternative to > allLower &= islower(c & 0x7F) != 0; > would be > allLower = allLower && islower(c & 0x7F); > if that's faster. No, that's way slower.
Darin Adler
Comment 5 2007-04-09 15:54:51 PDT
Comment on attachment 13972 [details] patch : 2-liner r=me; I think this might slow things down a bit; if so we might want to implement our own islower instead.
Jungshik Shin
Comment 6 2007-04-09 22:50:34 PDT
Thanks for r. Can you land it?
Kirby White
Comment 7 2007-04-10 11:58:32 PDT
Created attachment 13998 [details] benchmark source file (In reply to comment #4) > (In reply to comment #3) > > An alternative to > > allLower &= islower(c & 0x7F) != 0; > > would be > > allLower = allLower && islower(c & 0x7F); > > if that's faster. > > No, that's way slower. Are you sure? I ran a simple (and completely unoptimized) benchmark that indicates it's noticeably faster than either the original version or the new boolean one, even for an all-lower-case string. And for one with any upper-case, it's faster still, since logical && short-circuits the islower() call as soon as the first upper-case letter is encountered. Lower-case string: Original: 17.819 seconds (wall time) Boolean: 19.257 Logical: 10.185 Upper-case string: Original: 17.229 Boolean: 18.081 Logical: 7.698 User and kernel times from tcsh 'time' are analogous. The benchmark source is attached.
Darin Adler
Comment 8 2007-04-11 01:25:54 PDT
My bad. Looks like I was completely wrong! And in fact, the more obvious "early exit" version that breaks the loop when it finds something that's not lower is faster than any of the others in all cases I tested. The "all lowercase" test was flawed because spaces are not lowercase. All lowercase: Original: 2.225 Boolean: 2.455 Logical: 2.226 Early Exit: 2.088 No lowercase: Original: 2.521 Boolean: 2.782 Logical: 1.066 Early Exit: 0.107
Darin Adler
Comment 9 2007-04-11 01:26:54 PDT
Created attachment 14003 [details] updated benchmark source file
Darin Adler
Comment 10 2007-04-11 01:46:00 PDT
Created attachment 14004 [details] patch that uses early-exit instead of &=
Darin Adler
Comment 11 2007-04-11 01:46:26 PDT
Comment on attachment 13972 [details] patch : 2-liner Clearing the review flag from this older patch.
Darin Adler
Comment 12 2007-04-11 01:50:21 PDT
Uh oh! It's not correct to test unoptimized. When I optimize I get *very* different results for the all-lowercase version: Original: 0.210 Boolean: 0.211 Logical: 0.207 Early Exit: 1.564 This says that we should stick with the logical version, not the early-exit version.
Darin Adler
Comment 13 2007-04-11 01:56:24 PDT
Created attachment 14005 [details] patch that uses && instead of &
Darin Adler
Comment 14 2007-04-11 02:03:07 PDT
Created attachment 14006 [details] patch that uses && instead of & and fixes non-BMP issue
Kirby White
Comment 15 2007-04-11 08:56:39 PDT
(In reply to comment #12) > When I optimize I get *very* different results for the all-lowercase version: > > Original: 0.210 > Boolean: 0.211 > Logical: 0.207 > Early Exit: 1.564 1) I wonder if the compiler is unrolling the loops, with constant test strings? It might be worth testing with a user-input string. 2) I need a faster computer. :)
Darin Adler
Comment 16 2007-04-11 10:37:20 PDT
(In reply to comment #15) > 1) I wonder if the compiler is unrolling the loops, with constant test strings? > It might be worth testing with a user-input string. I don't think it's doing anything special because the test strings are constant, but I've been wrong before.
Kirby White
Comment 17 2007-04-11 10:39:11 PDT
Okay, I ran the test again with a user-input string and -O3 optimization and got the same conclusion you did, only more so. (Using only -O doesn't give the same answer.) thisisastringwithalllowercase (29) Original: 0.522 Boolean: 0.473 Logical: 0.600 Early Exit: 12.309 HERE'S A STRING WITH NO LOWERCASE (33) Original: 0.529 Boolean: 0.531 Logical: 0.523 Early Exit: 0.454
Darin Adler
Comment 18 2007-04-11 10:43:23 PDT
(In reply to comment #17) > thisisastringwithalllowercase (29) > Original: 0.522 > Boolean: 0.473 > Logical: 0.600 > Early Exit: 12.309 That's interesting. Logical is worse than my old code here. But anyway, lets just land the logical version.
John Sullivan
Comment 19 2007-04-15 21:17:52 PDT
Comment on attachment 14006 [details] patch that uses && instead of & and fixes non-BMP issue r=me
Darin Adler
Comment 20 2007-04-16 08:33:52 PDT
Sending WebCore/ChangeLog Sending WebCore/platform/StringImpl.cpp Transmitting file data .. Committed revision 20900.
Note You need to log in before you can comment on or make changes to this bug.