|Summary:||StringImpl::isLower incorrectly assumes islower returns 1 (it can return any non-0)|
|Product:||WebKit||Reporter:||Jungshik Shin <jshin>|
|Component:||WebCore Misc.||Assignee:||Darin Adler <darin>|
|Version:||523.x (Safari 3)|
Description Jungshik Shin 2007-03-29 16:55:12 PDT
We'd better make 'allLower' bool and use '!= 0' before using '&='
Comment 1 Jungshik Shin 2007-04-05 13:21:29 PDT
Created attachment 13972 [details] patch : 2-liner a simple patch.
Comment 2 Darin Adler 2007-04-05 13:45:41 PDT
Comment on attachment 13972 [details] patch : 2-liner What does "not very kosher" mean? Does this actually cause a problem? Does the fix have any performance impact?
Comment 3 Kirby White 2007-04-05 14:14:25 PDT
islower() from the C lib only claims to return nonzero if the character is lower-case, not necessarily 1. If it happened to return 1 in some situation and 2 in some other, the current test would fail. An alternative to allLower &= islower(c & 0x7F) != 0; would be allLower = allLower && islower(c & 0x7F); if that's faster.
Comment 4 Darin Adler 2007-04-09 15:47:22 PDT
(In reply to comment #3) > An alternative to > allLower &= islower(c & 0x7F) != 0; > would be > allLower = allLower && islower(c & 0x7F); > if that's faster. No, that's way slower.
Comment 5 Darin Adler 2007-04-09 15:54:51 PDT
Comment on attachment 13972 [details] patch : 2-liner r=me; I think this might slow things down a bit; if so we might want to implement our own islower instead.
Comment 6 Jungshik Shin 2007-04-09 22:50:34 PDT
Thanks for r. Can you land it?
Comment 7 Kirby White 2007-04-10 11:58:32 PDT
Created attachment 13998 [details] benchmark source file (In reply to comment #4) > (In reply to comment #3) > > An alternative to > > allLower &= islower(c & 0x7F) != 0; > > would be > > allLower = allLower && islower(c & 0x7F); > > if that's faster. > > No, that's way slower. Are you sure? I ran a simple (and completely unoptimized) benchmark that indicates it's noticeably faster than either the original version or the new boolean one, even for an all-lower-case string. And for one with any upper-case, it's faster still, since logical && short-circuits the islower() call as soon as the first upper-case letter is encountered. Lower-case string: Original: 17.819 seconds (wall time) Boolean: 19.257 Logical: 10.185 Upper-case string: Original: 17.229 Boolean: 18.081 Logical: 7.698 User and kernel times from tcsh 'time' are analogous. The benchmark source is attached.
Comment 8 Darin Adler 2007-04-11 01:25:54 PDT
My bad. Looks like I was completely wrong! And in fact, the more obvious "early exit" version that breaks the loop when it finds something that's not lower is faster than any of the others in all cases I tested. The "all lowercase" test was flawed because spaces are not lowercase. All lowercase: Original: 2.225 Boolean: 2.455 Logical: 2.226 Early Exit: 2.088 No lowercase: Original: 2.521 Boolean: 2.782 Logical: 1.066 Early Exit: 0.107
Comment 9 Darin Adler 2007-04-11 01:26:54 PDT
Created attachment 14003 [details] updated benchmark source file
Comment 10 Darin Adler 2007-04-11 01:46:00 PDT
Created attachment 14004 [details] patch that uses early-exit instead of &=
Comment 11 Darin Adler 2007-04-11 01:46:26 PDT
Comment on attachment 13972 [details] patch : 2-liner Clearing the review flag from this older patch.
Comment 12 Darin Adler 2007-04-11 01:50:21 PDT
Uh oh! It's not correct to test unoptimized. When I optimize I get *very* different results for the all-lowercase version: Original: 0.210 Boolean: 0.211 Logical: 0.207 Early Exit: 1.564 This says that we should stick with the logical version, not the early-exit version.
Comment 13 Darin Adler 2007-04-11 01:56:24 PDT
Created attachment 14005 [details] patch that uses && instead of &
Comment 14 Darin Adler 2007-04-11 02:03:07 PDT
Created attachment 14006 [details] patch that uses && instead of & and fixes non-BMP issue
Comment 15 Kirby White 2007-04-11 08:56:39 PDT
(In reply to comment #12) > When I optimize I get *very* different results for the all-lowercase version: > > Original: 0.210 > Boolean: 0.211 > Logical: 0.207 > Early Exit: 1.564 1) I wonder if the compiler is unrolling the loops, with constant test strings? It might be worth testing with a user-input string. 2) I need a faster computer. :)
Comment 16 Darin Adler 2007-04-11 10:37:20 PDT
(In reply to comment #15) > 1) I wonder if the compiler is unrolling the loops, with constant test strings? > It might be worth testing with a user-input string. I don't think it's doing anything special because the test strings are constant, but I've been wrong before.
Comment 17 Kirby White 2007-04-11 10:39:11 PDT
Okay, I ran the test again with a user-input string and -O3 optimization and got the same conclusion you did, only more so. (Using only -O doesn't give the same answer.) thisisastringwithalllowercase (29) Original: 0.522 Boolean: 0.473 Logical: 0.600 Early Exit: 12.309 HERE'S A STRING WITH NO LOWERCASE (33) Original: 0.529 Boolean: 0.531 Logical: 0.523 Early Exit: 0.454
Comment 18 Darin Adler 2007-04-11 10:43:23 PDT
(In reply to comment #17) > thisisastringwithalllowercase (29) > Original: 0.522 > Boolean: 0.473 > Logical: 0.600 > Early Exit: 12.309 That's interesting. Logical is worse than my old code here. But anyway, lets just land the logical version.
Comment 19 John Sullivan 2007-04-15 21:17:52 PDT
Comment on attachment 14006 [details] patch that uses && instead of & and fixes non-BMP issue r=me
Comment 20 Darin Adler 2007-04-16 08:33:52 PDT
Sending WebCore/ChangeLog Sending WebCore/platform/StringImpl.cpp Transmitting file data .. Committed revision 20900.