UNCONFIRMED 66286
Use String [] operator instead of indexing into String::characters() directly
https://bugs.webkit.org/show_bug.cgi?id=66286
Summary Use String [] operator instead of indexing into String::characters() directly
Xianzhu Wang
Reported 2011-08-16 03:05:57 PDT
This is part of removing references to String::characters() which depends on the internal data format of String.
Attachments
patch (31.20 KB, patch)
2011-08-17 00:13 PDT, Xianzhu Wang
gyuyoung.kim: commit-queue-
updated patch (31.29 KB, patch)
2011-08-17 02:48 PDT, Xianzhu Wang
no flags
performance test (5.15 KB, text/x-c++src)
2011-08-17 21:28 PDT, Xianzhu Wang
no flags
StringIterator as the returning type of characters() (32.98 KB, patch)
2011-08-18 03:40 PDT, Xianzhu Wang
gyuyoung.kim: commit-queue-
patch. The last one had a style issue and missed some JavaScriptCore files (35.47 KB, patch)
2011-08-19 04:19 PDT, Xianzhu Wang
no flags
Example of replacing checks in operator[] with ASSERTS (22.05 KB, patch)
2011-08-19 16:28 PDT, Annie Sullivan
no flags
Xianzhu Wang
Comment 1 2011-08-17 00:13:49 PDT
Gyuyoung Kim
Comment 2 2011-08-17 00:27:03 PDT
Collabora GTK+ EWS bot
Comment 3 2011-08-17 00:30:13 PDT
Early Warning System Bot
Comment 4 2011-08-17 00:39:59 PDT
Xianzhu Wang
Comment 5 2011-08-17 00:46:18 PDT
Comment on attachment 104155 [details] patch Please ignore this corrupted patch. I'm making a new new.
Xianzhu Wang
Comment 6 2011-08-17 02:48:41 PDT
Created attachment 104162 [details] updated patch
Andreas Kling
Comment 7 2011-08-17 03:56:09 PDT
This seems very likely to regress performance, since String::operator[] does additional bounds checking. Have you done any perf testing of this change? And is the effort to remove characters() usage tracked somewhere?
Andreas Kling
Comment 8 2011-08-17 04:42:41 PDT
(In reply to comment #7) > And is the effort to remove characters() usage tracked somewhere? D'oh, this is blocking bug 66161. Never mind :)
Darin Adler
Comment 9 2011-08-17 10:25:07 PDT
Most places that use characters() do so because of performance optimization.
Gavin Barraclough
Comment 10 2011-08-17 10:57:56 PDT
Looks good, I think this change should be performance tested before landed (per Darin's comment above). I'm not sure what a good test would be, that might hit any of this code. Dromaeo? peacemaker? - I'll comment again this afternoon when a few more people are around who might have good suggestions on the coverage.
Geoffrey Garen
Comment 11 2011-08-17 12:14:04 PDT
In the future, String::operator[] will do even more checking, to support 8bit vs 16bit representation, right? If so, the performance impact of repeated String::operator[] will be even greater. If we need direct access to a character buffer for performance, maybe we should still allow it, but guard it with an ASSERT, e.g.: if (string.is16Bit()) { // currently always true UChar* buffer = string.characters16(); // guarded by ASSERT ... }
Xianzhu Wang
Comment 12 2011-08-17 21:28:01 PDT
Created attachment 104303 [details] performance test I tested the performance with the attached program. To run it, you should have a chromium port of WebKit, put it under Source/WebKit/chromium/tests, and add it into Source/WebKit/chromium/WebKit.gypi, update and build webkit, then run 'out/Release/webkit_unit_tests --gtest_filter=StringTest.*'. My results are: accessStringPtr: 0.660000 accessStringPtrNoPostPlusPlus: 0.660000 accessCharacters: 0.650000 accessImpl: 0.650000 accessImplCacheLength: 0.660000 accessArray: 1.000000 accessArrayCacheLength: 1.000000 Description of the test: Run 100000 times sequentially accessing each character of a string containing 10000 characters. accessStringPtr: use a StringPtr class to simulate characters() and UChar* accessStringPtrNoPostPlusPlus: same as above, but avoid using ptr++ accessCharacters: use characters() accessImpl: use StringImpl instead of characters(); use length() in loop accessImplCacheLength: same as above, but save length() to a local variable before loop accessArray: use String::operator[]; use length() in loop accessArrayCacheLength: same as above, but save length() to a local variable before loop The timings show that the only performance difference is about the range checking in String::operator[]. If eliminating it, there would be no performance difference. I think we could have two choices: 1. Add String::ptr() to return StringPtr class instead of UChar*, and replace some characters() references with it. 2. Replace the range checking in String::operator[] to an ASSERT. I have no preference and would like to hear your opinions. About the performance cost of checking 8-bit or 16-bit, I will investigate it. However, it seems not in scope of this bug.
Xianzhu Wang
Comment 13 2011-08-17 21:57:30 PDT
Now I prefer the StringPtr class method, because it can hide and optimize checking of 8/16 bit like what Geoffrey suggested. The issue does related to this bug.
Xianzhu Wang
Comment 14 2011-08-18 03:40:02 PDT
Created attachment 104318 [details] StringIterator as the returning type of characters() For now StringIterator is just typedefed to const UChar*. In the future it will be defined as a class that can handle 8 and 16-bit buffers. The subject of this bug should be updated, but I'd like to hear your opinions about the patch and will update the subject if the patch looks good to you.
WebKit Review Bot
Comment 15 2011-08-18 03:41:56 PDT
Attachment 104318 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/css/CSSParser.h:296: The parameter name "str" adds no information, so it should be removed. [readability/parameter_name] [5] Total errors found: 1 in 22 files If any of these errors are false positives, please file a bug against check-webkit-style.
Gyuyoung Kim
Comment 16 2011-08-18 03:49:41 PDT
Comment on attachment 104318 [details] StringIterator as the returning type of characters() Attachment 104318 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/9419559
Early Warning System Bot
Comment 17 2011-08-18 03:50:52 PDT
Comment on attachment 104318 [details] StringIterator as the returning type of characters() Attachment 104318 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/9422575
WebKit Review Bot
Comment 18 2011-08-18 03:51:04 PDT
Comment on attachment 104318 [details] StringIterator as the returning type of characters() Attachment 104318 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9426277
Gustavo Noronha (kov)
Comment 19 2011-08-18 06:01:15 PDT
Comment on attachment 104318 [details] StringIterator as the returning type of characters() Attachment 104318 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/9425364
Geoffrey Garen
Comment 20 2011-08-18 13:03:35 PDT
(In reply to comment #13) > Now I prefer the StringPtr class method, because it can hide and optimize checking of 8/16 bit like what Geoffrey suggested. The issue does related to this bug. Actually, what I suggested was that StringPtr *can't* optimize checking the 8 vs 16 bit flag, because it must do the check on every access.
Xianzhu Wang
Comment 21 2011-08-18 21:46:08 PDT
(In reply to comment #20) > (In reply to comment #13) > > Now I prefer the StringPtr class method, because it can hide and optimize checking of 8/16 bit like what Geoffrey suggested. The issue does related to this bug. > > Actually, what I suggested was that StringPtr *can't* optimize checking the 8 vs 16 bit flag, because it must do the check on every access. You are right that there must be some extra cost, but I think with the encapsulation, we could potentially have some optimizations during the life of a StringPtr to keep the extra cost as small as possible. I'm trying to measure the impact of performance in real applications.
Xianzhu Wang
Comment 22 2011-08-19 04:00:59 PDT
I tried to create a big change which replaces most 'const UChar*' to StringIterator in WebKit to test the performance impact of additional condition check on each access, but gave up because of some bug introduced which is hard to find out from the hundreds of changed files. I did a simpler test with my first patch that changes some characters() access to []. The results are as below: patched original avg medium avg medium url-parser 2339.35 2340.5 2270.75 2249.5 xml-parser 1417.4 1417.5 1429.75 1415 html-parser 1459.15 1457 1455.15 1454.5 Though my StringTest shows 1/2 cost increase for [] accesses, the impact seems not actually visible in real applications. I guess the impact of 8/16-bits implementation of String and StringIterator should be also like this.
Xianzhu Wang
Comment 23 2011-08-19 04:19:13 PDT
Created attachment 104488 [details] patch. The last one had a style issue and missed some JavaScriptCore files
Annie Sullivan
Comment 24 2011-08-19 16:28:57 PDT
Created attachment 104588 [details] Example of replacing checks in operator[] with ASSERTS
Annie Sullivan
Comment 25 2011-08-19 16:42:00 PDT
(In reply to comment #24) > Created an attachment (id=104588) [details] > Example of replacing checks in operator[] with ASSERTS I looked into how much work it would be to change String operator [] to ASSERT instead of do range checks. This patch passes all layout tests. If it seems reasonable to people to use ASSERT instead of range checks, we can do the following: 1. Clean up this patch and put it up for review on a dependent bug. 2. Write a benchmark that checks the performance of additional code that would need to be added to operator[] to work with 8-bit strings. The code would probably look something like: if (m_refCountAndFlags & s_is8bit) return static_cast<UChar>(m_data8[i]); else return m_data16[i]; 3. If that goes well, continue with the original patch on this bug. Or we could go with the StringIterator approach. Would be great to hear what some more experienced WebKit developers think.
Ryosuke Niwa
Comment 26 2011-08-19 16:57:32 PDT
Comment on attachment 104588 [details] Example of replacing checks in operator[] with ASSERTS View in context: https://bugs.webkit.org/attachment.cgi?id=104588&action=review This patch appears to degrade the readability of call sites of operator[]. Can we instead add charAtWithoutBoundsCheck(unsigned) const, which doesn't do any bounds checks, and use that in places where we currently call characters()? > Source/WebCore/css/CSSComputedStyleDeclaration.cpp:590 > + if (!pseudoElementName.isEmpty()) > + nameWithoutColonsStart = pseudoElementName[0] == ':' ? (pseudoElementName[1] == ':' ? 2 : 1) : 0; Shouldn't we check that pseudoElementName's length is at least 2 if pseudoElementName[0] == ':'.
Darin Adler
Comment 27 2011-08-19 17:00:55 PDT
(In reply to comment #26) > Can we instead add charAtWithoutBoundsCheck(unsigned) const I was going to suggest something similar. But I’m not sure it has to have such a long name.
Annie Sullivan
Comment 28 2011-08-19 17:04:14 PDT
(In reply to comment #27) > (In reply to comment #26) > > Can we instead add charAtWithoutBoundsCheck(unsigned) const > > I was going to suggest something similar. But I’m not sure it has to have such a long name. So this would change my plan in comment 25 to: 1. Implement charAtWithoutBoundsCheck (with a shorter name) 2. Rewrite the first patch on this bug to call that instead of operator[] 3. Write a benchmark that checks the performance of additional code that would need to be added to charAtWithoutBoundsCheck and operator[] to work with 8-bit strings. The code would probably look something like: if (m_refCountAndFlags & s_is8bit) return static_cast<UChar>(m_data8[i]); else return m_data16[i]; Do people like that better than the StringIterator approach?
Ryosuke Niwa
Comment 29 2011-08-19 17:04:43 PDT
(In reply to comment #27) > (In reply to comment #26) > > Can we instead add charAtWithoutBoundsCheck(unsigned) const > > I was going to suggest something similar. But I’m not sure it has to have such a long name. fastCharAt? But I personally think that renaming the existing operator[] to charAt and then eliminating the bounds check from operator[] will be better since operator[] on an array doesn't do bounds check.
Darin Adler
Comment 30 2011-08-19 17:18:36 PDT
(In reply to comment #29) > fastCharAt? But I personally think that renaming the existing operator[] to charAt and then eliminating the bounds check from operator[] will be better since operator[] on an array doesn't do bounds check. Sure, later we can do that, but the point is to avoid making global risky changes that could even introduce buffer overrun for little benefit.
Ryosuke Niwa
Comment 31 2011-08-19 17:25:01 PDT
(In reply to comment #30) > Sure, later we can do that, but the point is to avoid making global risky changes that could even introduce buffer overrun for little benefit. I think we can do it safely if we first renamed operator[] to charAt in one patch, and then re-introduced operator[] in another patch.
Xianzhu Wang
Comment 32 2011-08-19 22:43:09 PDT
About the plan in comment 28, I have two points: 1. About the final usages of const UChar*: Extending from the cases in 'Passed to functions as UChar*', 'Passed as substring as UChar*', 'Classes with UChar* members', 'Return characters()' in Annie's spreadsheet, the final usages would fall into the other categories. This could also increase the cases of 'Use [] operator' category, and the performance measured after code change could be more accurate. We could also know the potential problems using 8-bit buffers in strings, especially with the third-party libraries (e.g. ICU). I can work on this. 2. To also change the final usages of 'Use [] operator' category, we should first determine how to pass const UChar* through parameters, class members and returned values: StringIterator, or 'const String&'/String? I think one benefit of StringIterator is its partly compatibility with const UChar*. Using StringIterator we can replace const UChar*s with small code change, until the final usages of them. If the final usage is just iterating through the characters (i.e. this bug), the iteration code can remain unchanged. All intermediate steps that just pass const UChar* to the next step (with or without an offset) can just change the type name.
Annie Sullivan
Comment 33 2011-08-22 12:14:43 PDT
(In reply to comment #32) > About the plan in comment 28, I have two points: > > 1. About the final usages of const UChar*: Extending from the cases in 'Passed to functions as UChar*', 'Passed as substring as UChar*', 'Classes with UChar* members', 'Return characters()' in Annie's spreadsheet, the final usages would fall into the other categories. This could also increase the cases of 'Use [] operator' category, and the performance measured after code change could be more accurate. We could also know the potential problems using 8-bit buffers in strings, especially with the third-party libraries (e.g. ICU). I can work on this. That would be great! > 2. To also change the final usages of 'Use [] operator' category, we should first determine how to pass const UChar* through parameters, class members and returned values: StringIterator, or 'const String&'/String? A lot of these usages pass in something like (characters() + offset, length) to create a substring quickly. As I think about it more, it should be possible to make StringIterator::substring() method which does this internally but returns a StringIterator so it doesn't need to expose the internal format of the character buffer. > I think one benefit of StringIterator is its partly compatibility with const UChar*. Using StringIterator we can replace const UChar*s with small code change, until the final usages of them. If the final usage is just iterating through the characters (i.e. this bug), the iteration code can remain unchanged. All intermediate steps that just pass const UChar* to the next step (with or without an offset) can just change the type name. The more I dig through the code that calls characters(), the more I agree with this. I think StringIterator is the right approach. I set the review flag to r? on your StringIterator patch. Can someone review?
Annie Sullivan
Comment 34 2011-08-22 13:29:05 PDT
On second thought, I created bug 66706 for just the StringIterator part of the patch, since there is a lot of discussion on this bug and I hoped it would be clearer what StringIterator is for if we put it on a separate bug.
Note You need to log in before you can comment on or make changes to this bug.