WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED WONTFIX
66706
Change return type of StringImpl::characters() to StringCharacterIterator
https://bugs.webkit.org/show_bug.cgi?id=66706
Summary
Change return type of StringImpl::characters() to StringCharacterIterator
Annie Sullivan
Reported
2011-08-22 13:25:20 PDT
In order to fix
bug 66161
(storing strings in 8-bit buffers when possible), we want to reduce the number of places where the underlying data format in WTF::String is exposed. The main thing that needs to get fixed are ~200 usages of the String::characters() method. After discussion in
bug 66286
, we think the best approach to the conversion is the following: 1. Typedef const UChar * (the return value of characters() ) StringIterator, and make characters() return StringIterator. 2. Replace usages of const UChar* in callers with StringIterator. At this point, the functionality of the code hasn't changed at all, but it would be possible to implement StringIterator with operator[], etc. in a way that hides the underlying data format of the string. 3. Write various implementations of StringIterator and do performance analysis on how they affect the entire codebase (as opposed to relying solely on microbenchmarks). 4. Pick a performant implementation of StringIterator, and actually convert the code to use 8-bit buffers when possible using this implementation. This bug addresses step 1. I pulled it out of one of the patches from
bug 66286
because there is a lot of discussion on that bug and I wanted to make sure what we're doing here is clear.
Attachments
Patch
(2.42 KB, patch)
2011-08-22 13:26 PDT
,
Annie Sullivan
no flags
Details
Formatted Diff
Diff
Patch
(2.48 KB, patch)
2011-08-22 13:43 PDT
,
Annie Sullivan
no flags
Details
Formatted Diff
Diff
Patch
(3.16 KB, patch)
2011-08-22 16:46 PDT
,
Annie Sullivan
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Annie Sullivan
Comment 1
2011-08-22 13:26:43 PDT
Created
attachment 104726
[details]
Patch
Darin Adler
Comment 2
2011-08-22 13:32:00 PDT
A string iterator sounds like something that iterates across strings. This iterates across characters within a string. I think this should be named StringCharacterIterator.
Annie Sullivan
Comment 3
2011-08-22 13:43:46 PDT
Created
attachment 104729
[details]
Patch Updated name from StringIterator to StringCharacterIterator
WebKit Review Bot
Comment 4
2011-08-22 13:46:37 PDT
Attachment 104729
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/update-webkit', '--chromium']" exit_code: 2 Last 3072 characters of output: chromium.org/svn/trunk/deps/third_party/libjpeg_turbo@95800 should_process: True processed: True requirements: ['./', 'chromium_deps', 'third_party'] name: third_party/v8-i18n url: From('chromium_deps', 'src/third_party/v8-i18n') parsed_url:
http://v8-i18n.googlecode.com/svn/trunk@4
should_process: True processed: True requirements: ['./', 'chromium_deps', 'third_party'] name: tools/grit url:
http://src.chromium.org/svn/trunk/src/tools/grit@97698
parsed_url:
http://src.chromium.org/svn/trunk/src/tools/grit@97698
should_process: True processed: True requirements: ['./'] name: base url:
http://src.chromium.org/svn/trunk/src/base@97698
parsed_url:
http://src.chromium.org/svn/trunk/src/base@97698
_file_list: ['/mnt/git/webkit-style-queue/Source/WebKit/chromium/base/event_recorder.cc', '/mnt/git/webkit-style-queue/Source/WebKit/chromium/base/event_recorder_win.cc', '/mnt/git/webkit-style-queue/Source/WebKit/chromium/base/base.gypi'] should_process: True processed: True requirements: ['./'] name: third_party/leveldb url: From('chromium_deps', 'src/third_party/leveldb') should_process: True requirements: ['./', 'chromium_deps', 'third_party'] name: sql url:
http://src.chromium.org/svn/trunk/src/sql@97698
should_process: True requirements: ['./'] name: v8 url: From('chromium_deps', 'src/v8') should_process: True requirements: ['./', 'chromium_deps'] name: testing/gtest url: From('chromium_deps', 'src/testing/gtest') should_process: True requirements: ['./', 'chromium_deps', 'testing'] name: third_party/libwebp url:
http://src.chromium.org/svn/trunk/src/third_party/libwebp@97698
should_process: True requirements: ['./', 'third_party'] name: googleurl url: From('chromium_deps', 'src/googleurl') should_process: True requirements: ['./', 'chromium_deps'] name: third_party/skia/include url: From('chromium_deps', 'src/third_party/skia/include') should_process: True requirements: ['./', 'chromium_deps', 'third_party'] name: third_party/ots url: From('chromium_deps', 'src/third_party/ots') should_process: True requirements: ['./', 'chromium_deps', 'third_party'] name: third_party/snappy/src url: From('chromium_deps', 'src/third_party/snappy/src') should_process: True requirements: ['./', 'chromium_deps', 'third_party'] ________ running 'svn update /mnt/git/webkit-style-queue/Source/WebKit/chromium/base --revision 97698 --ignore-externals' in '/mnt/git/webkit-style-queue/Source/WebKit/chromium' D /mnt/git/webkit-style-queue/Source/WebKit/chromium/base/event_recorder.cc A /mnt/git/webkit-style-queue/Source/WebKit/chromium/base/event_recorder_win.cc U /mnt/git/webkit-style-queue/Source/WebKit/chromium/base/base.gypi Updated to revision 97698. Died at Tools/Scripts/update-webkit-chromium line 69. No such file or directory at Tools/Scripts/update-webkit line 100. If any of these errors are false positives, please file a bug against check-webkit-style.
Annie Sullivan
Comment 5
2011-08-22 13:53:15 PDT
I think there is something wrong with style-queue; event_recorder_win.cc isn't even in my patch. The style check passed locally when I uploaded the patch.
Ryosuke Niwa
Comment 6
2011-08-22 13:53:40 PDT
Comment on
attachment 104729
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=104729&action=review
> Source/JavaScriptCore/wtf/text/StringImpl.h:68 > +typedef const UChar* StringCharacterIterator;
This is going to be a class, right? I think we should define a class here and modify the call sites of characters() to use StringCharacterIterator. By the way, we already have TextIterator and CharacterIterator and they have completely different semantics so I'm not sure if we should name this class StringCharacterIterator. How about ConstCharacterAccessor?
Annie Sullivan
Comment 7
2011-08-22 14:11:23 PDT
(In reply to
comment #6
)
> (From update of
attachment 104729
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=104729&action=review
> > > Source/JavaScriptCore/wtf/text/StringImpl.h:68 > > +typedef const UChar* StringCharacterIterator; > > This is going to be a class, right? I think we should define a class here and modify the call sites of characters() to use StringCharacterIterator.
Eventually it will be a class, yes. But we can make much smaller and less risky changes if we start with the typedef. Using the typedef doesn't change the performance or functionality at all. And we can update the code in smaller patches, instead of making a giant patch that changes all 200 callers and the subsequent calls they make, and adds in a new class at the same time. So when we get to step 3 of the plan in the bug description above, we can try various implementations of the class and test performance on various benchmarks, and check in/roll back that single change on its own.
> By the way, we already have TextIterator and CharacterIterator and they have completely different semantics so I'm not sure if we should name this class StringCharacterIterator. > > How about ConstCharacterAccessor?
I don't have a strong opinion. Darin, what do you think?
Darin Adler
Comment 8
2011-08-22 14:19:18 PDT
Comment on
attachment 104729
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=104729&action=review
>>> Source/JavaScriptCore/wtf/text/StringImpl.h:68 >>> +typedef const UChar* StringCharacterIterator; >> >> This is going to be a class, right? I think we should define a class here and modify the call sites of characters() to use StringCharacterIterator. >> By the way, we already have TextIterator and CharacterIterator and they have completely different semantics so I'm not sure if we should name this class StringCharacterIterator. >> >> How about ConstCharacterAccessor? > > Eventually it will be a class, yes. But we can make much smaller and less risky changes if we start with the typedef. Using the typedef doesn't change the performance or functionality at all. And we can update the code in smaller patches, instead of making a giant patch that changes all 200 callers and the subsequent calls they make, and adds in a new class at the same time. > > So when we get to step 3 of the plan in the bug description above, we can try various implementations of the class and test performance on various benchmarks, and check in/roll back that single change on its own.
I recognize that we already have a CharacterIterator, but it seems to me that StringCharacterIterator is clearly something related to the string class. Another possibility would be to name it String::CharacterIterator. Our existing CharacterIterator could be renamed and given a slightly longer, clearer name. It's used in a small enough number of places that it would be straightforward. I do think it’s a problem for the name StringCharacterIterator that this class is for read-only iteration and “iterator” usually means read-write access. It does indeed seem fine to start using this typedef to keep the size of the patch smaller once it becomes a class instead of a typedef. But it might give us a false sense of accomplishment. Some things, such as passing a const UChar* to external library functions such as CFStringCreateWithCharacters, won’t work at all with the iterator. And we won’t uncover those until we make it a class. So we may be putting this typedef in places where it does no good. If I was doing this, I would do it more like this: 1) Check in the typedef. 2) Locally change to a class. 3) Start landing changes to use the typedef in the tree, having already tested with the class that I have not yet checked in. 4) Land the real class when the time is right.
Annie Sullivan
Comment 9
2011-08-22 14:42:44 PDT
(In reply to
comment #8
)
> (From update of
attachment 104729
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=104729&action=review
> > >>> Source/JavaScriptCore/wtf/text/StringImpl.h:68 > >>> +typedef const UChar* StringCharacterIterator; > >> > >> This is going to be a class, right? I think we should define a class here and modify the call sites of characters() to use StringCharacterIterator. > >> By the way, we already have TextIterator and CharacterIterator and they have completely different semantics so I'm not sure if we should name this class StringCharacterIterator. > >> > >> How about ConstCharacterAccessor? > > > > Eventually it will be a class, yes. But we can make much smaller and less risky changes if we start with the typedef. Using the typedef doesn't change the performance or functionality at all. And we can update the code in smaller patches, instead of making a giant patch that changes all 200 callers and the subsequent calls they make, and adds in a new class at the same time. > > > > So when we get to step 3 of the plan in the bug description above, we can try various implementations of the class and test performance on various benchmarks, and check in/roll back that single change on its own. > > I recognize that we already have a CharacterIterator, but it seems to me that StringCharacterIterator is clearly something related to the string class. Another possibility would be to name it String::CharacterIterator. > > Our existing CharacterIterator could be renamed and given a slightly longer, clearer name. It's used in a small enough number of places that it would be straightforward.
I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator?
> I do think it’s a problem for the name StringCharacterIterator that this class is for read-only iteration and “iterator” usually means read-write access.
That's true. Do you think ConstCharacterAccessor captures that better? Or maybe something like String::ReadOnlyCharIterator would be better?
> It does indeed seem fine to start using this typedef to keep the size of the patch smaller once it becomes a class instead of a typedef. But it might give us a false sense of accomplishment. Some things, such as passing a const UChar* to external library functions such as CFStringCreateWithCharacters, won’t work at all with the iterator. And we won’t uncover those until we make it a class. So we may be putting this typedef in places where it does no good. > > If I was doing this, I would do it more like this: > > 1) Check in the typedef. > 2) Locally change to a class. > 3) Start landing changes to use the typedef in the tree, having already tested with the class that I have not yet checked in. > 4) Land the real class when the time is right.
In practice, we will definitely do something more like this. Sorry for oversimplifying in my comments so far; I wanted to be clear what order the patches would be submitted in.
Darin Adler
Comment 10
2011-08-22 14:49:12 PDT
(In reply to
comment #9
)
> I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator?
That seems OK.
> > I do think it’s a problem for the name StringCharacterIterator that this class is for read-only iteration and “iterator” usually means read-write access. > > That's true. Do you think ConstCharacterAccessor captures that better? Or maybe something like String::ReadOnlyCharIterator would be better?
The word “const” seems OK if not great. I think ReadOnly is slightly better but more wordy. The word “accessor” seems clearly not as good as iterator to express the fact that it is something you move from one character to the next. The abbreviation “Char” is not to my taste.
> > If I was doing this, I would do it more like this: > > > > 1) Check in the typedef. > > 2) Locally change to a class. > > 3) Start landing changes to use the typedef in the tree, having already tested with the class that I have not yet checked in. > > 4) Land the real class when the time is right. > > In practice, we will definitely do something more like this. Sorry for oversimplifying in my comments so far; I wanted to be clear what order the patches would be submitted in.
I am not sure we are understanding each other. This patch changes lots of call sites to use the iterator class instead of const UChar*. Have those call sites been tested with a class? If we were doing my (3) then we would have had to test these with the class and so we would know which ones were truly “iteration” and which were other kinds of const UChar* uses.
Annie Sullivan
Comment 11
2011-08-22 15:03:42 PDT
(In reply to
comment #10
)
> (In reply to
comment #9
) > > I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator? > > That seems OK.
Okay, I'll do that.
> > > I do think it’s a problem for the name StringCharacterIterator that this class is for read-only iteration and “iterator” usually means read-write access. > > > > That's true. Do you think ConstCharacterAccessor captures that better? Or maybe something like String::ReadOnlyCharIterator would be better? > > The word “const” seems OK if not great. I think ReadOnly is slightly better but more wordy. The word “accessor” seems clearly not as good as iterator to express the fact that it is something you move from one character to the next. The abbreviation “Char” is not to my taste.
Without abbreviating, StringReadOnlyCharacterIterator is the best name I can think of. Should I change the patch to that?
> > > If I was doing this, I would do it more like this: > > > > > > 1) Check in the typedef. > > > 2) Locally change to a class. > > > 3) Start landing changes to use the typedef in the tree, having already tested with the class that I have not yet checked in. > > > 4) Land the real class when the time is right. > > > > In practice, we will definitely do something more like this. Sorry for oversimplifying in my comments so far; I wanted to be clear what order the patches would be submitted in. > > I am not sure we are understanding each other. This patch changes lots of call sites to use the iterator class instead of const UChar*. Have those call sites been tested with a class? If we were doing my (3) then we would have had to test these with the class and so we would know which ones were truly “iteration” and which were other kinds of const UChar* uses.
The patch on
bug 66286
does indeed change lots of callsites, but the new patch I posted on this bug is just the typedef. I'm sorry if I'm still misunderstanding something?
Ryosuke Niwa
Comment 12
2011-08-22 15:09:59 PDT
(In reply to
comment #11
)
> (In reply to
comment #10
) > > (In reply to
comment #9
) > > > I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator? > > > > That seems OK. > > Okay, I'll do that.
I'd prefer keeping CharacterIterator and BackwardsCharacterIterator as is, and add String::ConstIterator.
Annie Sullivan
Comment 13
2011-08-22 15:11:08 PDT
(In reply to
comment #12
)
> (In reply to
comment #11
) > > (In reply to
comment #10
) > > > (In reply to
comment #9
) > > > > I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator? > > > > > > That seems OK. > > > > Okay, I'll do that. > > I'd prefer keeping CharacterIterator and BackwardsCharacterIterator as is, and add String::ConstIterator.
Darin, is that okay with you?
Darin Adler
Comment 14
2011-08-22 15:18:09 PDT
(In reply to
comment #13
)
> > I'd prefer keeping CharacterIterator and BackwardsCharacterIterator as is, and add String::ConstIterator. > > Darin, is that okay with you?
I don’t think it will be easy to make a type used in StringImpl be defined as a member of the String class. The name seems OK.
Annie Sullivan
Comment 15
2011-08-22 15:25:04 PDT
> > I am not sure we are understanding each other. This patch changes lots of call sites to use the iterator class instead of const UChar*. Have those call sites been tested with a class? If we were doing my (3) then we would have had to test these with the class and so we would know which ones were truly “iteration” and which were other kinds of const UChar* uses. > > The patch on
bug 66286
does indeed change lots of callsites, but the new patch I posted on this bug is just the typedef. I'm sorry if I'm still misunderstanding something?
Wait, I think I see where I was misunderstanding you. What I did was change the return value of characters(), assuming all callsites would be StringIterator, when it's clearly true that at least a few will really need access to the underlying bytes. Instead, would you prefer we leave characters() alone and add a second function which returns StringIterator, then in step 3 we change callsites to use it once we've verified that they're really iterators? (We'd use the preferred name, of course).
Annie Sullivan
Comment 16
2011-08-22 15:25:26 PDT
(In reply to
comment #14
)
> (In reply to
comment #13
) > > > I'd prefer keeping CharacterIterator and BackwardsCharacterIterator as is, and add String::ConstIterator. > > > > Darin, is that okay with you? > > I don’t think it will be easy to make a type used in StringImpl be defined as a member of the String class. The name seems OK.
Oops, right. Is StringConstIterator okay?
Darin Adler
Comment 17
2011-08-22 15:32:26 PDT
(In reply to
comment #15
)
> What I did was change the return value of characters()
Maybe you did that in your private tree. But in the public tree we are just using a typedef and it's still the same return type as before, so we did not yet change the return type of characters().
> Instead, would you prefer we leave characters() alone and add a second function which returns StringIterator
I don’t really care all that much about what the function is named. It might be clearer to name it characterIterator() rather than characters(). Having two functions is OK if it helps us test what places will work with the iterator in the future. We should only deploy the new typedef at call sites where we know it will work. I don’t care precisely how we do that testing, but once we know it will work, then it’s great to land the change now in a way that is safe and has no effect until we land more of the work later. It’s not as great to land it places where it was not tested, might not work, and might have to rewritten later. I’d rather have those call sites say const UChar* rather than being churned multiple times.
Darin Adler
Comment 18
2011-08-22 15:33:14 PDT
(In reply to
comment #16
)
> Oops, right. Is StringConstIterator okay?
I guess we’re back to the original name, StringIterator, only this time we added “const”. I still have my original objection. It seems like an iterator that iterates over strings, not over characters within a string, to me.
Annie Sullivan
Comment 19
2011-08-22 16:03:35 PDT
(In reply to
comment #18
)
> (In reply to
comment #16
) > > Oops, right. Is StringConstIterator okay? > > I guess we’re back to the original name, StringIterator, only this time we added “const”. I still have my original objection. It seems like an iterator that iterates over strings, not over characters within a string, to me.
StringConstCharacterIterator is slightly shorter than StringReadOnlyCharacterIterator. And I'm assuming you don't want to abbreviate "str", "char", "iter", etc. So I think that's the best we can do? (In reply to
comment #17
)
> (In reply to
comment #15
) > > What I did was change the return value of characters() > > Maybe you did that in your private tree. But in the public tree we are just using a typedef and it's still the same return type as before, so we did not yet change the return type of characters(). > > > Instead, would you prefer we leave characters() alone and add a second function which returns StringIterator > > I don’t really care all that much about what the function is named. It might be clearer to name it characterIterator() rather than characters(). Having two functions is OK if it helps us test what places will work with the iterator in the future.
I think having the two functions will help us test.
> We should only deploy the new typedef at call sites where we know it will work. I don’t care precisely how we do that testing, but once we know it will work, then it’s great to land the change now in a way that is safe and has no effect until we land more of the work later. > > It’s not as great to land it places where it was not tested, might not work, and might have to rewritten later. I’d rather have those call sites say const UChar* rather than being churned multiple times.
We'll definitely test call sites before changing them.
Annie Sullivan
Comment 20
2011-08-22 16:46:04 PDT
Created
attachment 104765
[details]
Patch Updated name to StringConstCharacterIterator and added characterIterator() function that returns StringConstCharacterIterator instead of const UChar*
Sam Weinig
Comment 21
2012-01-22 14:28:41 PST
StringImpl supports 8-bit buffers, so it seems like this work is not necessary now. Please re-open if there is still value to doing this work.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug