Bug 66706 - Change return type of StringImpl::characters() to StringCharacterIterator
Summary: Change return type of StringImpl::characters() to StringCharacterIterator
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Annie Sullivan
URL:
Keywords:
Depends on:
Blocks: 66286 66161
  Show dependency treegraph
 
Reported: 2011-08-22 13:25 PDT by Annie Sullivan
Modified: 2012-01-22 14:28 PST (History)
10 users (show)

See Also:


Attachments
Patch (2.42 KB, patch)
2011-08-22 13:26 PDT, Annie Sullivan
no flags Details | Formatted Diff | Diff
Patch (2.48 KB, patch)
2011-08-22 13:43 PDT, Annie Sullivan
no flags Details | Formatted Diff | Diff
Patch (3.16 KB, patch)
2011-08-22 16:46 PDT, Annie Sullivan
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Annie Sullivan 2011-08-22 13:25:20 PDT
In order to fix bug 66161 (storing strings in 8-bit buffers when possible), we want to reduce the number of places where the underlying data format in WTF::String is exposed. The main thing that needs to get fixed are ~200 usages of the String::characters() method. After discussion in bug 66286, we think the best approach to the conversion is the following:

1. Typedef const UChar * (the return value of characters() ) StringIterator, and make characters() return StringIterator.
2. Replace usages of const UChar* in callers with StringIterator. At this point, the functionality of the code hasn't changed at all, but it would be possible to implement StringIterator with operator[], etc. in a way that hides the underlying data format of the string.
3. Write various implementations of StringIterator and do performance analysis on how they affect the entire codebase (as opposed to relying solely on microbenchmarks).
4. Pick a performant implementation of StringIterator, and actually convert the code to use 8-bit buffers when possible using this implementation.

This bug addresses step 1. I pulled it out of one of the patches from bug 66286 because there is a lot of discussion on that bug and I wanted to make sure what we're doing here is clear.
Comment 1 Annie Sullivan 2011-08-22 13:26:43 PDT
Created attachment 104726 [details]
Patch
Comment 2 Darin Adler 2011-08-22 13:32:00 PDT
A string iterator sounds like something that iterates across strings. This iterates across characters within a string. I think this should be named StringCharacterIterator.
Comment 3 Annie Sullivan 2011-08-22 13:43:46 PDT
Created attachment 104729 [details]
Patch

Updated name from StringIterator to StringCharacterIterator
Comment 4 WebKit Review Bot 2011-08-22 13:46:37 PDT
Attachment 104729 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/update-webkit', '--chromium']" exit_code: 2

Last 3072 characters of output:
chromium.org/svn/trunk/deps/third_party/libjpeg_turbo@95800
    should_process: True
    processed: True
    requirements: ['./', 'chromium_deps', 'third_party']
  
    name: third_party/v8-i18n
    url: From('chromium_deps', 'src/third_party/v8-i18n')
    parsed_url: http://v8-i18n.googlecode.com/svn/trunk@4
    should_process: True
    processed: True
    requirements: ['./', 'chromium_deps', 'third_party']
  
    name: tools/grit
    url: http://src.chromium.org/svn/trunk/src/tools/grit@97698
    parsed_url: http://src.chromium.org/svn/trunk/src/tools/grit@97698
    should_process: True
    processed: True
    requirements: ['./']
  
    name: base
    url: http://src.chromium.org/svn/trunk/src/base@97698
    parsed_url: http://src.chromium.org/svn/trunk/src/base@97698
    _file_list: ['/mnt/git/webkit-style-queue/Source/WebKit/chromium/base/event_recorder.cc', '/mnt/git/webkit-style-queue/Source/WebKit/chromium/base/event_recorder_win.cc', '/mnt/git/webkit-style-queue/Source/WebKit/chromium/base/base.gypi']
    should_process: True
    processed: True
    requirements: ['./']
  
    name: third_party/leveldb
    url: From('chromium_deps', 'src/third_party/leveldb')
    should_process: True
    requirements: ['./', 'chromium_deps', 'third_party']
  
    name: sql
    url: http://src.chromium.org/svn/trunk/src/sql@97698
    should_process: True
    requirements: ['./']
  
    name: v8
    url: From('chromium_deps', 'src/v8')
    should_process: True
    requirements: ['./', 'chromium_deps']
  
    name: testing/gtest
    url: From('chromium_deps', 'src/testing/gtest')
    should_process: True
    requirements: ['./', 'chromium_deps', 'testing']
  
    name: third_party/libwebp
    url: http://src.chromium.org/svn/trunk/src/third_party/libwebp@97698
    should_process: True
    requirements: ['./', 'third_party']
  
    name: googleurl
    url: From('chromium_deps', 'src/googleurl')
    should_process: True
    requirements: ['./', 'chromium_deps']
  
    name: third_party/skia/include
    url: From('chromium_deps', 'src/third_party/skia/include')
    should_process: True
    requirements: ['./', 'chromium_deps', 'third_party']
  
    name: third_party/ots
    url: From('chromium_deps', 'src/third_party/ots')
    should_process: True
    requirements: ['./', 'chromium_deps', 'third_party']
  
    name: third_party/snappy/src
    url: From('chromium_deps', 'src/third_party/snappy/src')
    should_process: True
    requirements: ['./', 'chromium_deps', 'third_party']


________ running 'svn update /mnt/git/webkit-style-queue/Source/WebKit/chromium/base --revision 97698 --ignore-externals' in '/mnt/git/webkit-style-queue/Source/WebKit/chromium'
D    /mnt/git/webkit-style-queue/Source/WebKit/chromium/base/event_recorder.cc
A    /mnt/git/webkit-style-queue/Source/WebKit/chromium/base/event_recorder_win.cc
U    /mnt/git/webkit-style-queue/Source/WebKit/chromium/base/base.gypi
Updated to revision 97698.
Died at Tools/Scripts/update-webkit-chromium line 69.
No such file or directory at Tools/Scripts/update-webkit line 100.


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Annie Sullivan 2011-08-22 13:53:15 PDT
I think there is something wrong with style-queue; event_recorder_win.cc isn't even in my patch. The style check passed locally when I uploaded the patch.
Comment 6 Ryosuke Niwa 2011-08-22 13:53:40 PDT
Comment on attachment 104729 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=104729&action=review

> Source/JavaScriptCore/wtf/text/StringImpl.h:68
> +typedef const UChar* StringCharacterIterator;

This is going to be a class, right?  I think we should define a class here and modify the call sites of characters() to use StringCharacterIterator.
By the way, we already have TextIterator and CharacterIterator and they have completely different semantics so I'm not sure if we should name this class StringCharacterIterator.

How about ConstCharacterAccessor?
Comment 7 Annie Sullivan 2011-08-22 14:11:23 PDT
(In reply to comment #6)
> (From update of attachment 104729 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=104729&action=review
> 
> > Source/JavaScriptCore/wtf/text/StringImpl.h:68
> > +typedef const UChar* StringCharacterIterator;
> 
> This is going to be a class, right?  I think we should define a class here and modify the call sites of characters() to use StringCharacterIterator.

Eventually it will be a class, yes. But we can make much smaller and less risky changes if we start with the typedef. Using the typedef doesn't change the performance or functionality at all. And we can update the code in smaller patches, instead of making a giant patch that changes all 200 callers and the subsequent calls they make, and adds in a new class at the same time.

So when we get to step 3 of the plan in the bug description above, we can try various implementations of the class and test performance on various benchmarks, and check in/roll back that single change on its own.

> By the way, we already have TextIterator and CharacterIterator and they have completely different semantics so I'm not sure if we should name this class StringCharacterIterator.
> 
> How about ConstCharacterAccessor?

I don't have a strong opinion. Darin, what do you think?
Comment 8 Darin Adler 2011-08-22 14:19:18 PDT
Comment on attachment 104729 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=104729&action=review

>>> Source/JavaScriptCore/wtf/text/StringImpl.h:68
>>> +typedef const UChar* StringCharacterIterator;
>> 
>> This is going to be a class, right?  I think we should define a class here and modify the call sites of characters() to use StringCharacterIterator.
>> By the way, we already have TextIterator and CharacterIterator and they have completely different semantics so I'm not sure if we should name this class StringCharacterIterator.
>> 
>> How about ConstCharacterAccessor?
> 
> Eventually it will be a class, yes. But we can make much smaller and less risky changes if we start with the typedef. Using the typedef doesn't change the performance or functionality at all. And we can update the code in smaller patches, instead of making a giant patch that changes all 200 callers and the subsequent calls they make, and adds in a new class at the same time.
> 
> So when we get to step 3 of the plan in the bug description above, we can try various implementations of the class and test performance on various benchmarks, and check in/roll back that single change on its own.

I recognize that we already have a CharacterIterator, but it seems to me that StringCharacterIterator is clearly something related to the string class. Another possibility would be to name it String::CharacterIterator.

Our existing CharacterIterator could be renamed and given a slightly longer, clearer name. It's used in a small enough number of places that it would be straightforward.

I do think it’s a problem for the name StringCharacterIterator that this class is for read-only iteration and “iterator” usually means read-write access.

It does indeed seem fine to start using this typedef to keep the size of the patch smaller once it becomes a class instead of a typedef. But it might give us a false sense of accomplishment. Some things, such as passing a const UChar* to external library functions such as CFStringCreateWithCharacters, won’t work at all with the iterator. And we won’t uncover those until we make it a class. So we may be putting this typedef in places where it does no good.

If I was doing this, I would do it more like this:

    1) Check in the typedef.
    2) Locally change to a class.
    3) Start landing changes to use the typedef in the tree, having already tested with the class that I have not yet checked in.
    4) Land the real class when the time is right.
Comment 9 Annie Sullivan 2011-08-22 14:42:44 PDT
(In reply to comment #8)
> (From update of attachment 104729 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=104729&action=review
> 
> >>> Source/JavaScriptCore/wtf/text/StringImpl.h:68
> >>> +typedef const UChar* StringCharacterIterator;
> >> 
> >> This is going to be a class, right?  I think we should define a class here and modify the call sites of characters() to use StringCharacterIterator.
> >> By the way, we already have TextIterator and CharacterIterator and they have completely different semantics so I'm not sure if we should name this class StringCharacterIterator.
> >> 
> >> How about ConstCharacterAccessor?
> > 
> > Eventually it will be a class, yes. But we can make much smaller and less risky changes if we start with the typedef. Using the typedef doesn't change the performance or functionality at all. And we can update the code in smaller patches, instead of making a giant patch that changes all 200 callers and the subsequent calls they make, and adds in a new class at the same time.
> > 
> > So when we get to step 3 of the plan in the bug description above, we can try various implementations of the class and test performance on various benchmarks, and check in/roll back that single change on its own.
> 
> I recognize that we already have a CharacterIterator, but it seems to me that StringCharacterIterator is clearly something related to the string class. Another possibility would be to name it String::CharacterIterator.
> 
> Our existing CharacterIterator could be renamed and given a slightly longer, clearer name. It's used in a small enough number of places that it would be straightforward.

I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator?

> I do think it’s a problem for the name StringCharacterIterator that this class is for read-only iteration and “iterator” usually means read-write access.

That's true. Do you think ConstCharacterAccessor captures that better? Or maybe something like String::ReadOnlyCharIterator would be better?

> It does indeed seem fine to start using this typedef to keep the size of the patch smaller once it becomes a class instead of a typedef. But it might give us a false sense of accomplishment. Some things, such as passing a const UChar* to external library functions such as CFStringCreateWithCharacters, won’t work at all with the iterator. And we won’t uncover those until we make it a class. So we may be putting this typedef in places where it does no good.
> 
> If I was doing this, I would do it more like this:
> 
>     1) Check in the typedef.
>     2) Locally change to a class.
>     3) Start landing changes to use the typedef in the tree, having already tested with the class that I have not yet checked in.
>     4) Land the real class when the time is right.

In practice, we will definitely do something more like this. Sorry for oversimplifying in my comments so far; I wanted to be clear what order the patches would be submitted in.
Comment 10 Darin Adler 2011-08-22 14:49:12 PDT
(In reply to comment #9)
> I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator?

That seems OK.

> > I do think it’s a problem for the name StringCharacterIterator that this class is for read-only iteration and “iterator” usually means read-write access.
> 
> That's true. Do you think ConstCharacterAccessor captures that better? Or maybe something like String::ReadOnlyCharIterator would be better?

The word “const” seems OK if not great. I think ReadOnly is slightly better but more wordy. The word “accessor” seems clearly not as good as iterator to express the fact that it is something you move from one character to the next. The abbreviation “Char” is not to my taste.

> > If I was doing this, I would do it more like this:
> > 
> >     1) Check in the typedef.
> >     2) Locally change to a class.
> >     3) Start landing changes to use the typedef in the tree, having already tested with the class that I have not yet checked in.
> >     4) Land the real class when the time is right.
> 
> In practice, we will definitely do something more like this. Sorry for oversimplifying in my comments so far; I wanted to be clear what order the patches would be submitted in.

I am not sure we are understanding each other. This patch changes lots of call sites to use the iterator class instead of const UChar*. Have those call sites been tested with a class? If we were doing my (3) then we would have had to test these with the class and so we would know which ones were truly “iteration” and which were other kinds of const UChar* uses.
Comment 11 Annie Sullivan 2011-08-22 15:03:42 PDT
(In reply to comment #10)
> (In reply to comment #9)
> > I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator?
> 
> That seems OK.

Okay, I'll do that.

> > > I do think it’s a problem for the name StringCharacterIterator that this class is for read-only iteration and “iterator” usually means read-write access.
> > 
> > That's true. Do you think ConstCharacterAccessor captures that better? Or maybe something like String::ReadOnlyCharIterator would be better?
> 
> The word “const” seems OK if not great. I think ReadOnly is slightly better but more wordy. The word “accessor” seems clearly not as good as iterator to express the fact that it is something you move from one character to the next. The abbreviation “Char” is not to my taste.

Without abbreviating, StringReadOnlyCharacterIterator is the best name I can think of. Should I change the patch to that?

> > > If I was doing this, I would do it more like this:
> > > 
> > >     1) Check in the typedef.
> > >     2) Locally change to a class.
> > >     3) Start landing changes to use the typedef in the tree, having already tested with the class that I have not yet checked in.
> > >     4) Land the real class when the time is right.
> > 
> > In practice, we will definitely do something more like this. Sorry for oversimplifying in my comments so far; I wanted to be clear what order the patches would be submitted in.
> 
> I am not sure we are understanding each other. This patch changes lots of call sites to use the iterator class instead of const UChar*. Have those call sites been tested with a class? If we were doing my (3) then we would have had to test these with the class and so we would know which ones were truly “iteration” and which were other kinds of const UChar* uses.

The patch on bug 66286 does indeed change lots of callsites, but the new patch I posted on this bug is just the typedef. I'm sorry if I'm still misunderstanding something?
Comment 12 Ryosuke Niwa 2011-08-22 15:09:59 PDT
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #9)
> > > I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator?
> > 
> > That seems OK.
> 
> Okay, I'll do that.

I'd prefer keeping CharacterIterator and BackwardsCharacterIterator as is, and add String::ConstIterator.
Comment 13 Annie Sullivan 2011-08-22 15:11:08 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > (In reply to comment #9)
> > > > I'd be happy to file a blocking bug and write a patch for this. Looks like both CharacterIterator and BackwardsCharacterIterator would be changed, right? Any opinions on the new name? TextCharacterIterator/BackwardsTextCharacterIterator?
> > > 
> > > That seems OK.
> > 
> > Okay, I'll do that.
> 
> I'd prefer keeping CharacterIterator and BackwardsCharacterIterator as is, and add String::ConstIterator.

Darin, is that okay with you?
Comment 14 Darin Adler 2011-08-22 15:18:09 PDT
(In reply to comment #13)
> > I'd prefer keeping CharacterIterator and BackwardsCharacterIterator as is, and add String::ConstIterator.
> 
> Darin, is that okay with you?

I don’t think it will be easy to make a type used in StringImpl be defined as a member of the String class. The name seems OK.
Comment 15 Annie Sullivan 2011-08-22 15:25:04 PDT
> > I am not sure we are understanding each other. This patch changes lots of call sites to use the iterator class instead of const UChar*. Have those call sites been tested with a class? If we were doing my (3) then we would have had to test these with the class and so we would know which ones were truly “iteration” and which were other kinds of const UChar* uses.
> 
> The patch on bug 66286 does indeed change lots of callsites, but the new patch I posted on this bug is just the typedef. I'm sorry if I'm still misunderstanding something?

Wait, I think I see where I was misunderstanding you. What I did was change the return value of characters(), assuming all callsites would be StringIterator, when it's clearly true that at least a few will really need access to the underlying bytes.

Instead, would you prefer we leave characters() alone and add a second function which returns StringIterator, then in step 3 we change callsites to use it once we've verified that they're really iterators? (We'd use the preferred name, of course).
Comment 16 Annie Sullivan 2011-08-22 15:25:26 PDT
(In reply to comment #14)
> (In reply to comment #13)
> > > I'd prefer keeping CharacterIterator and BackwardsCharacterIterator as is, and add String::ConstIterator.
> > 
> > Darin, is that okay with you?
> 
> I don’t think it will be easy to make a type used in StringImpl be defined as a member of the String class. The name seems OK.

Oops, right. Is StringConstIterator okay?
Comment 17 Darin Adler 2011-08-22 15:32:26 PDT
(In reply to comment #15)
> What I did was change the return value of characters()

Maybe you did that in your private tree. But in the public tree we are just using a typedef and it's still the same return type as before, so we did not yet change the return type of characters().

> Instead, would you prefer we leave characters() alone and add a second function which returns StringIterator

I don’t really care all that much about what the function is named. It might be clearer to name it characterIterator() rather than characters(). Having two functions is OK if it helps us test what places will work with the iterator in the future.

We should only deploy the new typedef at call sites where we know it will work. I don’t care precisely how we do that testing, but once we know it will work, then it’s great to land the change now in a way that is safe and has no effect until we land more of the work later.

It’s not as great to land it places where it was not tested, might not work, and might have to rewritten later. I’d rather have those call sites say const UChar* rather than being churned multiple times.
Comment 18 Darin Adler 2011-08-22 15:33:14 PDT
(In reply to comment #16)
> Oops, right. Is StringConstIterator okay?

I guess we’re back to the original name, StringIterator, only this time we added “const”. I still have my original objection. It seems like an iterator that iterates over strings, not over characters within a string, to me.
Comment 19 Annie Sullivan 2011-08-22 16:03:35 PDT
(In reply to comment #18)
> (In reply to comment #16)
> > Oops, right. Is StringConstIterator okay?
> 
> I guess we’re back to the original name, StringIterator, only this time we added “const”. I still have my original objection. It seems like an iterator that iterates over strings, not over characters within a string, to me.

StringConstCharacterIterator is slightly shorter than StringReadOnlyCharacterIterator. And I'm assuming you don't want to abbreviate "str", "char", "iter", etc. So I think that's the best we can do?

(In reply to comment #17)
> (In reply to comment #15)
> > What I did was change the return value of characters()
> 
> Maybe you did that in your private tree. But in the public tree we are just using a typedef and it's still the same return type as before, so we did not yet change the return type of characters().
> 
> > Instead, would you prefer we leave characters() alone and add a second function which returns StringIterator
> 
> I don’t really care all that much about what the function is named. It might be clearer to name it characterIterator() rather than characters(). Having two functions is OK if it helps us test what places will work with the iterator in the future.

I think having the two functions will help us test.

> We should only deploy the new typedef at call sites where we know it will work. I don’t care precisely how we do that testing, but once we know it will work, then it’s great to land the change now in a way that is safe and has no effect until we land more of the work later.
> 
> It’s not as great to land it places where it was not tested, might not work, and might have to rewritten later. I’d rather have those call sites say const UChar* rather than being churned multiple times.

We'll definitely test call sites before changing them.
Comment 20 Annie Sullivan 2011-08-22 16:46:04 PDT
Created attachment 104765 [details]
Patch

Updated name to StringConstCharacterIterator and added characterIterator() function that returns StringConstCharacterIterator instead of const UChar*
Comment 21 Sam Weinig 2012-01-22 14:28:41 PST
StringImpl supports 8-bit buffers, so it seems like this work is not necessary now.  Please re-open if there is still value to doing this work.