WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
137165
Support modern for loops over StringView code units and code points
https://bugs.webkit.org/show_bug.cgi?id=137165
Summary
Support modern for loops over StringView code units and code points
Myles C. Maxfield
Reported
2014-09-26 19:49:44 PDT
Support modern for loops over StringViews
Attachments
Patch
(9.45 KB, patch)
2014-09-26 20:03 PDT
,
Myles C. Maxfield
no flags
Details
Formatted Diff
Diff
Patch
(9.95 KB, patch)
2014-09-28 21:53 PDT
,
Myles C. Maxfield
no flags
Details
Formatted Diff
Diff
Patch
(10.67 KB, patch)
2014-09-28 22:37 PDT
,
Myles C. Maxfield
no flags
Details
Formatted Diff
Diff
Patch
(10.66 KB, patch)
2014-09-28 22:41 PDT
,
Myles C. Maxfield
no flags
Details
Formatted Diff
Diff
Patch
(10.67 KB, patch)
2014-09-28 22:43 PDT
,
Myles C. Maxfield
no flags
Details
Formatted Diff
Diff
Patch
(13.50 KB, patch)
2014-09-29 12:18 PDT
,
Myles C. Maxfield
no flags
Details
Formatted Diff
Diff
Patch
(10.50 KB, patch)
2014-09-30 09:48 PDT
,
Myles C. Maxfield
no flags
Details
Formatted Diff
Diff
Patch
(10.04 KB, patch)
2014-10-02 17:31 PDT
,
Myles C. Maxfield
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(7)
View All
Add attachment
proposed patch, testcase, etc.
Myles C. Maxfield
Comment 1
2014-09-26 20:03:16 PDT
Created
attachment 238754
[details]
Patch
Darin Adler
Comment 2
2014-09-28 16:37:54 PDT
Comment on
attachment 238754
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=238754&action=review
I think this is a pretty good idea. But what guarantees these objects don’t last longer than the underlying StringView? It seems really easy to make the mistake of keeping one of these alive when the StringView is gone. For example, this code would not work: for (UChar character : functionReturningStringView().codeUnits()) I think we should consider a different version of these that will work even if the StringView has been destroyed. Probably the best way to do this is to use a StringView rather than a const StringView& as a data member of the iterator. We need to test how codePoints works on invalid sequences; such as when the last character is the lead and there is no trail, and when a trail appears with no lead. U16_NEXT handles these cases, but I am not sure that our code handles them correctly. I am not sure that
> Source/WTF/ChangeLog:8 > + This patch adds two functions, codepoints() and codeunits(), on StringView.
Code point and code unit are each two word phrases, not single words. Lets not smash them together into one word. uNless there is a good reason.
> Source/WTF/wtf/text/StringView.h:40 > +class StringViewCodepoints {
I think this should be a member class. So it would be declared inside the StringView class. class StringView { ... class CodePoints; Then later, like this: class StringView::CodePoints { ... };
> Source/WTF/wtf/text/StringView.h:47 > + bool operator!=(const Iterator&) const;
I think we should supply both == and != operators.
> Source/WTF/wtf/text/StringView.h:57 > + StringViewCodepoints(const StringView&);
Might want this to be explicit.
> Source/WTF/wtf/text/StringView.h:60 > +private:
Need a blank line before this.
> Source/WTF/wtf/text/StringView.h:240 > + StringViewCodepoints codepoints() const > + { > + return StringViewCodepoints(*this); > + } > + > + StringViewCodeunits codeunits() const > + { > + return StringViewCodeunits(*this); > + }
Please put the bodies of these functions outside the class. I have a patch coming soon that does this for all StringView member functions.
> Source/WTF/wtf/text/StringView.h:348 > + if (m_cacheIsValid) { > + m_index = m_nextIndexCache; > + m_cacheIsValid = false;
Would read better with early return.
> Tools/TestWebKitAPI/Tests/WTF/StringView.cpp:158 > + for (auto codepoint : StringView(builder.toString()).codepoints()) {
I think this code would crash if it was run under a memory debugger; the StringView won’t stay alive.
Myles C. Maxfield
Comment 3
2014-09-28 21:47:57 PDT
Comment on
attachment 238754
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=238754&action=review
>> Source/WTF/ChangeLog:8 >> + This patch adds two functions, codepoints() and codeunits(), on StringView. > > Code point and code unit are each two word phrases, not single words. Lets not smash them together into one word. uNless there is a good reason.
Done.
>> Source/WTF/wtf/text/StringView.h:40 >> +class StringViewCodepoints { > > I think this should be a member class. So it would be declared inside the StringView class. > > class StringView { > ... > class CodePoints; > > Then later, like this: > > class StringView::CodePoints { > ... > };
Done. I really like the translation of StringViewCodePoints to StringView::CodePoints. There is a way to describe exactly what I'm trying to describe!
>> Source/WTF/wtf/text/StringView.h:47 >> + bool operator!=(const Iterator&) const; > > I think we should supply both == and != operators.
Done.
>> Source/WTF/wtf/text/StringView.h:57 >> + StringViewCodepoints(const StringView&); > > Might want this to be explicit.
Done.
>> Source/WTF/wtf/text/StringView.h:60 >> +private: > > Need a blank line before this.
Done.
>> Source/WTF/wtf/text/StringView.h:240 >> + } > > Please put the bodies of these functions outside the class. I have a patch coming soon that does this for all StringView member functions.
Done.
>> Source/WTF/wtf/text/StringView.h:348 >> + m_cacheIsValid = false; > > Would read better with early return.
Done.
>> Tools/TestWebKitAPI/Tests/WTF/StringView.cpp:158 >> + for (auto codepoint : StringView(builder.toString()).codepoints()) { > > I think this code would crash if it was run under a memory debugger; the StringView won’t stay alive.
I tested this out and you're right. I've made the internal member a StringView instead of a const StringView&.
Myles C. Maxfield
Comment 4
2014-09-28 21:53:23 PDT
Created
attachment 238839
[details]
Patch
Myles C. Maxfield
Comment 5
2014-09-28 22:37:56 PDT
Created
attachment 238840
[details]
Patch
Myles C. Maxfield
Comment 6
2014-09-28 22:41:41 PDT
Created
attachment 238841
[details]
Patch
Myles C. Maxfield
Comment 7
2014-09-28 22:43:30 PDT
Created
attachment 238842
[details]
Patch
Said Abou-Hallawa
Comment 8
2014-09-29 10:29:38 PDT
I have two comments on this change: 1. There is no way to use this code to traverse the string backward. The std::string creates reverse beginning and reverse end iterators which can be used to traverse the string backward. Maybe you need to do the same for your iterators. Or maybe we can add operator--() to your iterators. 2. It was had for me to keep track to with your code in StringView::CodePoints::Iterator::operator++() and StringView::CodePoints::Iterator::operator*(). It seems you have two different cases: is8Bit() and !is8Bit(). When is8Bit() is true you do not need to cache the values m_nextIndexCache and m_codepointCache. But you are doing the following: a) you are not initializing m_codepointCache in the constructor. I know it is guarded by having m_cacheIsValid set to false in the constructor. And m_codepointCache is going to be fixed in operator++() or operator*() if is8Bit() is false. But I think initializing it to nullptr in the constructor might make things clearer. b) In StringView::CodePoints::Iterator::operator*() and if m_stringView.is8Bit() you are ++m_nextIndexCache. This looks weird to me to have the operator*() increments the index cache in this cache. And actually it can produce bugs: Consider this example: StringView strView = StringView(String("ABC")).codePoints(); Iterator it = strView.begin(); std::cout << *it; std::cout << *it; it++; std::cout << *it; Result will be: AAC Expected: AAB The reason is in operator*() we moved the m_nextIndexCache twice and in operator++() we set the m_index to m_nextIndexCache which was '2'. c) I would suggest in StringView::CodePoints::Iterator::operator*() and StringView::CodePoints::Iterator::operator++() to handle the cases is8Bit() and !is8Bit() in completely different code paths since when is8Bit() is true, you do not need the cache and when is8Bit() is true you need the cache. I personally like the code when it looks like a tree: The code starts and diverges and keep diverging. But I got confused when it looks like a diamond: The code starts, diverge and then converge again. The code in these two functions looks like a diamond.
Myles C. Maxfield
Comment 9
2014-09-29 11:27:56 PDT
Comment on
attachment 238842
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=238842&action=review
> Source/WTF/wtf/text/StringView.h:387 > + ++m_nextIndexCache;
This is wrong if operator++() has been called multiple times since the last call to operator*()
Myles C. Maxfield
Comment 10
2014-09-29 12:14:23 PDT
Replies inline. (In reply to
comment #8
)
> I have two comments on this change: > > 1. There is no way to use this code to traverse the string backward. The std::string creates reverse beginning and reverse end iterators which can be used to traverse the string backward. Maybe you need to do the same for your iterators. Or maybe we can add operator--() to your iterators.
Yep. That is a non-goal. If we need this in the future, we can add it in.
> > 2. It was had for me to keep track to with your code in StringView::CodePoints::Iterator::operator++() and StringView::CodePoints::Iterator::operator*(). It seems you have two different cases: is8Bit() and !is8Bit(). When is8Bit() is true you do not need to cache the values m_nextIndexCache and m_codepointCache. But you are doing the following: > > a) you are not initializing m_codepointCache in the constructor. I know it is guarded by having m_cacheIsValid set to false in the constructor. And m_codepointCache is going to be fixed in operator++() or operator*() if is8Bit() is false. But I think initializing it to nullptr in the constructor might make things clearer.
The m_cacheIsValid should be enough, but it's cheap enough to do this that I'll do it for readability.
> b) In StringView::CodePoints::Iterator::operator*() and if m_stringView.is8Bit() you are ++m_nextIndexCache. This looks weird to me to have the operator*() increments the index cache in this cache. And actually it can produce bugs: Consider this example: > StringView strView = StringView(String("ABC")).codePoints(); > Iterator it = strView.begin(); > std::cout << *it; > std::cout << *it; > it++; > std::cout << *it; > > Result will be: > AAC > > Expected: > AAB > > The reason is in operator*() we moved the m_nextIndexCache twice and in operator++() we set the m_index to m_nextIndexCache which was '2'.
Line 390 makes this not possible. However, you bring up a good point - I don't have any tests regarding calling operator++() or operator*() multiple times in a row. I've added some tests.
> c) I would suggest in StringView::CodePoints::Iterator::operator*() and StringView::CodePoints::Iterator::operator++() to handle the cases is8Bit() and !is8Bit() in completely different code paths since when is8Bit() is true, you do not need the cache and when is8Bit() is true you need the cache. I personally like the code when it looks like a tree: The code starts and diverges and keep diverging. But I got confused when it looks like a diamond: The code starts, diverge and then converge again. The code in these two functions looks like a diamond.
I agree with your premise but disagree with your conclusion. I don't think that having four iterator classes would be worth the amount of complexity that could be avoided. There are only 4 variables inside CodePoints, and each of the 4 is a (variation on an) int.
Myles C. Maxfield
Comment 11
2014-09-29 12:18:20 PDT
(In reply to
comment #8
)
> I have two comments on this change: > > 1. There is no way to use this code to traverse the string backward. The std::string creates reverse beginning and reverse end iterators which can be used to traverse the string backward. Maybe you need to do the same for your iterators. Or maybe we can add operator--() to your iterators. > > 2. It was had for me to keep track to with your code in StringView::CodePoints::Iterator::operator++() and StringView::CodePoints::Iterator::operator*(). It seems you have two different cases: is8Bit() and !is8Bit(). When is8Bit() is true you do not need to cache the values m_nextIndexCache and m_codepointCache. But you are doing the following: > > a) you are not initializing m_codepointCache in the constructor. I know it is guarded by having m_cacheIsValid set to false in the constructor. And m_codepointCache is going to be fixed in operator++() or operator*() if is8Bit() is false. But I think initializing it to nullptr in the constructor might make things clearer. > b) In StringView::CodePoints::Iterator::operator*() and if m_stringView.is8Bit() you are ++m_nextIndexCache. This looks weird to me to have the operator*() increments the index cache in this cache. And actually it can produce bugs: Consider this example: > StringView strView = StringView(String("ABC")).codePoints(); > Iterator it = strView.begin(); > std::cout << *it; > std::cout << *it; > it++; > std::cout << *it; > > Result will be: > AAC > > Expected: > AAB > > The reason is in operator*() we moved the m_nextIndexCache twice and in operator++() we set the m_index to m_nextIndexCache which was '2'. > c) I would suggest in StringView::CodePoints::Iterator::operator*() and StringView::CodePoints::Iterator::operator++() to handle the cases is8Bit() and !is8Bit() in completely different code paths since when is8Bit() is true, you do not need the cache and when is8Bit() is true you need the cache. I personally like the code when it looks like a tree: The code starts and diverges and keep diverging. But I got confused when it looks like a diamond: The code starts, diverge and then converge again. The code in these two functions looks like a diamond.
I've added some ASSERTs that should make it more obvious what is going on.
Myles C. Maxfield
Comment 12
2014-09-29 12:18:44 PDT
Created
attachment 238865
[details]
Patch
Myles C. Maxfield
Comment 13
2014-09-30 09:48:12 PDT
Created
attachment 238931
[details]
Patch
Darin Adler
Comment 14
2014-10-02 16:17:54 PDT
Comment on
attachment 238931
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=238931&action=review
review- because we don’t want StringView == StringView to do what is coded here.
> Source/WTF/wtf/text/StringView.h:41 > + class CodePoints; > + class CodeUnits;
I suggest defining these right before they are used instead of at the top of the class.
> Source/WTF/wtf/text/StringView.h:188 > + bool operator==(const StringView&) const; > + bool operator!=(const StringView&) const;
These operators should be implemented as free functions, not member functions. If these are member functions, then no type conversions can be done on the left side of the == and !=, but if they are free functions they can be.
> Source/WTF/wtf/text/StringView.h:247 > +#if !ASSERT_DISABLED > + mutable Mode m_mode; > +#endif
I think a boolean named: bool m_alreadyIncremented; or something like that, would be clearer than this enum, which I find mysterious.
> Source/WTF/wtf/text/StringView.h:339 > +inline bool StringView::operator==(const StringView& other) const > +{ > + return m_characters == other.m_characters && m_length == other.m_length; > +}
I don’t think it’s good idea to have a StringView == operator that checks identity rather than contents of the strings. I think an == operator on two string views should compare the actual characters, not the pointers. And should return true if the two views look at two different ranges of memory with the same characters, even if one is 8-bit and the other 16-bit. Or we could just not have an == operator, but having one with this meaning seems like a bad idea.
> Source/WTF/wtf/text/StringView.h:396 > + return m_stringView == other.m_stringView && m_index == other.m_index;
This should assert that &m_stringView == &other.m_stringView and should not look at m_stringView at all in non-debug builds.
> Source/WTF/wtf/text/StringView.h:438 > + return m_stringView == other.m_stringView && m_index == other.m_index;
This should assert that &m_stringView == &other.m_stringView and should not look at m_stringView at all in non-debug builds.
Myles C. Maxfield
Comment 15
2014-10-02 17:01:50 PDT
Comment on
attachment 238931
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=238931&action=review
>> Source/WTF/wtf/text/StringView.h:41 >> + class CodeUnits; > > I suggest defining these right before they are used instead of at the top of the class.
Done.
>> Source/WTF/wtf/text/StringView.h:188 >> + bool operator!=(const StringView&) const; > > These operators should be implemented as free functions, not member functions. If these are member functions, then no type conversions can be done on the left side of the == and !=, but if they are free functions they can be.
Given your comment below, I think providing these operators is out of scope for this patch.
>> Source/WTF/wtf/text/StringView.h:247 >> +#endif > > I think a boolean named: > > bool m_alreadyIncremented; > > or something like that, would be clearer than this enum, which I find mysterious.
Done.
>> Source/WTF/wtf/text/StringView.h:339 >> +} > > I don’t think it’s good idea to have a StringView == operator that checks identity rather than contents of the strings. I think an == operator on two string views should compare the actual characters, not the pointers. And should return true if the two views look at two different ranges of memory with the same characters, even if one is 8-bit and the other 16-bit. > > Or we could just not have an == operator, but having one with this meaning seems like a bad idea.
Given your comment below, I think providing an == operator is out of scope for this patch. There are different ways to look at the characters inside a StringView: we could compare them code point by code point, or we could use ICU to canonicalize the strings and then compare the canonicalized versions. We could compare code units instead if we wanted to avoid the U16_NEXT() and know that both strings are in UTF-16.
>> Source/WTF/wtf/text/StringView.h:396 >> + return m_stringView == other.m_stringView && m_index == other.m_index; > > This should assert that &m_stringView == &other.m_stringView and should not look at m_stringView at all in non-debug builds.
Done.
>> Source/WTF/wtf/text/StringView.h:438 >> + return m_stringView == other.m_stringView && m_index == other.m_index; > > This should assert that &m_stringView == &other.m_stringView and should not look at m_stringView at all in non-debug builds.
Done.
Myles C. Maxfield
Comment 16
2014-10-02 17:31:09 PDT
Created
attachment 239162
[details]
Patch
Darin Adler
Comment 17
2014-10-02 17:36:15 PDT
Comment on
attachment 239162
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=239162&action=review
> Source/WTF/wtf/text/StringView.h:223 > + mutable bool m_alreadyIncremented;
I guess this isn’t as clear as I thought it would be, because I meant exactly the opposite of what you did!
> Source/WTF/wtf/text/StringView.h:389 > + m_alreadyIncremented = false;
My idea is that we would be setting this to true here, because as part of the * operator, we “already incremented” the iterator.
Darin Adler
Comment 18
2014-10-03 09:08:20 PDT
Comment on
attachment 239162
[details]
Patch This is OK, but I think the m_alreadyIncremented boolean is confusing. What I proposed was actually the opposite of what you wrote, showing that the two of us don’t even agree on this!
Myles C. Maxfield
Comment 19
2014-10-03 09:11:32 PDT
(In reply to
comment #18
)
> (From update of
attachment 239162
[details]
) > This is OK, but I think the m_alreadyIncremented boolean is confusing. What I proposed was actually the opposite of what you wrote, showing that the two of us don’t even agree on this!
Interesting. I thought you meant I should make line 380 set it to true because, once you've run the increment operator, the iterator has already been incremented, so you can't increment it again.
Darin Adler
Comment 20
2014-10-03 09:17:04 PDT
(In reply to
comment #19
)
> (In reply to
comment #18
) > > This is OK, but I think the m_alreadyIncremented boolean is confusing. What I proposed was actually the opposite of what you wrote, showing that the two of us don’t even agree on this! > > Interesting. I thought you meant I should make line 380 set it to true because, once you've run the increment operator, the iterator has already been incremented, so you can't increment it again.
Yes, makes logical sense but is not what I meant. I meant that once you use the * operator the iterator is in an unusual state. It has already been incremented, so when you then actually use the ++ operator you don’t have to do any incrementing. Further, it’s not safe to do operations like == on it when it’s in this “already incremented” state because they will yield incorrect results. So, critically, StringView::CodePoints::Iterator::operator== is missing assertions that check that neither of the two iterators is in this unusual “already incremented” state, whatever we call the boolean.
Myles C. Maxfield
Comment 21
2014-10-03 09:34:44 PDT
(In reply to
comment #20
)
> (In reply to
comment #19
) > > (In reply to
comment #18
) > > > This is OK, but I think the m_alreadyIncremented boolean is confusing. What I proposed was actually the opposite of what you wrote, showing that the two of us don’t even agree on this! > > > > Interesting. I thought you meant I should make line 380 set it to true because, once you've run the increment operator, the iterator has already been incremented, so you can't increment it again. > > Yes, makes logical sense but is not what I meant. > > I meant that once you use the * operator the iterator is in an unusual state. It has already been incremented, so when you then actually use the ++ operator you don’t have to do any incrementing. Further, it’s not safe to do operations like == on it when it’s in this “already incremented” state because they will yield incorrect results. So, critically, StringView::CodePoints::Iterator::operator== is missing assertions that check that neither of the two iterators is in this unusual “already incremented” state, whatever we call the boolean.
Ah, I understand. Your thinking of the term "increment" is referring to the internal m_index variable, and my thinking of the term "increment" is referring to the operator++ API call. I'll switch the boolean (and add the missing ASSERTs).
Myles C. Maxfield
Comment 22
2014-10-03 10:58:11 PDT
http://trac.webkit.org/changeset/174271
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug