Bug 187864 - WTF::StringView::split should have an allowEmptyEntries flag
Summary: WTF::StringView::split should have an allowEmptyEntries flag
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ross Kirsling
URL:
Keywords: InRadar
Depends on:
Blocks: 187859
  Show dependency treegraph
 
Reported: 2018-07-20 10:39 PDT by Ross Kirsling
Modified: 2018-07-24 10:35 PDT (History)
10 users (show)

See Also:


Attachments
Patch (7.42 KB, patch)
2018-07-20 10:54 PDT, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch (7.50 KB, patch)
2018-07-20 13:36 PDT, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch (7.49 KB, patch)
2018-07-20 21:56 PDT, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch (7.69 KB, patch)
2018-07-23 10:07 PDT, Ross Kirsling
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ross Kirsling 2018-07-20 10:39:20 PDT
WTF::String::split takes an allowEmptyEntries flag -- StringView should follow suit.
Comment 1 Ross Kirsling 2018-07-20 10:54:34 PDT
Created attachment 345461 [details]
Patch
Comment 2 Konstantin Tokarev 2018-07-20 11:07:39 PDT
I think this is not the best API, because by looking at call site like

 a.split(' ', true)

you won't have a slightest idea what does it mean until you look up implementation. In Qt we call this "boolean parameter trap" [1]. It would be better to use enum value, or other kind of named and typed flag

[1] https://wiki.qt.io/API_Design_Principles#The_Boolean_Parameter_Trap
Comment 3 Ross Kirsling 2018-07-20 11:11:40 PDT
(In reply to Konstantin Tokarev from comment #2)
> I think this is not the best API, because by looking at call site like
> 
>  a.split(' ', true)
> 
> you won't have a slightest idea what does it mean until you look up
> implementation. In Qt we call this "boolean parameter trap" [1]. It would be
> better to use enum value, or other kind of named and typed flag
> 
> [1] https://wiki.qt.io/API_Design_Principles#The_Boolean_Parameter_Trap

That's a fair point, but I was assuming we would want the API to align with String::split:
https://github.com/WebKit/webkit/blob/master/Source/WTF/wtf/text/WTFString.cpp#L705
Comment 4 Konstantin Tokarev 2018-07-20 11:14:39 PDT
I think it would be better to fix String::split as well, though the latter is a bit more readable: split(separator, result) vs split(separator, true, result)
Comment 5 Ross Kirsling 2018-07-20 13:36:53 PDT
Created attachment 345478 [details]
Patch
Comment 6 Ross Kirsling 2018-07-20 21:56:19 PDT
Created attachment 345504 [details]
Patch
Comment 7 Fujii Hironori 2018-07-22 23:04:23 PDT
Assertion fails in Debug build.

> ASSERTION FAILED: m_position < m_result.m_string.length()
> c:\webkit\ga\webkitbuild\debug\derivedsources\forwardingheaders\wtf\text\stringview.h(946) : WTF::StringView::SplitResult::Iterator::operator *
Comment 8 Ross Kirsling 2018-07-23 10:07:38 PDT
Created attachment 345579 [details]
Patch
Comment 9 Ross Kirsling 2018-07-23 10:09:29 PDT
(In reply to Fujii Hironori from comment #7)
> Assertion fails in Debug build.
> 
> > ASSERTION FAILED: m_position < m_result.m_string.length()
> > c:\webkit\ga\webkitbuild\debug\derivedsources\forwardingheaders\wtf\text\stringview.h(946) : WTF::StringView::SplitResult::Iterator::operator *

Whoops, thanks! Assertion updated.
Comment 10 Konstantin Tokarev 2018-07-23 10:59:59 PDT
Comment on attachment 345579 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=345579&action=review

> Source/WTF/wtf/text/StringView.h:953
> +    return m_position == other.m_position && m_isDone == other.m_isDone;

This looks suspicious to me. 2 iterators may have same position but differ in m_isDone value only if they have result.m_allowEmptyEntries

Shouldn't we compare that value instead of m_isDone? May iterators with different allowEmptyEntries be ever considered equal? What is the usage context of this operator?
Comment 11 Konstantin Tokarev 2018-07-23 11:04:03 PDT
Comment on attachment 345579 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=345579&action=review

>> Source/WTF/wtf/text/StringView.h:953
>> +    return m_position == other.m_position && m_isDone == other.m_isDone;
> 
> This looks suspicious to me. 2 iterators may have same position but differ in m_isDone value only if they have result.m_allowEmptyEntries
> 
> Shouldn't we compare that value instead of m_isDone? May iterators with different allowEmptyEntries be ever considered equal? What is the usage context of this operator?

Actually, it seems that m_isDone affects control flow only when assers are enabled, this means that it should be guarded with #ifndef NDEBUG, and it should be excluded from operator==
Comment 12 Ross Kirsling 2018-07-23 11:16:40 PDT
(In reply to Konstantin Tokarev from comment #11)
> Comment on attachment 345579 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=345579&action=review
> 
> >> Source/WTF/wtf/text/StringView.h:953
> >> +    return m_position == other.m_position && m_isDone == other.m_isDone;
> > 
> > This looks suspicious to me. 2 iterators may have same position but differ in m_isDone value only if they have result.m_allowEmptyEntries
> > 
> > Shouldn't we compare that value instead of m_isDone? May iterators with different allowEmptyEntries be ever considered equal? What is the usage context of this operator?
> 
> Actually, it seems that m_isDone affects control flow only when assers are
> enabled, this means that it should be guarded with #ifndef NDEBUG, and it
> should be excluded from operator==

No, the asserts are just asserts; I simply forgot to update them, as I was in Release...

m_isDone exists to prevent having begin() == end() one iteration too early when the input string is either empty or ends in a separator. That is, when we reach the point where m_position == length(), we still have one more empty entry that must be returned. Thus m_position alone cannot suffice as an end condition.
Comment 13 Konstantin Tokarev 2018-07-23 14:57:13 PDT
Comment on attachment 345579 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=345579&action=review

>>>> Source/WTF/wtf/text/StringView.h:953
>>>> +    return m_position == other.m_position && m_isDone == other.m_isDone;
>>> 
>>> This looks suspicious to me. 2 iterators may have same position but differ in m_isDone value only if they have result.m_allowEmptyEntries
>>> 
>>> Shouldn't we compare that value instead of m_isDone? May iterators with different allowEmptyEntries be ever considered equal? What is the usage context of this operator?
>> 
>> Actually, it seems that m_isDone affects control flow only when assers are enabled, this means that it should be guarded with #ifndef NDEBUG, and it should be excluded from operator==
> 
> No, the asserts are just asserts; I simply forgot to update them, as I was in Release...
> 
> m_isDone exists to prevent having begin() == end() one iteration too early when the input string is either empty or ends in a separator. That is, when we reach the point where m_position == length(), we still have one more empty entry that must be returned. Thus m_position alone cannot suffice as an end condition.

Thanks for explanation, now I have better idea of what's going on here. However, I don't get why empty string (corresponding to result.m_string.isEmpty()) should produce any entries at all - it doesn't have any separators.
Comment 14 Ross Kirsling 2018-07-23 15:31:02 PDT
(In reply to Konstantin Tokarev from comment #13)
> Comment on attachment 345579 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=345579&action=review
> 
> >>>> Source/WTF/wtf/text/StringView.h:953
> >>>> +    return m_position == other.m_position && m_isDone == other.m_isDone;
> >>> 
> >>> This looks suspicious to me. 2 iterators may have same position but differ in m_isDone value only if they have result.m_allowEmptyEntries
> >>> 
> >>> Shouldn't we compare that value instead of m_isDone? May iterators with different allowEmptyEntries be ever considered equal? What is the usage context of this operator?
> >> 
> >> Actually, it seems that m_isDone affects control flow only when assers are enabled, this means that it should be guarded with #ifndef NDEBUG, and it should be excluded from operator==
> > 
> > No, the asserts are just asserts; I simply forgot to update them, as I was in Release...
> > 
> > m_isDone exists to prevent having begin() == end() one iteration too early when the input string is either empty or ends in a separator. That is, when we reach the point where m_position == length(), we still have one more empty entry that must be returned. Thus m_position alone cannot suffice as an end condition.
> 
> Thanks for explanation, now I have better idea of what's going on here.
> However, I don't get why empty string (corresponding to
> result.m_string.isEmpty()) should produce any entries at all - it doesn't
> have any separators.

I'm just following the standard behavior of string split in every language I am aware of (including WTF::String::split and boost::algorithm::split).
The rationale is that we want "abc".split(',') to return { "abc" } (just the original string), so that sort of implies that "".split(',') should return { "" }.
Comment 15 WebKit Commit Bot 2018-07-23 17:01:55 PDT
Comment on attachment 345579 [details]
Patch

Clearing flags on attachment: 345579

Committed r234122: <https://trac.webkit.org/changeset/234122>
Comment 16 WebKit Commit Bot 2018-07-23 17:01:57 PDT
All reviewed patches have been landed.  Closing bug.
Comment 17 Radar WebKit Bug Importer 2018-07-23 17:02:46 PDT
<rdar://problem/42521814>
Comment 18 Darin Adler 2018-07-23 17:38:02 PDT
I agree that using a boolean here is not good style and I am unhappy to see this pattern repeated from WTF::String rather than fixing it in both places.
Comment 19 Darin Adler 2018-07-23 17:39:28 PDT
(In reply to Darin Adler from comment #18)
> I agree that using a boolean here is not good style and I am unhappy to see
> this pattern repeated from WTF::String rather than fixing it in both places.

Oh, I see, we used a named argument instead of a boolean -- better. However, I think we should simply have used a new function named splitAllowingEmptyEntries.
Comment 20 Ross Kirsling 2018-07-23 17:43:15 PDT
(In reply to Darin Adler from comment #19)
> (In reply to Darin Adler from comment #18)
> > I agree that using a boolean here is not good style and I am unhappy to see
> > this pattern repeated from WTF::String rather than fixing it in both places.
> 
> Oh, I see, we used a named argument instead of a boolean -- better. However,
> I think we should simply have used a new function named
> splitAllowingEmptyEntries.

I'd be happy to address this for String and StringView alike in a new ticket -- would that be okay?
Comment 21 Ross Kirsling 2018-07-24 10:35:59 PDT
(In reply to Ross Kirsling from comment #20)
> I'd be happy to address this for String and StringView alike in a new ticket
> -- would that be okay?

Created https://bugs.webkit.org/show_bug.cgi?id=187963 for this purpose.