Bug 109784 - String(Vector) behaves differently from String(vector.data(), vector.size()) for vectors with inline capacity in the size=0 case
Summary: String(Vector) behaves differently from String(vector.data(), vector.size()) ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on: 109638
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-13 22:16 PST by Eric Seidel (no email)
Modified: 2013-02-14 12:03 PST (History)
9 users (show)

See Also:


Attachments
Patch (9.52 KB, patch)
2013-02-13 22:23 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (2.68 KB, patch)
2013-02-14 01:31 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch for landing (2.68 KB, patch)
2013-02-14 01:38 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Back to the original fix, now with more context (2.73 KB, patch)
2013-02-14 10:26 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2013-02-13 22:16:43 PST
REGRESSION(r142712): attribute values show up as "(null)" instead of null with the threaded parser
Comment 1 Eric Seidel (no email) 2013-02-13 22:23:22 PST
Created attachment 188263 [details]
Patch
Comment 2 Eric Seidel (no email) 2013-02-13 22:24:27 PST
I stab you in the face, visual studio.
Comment 3 Eric Seidel (no email) 2013-02-13 22:29:28 PST
I've asked on webkit-dev about the sln issue:
https://lists.webkit.org/pipermail/webkit-dev/2013-February/023844.html
Comment 4 Benjamin Poulain 2013-02-13 23:21:55 PST
Comment on attachment 188263 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188263&action=review

> Source/WTF/wtf/text/WTFString.h:115
> +    // If the Vector is size == 0, this will produce an empty string.
>      template<size_t inlineCapacity>
>      explicit String(const Vector<UChar, inlineCapacity>&);

Ouch.
I considered changing adopt() to work the other way around.

It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion.

Why do you think empty() is the correct output?
Shouldn't the specific call site be changed instead? Is
Comment 5 Eric Seidel (no email) 2013-02-14 01:02:02 PST
(In reply to comment #4)
> (From update of attachment 188263 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=188263&action=review
> 
> > Source/WTF/wtf/text/WTFString.h:115
> > +    // If the Vector is size == 0, this will produce an empty string.
> >      template<size_t inlineCapacity>
> >      explicit String(const Vector<UChar, inlineCapacity>&);
> 
> Ouch.
> I considered changing adopt() to work the other way around.
> 
> It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion.

String(UChar*, size_t) already returns the empty string when size ==0:

// Construct a string with UTF-16 data.
String::String(const UChar* characters, unsigned length)
    : m_impl(characters ? StringImpl::create(characters, length) : 0)
{
}

So this change makes the behavior consistent.

That was infact the bug.  I changed callsites which used to use String(vector.data(), vector.size()) to use String(vector) and tests started failing (for the threaded parser).  This patch fixes that.

> Why do you think empty() is the correct output?

Vector has no concept of null.  I'm really just trying to make String(vector) match String(vector.data(), vector.size()), since that seems to me they should match. :)

> Shouldn't the specific call site be changed instead?

see above.
Comment 6 Benjamin Poulain 2013-02-14 01:05:32 PST
> > It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion.
> 
> String(UChar*, size_t) already returns the empty string when size ==0:
> 
> // Construct a string with UTF-16 data.
> String::String(const UChar* characters, unsigned length)
>     : m_impl(characters ? StringImpl::create(characters, length) : 0)
> {
> }
> 
> So this change makes the behavior consistent.

Not when the length is zero, when characters is zero!
Comment 7 Eric Seidel (no email) 2013-02-14 01:07:55 PST
(In reply to comment #6)
> > > It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion.
> > 
> > String(UChar*, size_t) already returns the empty string when size ==0:
> > 
> > // Construct a string with UTF-16 data.
> > String::String(const UChar* characters, unsigned length)
> >     : m_impl(characters ? StringImpl::create(characters, length) : 0)
> > {
> > }
> > 
> > So this change makes the behavior consistent.
> 
> Not when the length is zero, when characters is zero!

Correct.  That only returns *null* when characters are zero, but returns *empty* when length is zero. :)  That's what I"m changing String(Vector) to do. :)  Before this patch, they're inconsistent.  STring(Vector) in TOT will return a null string when the vector has size 0.
Comment 8 Eric Seidel (no email) 2013-02-14 01:09:36 PST
Vector can never produce a null .data(), so String(vector.data(), vector.size()) can never return a null string.  Just like what I'm changing String(Vector) to do with this patch.
Comment 9 Benjamin Poulain 2013-02-14 01:10:37 PST
(In reply to comment #8)
> Vector can never produce a null .data(), so String(vector.data(), vector.size()) can never return a null string.  Just like what I'm changing String(Vector) to do with this patch.

Oh, really, I got that wrong about vector. 
I thought data() of an empty vector was 0.
Comment 10 Eric Seidel (no email) 2013-02-14 01:13:39 PST
(In reply to comment #9)
> (In reply to comment #8)
> > Vector can never produce a null .data(), so String(vector.data(), vector.size()) can never return a null string.  Just like what I'm changing String(Vector) to do with this patch.
> 
> Oh, really, I got that wrong about vector. 
> I thought data() of an empty vector was 0.

Hmm..  You may be right.

Looking at VectorBufferBase, it appears m_buffer can be null.

Maybe I'm barking up the wrong tree.  Will look again in the morning.
Comment 11 Eric Seidel (no email) 2013-02-14 01:16:12 PST
It's possible that Vectors with inline capacity can never have a data() which is null.

If so, that may be a "bug" in vector, that data() is non-null for inline vectors.  Or at least that would explain this behavior difference.
Comment 12 Benjamin Poulain 2013-02-14 01:17:38 PST
> Hmm..  You may be right.
> 
> Looking at VectorBufferBase, it appears m_buffer can be null.
> 
> Maybe I'm barking up the wrong tree.  Will look again in the morning.

Skimming over Vector: your data non-zero because its inline capacity is non zero.

This is messed up, I don't have an easy answer here for the Vector->String conversion :(
Comment 13 Eric Seidel (no email) 2013-02-14 01:18:20 PST
Yeah.  It appears that Vectors of size 0 with an inline capacity will never have a null data(), but Vectors without inline capacity and size 0 will.

To fix this, we should just key off of vector.data() instead, and then the behavior will match String(UChar*, size_t) exactly. :)
Comment 14 Eric Seidel (no email) 2013-02-14 01:23:09 PST
I've filed bug 109792 to track the Vector oddity.  I'll post an updated patch now.
Comment 15 Benjamin Poulain 2013-02-14 01:28:51 PST
Pretty funny we could be arguing while being both right. Silly Vector :)
Comment 16 Eric Seidel (no email) 2013-02-14 01:31:17 PST
Created attachment 188282 [details]
Patch
Comment 17 Eric Seidel (no email) 2013-02-14 01:33:19 PST
We were just emphatically agreeing with one another. :)
Comment 18 Benjamin Poulain 2013-02-14 01:33:32 PST
Comment on attachment 188282 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188282&action=review

> Source/WTF/ChangeLog:20
> +        between threads).  The main-thread parser path uses AtomicHTMLToken which

Two spaces after the period.
Comment 19 Eric Seidel (no email) 2013-02-14 01:38:54 PST
Created attachment 188288 [details]
Patch for landing
Comment 20 Benjamin Poulain 2013-02-14 01:47:26 PST
To come back to the problem in general:

Usually, I am strongly against reading the pointer of an array/vector if the passed length is 0.

For String, it just works well in practice that creating a String without a buffer (characters = 0) means there are no characters->null strings.
The problem is this behavior should be kept consistent with String(pointer, length), and we end up with oddities like checking the pointer before the length.

I start to think it may be a better choice to have String(char*) and String(char*, size_t) return empty strings. It would become the job of the caller to create null strings if necessary. A recent discussion with Darin also put the idea in my head that the implicit creation of null String can lead to unfortunate side effects.

I tend to like null string because they are really really cheap to pass around code path. But problems like this patch make me think we may be abusing them in WebKit.
Comment 21 WebKit Review Bot 2013-02-14 02:24:56 PST
Comment on attachment 188288 [details]
Patch for landing

Clearing flags on attachment: 188288

Committed r142863: <http://trac.webkit.org/changeset/142863>
Comment 22 WebKit Review Bot 2013-02-14 02:25:01 PST
All reviewed patches have been landed.  Closing bug.
Comment 23 Eric Seidel (no email) 2013-02-14 09:47:34 PST
(In reply to comment #20)
> To come back to the problem in general:
> 
> Usually, I am strongly against reading the pointer of an array/vector if the passed length is 0.
> 
> For String, it just works well in practice that creating a String without a buffer (characters = 0) means there are no characters->null strings.
> The problem is this behavior should be kept consistent with String(pointer, length), and we end up with oddities like checking the pointer before the length.
> 
> I start to think it may be a better choice to have String(char*) and String(char*, size_t) return empty strings. It would become the job of the caller to create null strings if necessary. A recent discussion with Darin also put the idea in my head that the implicit creation of null String can lead to unfortunate side effects.

That seems totally reasonable to me.

> I tend to like null string because they are really really cheap to pass around code path. But problems like this patch make me think we may be abusing them in WebKit.

Agreed.
Comment 24 Darin Adler 2013-02-14 09:48:50 PST
I don’t think this is the best fix. I suggest we always create empty strings, not null strings, in this constructor.
Comment 25 Darin Adler 2013-02-14 09:50:21 PST
We should still check size, not data, but we should construct an empty string, not a null string, in that case. It’s a bad idea to behave differently based on what the data pointer returns for an empty vector. That leads to the confusing FIXME in the header and it’s not what we want for these classes.
Comment 26 Darin Adler 2013-02-14 09:58:00 PST
We can continue to discuss whether to remove the "create a null string if passed a null pointer" behavior of the pointer-based versions of these functions. But for the vector version, we should never create a null string since there is no such thing as a null vector.

The patch we landed accidentally invents the concept of a null vector because if the the vector's data is 0 it creates a null string. We don’t want that for vectors.

I understand that the old code glued together vectors and strings in the obvious way, and therefore it also had that strange behavior of inventing the concept of a null vector. And I agree with Ben that one way to fix that is to remove the String constructor function concept of making a null string if passed a null character pointer. But we can and should fix the vector part of this now while we continue that discussion.
Comment 27 Eric Seidel (no email) 2013-02-14 10:00:00 PST
It sounds like Darin is advocating for the first patch on this bug, which checked size and returned empty().  What we landed matches previous String(UChar*, size_t) behavior, but I agree, is confusing.  I'm totally happy to see both String(Vector) and STring(UChar*, size_t) move to returning empty() in the length=0 case and never null.
Comment 28 Eric Seidel (no email) 2013-02-14 10:02:03 PST
I'll post a patch to fix String(Vector) to always return empty on size=0 shortly.
Comment 29 Darin Adler 2013-02-14 10:05:20 PST
(In reply to comment #27)
> It sounds like Darin is advocating for the first patch on this bug, which checked size and returned empty().

Yes.

> What we landed matches previous String(UChar*, size_t) behavior, but I agree, is confusing.

It matches String(UChar*, size_t) behavior iff you are also assuming you’d pass vector’s data and size through directly.

But the point is that we want to correctly convert an empty vector, and that requires more than just fetching the data and size and passing them through. In my mind this is a separate issue from the design of String(const UChar*, size_t).

> I'm totally happy to see both String(Vector) and String(UChar*, size_t) move to returning empty() in the length=0 case and never null.

I am certain we should correct String(Vector) so it correctly makes an empty string from an empty vector, even if the call sites inside the tokenizer just passed through the data and size before. There’s a higher standard for a shared WTF function than for local code in one part of WebKit.

I am not sure that removing the “you can get a null string by passing a null pointer” behavior from String(UChar*, size_t) is the right thing to do and we should reflect on that more before making a change.

One tiny factor is that, as Ben points out, constructing an empty string is more costly than constructing a null string. And given the inlining, there’s even a bit of code bloat factor.
Comment 30 Eric Seidel (no email) 2013-02-14 10:26:11 PST
Reopening to attach new patch.
Comment 31 Eric Seidel (no email) 2013-02-14 10:26:13 PST
Created attachment 188378 [details]
Back to the original fix, now with more context
Comment 32 Darin Adler 2013-02-14 10:38:15 PST
Comment on attachment 188378 [details]
Back to the original fix, now with more context

View in context: https://bugs.webkit.org/attachment.cgi?id=188378&action=review

> Source/WTF/wtf/text/WTFString.h:118
> +    // NOTE: This is different from String(vector.data(), vector.size())
> +    // which will sometimes return a null string when vector.data() is null
> +    // which can only occur for vectors without inline capacity.
> +    // See: https://bugs.webkit.org/show_bug.cgi?id=109792

I personally would not include this note, but I’m OK leaving it in.
Comment 33 WebKit Review Bot 2013-02-14 11:18:52 PST
Comment on attachment 188378 [details]
Back to the original fix, now with more context

Clearing flags on attachment: 188378

Committed r142894: <http://trac.webkit.org/changeset/142894>
Comment 34 WebKit Review Bot 2013-02-14 11:18:58 PST
All reviewed patches have been landed.  Closing bug.
Comment 35 Benjamin Poulain 2013-02-14 12:03:33 PST
(In reply to comment #32)
> (From update of attachment 188378 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=188378&action=review
> 
> > Source/WTF/wtf/text/WTFString.h:118
> > +    // NOTE: This is different from String(vector.data(), vector.size())
> > +    // which will sometimes return a null string when vector.data() is null
> > +    // which can only occur for vectors without inline capacity.
> > +    // See: https://bugs.webkit.org/show_bug.cgi?id=109792
> 
> I personally would not include this note, but I’m OK leaving it in.

This is the reason why I wanted to change everything at once.

I would prefer to have completely consistent behavior, no exception in one direction or an other.