RESOLVED FIXED Bug 109784
String(Vector) behaves differently from String(vector.data(), vector.size()) for vectors with inline capacity in the size=0 case
https://bugs.webkit.org/show_bug.cgi?id=109784
Summary String(Vector) behaves differently from String(vector.data(), vector.size()) ...
Eric Seidel (no email)
Reported 2013-02-13 22:16:43 PST
REGRESSION(r142712): attribute values show up as "(null)" instead of null with the threaded parser
Attachments
Patch (9.52 KB, patch)
2013-02-13 22:23 PST, Eric Seidel (no email)
no flags
Patch (2.68 KB, patch)
2013-02-14 01:31 PST, Eric Seidel (no email)
no flags
Patch for landing (2.68 KB, patch)
2013-02-14 01:38 PST, Eric Seidel (no email)
no flags
Back to the original fix, now with more context (2.73 KB, patch)
2013-02-14 10:26 PST, Eric Seidel (no email)
no flags
Eric Seidel (no email)
Comment 1 2013-02-13 22:23:22 PST
Eric Seidel (no email)
Comment 2 2013-02-13 22:24:27 PST
I stab you in the face, visual studio.
Eric Seidel (no email)
Comment 3 2013-02-13 22:29:28 PST
Benjamin Poulain
Comment 4 2013-02-13 23:21:55 PST
Comment on attachment 188263 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=188263&action=review > Source/WTF/wtf/text/WTFString.h:115 > + // If the Vector is size == 0, this will produce an empty string. > template<size_t inlineCapacity> > explicit String(const Vector<UChar, inlineCapacity>&); Ouch. I considered changing adopt() to work the other way around. It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion. Why do you think empty() is the correct output? Shouldn't the specific call site be changed instead? Is
Eric Seidel (no email)
Comment 5 2013-02-14 01:02:02 PST
(In reply to comment #4) > (From update of attachment 188263 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=188263&action=review > > > Source/WTF/wtf/text/WTFString.h:115 > > + // If the Vector is size == 0, this will produce an empty string. > > template<size_t inlineCapacity> > > explicit String(const Vector<UChar, inlineCapacity>&); > > Ouch. > I considered changing adopt() to work the other way around. > > It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion. String(UChar*, size_t) already returns the empty string when size ==0: // Construct a string with UTF-16 data. String::String(const UChar* characters, unsigned length) : m_impl(characters ? StringImpl::create(characters, length) : 0) { } So this change makes the behavior consistent. That was infact the bug. I changed callsites which used to use String(vector.data(), vector.size()) to use String(vector) and tests started failing (for the threaded parser). This patch fixes that. > Why do you think empty() is the correct output? Vector has no concept of null. I'm really just trying to make String(vector) match String(vector.data(), vector.size()), since that seems to me they should match. :) > Shouldn't the specific call site be changed instead? see above.
Benjamin Poulain
Comment 6 2013-02-14 01:05:32 PST
> > It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion. > > String(UChar*, size_t) already returns the empty string when size ==0: > > // Construct a string with UTF-16 data. > String::String(const UChar* characters, unsigned length) > : m_impl(characters ? StringImpl::create(characters, length) : 0) > { > } > > So this change makes the behavior consistent. Not when the length is zero, when characters is zero!
Eric Seidel (no email)
Comment 7 2013-02-14 01:07:55 PST
(In reply to comment #6) > > > It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion. > > > > String(UChar*, size_t) already returns the empty string when size ==0: > > > > // Construct a string with UTF-16 data. > > String::String(const UChar* characters, unsigned length) > > : m_impl(characters ? StringImpl::create(characters, length) : 0) > > { > > } > > > > So this change makes the behavior consistent. > > Not when the length is zero, when characters is zero! Correct. That only returns *null* when characters are zero, but returns *empty* when length is zero. :) That's what I"m changing String(Vector) to do. :) Before this patch, they're inconsistent. STring(Vector) in TOT will return a null string when the vector has size 0.
Eric Seidel (no email)
Comment 8 2013-02-14 01:09:36 PST
Vector can never produce a null .data(), so String(vector.data(), vector.size()) can never return a null string. Just like what I'm changing String(Vector) to do with this patch.
Benjamin Poulain
Comment 9 2013-02-14 01:10:37 PST
(In reply to comment #8) > Vector can never produce a null .data(), so String(vector.data(), vector.size()) can never return a null string. Just like what I'm changing String(Vector) to do with this patch. Oh, really, I got that wrong about vector. I thought data() of an empty vector was 0.
Eric Seidel (no email)
Comment 10 2013-02-14 01:13:39 PST
(In reply to comment #9) > (In reply to comment #8) > > Vector can never produce a null .data(), so String(vector.data(), vector.size()) can never return a null string. Just like what I'm changing String(Vector) to do with this patch. > > Oh, really, I got that wrong about vector. > I thought data() of an empty vector was 0. Hmm.. You may be right. Looking at VectorBufferBase, it appears m_buffer can be null. Maybe I'm barking up the wrong tree. Will look again in the morning.
Eric Seidel (no email)
Comment 11 2013-02-14 01:16:12 PST
It's possible that Vectors with inline capacity can never have a data() which is null. If so, that may be a "bug" in vector, that data() is non-null for inline vectors. Or at least that would explain this behavior difference.
Benjamin Poulain
Comment 12 2013-02-14 01:17:38 PST
> Hmm.. You may be right. > > Looking at VectorBufferBase, it appears m_buffer can be null. > > Maybe I'm barking up the wrong tree. Will look again in the morning. Skimming over Vector: your data non-zero because its inline capacity is non zero. This is messed up, I don't have an easy answer here for the Vector->String conversion :(
Eric Seidel (no email)
Comment 13 2013-02-14 01:18:20 PST
Yeah. It appears that Vectors of size 0 with an inline capacity will never have a null data(), but Vectors without inline capacity and size 0 will. To fix this, we should just key off of vector.data() instead, and then the behavior will match String(UChar*, size_t) exactly. :)
Eric Seidel (no email)
Comment 14 2013-02-14 01:23:09 PST
I've filed bug 109792 to track the Vector oddity. I'll post an updated patch now.
Benjamin Poulain
Comment 15 2013-02-14 01:28:51 PST
Pretty funny we could be arguing while being both right. Silly Vector :)
Eric Seidel (no email)
Comment 16 2013-02-14 01:31:17 PST
Eric Seidel (no email)
Comment 17 2013-02-14 01:33:19 PST
We were just emphatically agreeing with one another. :)
Benjamin Poulain
Comment 18 2013-02-14 01:33:32 PST
Comment on attachment 188282 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=188282&action=review > Source/WTF/ChangeLog:20 > + between threads). The main-thread parser path uses AtomicHTMLToken which Two spaces after the period.
Eric Seidel (no email)
Comment 19 2013-02-14 01:38:54 PST
Created attachment 188288 [details] Patch for landing
Benjamin Poulain
Comment 20 2013-02-14 01:47:26 PST
To come back to the problem in general: Usually, I am strongly against reading the pointer of an array/vector if the passed length is 0. For String, it just works well in practice that creating a String without a buffer (characters = 0) means there are no characters->null strings. The problem is this behavior should be kept consistent with String(pointer, length), and we end up with oddities like checking the pointer before the length. I start to think it may be a better choice to have String(char*) and String(char*, size_t) return empty strings. It would become the job of the caller to create null strings if necessary. A recent discussion with Darin also put the idea in my head that the implicit creation of null String can lead to unfortunate side effects. I tend to like null string because they are really really cheap to pass around code path. But problems like this patch make me think we may be abusing them in WebKit.
WebKit Review Bot
Comment 21 2013-02-14 02:24:56 PST
Comment on attachment 188288 [details] Patch for landing Clearing flags on attachment: 188288 Committed r142863: <http://trac.webkit.org/changeset/142863>
WebKit Review Bot
Comment 22 2013-02-14 02:25:01 PST
All reviewed patches have been landed. Closing bug.
Eric Seidel (no email)
Comment 23 2013-02-14 09:47:34 PST
(In reply to comment #20) > To come back to the problem in general: > > Usually, I am strongly against reading the pointer of an array/vector if the passed length is 0. > > For String, it just works well in practice that creating a String without a buffer (characters = 0) means there are no characters->null strings. > The problem is this behavior should be kept consistent with String(pointer, length), and we end up with oddities like checking the pointer before the length. > > I start to think it may be a better choice to have String(char*) and String(char*, size_t) return empty strings. It would become the job of the caller to create null strings if necessary. A recent discussion with Darin also put the idea in my head that the implicit creation of null String can lead to unfortunate side effects. That seems totally reasonable to me. > I tend to like null string because they are really really cheap to pass around code path. But problems like this patch make me think we may be abusing them in WebKit. Agreed.
Darin Adler
Comment 24 2013-02-14 09:48:50 PST
I don’t think this is the best fix. I suggest we always create empty strings, not null strings, in this constructor.
Darin Adler
Comment 25 2013-02-14 09:50:21 PST
We should still check size, not data, but we should construct an empty string, not a null string, in that case. It’s a bad idea to behave differently based on what the data pointer returns for an empty vector. That leads to the confusing FIXME in the header and it’s not what we want for these classes.
Darin Adler
Comment 26 2013-02-14 09:58:00 PST
We can continue to discuss whether to remove the "create a null string if passed a null pointer" behavior of the pointer-based versions of these functions. But for the vector version, we should never create a null string since there is no such thing as a null vector. The patch we landed accidentally invents the concept of a null vector because if the the vector's data is 0 it creates a null string. We don’t want that for vectors. I understand that the old code glued together vectors and strings in the obvious way, and therefore it also had that strange behavior of inventing the concept of a null vector. And I agree with Ben that one way to fix that is to remove the String constructor function concept of making a null string if passed a null character pointer. But we can and should fix the vector part of this now while we continue that discussion.
Eric Seidel (no email)
Comment 27 2013-02-14 10:00:00 PST
It sounds like Darin is advocating for the first patch on this bug, which checked size and returned empty(). What we landed matches previous String(UChar*, size_t) behavior, but I agree, is confusing. I'm totally happy to see both String(Vector) and STring(UChar*, size_t) move to returning empty() in the length=0 case and never null.
Eric Seidel (no email)
Comment 28 2013-02-14 10:02:03 PST
I'll post a patch to fix String(Vector) to always return empty on size=0 shortly.
Darin Adler
Comment 29 2013-02-14 10:05:20 PST
(In reply to comment #27) > It sounds like Darin is advocating for the first patch on this bug, which checked size and returned empty(). Yes. > What we landed matches previous String(UChar*, size_t) behavior, but I agree, is confusing. It matches String(UChar*, size_t) behavior iff you are also assuming you’d pass vector’s data and size through directly. But the point is that we want to correctly convert an empty vector, and that requires more than just fetching the data and size and passing them through. In my mind this is a separate issue from the design of String(const UChar*, size_t). > I'm totally happy to see both String(Vector) and String(UChar*, size_t) move to returning empty() in the length=0 case and never null. I am certain we should correct String(Vector) so it correctly makes an empty string from an empty vector, even if the call sites inside the tokenizer just passed through the data and size before. There’s a higher standard for a shared WTF function than for local code in one part of WebKit. I am not sure that removing the “you can get a null string by passing a null pointer” behavior from String(UChar*, size_t) is the right thing to do and we should reflect on that more before making a change. One tiny factor is that, as Ben points out, constructing an empty string is more costly than constructing a null string. And given the inlining, there’s even a bit of code bloat factor.
Eric Seidel (no email)
Comment 30 2013-02-14 10:26:11 PST
Reopening to attach new patch.
Eric Seidel (no email)
Comment 31 2013-02-14 10:26:13 PST
Created attachment 188378 [details] Back to the original fix, now with more context
Darin Adler
Comment 32 2013-02-14 10:38:15 PST
Comment on attachment 188378 [details] Back to the original fix, now with more context View in context: https://bugs.webkit.org/attachment.cgi?id=188378&action=review > Source/WTF/wtf/text/WTFString.h:118 > + // NOTE: This is different from String(vector.data(), vector.size()) > + // which will sometimes return a null string when vector.data() is null > + // which can only occur for vectors without inline capacity. > + // See: https://bugs.webkit.org/show_bug.cgi?id=109792 I personally would not include this note, but I’m OK leaving it in.
WebKit Review Bot
Comment 33 2013-02-14 11:18:52 PST
Comment on attachment 188378 [details] Back to the original fix, now with more context Clearing flags on attachment: 188378 Committed r142894: <http://trac.webkit.org/changeset/142894>
WebKit Review Bot
Comment 34 2013-02-14 11:18:58 PST
All reviewed patches have been landed. Closing bug.
Benjamin Poulain
Comment 35 2013-02-14 12:03:33 PST
(In reply to comment #32) > (From update of attachment 188378 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=188378&action=review > > > Source/WTF/wtf/text/WTFString.h:118 > > + // NOTE: This is different from String(vector.data(), vector.size()) > > + // which will sometimes return a null string when vector.data() is null > > + // which can only occur for vectors without inline capacity. > > + // See: https://bugs.webkit.org/show_bug.cgi?id=109792 > > I personally would not include this note, but I’m OK leaving it in. This is the reason why I wanted to change everything at once. I would prefer to have completely consistent behavior, no exception in one direction or an other.
Note You need to log in before you can comment on or make changes to this bug.