REGRESSION(r142712): attribute values show up as "(null)" instead of null with the threaded parser
Created attachment 188263 [details] Patch
I stab you in the face, visual studio.
I've asked on webkit-dev about the sln issue: https://lists.webkit.org/pipermail/webkit-dev/2013-February/023844.html
Comment on attachment 188263 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=188263&action=review > Source/WTF/wtf/text/WTFString.h:115 > + // If the Vector is size == 0, this will produce an empty string. > template<size_t inlineCapacity> > explicit String(const Vector<UChar, inlineCapacity>&); Ouch. I considered changing adopt() to work the other way around. It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion. Why do you think empty() is the correct output? Shouldn't the specific call site be changed instead? Is
(In reply to comment #4) > (From update of attachment 188263 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=188263&action=review > > > Source/WTF/wtf/text/WTFString.h:115 > > + // If the Vector is size == 0, this will produce an empty string. > > template<size_t inlineCapacity> > > explicit String(const Vector<UChar, inlineCapacity>&); > > Ouch. > I considered changing adopt() to work the other way around. > > It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion. String(UChar*, size_t) already returns the empty string when size ==0: // Construct a string with UTF-16 data. String::String(const UChar* characters, unsigned length) : m_impl(characters ? StringImpl::create(characters, length) : 0) { } So this change makes the behavior consistent. That was infact the bug. I changed callsites which used to use String(vector.data(), vector.size()) to use String(vector) and tests started failing (for the threaded parser). This patch fixes that. > Why do you think empty() is the correct output? Vector has no concept of null. I'm really just trying to make String(vector) match String(vector.data(), vector.size()), since that seems to me they should match. :) > Shouldn't the specific call site be changed instead? see above.
> > It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion. > > String(UChar*, size_t) already returns the empty string when size ==0: > > // Construct a string with UTF-16 data. > String::String(const UChar* characters, unsigned length) > : m_impl(characters ? StringImpl::create(characters, length) : 0) > { > } > > So this change makes the behavior consistent. Not when the length is zero, when characters is zero!
(In reply to comment #6) > > > It will also mean String(vector) != String(vector.data(), vector.size()), which is counter-intuitive in my opinion. > > > > String(UChar*, size_t) already returns the empty string when size ==0: > > > > // Construct a string with UTF-16 data. > > String::String(const UChar* characters, unsigned length) > > : m_impl(characters ? StringImpl::create(characters, length) : 0) > > { > > } > > > > So this change makes the behavior consistent. > > Not when the length is zero, when characters is zero! Correct. That only returns *null* when characters are zero, but returns *empty* when length is zero. :) That's what I"m changing String(Vector) to do. :) Before this patch, they're inconsistent. STring(Vector) in TOT will return a null string when the vector has size 0.
Vector can never produce a null .data(), so String(vector.data(), vector.size()) can never return a null string. Just like what I'm changing String(Vector) to do with this patch.
(In reply to comment #8) > Vector can never produce a null .data(), so String(vector.data(), vector.size()) can never return a null string. Just like what I'm changing String(Vector) to do with this patch. Oh, really, I got that wrong about vector. I thought data() of an empty vector was 0.
(In reply to comment #9) > (In reply to comment #8) > > Vector can never produce a null .data(), so String(vector.data(), vector.size()) can never return a null string. Just like what I'm changing String(Vector) to do with this patch. > > Oh, really, I got that wrong about vector. > I thought data() of an empty vector was 0. Hmm.. You may be right. Looking at VectorBufferBase, it appears m_buffer can be null. Maybe I'm barking up the wrong tree. Will look again in the morning.
It's possible that Vectors with inline capacity can never have a data() which is null. If so, that may be a "bug" in vector, that data() is non-null for inline vectors. Or at least that would explain this behavior difference.
> Hmm.. You may be right. > > Looking at VectorBufferBase, it appears m_buffer can be null. > > Maybe I'm barking up the wrong tree. Will look again in the morning. Skimming over Vector: your data non-zero because its inline capacity is non zero. This is messed up, I don't have an easy answer here for the Vector->String conversion :(
Yeah. It appears that Vectors of size 0 with an inline capacity will never have a null data(), but Vectors without inline capacity and size 0 will. To fix this, we should just key off of vector.data() instead, and then the behavior will match String(UChar*, size_t) exactly. :)
I've filed bug 109792 to track the Vector oddity. I'll post an updated patch now.
Pretty funny we could be arguing while being both right. Silly Vector :)
Created attachment 188282 [details] Patch
We were just emphatically agreeing with one another. :)
Comment on attachment 188282 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=188282&action=review > Source/WTF/ChangeLog:20 > + between threads). The main-thread parser path uses AtomicHTMLToken which Two spaces after the period.
Created attachment 188288 [details] Patch for landing
To come back to the problem in general: Usually, I am strongly against reading the pointer of an array/vector if the passed length is 0. For String, it just works well in practice that creating a String without a buffer (characters = 0) means there are no characters->null strings. The problem is this behavior should be kept consistent with String(pointer, length), and we end up with oddities like checking the pointer before the length. I start to think it may be a better choice to have String(char*) and String(char*, size_t) return empty strings. It would become the job of the caller to create null strings if necessary. A recent discussion with Darin also put the idea in my head that the implicit creation of null String can lead to unfortunate side effects. I tend to like null string because they are really really cheap to pass around code path. But problems like this patch make me think we may be abusing them in WebKit.
Comment on attachment 188288 [details] Patch for landing Clearing flags on attachment: 188288 Committed r142863: <http://trac.webkit.org/changeset/142863>
All reviewed patches have been landed. Closing bug.
(In reply to comment #20) > To come back to the problem in general: > > Usually, I am strongly against reading the pointer of an array/vector if the passed length is 0. > > For String, it just works well in practice that creating a String without a buffer (characters = 0) means there are no characters->null strings. > The problem is this behavior should be kept consistent with String(pointer, length), and we end up with oddities like checking the pointer before the length. > > I start to think it may be a better choice to have String(char*) and String(char*, size_t) return empty strings. It would become the job of the caller to create null strings if necessary. A recent discussion with Darin also put the idea in my head that the implicit creation of null String can lead to unfortunate side effects. That seems totally reasonable to me. > I tend to like null string because they are really really cheap to pass around code path. But problems like this patch make me think we may be abusing them in WebKit. Agreed.
I don’t think this is the best fix. I suggest we always create empty strings, not null strings, in this constructor.
We should still check size, not data, but we should construct an empty string, not a null string, in that case. It’s a bad idea to behave differently based on what the data pointer returns for an empty vector. That leads to the confusing FIXME in the header and it’s not what we want for these classes.
We can continue to discuss whether to remove the "create a null string if passed a null pointer" behavior of the pointer-based versions of these functions. But for the vector version, we should never create a null string since there is no such thing as a null vector. The patch we landed accidentally invents the concept of a null vector because if the the vector's data is 0 it creates a null string. We don’t want that for vectors. I understand that the old code glued together vectors and strings in the obvious way, and therefore it also had that strange behavior of inventing the concept of a null vector. And I agree with Ben that one way to fix that is to remove the String constructor function concept of making a null string if passed a null character pointer. But we can and should fix the vector part of this now while we continue that discussion.
It sounds like Darin is advocating for the first patch on this bug, which checked size and returned empty(). What we landed matches previous String(UChar*, size_t) behavior, but I agree, is confusing. I'm totally happy to see both String(Vector) and STring(UChar*, size_t) move to returning empty() in the length=0 case and never null.
I'll post a patch to fix String(Vector) to always return empty on size=0 shortly.
(In reply to comment #27) > It sounds like Darin is advocating for the first patch on this bug, which checked size and returned empty(). Yes. > What we landed matches previous String(UChar*, size_t) behavior, but I agree, is confusing. It matches String(UChar*, size_t) behavior iff you are also assuming you’d pass vector’s data and size through directly. But the point is that we want to correctly convert an empty vector, and that requires more than just fetching the data and size and passing them through. In my mind this is a separate issue from the design of String(const UChar*, size_t). > I'm totally happy to see both String(Vector) and String(UChar*, size_t) move to returning empty() in the length=0 case and never null. I am certain we should correct String(Vector) so it correctly makes an empty string from an empty vector, even if the call sites inside the tokenizer just passed through the data and size before. There’s a higher standard for a shared WTF function than for local code in one part of WebKit. I am not sure that removing the “you can get a null string by passing a null pointer” behavior from String(UChar*, size_t) is the right thing to do and we should reflect on that more before making a change. One tiny factor is that, as Ben points out, constructing an empty string is more costly than constructing a null string. And given the inlining, there’s even a bit of code bloat factor.
Reopening to attach new patch.
Created attachment 188378 [details] Back to the original fix, now with more context
Comment on attachment 188378 [details] Back to the original fix, now with more context View in context: https://bugs.webkit.org/attachment.cgi?id=188378&action=review > Source/WTF/wtf/text/WTFString.h:118 > + // NOTE: This is different from String(vector.data(), vector.size()) > + // which will sometimes return a null string when vector.data() is null > + // which can only occur for vectors without inline capacity. > + // See: https://bugs.webkit.org/show_bug.cgi?id=109792 I personally would not include this note, but I’m OK leaving it in.
Comment on attachment 188378 [details] Back to the original fix, now with more context Clearing flags on attachment: 188378 Committed r142894: <http://trac.webkit.org/changeset/142894>
(In reply to comment #32) > (From update of attachment 188378 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=188378&action=review > > > Source/WTF/wtf/text/WTFString.h:118 > > + // NOTE: This is different from String(vector.data(), vector.size()) > > + // which will sometimes return a null string when vector.data() is null > > + // which can only occur for vectors without inline capacity. > > + // See: https://bugs.webkit.org/show_bug.cgi?id=109792 > > I personally would not include this note, but I’m OK leaving it in. This is the reason why I wanted to change everything at once. I would prefer to have completely consistent behavior, no exception in one direction or an other.