Bug 32831 - Replace UString::Rep implementation, following introduction of ropes to JSC.
: Replace UString::Rep implementation, following introduction of ropes to JSC.
Status: RESOLVED FIXED
: WebKit
JavaScriptCore
: 528+ (Nightly build)
: PC Mac OS X 10.5
: P2 Normal
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2009-12-21 11:14 PST by
Modified: 2010-01-13 17:21 PST (History)


Attachments
The patch (85.89 KB, patch)
2009-12-21 11:19 PST, Gavin Barraclough
darin: review-
Review Patch | Details | Formatted Diff | Diff
Review comment fixage (86.68 KB, patch)
2009-12-21 13:38 PST, Gavin Barraclough
darin: review+
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2009-12-21 11:14:44 PST
Remove redundant overcapacity mechanisms, reduce memory cost of Rep's, and add a inline storage mechanism akin to that in WebCore's StringImpl.
------- Comment #1 From 2009-12-21 11:19:24 PST -------
Created an attachment (id=45339) [details]
The patch
------- Comment #2 From 2009-12-21 11:20:23 PST -------
Attachment 45339 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
JavaScriptCore/runtime/UStringImpl.cpp:25:  You should add a blank line after implementation file's own header.  [build/include_order] [4]
JavaScriptCore/runtime/UStringImpl.cpp:124:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
JavaScriptCore/runtime/UStringImpl.cpp:170:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
JavaScriptCore/runtime/UStringImpl.h:276:  #endif line should be "#endif // UStringImpl_h"  [build/header_guard] [5]
JavaScriptCore/runtime/UStringImpl.h:32:  Alphabetical sorting problem.  [build/include_order] [4]
JavaScriptCore/runtime/UStringImpl.h:55:  Missing spaces around &=  [whitespace/operators] [3]
JavaScriptCore/runtime/UStringImpl.h:61:  Missing spaces around |=  [whitespace/operators] [3]
JavaScriptCore/runtime/UStringImpl.h:130:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/runtime/UStringImpl.h:130:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
JavaScriptCore/runtime/UStringImpl.h:131:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/runtime/UStringImpl.h:132:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/runtime/UStringImpl.h:134:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/runtime/UStringImpl.h:136:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/runtime/UStringImpl.h:136:  Missing spaces around +=  [whitespace/operators] [3]
JavaScriptCore/runtime/UStringImpl.h:137:  Missing spaces around -=  [whitespace/operators] [3]
JavaScriptCore/runtime/UStringImpl.h:137:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
WebCore/ForwardingHeaders/runtime/UStringImpl.h:1:  #ifndef header guard has wrong style, please use: UStringImpl_h  [build/header_guard] [5]
WebCore/ForwardingHeaders/runtime/UStringImpl.h:4:  #endif line should be "#endif // UStringImpl_h"  [build/header_guard] [5]
JavaScriptCore/runtime/UString.cpp:242:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 19
------- Comment #3 From 2009-12-21 11:51:16 PST -------
(From update of attachment 45339 [details])
It would have been nice to have smaller patches. For example, one patch could have removed all the unneeded stuff without changing anything that was left behind.

> +    UString::Rep m_base;

Can we start calling these UStringImpl in code we're modifying, instead of using the typedef? I think a later patch should just eliminate "Rep" entirely.

>      // make sure UString doesn't try to reuse the buffer by pretending we have one more character in it

This comment should go. You removed all the code that went with it.

> +UStringImpl* UStringImpl::nullUStringImpl;
> +UStringImpl* UStringImpl::emptyUStringImpl;

Can we give these names that don't repeat the class name? If you are using the "s_" naming, s_null and s_empty would seem to do.

> +        // UString constructors passed char*s assume 7-bit ASCII enciding; for UTF8 use 'createFromUTF8', belwo.

Typo: "enciding".
Typo: "belwo".

The functions zero-extend the characters, so they don't assume ASCII encoding. They do ISO Latin-1, not UTF-8. I think the comment could be worded more precisely.

If the function assumed ASCII encoding then it would be illegal to pass a non-ASCII character and we would assert that it was not done. Maybe we should switch to that.

> -        // Constructor for null-terminated ASCII string.
>          UString(const char*);

A shame to lose the "null-terminated" aspect of this comment.

> Index: JavaScriptCore/runtime/UStringImpl.cpp
> ===================================================================
> --- JavaScriptCore/runtime/UStringImpl.cpp	(revision 0)
> +++ JavaScriptCore/runtime/UStringImpl.cpp	(revision 0)
> @@ -0,0 +1,176 @@
> +/*
> + *  Copyright (C) 1999-2000 Harri Porten (porten@kde.org)
> + *  Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009 Apple Inc. All rights reserved.
> + *  Copyright (C) 2007 Cameron Zwarich (cwzwarich@uwaterloo.ca)
> + *  Copyright (C) 2009 Google Inc. All rights reserved.

There may be too much copyright here.

It would be best, if this file is really a copied piece of UString.cpp, to do this by using "svn copy" so we see the deletion of the non-moved part of the file, and then the changes to the moved bit. The way you did it here, the code all looks new and has no history. But yet the copyright indicates you think it does have history. If the only part you kept was the hash function then I don't think you need non-Apple copyrights. Maciej and I wrote the hash function.

> +#include "config.h"
> +#include "UStringImpl.h"
> +#include "Identifier.h"
> +#include "UString.h"
> +#include "UTF8.h"

The UStringImpl.h include should be in a separate paragraph by itself.

> +using namespace WTF;

This should not be needed. If it is, then the WTF headers are done wrong, I believe.

> +COMPILE_ASSERT(sizeof(UChar) == 2, uchar_is_2_bytes);

What code below assumes this?

> +    ASSERT((bufferOwnership() == BufferShared) || ((bufferOwnership() == BufferOwned) && !m_dataBuffer.asPtr<void*>()));

I find it hard to read this assertion. Maybe too many parentheses?

> Index: JavaScriptCore/runtime/UStringImpl.h
> ===================================================================
> --- JavaScriptCore/runtime/UStringImpl.h	(revision 0)
> +++ JavaScriptCore/runtime/UStringImpl.h	(revision 0)
> @@ -0,0 +1,276 @@
> +/*
> + *  Copyright (C) 1999-2000 Harri Porten (porten@kde.org)
> + *  Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009 Apple Inc. All rights reserved.
> + *  Copyright (C) 2009 Google Inc. All rights reserved.

Same comments about copyright and "svn copy". Lets keep the appropriate amount of history.

> +#include <stdint.h>

Why? Are these really used in the header?

> +#include <wtf/FastMalloc.h>

Not needed. We use Noncopyable and if we include that header, then FastMalloc.h is already taken care of.

> +#include <limits.h>

Why? Don't we get everything we need from the C++ <limits>?

> +    UntypedPtrAndBitfield() {}

Uninitialized? Really? Maybe it should be zeroed instead? Or perhaps we don't need it at all?

Does this code come from WebCore::StringImpl? If so, could we svn copy from there?

> +    unsigned hash() const { if (m_hash == 0) m_hash = computeHash(data(), m_length); return m_hash; }

Our coding style would be !m_hash.

> +    void setHash(unsigned hash) { ASSERT(hash == computeHash(data(), m_length)); m_hash = hash; } // fast path for Identifiers

Do we really need a public function for this?

> +    UStringImpl* ref() { m_refCount+=2; return this; }
> +    ALWAYS_INLINE void deref() { if ((m_refCount-=2) == 0) destroy(); }

We normally put spaces around operators like "+=" and "-=".

You need a comment right here explaining why we bump the reference count by 2 instead of one.

Why does ref return UStringImpl* instead of void?

> +        if (length > (std::numeric_limits<size_t>::max() / sizeof(UChar)))

Extra parentheses here make it harder for me to read.

> +#ifdef NDEBUG
> +    void checkConsistency() const {}
> +#else
> +    void checkConsistency() const
> +    {
> +        // There is no recursion of substrings.
> +        ASSERT(bufferOwnerString()->bufferOwnership() != BufferSubstring);
> +        // Static strings cannot be put in identifier tables, because they are globally shared.
> +        ASSERT(!isStatic() || !identifierTable());
> +    }
> +#endif

Normally I prefer to write such ifdefs so there is only one declaration even if there are two definitions. In fact, I suggest putting the debug version into the cpp file since we normally don't inline in debug builds anyway. We'd just do this:

    #ifdef NDEBUG
    inline UStringImpl::checkConsistency() const
    {
    }
    #endif

And keep the class definition tidy with no ifdefs in it at all.

> +    enum BufferOwnership {
> +        BufferInternal = 0,
> +        BufferOwned = 1,
> +        BufferSubstring = 2,
> +        BufferShared = 3,
> +    };

Is there a reason to specify the numbers for each of these?

> +    // For SmallStringStorage which allocates an array, and uses an in-place new.

There should be a comma before the word which. And no comma before the word and.

> +    // Used to construct normal strings with an internal or external buffer.
> +    UStringImpl(UChar* data, int length, BufferOwnership ownership)

I'd expect an assertion that matches the comment.

> +    // Used to construct static strings with an special refCount of 1 - since refCount
> +    // is incremented/decremented by 2, this can never hit zero.  This means that the
> +    // static string will never be destroyed.  This is important because static strings
> +    // will be shared across threads & ref-counted in a non-threadsafe manner.

We use one space after a period, not two. I suggest a named constant for the special reference count of 1 so you don't have to put a comment on the line where you use it.

> +#ifndef NDEBUG
> +    bool isStatic() const { return m_refCount & 1; }
> +#endif

Since this is already private, maybe it doesn't have to be debug only.

> +    int m_refCount;

Since the code relies on overflow semantics for the special reference count, this should be unsigned, which has defined behavior on overflow, rather than int, which does not.

I'll say review-, but this looks really good. You can probably land after making a few fixes.
------- Comment #4 From 2009-12-21 13:37:37 PST -------
> > +    UString::Rep m_base;
> 
> Can we start calling these UStringImpl in code we're modifying, instead of
> using the typedef? I think a later patch should just eliminate "Rep" entirely.

Absolutely agreed.  I think it might make sense to look at further bringing UStringImpl and StringImpl closer together before doing so, but I definitely see this as the way forwards.

> > +    UntypedPtrAndBitfield() {}
> 
> Uninitialized? Really? Maybe it should be zeroed instead? Or perhaps we don't
> need it at all?

This constructor is used by the non argument UStringImpl constructor (which also leaves the object completely uninitialized), which is used by SmallStrings (In declaring an array of UStringImpls).  This behaviour replicated that of the previous UString::Rep implementation.

> Does this code come from WebCore::StringImpl? If so, could we svn copy from
> there?

No, I think this is all now fresh code (originally copied from UString::Rep, but everything has been replaced).

> > +    void setHash(unsigned hash) { ASSERT(hash == computeHash(data(), m_length)); m_hash = hash; } // fast path for Identifiers
> 
> Do we really need a public function for this?

Short answer is yes – this is provided from the identifier map.  I think we only do this during construction, so this could possibly be replaced with a fifth constructor (which just adds a lot of lines to solve a small problem), or we could friend Identifier (which didn't seem ideal from an encapsulation perspective.).

> Why does ref return UStringImpl* instead of void?

Again this is replicating the behaviour of the previous implementation of UString::Rep.  This is used in various other parts of the code base, which assign a UString::Rep* to a variable and reference it in a single line.  

> > +#ifdef NDEBUG
> > +    void checkConsistency() const {}
> > +#else
> > +    void checkConsistency() const
> > +    {
> > +        // There is no recursion of substrings.
> > +        ASSERT(bufferOwnerString()->bufferOwnership() != BufferSubstring);
> > +        // Static strings cannot be put in identifier tables, because they are globally shared.
> > +        ASSERT(!isStatic() || !identifierTable());
> > +    }
> > +#endif
> 
> Normally I prefer to write such ifdefs so there is only one declaration even if
> there are two definitions. In fact, I suggest putting the debug version into
> the cpp file since we normally don't inline in debug builds anyway. We'd just
> do this:

Moving the function into the cpp is difficult since it is called from the header, from methods that are used from WebCore.  This means that an entry in the .exp would be required in debug builds, but no implementation of the method would exist in release builds.
------- Comment #5 From 2009-12-21 13:38:35 PST -------
Created an attachment (id=45347) [details]
Review comment fixage
------- Comment #6 From 2009-12-21 13:43:56 PST -------
Attachment 45347 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
JavaScriptCore/runtime/UStringImpl.cpp:125:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
JavaScriptCore/runtime/UStringImpl.cpp:171:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
JavaScriptCore/runtime/UStringImpl.h:274:  #endif line should be "#endif // UStringImpl_h"  [build/header_guard] [5]
JavaScriptCore/runtime/UStringImpl.h:33:  Alphabetical sorting problem.  [build/include_order] [4]
JavaScriptCore/runtime/UStringImpl.h:55:  Missing spaces around &=  [whitespace/operators] [3]
JavaScriptCore/runtime/UStringImpl.h:61:  Missing spaces around |=  [whitespace/operators] [3]
JavaScriptCore/runtime/UStringImpl.h:130:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/runtime/UStringImpl.h:131:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/runtime/UStringImpl.h:132:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/runtime/UStringImpl.h:134:  More than one command on the same line  [whitespace/newline] [4]
JavaScriptCore/runtime/UStringImpl.h:136:  More than one command on the same line  [whitespace/newline] [4]
WebCore/ForwardingHeaders/runtime/UStringImpl.h:1:  #ifndef header guard has wrong style, please use: UStringImpl_h  [build/header_guard] [5]
WebCore/ForwardingHeaders/runtime/UStringImpl.h:4:  #endif line should be "#endif // UStringImpl_h"  [build/header_guard] [5]
JavaScriptCore/runtime/UString.cpp:242:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 14
------- Comment #7 From 2009-12-21 13:51:52 PST -------
(In reply to comment #4)
> > > +#ifdef NDEBUG
> > > +    void checkConsistency() const {}
> > > +#else
> > > +    void checkConsistency() const
> > > +    {
> > > +        // There is no recursion of substrings.
> > > +        ASSERT(bufferOwnerString()->bufferOwnership() != BufferSubstring);
> > > +        // Static strings cannot be put in identifier tables, because they are globally shared.
> > > +        ASSERT(!isStatic() || !identifierTable());
> > > +    }
> > > +#endif
> > 
> > Normally I prefer to write such ifdefs so there is only one declaration even if
> > there are two definitions. In fact, I suggest putting the debug version into
> > the cpp file since we normally don't inline in debug builds anyway. We'd just
> > do this:
> 
> Moving the function into the cpp is difficult since it is called from the
> header, from methods that are used from WebCore.  This means that an entry in
> the .exp would be required in debug builds, but no implementation of the method
> would exist in release builds.

OK. Well we could still do the ifdef'ing in the header. I'd put the ifdef in the body of the function then. And probably put the definition outside the class to avoid having that whole mess up in the class definition.
------- Comment #8 From 2009-12-21 13:53:45 PST -------
(From update of attachment 45347 [details])
> +        bool isEmpty() const { return (!m_rep->size()); }

Those parens! Yuck.

r=me
------- Comment #9 From 2009-12-21 16:41:32 PST -------
(In reply to comment #7)
> (In reply to comment #4)
> > > > +#ifdef NDEBUG
> > > > +    void checkConsistency() const {}
> > > > +#else
> > > > +    void checkConsistency() const
> > > > +    {
> > > > +        // There is no recursion of substrings.
> > > > +        ASSERT(bufferOwnerString()->bufferOwnership() != BufferSubstring);
> > > > +        // Static strings cannot be put in identifier tables, because they are globally shared.
> > > > +        ASSERT(!isStatic() || !identifierTable());
> > > > +    }
> > > > +#endif
> OK. Well we could still do the ifdef'ing in the header. I'd put the ifdef in
> the body of the function then. And probably put the definition outside the
> class to avoid having that whole mess up in the class definition.

I realized that since the function only contains ASSERTs (and since these compile to nothing in release builds) we can remove the #ifdef here altogether!

Also removed a couple of spare sets of parens, as per comment #8.

Transmitting file data ..............
Committed revision 52463.
------- Comment #10 From 2010-01-12 08:26:36 PST -------
>    void* operator new(size_t size) { return fastMalloc(size); }
>    void* operator new(size_t, void* inPlace) { return inPlace; }

Hi, Gavin, why is above code necessary? and why are delete operators not defined?

Thanks.
------- Comment #11 From 2010-01-12 11:12:22 PST -------
(In reply to comment #10)
> >    void* operator new(size_t size) { return fastMalloc(size); }
> >    void* operator new(size_t, void* inPlace) { return inPlace; }
> 
> Hi, Gavin, why is above code necessary? and why are delete operators not
> defined?

The in-place new is used in the SmallStringsStorage to declare an array on UStringImpls (these are never deleted).  Since a new operator has been defined the C++ compiler won't provide a default new(size_t, so one must be explicitly declared.

A delete wasn't defined because this appeared to be redundant, on mac at leaset – is there a reason we need one on another platform?
------- Comment #12 From 2010-01-12 11:14:05 PST -------
(In reply to comment #11)
> A delete wasn't defined because this appeared to be redundant, on mac at leaset
> – is there a reason we need one on another platform?

I suggest implementing a matching delete or deriving from FastAllocBase.

We are planning on removing the global operator new and delete eventually on Mac OS X. To be more specific, we plan on replacing them with ASSERT_NOT_REACHED, because we don’t want to accidentally rely on them, nor accidentally use the non-FastMalloc version of malloc within the framework.
------- Comment #13 From 2010-01-13 07:39:58 PST -------
(In reply to comment #11)
> (In reply to comment #10)
> > >    void* operator new(size_t size) { return fastMalloc(size); }
> > >    void* operator new(size_t, void* inPlace) { return inPlace; }
> > 
> > Hi, Gavin, why is above code necessary? and why are delete operators not
> > defined?
> 
> The in-place new is used in the SmallStringsStorage to declare an array on
> UStringImpls (these are never deleted).  Since a new operator has been defined
> the C++ compiler won't provide a default new(size_t, so one must be explicitly
> declared.
> 
> A delete wasn't defined because this appeared to be redundant, on mac at leaset
> – is there a reason we need one on another platform?

then the first one is not necessary because I remember even Noncopyable is derived from FastAllocBase. 

I just worry about this case: you implement operator new with fastMalloc, but if FastAllocBase::new does it in different way. then when you call delete, FastAllocBase::delete will be called and then will crash...
------- Comment #14 From 2010-01-13 07:44:59 PST -------
(In reply to comment #12)
> (In reply to comment #11)
> > A delete wasn't defined because this appeared to be redundant, on mac at leaset
> > – is there a reason we need one on another platform?
> 
> I suggest implementing a matching delete or deriving from FastAllocBase.
> 
> We are planning on removing the global operator new and delete eventually on
> Mac OS X. To be more specific, we plan on replacing them with
> ASSERT_NOT_REACHED, because we don’t want to accidentally rely on them, nor
> accidentally use the non-FastMalloc version of malloc within the framework.

"removing the global operator new and delete": why? then what if STL call global new/delete operators? and  "new int[10]"? Personally, I prefer global new/delete operators than FastAllocBase.
------- Comment #15 From 2010-01-13 09:00:07 PST -------
(In reply to comment #13)
> I just worry about this case: you implement operator new with fastMalloc, but
> if FastAllocBase::new does it in different way. then when you call delete,
> FastAllocBase::delete will be called and then will crash...

Yes, I am also worried about that.
------- Comment #16 From 2010-01-13 17:21:39 PST -------
operator new replaced with a 'using' to import the parent Class's new operator in r53221.