Bug 23175 - String and UString should be able to share a UChar* buffer.
: String and UString should be able to share a UChar* buffer.
Status: RESOLVED FIXED
: WebKit
JavaScriptCore
: 528+ (Nightly build)
: All All
: P2 Normal
Assigned To:
:
:
: 23199 25039 25126 26057
: 22720
  Show dependency treegraph
 
Reported: 2009-01-07 15:20 PST by
Modified: 2009-06-29 16:00 PST (History)


Attachments
Patch for bug. (80.07 KB, patch)
2009-01-07 17:29 PST, David Levin
no flags Review Patch | Details | Formatted Diff | Diff
Attempted to make the SharedDataHolder usage more clear. (82.39 KB, patch)
2009-01-08 17:06 PST, David Levin
no flags Review Patch | Details | Formatted Diff | Diff
Part 1. Some refactoring work. (32.42 KB, patch)
2009-01-09 04:53 PST, David Levin
no flags Review Patch | Details | Formatted Diff | Diff
Minor change to previous patch. (1.99 KB, patch)
2009-01-09 11:22 PST, David Levin
no flags Review Patch | Details | Formatted Diff | Diff
Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class. (27.80 KB, patch)
2009-01-11 11:32 PST, David Levin
darin: review-
Review Patch | Details | Formatted Diff | Diff
Addressed comments -- Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class. (28.36 KB, patch)
2009-01-11 16:51 PST, David Levin
no flags Review Patch | Details | Formatted Diff | Diff
Addressed Darin's comments from the last review. (28.30 KB, patch)
2009-01-11 18:06 PST, David Levin
no flags Review Patch | Details | Formatted Diff | Diff
Part3: Allow UString to share a UChar* with another class. (32.77 KB, patch)
2009-01-12 04:39 PST, David Levin
no flags Review Patch | Details | Formatted Diff | Diff
Part 3: Add CrossThreadRefCounted. (10.05 KB, patch)
2009-02-02 16:30 PST, David Levin
no flags Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2009-01-07 15:20:21 PST
Also, it would be nice to share a String across threads and share the buffer in that case as well.
------- Comment #1 From 2009-01-07 17:29:01 PST -------
Created an attachment (id=26515) [details]
Patch for bug.
------- Comment #2 From 2009-01-07 17:34:25 PST -------
There is a simple follow on patch that could enable StringImpl to automatically share the buf from a UString, but I'm slightly more worried about potential memory regressions (due to UString's buffers being bigger than StringImpl), so I wanted to get in this part first.
------- Comment #3 From 2009-01-07 22:51:00 PST -------
If anyone wants to start looking at the diff, it would be great.

However, I'm looking at changing making some changes to mrowe's comments in irc.   My summary: Basically "SharedDataHolder" is confusing and badly named (and I goofed in the changelog "visa versa" should be "vice versa").

So I've be changing SharedDataHolder some how to make it clearer in how to use it (and likely changing its name).
------- Comment #4 From 2009-01-08 17:06:22 PST -------
Created an attachment (id=26550) [details]
Attempted to make the SharedDataHolder usage more clear.

This has all known changes done to it.

There is one known issue with the change:
JavaScriptCore/wtf/SharedDataHolder.h no longer matches the contents of the file.

This means that I should change the name of the file and the references to it in various places (build files, includes, and ChangeLog).  However, in case there is more feedback about the naming of this class, I avoid those changes for the moment.
------- Comment #5 From 2009-01-08 17:13:41 PST -------
Now it is ready for review.
------- Comment #6 From 2009-01-09 04:53:19 PST -------
Created an attachment (id=26563) [details]
Part 1.  Some refactoring work.
------- Comment #7 From 2009-01-09 04:55:49 PST -------
Here's a much smaller patch that is ready for review.

After this one gets in, I'll do another incremental one.
------- Comment #8 From 2009-01-09 09:55:41 PST -------
Committing to http://svn.webkit.org/repository/webkit/trunk ...
    M    JavaScriptCore/ChangeLog
    M    JavaScriptCore/GNUmakefile.am
    M    JavaScriptCore/JavaScriptCore.exp
    M    JavaScriptCore/JavaScriptCore.vcproj/WTF/WTF.vcproj
    M    JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
    M    JavaScriptCore/runtime/Identifier.cpp
    M    JavaScriptCore/runtime/InitializeThreading.cpp
    M    JavaScriptCore/runtime/JSGlobalData.cpp
    M    JavaScriptCore/runtime/PropertyNameArray.cpp
    M    JavaScriptCore/runtime/UString.cpp
    M    JavaScriptCore/runtime/UString.h
    M    JavaScriptGlue/ChangeLog
    M    WebCore/ChangeLog
Committed r39747
Committing to http://svn.webkit.org/repository/webkit/trunk ...
    A    JavaScriptGlue/ForwardingHeaders/wtf/PtrAndFlags.h
    A    WebCore/ForwardingHeaders/wtf/PtrAndFlags.h
Committed r39748
------- Comment #9 From 2009-01-09 11:22:41 PST -------
Created an attachment (id=26565) [details]
Minor change to previous patch.
------- Comment #10 From 2009-01-09 12:17:30 PST -------
(From update of attachment 26565 [details])
Committing to http://svn.webkit.org/repository/webkit/trunk ...
    M    JavaScriptCore/ChangeLog
    M    JavaScriptCore/jsc.cpp
    M    JavaScriptCore/runtime/JSGlobalData.cpp
Committed r39755
------- Comment #11 From 2009-01-11 01:45:10 PST -------
(From update of attachment 26563 [details])
Clearing review flags from committed patches to get the bug out of commit queue.
------- Comment #12 From 2009-01-11 11:32:33 PST -------
Created an attachment (id=26609) [details]
Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.
------- Comment #13 From 2009-01-11 14:14:15 PST -------
(From update of attachment 26609 [details])
This looks good.

> +static const unsigned numCharactersToStore = 0x100; // Note that SmallStrings::m_singleCharacterStrings should be the same size.

We have COMPILE_ASSERT so that such things can be checked rather than just mentioned in comments.

> +}  // namespace JSC
> +
> +#endif  // SmallStrings_h

If the patch is going to touch some files just to add comments, then I will be in full nitpick mode. I suggest only one space before "//" to match the style elsewhere, rather than two spaces.

> -            rep->capacity = newCapacity;
> +            rep->baseString()->capacity = newCapacity;

I would have expected you to use "base" here rather than re-fetching rep->baseString(). Is this better in some way?

> -            bool baseIsSelf() const { return baseString == this; }
> +            bool baseIsSelf() const { return m_identifierTableAndFlags.isFlagSet(BaseStringFlag); }

Why can't we keep the old definition, but move it down in the file where it can see the BaseString class definition? Or can't this check if baseString is zero instead? Do we really need the bit in the flags? Does the bit help performance or something?

> -            UChar* data() const { return baseString->buf + baseString->preCapacity + offset; }
> +            UChar* data() const;

I'm surprised this can be non-inline without hurting performance slightly.

> +            void setBaseString(PassRefPtr<BaseString> base) { ASSERT(base != this); m_baseString = base.releaseRef(); }
> +            BaseString* baseString() { return reinterpret_cast<BaseString*>(baseIsSelf() ? this : m_baseString); }
> +            const BaseString* baseString() const { return const_cast<const BaseString*>(const_cast<Rep*>(this)->baseString());  }

It's annoying to have these reinterpret_cast here. Instead you should just define these inline functions outside the class definition, in a place that can see the definition of BaseString.

> +            void* m_baseString; // If "this" is a BaseString instance, it is 0. BaseString* otherwise.

Seems really ugly to have this be a void*. Can't it be a BaseString* instead? When is the fact that it's a void* helpful?

> +
> +            static BaseString& null() { return *nullBaseString; }
> +            static BaseString& empty() { return *emptyBaseString; }
> +        private:
> +            friend void initializeUString();

I'd prefer a blank line before the private line.

> +    inline UChar* UString::Rep::data() const
> +    {
> +        const UString::BaseString* base = baseString();
> +        return base->buf + base->preCapacity + offset;
> +    }

Do you need the UString:: qualifier on BaseString here?

> +        UString::BaseString* base = m_rep->baseString();

And here?

I'm going to say review- because of the reinterpret_cast question, and the question of whether we really do need a bit to tell whether something is a base string. But this does seem like a great change and about ready to land.
------- Comment #14 From 2009-01-11 16:51:55 PST -------
Created an attachment (id=26618) [details]
Addressed comments -- Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.


> > +static const unsigned numCharactersToStore = 0x100; // Note that SmallStrings::m_singleCharacterStrings should be the same size.
>We have COMPILE_ASSERT so that such things can be checked rather than just mentioned in comments.
Fixed.

> > +}  // namespace JSC
> > +
> > +#endif  // SmallStrings_h
> I suggest only one space before "//" to match the style elsewhere, rather than two spaces.
Fixed.

> > -            rep->capacity = newCapacity;
> > +            rep->baseString()->capacity = newCapacity;

> I would have expected you to use "base" here rather than re-fetching
> rep->baseString(). Is this better in some way?

rep has just been recreated, so the local variable base is no longer valid at this point.


> > +            UChar* data() const;
> I'm surprised this can be non-inline without hurting performance slightly.

Still inlined just not defined in the class definition.


> > +            void setBaseString(PassRefPtr<BaseString> base) { ASSERT(base != this); m_baseString = base.releaseRef(); }
> > +            BaseString* baseString() { return reinterpret_cast<BaseString*>(baseIsSelf() ? this : m_baseString); }
> > +            const BaseString* baseString() const { return const_cast<const BaseString*>(const_cast<Rep*>(this)->baseString());  }

> It's annoying to have these reinterpret_cast here. Instead you should just
> define these inline functions outside the class definition, in a place that can
> see the definition of BaseString.

Fixed.

> > +            static BaseString& empty() { return *emptyBaseString; }
> > +        private:
> > +            friend void initializeUString();
> I'd prefer a blank line before the private line.

Fixed.

> +    inline UChar* UString::Rep::data() const
> +    {
> +        const UString::BaseString* base = baseString();
> Do you need the UString:: qualifier on BaseString here?

Fixed.


> > +        UString::BaseString* base = m_rep->baseString();
>And here?
Fixed.


> > -            bool baseIsSelf() const { return baseString == this; }
>> +            bool baseIsSelf() const { return m_identifierTableAndFlags.isFlagSet(BaseStringFlag); }

> Why can't we keep the old definition, but move it down in the file where it can...

and

> > +            void* m_baseString; // If "this" is a BaseString instance, it is 0. BaseString* otherwise.
> Seems really ugly to have this be a void*. Can't it be a BaseString* instead?
> When is the fact that it's a void* helpful?

and

> I'm going to say review- because of the reinterpret_cast question, and the
> question of whether we really do need a bit to tell whether something is a base
> string.

In the next change, I'll make this point to something else if it is a BaseString.  This is the reason for the flag and putting 0 in there in the case of the BaseString and for making a it void* that no code can use directly.

Why re-use this field? In order to not make BaseString any bigger.  (I've been under the impression that this is desired.)
------- Comment #15 From 2009-01-11 17:47:54 PST -------
(In reply to comment #14)
> In the next change, I'll make this point to something else if it is a
> BaseString.  This is the reason for the flag and putting 0 in there in the case
> of the BaseString and for making a it void* that no code can use directly.

A union is typically better than a void* to use the same storage for pointers of two different types.
------- Comment #16 From 2009-01-11 17:51:57 PST -------
(From update of attachment 26618 [details])
> +    COMPILE_ASSERT(numCharactersToStore == sizeof(SmallStrings::m_singleCharacterStrings) / sizeof(SmallStrings::m_singleCharacterStrings[0]), 
> +                   WTF_IsNumCharactersConstInSyncWithClassUsage);

No need to put that WTF_ prefix on the COMPILE_ASSERT argument. That name is only used for a local symbol, which can't conflict with anything outside this function. The only place you *might* want that prefix would be if you used COMPILE_ASSERT in a header, and even there I don't think it’s needed.

Do you need those SmallStrings:: qualifiers? I'd think this would compile without them since we’re inside a SmallStrings constructor definition.

> +            void setBaseString(PassRefPtr<BaseString> base);

This is a classic case where we would not name the argument in the function declaration, since the argument's type and the function name make it perfectly clear without a name.

r=me
------- Comment #17 From 2009-01-11 18:06:04 PST -------
Created an attachment (id=26619) [details]
Addressed Darin's comments from the last review.

The only change is to fix the two comments on the R+ review from before.
------- Comment #18 From 2009-01-11 19:19:59 PST -------
(From update of attachment 26619 [details])
Landed as http://trac.webkit.org/changeset/39815
------- Comment #19 From 2009-01-11 19:20:13 PST -------
(From update of attachment 26619 [details])
Clearing review flag because it was landed.
------- Comment #20 From 2009-01-12 04:39:18 PST -------
Created an attachment (id=26629) [details]
Allow UString to share a UChar* with another class.

The underlying class used for sharing also allows for sharing across threads which will be used in StringImpl::copy.
------- Comment #21 From 2009-01-26 22:23:19 PST -------
I currently like the names
  FastRefCountForThreads
or
  FastThreadableRefCount

instead of
  RefCountedMalloced

but I'd like a second opinion before I change it.  (The fact that it hold something that was malloc'ed is easily changeable if it is ever desired to be used to for another allocation type.)
------- Comment #22 From 2009-01-27 08:57:25 PST -------
(In reply to comment #21)
> I currently like the names
>   FastRefCountForThreads
> or
>   FastThreadableRefCount
> 
> instead of
>   RefCountedMalloced
> 
> but I'd like a second opinion before I change it.  (The fact that it hold
> something that was malloc'ed is easily changeable if it is ever desired to be
> used to for another allocation type.)

I like the general direction here; the new suggested names seem better.

Despite our unfortunate use of the word "fast" in "fastMalloc", I generally would like to avoid the word "fast". It makes me want to ask the question, "When we would I ever *not* use this?" I don't want my other code to be slow!

This is easier to do in person, but generally I like to describe the class and what it's for in prose, and then select words from that discussion to name the class. That exercise often works well.
------- Comment #23 From 2009-01-27 12:56:12 PST -------
>  describe the class and what it's for in prose

Here's my attempt:

The class is to allow for sharing an item with multiple objects across threads that can each control its lifetime independently (using a ref counting scheme) which is similar to RefCounted.  

It particularly useful in performance critical scenarios where ref/deref are done often without significantly impacting performance and avoids using atomic* until absolutely necessary (unlike ThreadSafeShared which does it on every call).

In return, the way to use it is slightly more complicated than ThreadSafeShared.  An instance of the class is not threadsafe, but it is designed to be usable in a multi-thread scenario.  The code has to call ::copy() to get another instance which can be used on another thread (while still holding on to the same object).
------- Comment #24 From 2009-01-29 13:57:51 PST -------
(From update of attachment 26618 [details])
Removed r+ b/c it was landed.
------- Comment #25 From 2009-02-02 16:30:10 PST -------
Created an attachment (id=27266) [details]
Part 3: Add CrossThreadRefCounted.

Ready for review.

Also I broke up the previous (part 3) patch into something smaller with the hope that it will be easier to review like this, and came up with a better name for my class.  The class isn't in the build yet because it isn't used in this patch.  (It will be used in the next patch.)
------- Comment #26 From 2009-03-09 09:38:37 PST -------
(From update of attachment 27266 [details])
> +    // ThreadSafeShared can have a siginificant perf impact when used in low level classes

Typo here in the spelling of "significant".

> +        // data is freed using *fastFree*.

I believe this comment is wrong. I think the data is deallocated with delete, not fastFree.

> +        // Used to make an instance that can be used on another thread.
> +        PassRefPtr<CrossThreadRefCounted<T> > copy();

I think this naming is getting increasingly unclear. The functions that copy objects to be used on other threads probably need a name that somehow talks about threads rather then just the word "copy".

> +        bool isShared() const { return !m_refCounter.hasOneRef() || dataAccessMustBeThreadSafe(); }

I don't understand the semantics of this function. When would you call it and what would you do based on the result.

> +        bool dataAccessMustBeThreadSafe() const { return m_threadSafeRefCounter && !m_threadSafeRefCounter->hasOneRef(); }

Same question.

> +        ~CrossThreadRefCounted()
> +        {
> +            if (m_data && !m_threadSafeRefCounter)
> +                delete m_data;
> +        }

The null check of m_data is redundant with what the compiler will already do automatically, so I suggest omitting it.

> +#if USE(LOCKFREE_THREADSAFESHARED)
> +        if (atomicDecrement(&m_refCount) <= 0)
> +#else
> +        {
> +            MutexLocker locker(m_mutex);
> +            --m_refCount;
> +        }
> +        if (m_refCount <= 0)
> +#endif
> +            return true;

I think this would read more clearly if the return statement was repeated isnide the #if.

r=me
------- Comment #27 From 2009-03-09 14:06:24 PST -------
(From update of attachment 27266 [details])
Committed in r41536.
------- Comment #28 From 2009-06-29 16:00:15 PST -------
This is all working now.  Changing to FIXED.