Bug 23175 - String and UString should be able to share a UChar* buffer.
Summary: String and UString should be able to share a UChar* buffer.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: David Levin
URL:
Keywords:
Depends on: 23199 25039 25126 26057
Blocks: 22720
  Show dependency treegraph
 
Reported: 2009-01-07 15:20 PST by David Levin
Modified: 2009-06-29 16:00 PDT (History)
7 users (show)

See Also:


Attachments
Patch for bug. (80.07 KB, patch)
2009-01-07 17:29 PST, David Levin
no flags Details | Formatted Diff | Diff
Attempted to make the SharedDataHolder usage more clear. (82.39 KB, patch)
2009-01-08 17:06 PST, David Levin
no flags Details | Formatted Diff | Diff
Part 1. Some refactoring work. (32.42 KB, patch)
2009-01-09 04:53 PST, David Levin
no flags Details | Formatted Diff | Diff
Minor change to previous patch. (1.99 KB, patch)
2009-01-09 11:22 PST, David Levin
no flags Details | Formatted Diff | Diff
Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class. (27.80 KB, patch)
2009-01-11 11:32 PST, David Levin
darin: review-
Details | Formatted Diff | Diff
Addressed comments -- Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class. (28.36 KB, patch)
2009-01-11 16:51 PST, David Levin
no flags Details | Formatted Diff | Diff
Addressed Darin's comments from the last review. (28.30 KB, patch)
2009-01-11 18:06 PST, David Levin
no flags Details | Formatted Diff | Diff
Part3: Allow UString to share a UChar* with another class. (32.77 KB, patch)
2009-01-12 04:39 PST, David Levin
no flags Details | Formatted Diff | Diff
Part 3: Add CrossThreadRefCounted. (10.05 KB, patch)
2009-02-02 16:30 PST, David Levin
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Levin 2009-01-07 15:20:21 PST
Also, it would be nice to share a String across threads and share the buffer in that case as well.
Comment 1 David Levin 2009-01-07 17:29:01 PST
Created attachment 26515 [details]
Patch for bug.
Comment 2 David Levin 2009-01-07 17:34:25 PST
There is a simple follow on patch that could enable StringImpl to automatically share the buf from a UString, but I'm slightly more worried about potential memory regressions (due to UString's buffers being bigger than StringImpl), so I wanted to get in this part first.
Comment 3 David Levin 2009-01-07 22:51:00 PST
If anyone wants to start looking at the diff, it would be great.

However, I'm looking at changing making some changes to mrowe's comments in irc.   My summary: Basically "SharedDataHolder" is confusing and badly named (and I goofed in the changelog "visa versa" should be "vice versa").

So I've be changing SharedDataHolder some how to make it clearer in how to use it (and likely changing its name).
Comment 4 David Levin 2009-01-08 17:06:22 PST
Created attachment 26550 [details]
Attempted to make the SharedDataHolder usage more clear.

This has all known changes done to it.

There is one known issue with the change:
JavaScriptCore/wtf/SharedDataHolder.h no longer matches the contents of the file.

This means that I should change the name of the file and the references to it in various places (build files, includes, and ChangeLog).  However, in case there is more feedback about the naming of this class, I avoid those changes for the moment.
Comment 5 David Levin 2009-01-08 17:13:41 PST
Now it is ready for review.
Comment 6 David Levin 2009-01-09 04:53:19 PST
Created attachment 26563 [details]
Part 1.  Some refactoring work.
Comment 7 David Levin 2009-01-09 04:55:49 PST
Here's a much smaller patch that is ready for review.

After this one gets in, I'll do another incremental one.
Comment 8 Oliver Hunt 2009-01-09 09:55:41 PST
Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	JavaScriptCore/ChangeLog
	M	JavaScriptCore/GNUmakefile.am
	M	JavaScriptCore/JavaScriptCore.exp
	M	JavaScriptCore/JavaScriptCore.vcproj/WTF/WTF.vcproj
	M	JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
	M	JavaScriptCore/runtime/Identifier.cpp
	M	JavaScriptCore/runtime/InitializeThreading.cpp
	M	JavaScriptCore/runtime/JSGlobalData.cpp
	M	JavaScriptCore/runtime/PropertyNameArray.cpp
	M	JavaScriptCore/runtime/UString.cpp
	M	JavaScriptCore/runtime/UString.h
	M	JavaScriptGlue/ChangeLog
	M	WebCore/ChangeLog
Committed r39747
Committing to http://svn.webkit.org/repository/webkit/trunk ...
	A	JavaScriptGlue/ForwardingHeaders/wtf/PtrAndFlags.h
	A	WebCore/ForwardingHeaders/wtf/PtrAndFlags.h
Committed r39748

Comment 9 David Levin 2009-01-09 11:22:41 PST
Created attachment 26565 [details]
Minor change to previous patch.
Comment 10 Oliver Hunt 2009-01-09 12:17:30 PST
Comment on attachment 26565 [details]
Minor change to previous patch.

Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	JavaScriptCore/ChangeLog
	M	JavaScriptCore/jsc.cpp
	M	JavaScriptCore/runtime/JSGlobalData.cpp
Committed r39755
Comment 11 Alexey Proskuryakov 2009-01-11 01:45:10 PST
Comment on attachment 26563 [details]
Part 1.  Some refactoring work.

Clearing review flags from committed patches to get the bug out of commit queue.
Comment 12 David Levin 2009-01-11 11:32:33 PST
Created attachment 26609 [details]
Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.
Comment 13 Darin Adler 2009-01-11 14:14:15 PST
Comment on attachment 26609 [details]
Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.

This looks good.

> +static const unsigned numCharactersToStore = 0x100; // Note that SmallStrings::m_singleCharacterStrings should be the same size.

We have COMPILE_ASSERT so that such things can be checked rather than just mentioned in comments.

> +}  // namespace JSC
> +
> +#endif  // SmallStrings_h

If the patch is going to touch some files just to add comments, then I will be in full nitpick mode. I suggest only one space before "//" to match the style elsewhere, rather than two spaces.

> -            rep->capacity = newCapacity;
> +            rep->baseString()->capacity = newCapacity;

I would have expected you to use "base" here rather than re-fetching rep->baseString(). Is this better in some way?

> -            bool baseIsSelf() const { return baseString == this; }
> +            bool baseIsSelf() const { return m_identifierTableAndFlags.isFlagSet(BaseStringFlag); }

Why can't we keep the old definition, but move it down in the file where it can see the BaseString class definition? Or can't this check if baseString is zero instead? Do we really need the bit in the flags? Does the bit help performance or something?

> -            UChar* data() const { return baseString->buf + baseString->preCapacity + offset; }
> +            UChar* data() const;

I'm surprised this can be non-inline without hurting performance slightly.

> +            void setBaseString(PassRefPtr<BaseString> base) { ASSERT(base != this); m_baseString = base.releaseRef(); }
> +            BaseString* baseString() { return reinterpret_cast<BaseString*>(baseIsSelf() ? this : m_baseString); }
> +            const BaseString* baseString() const { return const_cast<const BaseString*>(const_cast<Rep*>(this)->baseString());  }

It's annoying to have these reinterpret_cast here. Instead you should just define these inline functions outside the class definition, in a place that can see the definition of BaseString.

> +            void* m_baseString; // If "this" is a BaseString instance, it is 0. BaseString* otherwise.

Seems really ugly to have this be a void*. Can't it be a BaseString* instead? When is the fact that it's a void* helpful?

> +
> +            static BaseString& null() { return *nullBaseString; }
> +            static BaseString& empty() { return *emptyBaseString; }
> +        private:
> +            friend void initializeUString();

I'd prefer a blank line before the private line.

> +    inline UChar* UString::Rep::data() const
> +    {
> +        const UString::BaseString* base = baseString();
> +        return base->buf + base->preCapacity + offset;
> +    }

Do you need the UString:: qualifier on BaseString here?

> +        UString::BaseString* base = m_rep->baseString();

And here?

I'm going to say review- because of the reinterpret_cast question, and the question of whether we really do need a bit to tell whether something is a base string. But this does seem like a great change and about ready to land.
Comment 14 David Levin 2009-01-11 16:51:55 PST
Created attachment 26618 [details]
Addressed comments -- Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.


> > +static const unsigned numCharactersToStore = 0x100; // Note that SmallStrings::m_singleCharacterStrings should be the same size.
>We have COMPILE_ASSERT so that such things can be checked rather than just mentioned in comments.
Fixed.

> > +}  // namespace JSC
> > +
> > +#endif  // SmallStrings_h
> I suggest only one space before "//" to match the style elsewhere, rather than two spaces.
Fixed.

> > -            rep->capacity = newCapacity;
> > +            rep->baseString()->capacity = newCapacity;

> I would have expected you to use "base" here rather than re-fetching
> rep->baseString(). Is this better in some way?

rep has just been recreated, so the local variable base is no longer valid at this point.


> > +            UChar* data() const;
> I'm surprised this can be non-inline without hurting performance slightly.

Still inlined just not defined in the class definition.


> > +            void setBaseString(PassRefPtr<BaseString> base) { ASSERT(base != this); m_baseString = base.releaseRef(); }
> > +            BaseString* baseString() { return reinterpret_cast<BaseString*>(baseIsSelf() ? this : m_baseString); }
> > +            const BaseString* baseString() const { return const_cast<const BaseString*>(const_cast<Rep*>(this)->baseString());  }

> It's annoying to have these reinterpret_cast here. Instead you should just
> define these inline functions outside the class definition, in a place that can
> see the definition of BaseString.

Fixed.

> > +            static BaseString& empty() { return *emptyBaseString; }
> > +        private:
> > +            friend void initializeUString();
> I'd prefer a blank line before the private line.

Fixed.

> +    inline UChar* UString::Rep::data() const
> +    {
> +        const UString::BaseString* base = baseString();
> Do you need the UString:: qualifier on BaseString here?

Fixed.


> > +        UString::BaseString* base = m_rep->baseString();
>And here?
Fixed.


> > -            bool baseIsSelf() const { return baseString == this; }
>> +            bool baseIsSelf() const { return m_identifierTableAndFlags.isFlagSet(BaseStringFlag); }

> Why can't we keep the old definition, but move it down in the file where it can...

and

> > +            void* m_baseString; // If "this" is a BaseString instance, it is 0. BaseString* otherwise.
> Seems really ugly to have this be a void*. Can't it be a BaseString* instead?
> When is the fact that it's a void* helpful?

and

> I'm going to say review- because of the reinterpret_cast question, and the
> question of whether we really do need a bit to tell whether something is a base
> string.

In the next change, I'll make this point to something else if it is a BaseString.  This is the reason for the flag and putting 0 in there in the case of the BaseString and for making a it void* that no code can use directly.

Why re-use this field? In order to not make BaseString any bigger.  (I've been under the impression that this is desired.)
Comment 15 Darin Adler 2009-01-11 17:47:54 PST
(In reply to comment #14)
> In the next change, I'll make this point to something else if it is a
> BaseString.  This is the reason for the flag and putting 0 in there in the case
> of the BaseString and for making a it void* that no code can use directly.

A union is typically better than a void* to use the same storage for pointers of two different types.
Comment 16 Darin Adler 2009-01-11 17:51:57 PST
Comment on attachment 26618 [details]
Addressed comments -- Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.

> +    COMPILE_ASSERT(numCharactersToStore == sizeof(SmallStrings::m_singleCharacterStrings) / sizeof(SmallStrings::m_singleCharacterStrings[0]), 
> +                   WTF_IsNumCharactersConstInSyncWithClassUsage);

No need to put that WTF_ prefix on the COMPILE_ASSERT argument. That name is only used for a local symbol, which can't conflict with anything outside this function. The only place you *might* want that prefix would be if you used COMPILE_ASSERT in a header, and even there I don't think it’s needed.

Do you need those SmallStrings:: qualifiers? I'd think this would compile without them since we’re inside a SmallStrings constructor definition.

> +            void setBaseString(PassRefPtr<BaseString> base);

This is a classic case where we would not name the argument in the function declaration, since the argument's type and the function name make it perfectly clear without a name.

r=me
Comment 17 David Levin 2009-01-11 18:06:04 PST
Created attachment 26619 [details]
Addressed Darin's comments from the last review.

The only change is to fix the two comments on the R+ review from before.
Comment 18 Darin Adler 2009-01-11 19:19:59 PST
Comment on attachment 26619 [details]
Addressed Darin's comments from the last review.

Landed as http://trac.webkit.org/changeset/39815
Comment 19 Darin Adler 2009-01-11 19:20:13 PST
Comment on attachment 26619 [details]
Addressed Darin's comments from the last review.

Clearing review flag because it was landed.
Comment 20 David Levin 2009-01-12 04:39:18 PST
Created attachment 26629 [details]
Part3: Allow UString to share a UChar* with another class.

The underlying class used for sharing also allows for sharing across threads which will be used in StringImpl::copy.
Comment 21 David Levin 2009-01-26 22:23:19 PST
I currently like the names
  FastRefCountForThreads
or
  FastThreadableRefCount

instead of
  RefCountedMalloced

but I'd like a second opinion before I change it.  (The fact that it hold something that was malloc'ed is easily changeable if it is ever desired to be used to for another allocation type.)
 
Comment 22 Darin Adler 2009-01-27 08:57:25 PST
(In reply to comment #21)
> I currently like the names
>   FastRefCountForThreads
> or
>   FastThreadableRefCount
> 
> instead of
>   RefCountedMalloced
> 
> but I'd like a second opinion before I change it.  (The fact that it hold
> something that was malloc'ed is easily changeable if it is ever desired to be
> used to for another allocation type.)

I like the general direction here; the new suggested names seem better.

Despite our unfortunate use of the word "fast" in "fastMalloc", I generally would like to avoid the word "fast". It makes me want to ask the question, "When we would I ever *not* use this?" I don't want my other code to be slow!

This is easier to do in person, but generally I like to describe the class and what it's for in prose, and then select words from that discussion to name the class. That exercise often works well.
Comment 23 David Levin 2009-01-27 12:56:12 PST
>  describe the class and what it's for in prose

Here's my attempt:

The class is to allow for sharing an item with multiple objects across threads that can each control its lifetime independently (using a ref counting scheme) which is similar to RefCounted.  

It particularly useful in performance critical scenarios where ref/deref are done often without significantly impacting performance and avoids using atomic* until absolutely necessary (unlike ThreadSafeShared which does it on every call).

In return, the way to use it is slightly more complicated than ThreadSafeShared.  An instance of the class is not threadsafe, but it is designed to be usable in a multi-thread scenario.  The code has to call ::copy() to get another instance which can be used on another thread (while still holding on to the same object).
Comment 24 David Levin 2009-01-29 13:57:51 PST
Comment on attachment 26618 [details]
Addressed comments -- Part 2: Refactor UString::Rep into two classes which reduces memory usage for the base Rep class.

Removed r+ b/c it was landed.
Comment 25 David Levin 2009-02-02 16:30:10 PST
Created attachment 27266 [details]
Part 3: Add CrossThreadRefCounted.

Ready for review.

Also I broke up the previous (part 3) patch into something smaller with the hope that it will be easier to review like this, and came up with a better name for my class.  The class isn't in the build yet because it isn't used in this patch.  (It will be used in the next patch.)
Comment 26 Darin Adler 2009-03-09 09:38:37 PDT
Comment on attachment 27266 [details]
Part 3: Add CrossThreadRefCounted.

> +    // ThreadSafeShared can have a siginificant perf impact when used in low level classes

Typo here in the spelling of "significant".

> +        // data is freed using *fastFree*.

I believe this comment is wrong. I think the data is deallocated with delete, not fastFree.

> +        // Used to make an instance that can be used on another thread.
> +        PassRefPtr<CrossThreadRefCounted<T> > copy();

I think this naming is getting increasingly unclear. The functions that copy objects to be used on other threads probably need a name that somehow talks about threads rather then just the word "copy".

> +        bool isShared() const { return !m_refCounter.hasOneRef() || dataAccessMustBeThreadSafe(); }

I don't understand the semantics of this function. When would you call it and what would you do based on the result.

> +        bool dataAccessMustBeThreadSafe() const { return m_threadSafeRefCounter && !m_threadSafeRefCounter->hasOneRef(); }

Same question.

> +        ~CrossThreadRefCounted()
> +        {
> +            if (m_data && !m_threadSafeRefCounter)
> +                delete m_data;
> +        }

The null check of m_data is redundant with what the compiler will already do automatically, so I suggest omitting it.

> +#if USE(LOCKFREE_THREADSAFESHARED)
> +        if (atomicDecrement(&m_refCount) <= 0)
> +#else
> +        {
> +            MutexLocker locker(m_mutex);
> +            --m_refCount;
> +        }
> +        if (m_refCount <= 0)
> +#endif
> +            return true;

I think this would read more clearly if the return statement was repeated isnide the #if.

r=me
Comment 27 David Levin 2009-03-09 14:06:24 PDT
Comment on attachment 27266 [details]
Part 3: Add CrossThreadRefCounted.

Committed in r41536.
Comment 28 David Levin 2009-06-29 16:00:15 PDT
This is all working now.  Changing to FIXED.