Bug 27762

Summary: [v8] Cache atomic strings in externalized v8 strings
Product: WebKit Reporter: Christian Plesner Hansen <christian.plesner.hansen>
Component: WebCore Misc.Assignee: David Levin <levin>
Status: RESOLVED FIXED    
Severity: Normal CC: antonm, eric, levin, sam
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Initial version
none
Updated ChangeLog
none
Check that cached atomic string stay on the main thread
eric: review-
Fixed style issues; added main thread assert
none
Fixed whitespace issue
levin: review-
Next iteration
levin: review-
Fixed memory waste etc.
levin: review+
GetElementById benchmark
none
Shark sample of benchmark in WebKit r46722 built RELEASE for Mac with JSC none

Description Christian Plesner Hansen 2009-07-28 05:50:43 PDT
We currently only cache plain webkit strings in externalized v8 strings and then convert them on demand to atomic strings.  With this change we also cache the atomic strings the first time they are created.

Also, cleaned up some code in the v8 code generator and added a hint that the parameter to getElementById is actually an atomic string.  We may want to do that in more places but I'm leaving that for another changelist.
Comment 1 Christian Plesner Hansen 2009-07-28 05:52:57 PDT
Created attachment 33618 [details]
Initial version
Comment 2 Christian Plesner Hansen 2009-07-28 06:34:34 PDT
Created attachment 33620 [details]
Updated ChangeLog
Comment 3 anton muhin 2009-07-28 06:56:12 PDT
Just a minor comments/questions.

1) Looks like you don't use HintAtomic and there is no support for it in codegen.  Maybe this change should be just postponed?

2) do we need to store String now?  String(StringImpl*) ctor seems very cheap.  Maybe only store AtomicString even for String (it might though perform a find on creation, but it might benefit us later as most of DOM api uses AtomicStrings anyway)?

3) Safari has a nice optimization that if we don't find AtmoicString, we're guaranteed not to get node by the given idea for example.  Is it possible to incorporate it into Chrome as well?  Of course, not for this CL.
Comment 4 Christian Plesner Hansen 2009-07-28 07:06:08 PDT
> 1) Looks like you don't use HintAtomic and there is no support for it in
> codegen.  Maybe this change should be just postponed?

I don't use HintAtomic value but I do use the presence/absence of a V8Custom annotation in line 1574.

> 2) do we need to store String now?  String(StringImpl*) ctor seems very cheap. 
> Maybe only store AtomicString even for String (it might though perform a find
> on creation, but it might benefit us later as most of DOM api uses
> AtomicStrings anyway)?

The reason for the m_impl field is to make sure the String value is kept alive even after it has been converted to an AtomicString.  V8 may use the buffer underlying m_impl even after m_atomic_impl has been set so both have to be kept alive.

> 3) Safari has a nice optimization that if we don't find AtmoicString, we're
> guaranteed not to get node by the given idea for example.  Is it possible to
> incorporate it into Chrome as well?  Of course, not for this CL.

Good point, we should definitely do that.
Comment 5 anton muhin 2009-07-28 07:13:25 PDT
(In reply to comment #4)
> > 1) Looks like you don't use HintAtomic and there is no support for it in
> > codegen.  Maybe this change should be just postponed?
> 
> I don't use HintAtomic value but I do use the presence/absence of a V8Custom
> annotation in line 1574.
> 
> > 2) do we need to store String now?  String(StringImpl*) ctor seems very cheap. 
> > Maybe only store AtomicString even for String (it might though perform a find
> > on creation, but it might benefit us later as most of DOM api uses
> > AtomicStrings anyway)?
> 
> The reason for the m_impl field is to make sure the String value is kept alive
> even after it has been converted to an AtomicString.  V8 may use the buffer
> underlying m_impl even after m_atomic_impl has been set so both have to be kept
> alive.

AtomicString has String m_string field, cannot we reuse that instead?

> 
> > 3) Safari has a nice optimization that if we don't find AtmoicString, we're
> > guaranteed not to get node by the given idea for example.  Is it possible to
> > incorporate it into Chrome as well?  Of course, not for this CL.
> 
> Good point, we should definitely do that.
Comment 6 Christian Plesner Hansen 2009-07-28 07:20:14 PDT
Created attachment 33628 [details]
Check that cached atomic string stay on the main thread
Comment 7 Christian Plesner Hansen 2009-07-28 07:24:18 PDT
The issue was raised in another review that atomic string are not safe to use across threads.  I've added an assertion that only the main thread accesses the cached atomic string which, as far as I can tell, is a safe assumption to make.  It is still possible to use the atomic string on other threads as if it were a plain string; I assume that is safe.
Comment 8 Christian Plesner Hansen 2009-07-28 07:26:17 PDT
> AtomicString has String m_string field, cannot we reuse that instead?

That field is already being used to hold the string value shared between all AtomicString values with the same contents.
Comment 9 Eric Seidel (no email) 2009-07-28 10:27:12 PDT
Comment on attachment 33628 [details]
Check that cached atomic string stay on the main thread

There are tabs in your ChangeLog.

m_atomic_impl is not WebKit style.  m_atomicString would be, or m_atomicImpl (although we've historically tried to kill Impl, excepting a few places)

+        if (m_atomic_impl.isNull()) {
+            m_atomic_impl = AtomicString(m_impl);
+        }

is not WK style.

WK uses 4 space indent:
+    if (!v8String->MakeExternal(resource)) {
+      // In case of a failure delete the external resource as it was not used.
+      delete resource;
+    }

I think you should consider running WebKit's copy of cpplint, it's found in WebKitTools/Scripts/


JSC does this by passing UStrings directly to an AtomicString constructor (they're implicitly constructed when getElementById is called).

PassRefPtr<StringImpl> AtomicString::add(const JSC::UString& ustring)

is where the real magic is.  (Just hashes the string and makes a StringImpl wrapper if necessary.)

Does V8 not want to just do the same thing?  Add a new constructor for AtomicString for a Handle<String> (or whatever the appropriate v8 string base type is).
Comment 10 Eric Seidel (no email) 2009-07-28 10:27:45 PDT
The r- was for the style stuff mostly.  But I'm wondering if you really need to/want to add a custom attribute for this.
Comment 11 Christian Plesner Hansen 2009-07-28 10:49:44 PDT
> There are tabs in your ChangeLog.
> 
> m_atomic_impl is not WebKit style.  m_atomicString would be, or m_atomicImpl
> (although we've historically tried to kill Impl, excepting a few places)
> 
> +        if (m_atomic_impl.isNull()) {
> +            m_atomic_impl = AtomicString(m_impl);
> +        }
> 
> is not WK style.
> 
> WK uses 4 space indent:
> +    if (!v8String->MakeExternal(resource)) {
> +      // In case of a failure delete the external resource as it was not used.
> +      delete resource;
> +    }
> 
> I think you should consider running WebKit's copy of cpplint, it's found in
> WebKitTools/Scripts/

Sorry, it's been like two years since I last worked on WK code.  I'll lint and clean up the code.

> JSC does this by passing UStrings directly to an AtomicString constructor
> (they're implicitly constructed when getElementById is called).
> 
> PassRefPtr<StringImpl> AtomicString::add(const JSC::UString& ustring)
> 
> is where the real magic is.  (Just hashes the string and makes a StringImpl
> wrapper if necessary.)

It's the 'add' call I'm trying to get rid of; currently around 10% of getElementById is spent just looking up atomic strings and that seems to hold for most operations that use them.
Comment 12 Christian Plesner Hansen 2009-07-28 10:51:51 PDT
> The r- was for the style stuff mostly.  But I'm wondering if you really need
> to/want to add a custom attribute for this.

Just so I understand what you're asking: are you suggesting getting rid of the annotation altogether or replacing [V8Custom=...] with a special-purpose [Atomic] annotation?
Comment 13 Eric Seidel (no email) 2009-07-28 10:55:50 PDT
So do you believe JSC to be slow for the same reasons?  Or is add() somehow faster for JSC?
Comment 14 Christian Plesner Hansen 2009-07-28 11:20:04 PDT
> So do you believe JSC to be slow for the same reasons?  Or is add() somehow
> faster for JSC?

I haven't looked at JSC's code but unless they're doing something clever to avoid it I would expect they're doing at least one call to add() that could be saved if there was somewhere to cache the atomic string.
Comment 15 Eric Seidel (no email) 2009-07-28 11:31:59 PDT
I believe both UString and StringImpl know how to cache their hash values.  So add() is just a single lookup in the AtomicString hash table, you don't have to re-hash the string.
Comment 16 Christian Plesner Hansen 2009-07-28 11:34:58 PDT
(In reply to comment #15)
> I believe both UString and StringImpl know how to cache their hash values.  So
> add() is just a single lookup in the AtomicString hash table, you don't have to
> re-hash the string.

We currently cache a StringImpl so if they do hash caching we already get the benefits of that.  The lookup itself still shows up high on our profiles.

Also, to answer your question from before: I think add() is as fast for us as for JSC.
Comment 17 Eric Seidel (no email) 2009-07-28 11:39:14 PDT
Comment on attachment 33628 [details]
Check that cached atomic string stay on the main thread

If what you say is true, and that add() would show up just as much for JSC, it seems we should fix this in a different way which fixes it for all JS engines. :)  Hard to know without seeing the profile in question.
Comment 18 Christian Plesner Hansen 2009-07-28 23:23:31 PDT
Created attachment 33695 [details]
Fixed style issues; added main thread assert
Comment 19 Christian Plesner Hansen 2009-07-29 01:04:00 PDT
(In reply to comment #17)
> (From update of attachment 33628 [details])
> If what you say is true, and that add() would show up just as much for JSC, it
> seems we should fix this in a different way which fixes it for all JS engines.
> :)  Hard to know without seeing the profile in question.

I agree, a solution that covers both JSC and v8 would be better.  Just as an experiment I tried replacing my changes with a naive atomic string cache in StringImpls and it makes my getElementById microbenchmark ~8% faster.  So that might be worth doing if the cost of an extra word per StringImpl is acceptable.

I've only tried it in webkit/v8 though because webkit/jsc doesn't run on linux (as far as I know).
Comment 20 Eric Seidel (no email) 2009-07-29 09:23:17 PDT
Comment on attachment 33695 [details]
Fixed style issues; added main thread assert

Of course JSC runs on linux. :)  I still think v8-specific hacks are a bad thing.

Spacing:
+    if (!v8String->MakeExternal(resource)) {
+        // In case of a failure delete the external resource as it was not used.
+       delete resource;
+    }
Comment 21 Sam Weinig 2009-07-29 09:56:35 PDT
What test are you profiling against?
Comment 22 Christian Plesner Hansen 2009-07-29 10:13:09 PDT
> Of course JSC runs on linux. :)  I still think v8-specific hacks are a bad
> thing.

I know jsc runs on linux but I assumed that chrome-jsc doesn't.  I may be wrong
about that and it would be very useful if it does run.

I agree that a solution that improves both vms would be better but I don't know
enough about webkit to decide how best to implement it or decide if it should
be implemented at all.  V8's string representation happens to be particularly
well suited for this kind of caching because we allow transparent
representation changes; I don't consider this a hack but just taking advantage
of the hooks that are available in v8 and not, as far as I know, in jsc.
Comment 23 Christian Plesner Hansen 2009-07-29 10:17:13 PDT
> What test are you profiling against?

Right now I'm working specifically on document.getElementById because it performed abysmally in chromium so the benchmark I'm running is a tight loop that just calls document.getElementById over and over again.  But I expect the atom conversion part of getElementById is representative of all operations that use atomic strings.
Comment 24 Christian Plesner Hansen 2009-07-29 10:23:02 PDT
Created attachment 33721 [details]
Fixed whitespace issue
Comment 25 David Levin 2009-07-30 20:58:12 PDT
Comment on attachment 33721 [details]
Fixed whitespace issue

Just a few things to address and this can be done.

Please get the patch to be based from WebKit or else it will be harder to commit this for you (i.e. in this case, all path for files below should be prefixed with WebCore/).

> Index: ChangeLog
> ===================================================================
> --- ChangeLog	(revision 46477)
> +++ ChangeLog	(working copy)
> @@ -1,3 +1,21 @@
> +2009-07-28  Christian Plesner Hansen  <christian.plesner.hansen@gmail.com>
> +
> +    Reviewed by NOBODY (OOPS!).
> +
Add title and bug link like this:

[v8] Cache atomic strings in externalized v8 strings
https://bugs.webkit.org/show_bug.cgi?id=27762


> +    Cache atomic strings in externalized strings the first time
> +    the string is converted.  Added V8Custom IDL annotation for
> +    hinting that converting a string directly to an atomic string is a
> +    good idea.

Get rid of TABs in the changelog.



> Index: bindings/v8/V8Binding.cpp
> +    explicit WebCoreStringResource(const AtomicString& string)
> +        : m_impl(string.impl()), m_atomicImpl(string.impl())
Please format like this:
    explicit WebCoreStringResource(const AtomicString& string)
        : m_impl(string.impl())
        , m_atomicImpl(string.impl())




> +    AtomicString atomicString()
> +    {
> +        ASSERT(WTF::isMainThread());
> +        if (m_atomicImpl.isNull())
> +            m_atomicImpl = AtomicString(m_impl);

It seems that you should adjust the external memory here:
        if (m_atomicImpl.isNull()) {
            m_atomicImpl = AtomicString(m_impl);
            if (!m_impl.inTable())
                v8::V8::AdjustAmountOfExternalAllocatedMemory(2 * length());
        }

And change the destructor like this:

    virtual ~WebCoreStringResource()
    {
        ASSERT(WTF::isMainThread());
        int reducedExternalMemory = -2 * length();
        if (!m_impl.inTable())
            reducedExternalMemory *= 2;
        v8::V8::AdjustAmountOfExternalAllocatedMemory(reducedExternalMemory);
    }

>  private:
>      // A shallow copy of the string. Keeps the string buffer alive until the V8 engine garbage collects it.
>      String m_impl;
> +    // If this string is atomic or has been made atomic earlier the
> +    // atomic string is held here.  In the case where the string starts

Please only put one space after periods.

> +    // off non-atomic and becomes atomic later it is necessary to keep
> +    // the original string alive because v8 may keep derived pointers
> +    // into that string.
> +    AtomicString m_atomicImpl;

Why is it m_atomicImpl instead of just m_atomicString?


> @@ -130,13 +151,34 @@ String v8ValueToWebCoreString(v8::Handle
>  
>  AtomicString v8StringToAtomicWebCoreString(v8::Handle<v8::String> v8String)
>  {
> +    WebCoreStringResource* stringResource = static_cast<WebCoreStringResource*>(v8String->GetExternalStringResource());
> +    if (stringResource)
> +        return stringResource->atomicString();
> +
> +    int length = v8String->Length();
> +    if (!length) {
> +        // Avoid trying to morph empty strings, as they do not have enough room to contain the external reference.
> +        return StringImpl::empty();
> +    }
> +
> +    UChar* buffer;
> +    String plainResult = String::createUninitialized(length, buffer);
> +    v8String->Write(reinterpret_cast<uint16_t*>(buffer), 0, length);
> +    AtomicString result(plainResult);
> +
> +    WebCoreStringResource* resource = new WebCoreStringResource(result);
> +    if (!v8String->MakeExternal(resource)) {
> +        // In case of a failure delete the external resource as it was not used.
> +        delete resource;
> +    }
> +    return result;

This duplicates a fair amount of code from v8StringToWebCoreString. Can you factor the code to avoid this?

btw, if you create a new function (or add to an existing function) and want a "bool" parameter, please make it an enum instead.
Comment 26 Christian Plesner Hansen 2009-07-31 03:02:52 PDT
Created attachment 33865 [details]
Next iteration
Comment 27 Christian Plesner Hansen 2009-07-31 03:35:23 PDT
Fixed all issues except one.  I don't see any tabs in the ChangeLog but I had to create a new one anyway so if there were any they should be gone now.

There is one potential issue left though.  Is is correct that an atomic thread created on one thread cannot safely be used on another?  Or is it just that the operation of making a string atomic is not thread safe?  If it is the first case then as far as I can see plain String objects can't be shared either, which my changes assume they can.

> Please get the patch to be based from WebKit or else it will be harder to
> commit this for you (i.e. in this case, all path for files below should be
> prefixed with WebCore/).

On linux, using svn-create-patch from third_party/WebKit gave me an empty patch so I had to be a little creative.  Let me know if the patch gives you any problems.

> Why is it m_atomicImpl instead of just m_atomicString?

Renamed one to m_plainString and the other to m_atomicString to make it clear what the difference is.
Comment 28 David Levin 2009-07-31 09:01:29 PDT
Comment on attachment 33865 [details]
Next iteration

Just a few more thigns.

> Index: WebCore/ChangeLog
> +2009-07-31  Christian Plesner Hansen  <christian.plesner.hansen@gmail.com>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        [V8] Cache atomic strings in externalized v8 strings
> +        https://bugs.webkit.org/show_bug.cgi?id=27762
> +
> +        * bindings/scripts/CodeGeneratorV8.pm:
> +        * bindings/v8/V8Binding.cpp:
> +        (WebCore::WebCoreStringResource::WebCoreStringResource):
> +        (WebCore::WebCoreStringResource::~WebCoreStringResource):
> +        (WebCore::WebCoreStringResource::data):
> +        (WebCore::WebCoreStringResource::length):
> +        (WebCore::WebCoreStringResource::webcoreString):
> +        (WebCore::WebCoreStringResource::atomicString):
> +        (WebCore::WebCoreStringResource::forString):
> +        (WebCore::v8StringToWebCoreString):
> +        (WebCore::v8StringToAtomicWebCoreString):
> +        (WebCore::v8ValueToWebCoreString):
> +        (WebCore::v8ValueToAtomicWebCoreString):

I think this is ok for this change, but in general (brief) per function comments are encouraged to explain what happened in the function.  See other entries in the change log.

> Index: WebCore/bindings/v8/V8Binding.cpp

> @@ -49,33 +49,66 @@ namespace WebCore {
>  class WebCoreStringResource : public v8::String::ExternalStringResource {
>  public:
>      explicit WebCoreStringResource(const String& string)
> -        : m_impl(string.impl())
> +        : m_plainString(string)
>      {
Nice to add assert here:
        ASSERT(WTF::isMainThread());

>          v8::V8::AdjustAmountOfExternalAllocatedMemory(2 * length());
>      }
>  
> +    explicit WebCoreStringResource(const AtomicString& string)
> +        : m_plainString(string)
> +        , m_atomicString(string)
> +    {
> +        ASSERT(WTF::isMainThread());
> +        v8::V8::AdjustAmountOfExternalAllocatedMemory(2 * length());
> +    }
> +
>      virtual ~WebCoreStringResource()
>      {
> -        v8::V8::AdjustAmountOfExternalAllocatedMemory(-2 * length());

Nice to add assert here:
        ASSERT(WTF::isMainThread());

(Really all we care about is that everything is on the same thread and for now this will do.)


> +        int reducedExternalMemory = -2 * length();
> +        if (!m_plainString.impl()->inTable())
> +            reducedExternalMemory *= 2;
> +        v8::V8::AdjustAmountOfExternalAllocatedMemory(reducedExternalMemory);
>      }



> +    AtomicString atomicString()
> +    {
> +        ASSERT(WTF::isMainThread());
> +        if (m_atomicString.isNull()) {
> +            m_atomicString = AtomicString(m_plainString);
> +            if (m_plainString.impl()->inTable())

This should be
   if (!m_plainString.impl()->inTable())

> +                v8::V8::AdjustAmountOfExternalAllocatedMemory(2 * length());
> +        }
> +        return m_atomicString;
> +    }
> +
> +    static WebCoreStringResource *forString(v8::Handle<v8::String> v8String)

Move the "*"
   static WebCoreStringResource*

Also the name "forString" doesn't read that well to me. How about toStringResource?  (toString may have worked but there are so many *String* in here, that I thought it would be nice to distinguish.)


> +AtomicString v8StringToAtomicWebCoreString(v8::Handle<v8::String> v8String)
> +{
> +    WebCoreStringResource* stringResource = WebCoreStringResource::forString(v8String);
> +    if (!stringResource) {
> +        // If this string hasn't been externalized we force it now.

Nice to add a comma after the if clause.
   If this string hasn't been externalized, we force it now.



And as discussed, this solution removed the duplicate code but increased memory usage unnecessarily.
Comment 29 Christian Plesner Hansen 2009-07-31 09:12:12 PDT
Created attachment 33879 [details]
Fixed memory waste etc.
Comment 30 Christian Plesner Hansen 2009-07-31 09:15:03 PDT
> This should be
>    if (!m_plainString.impl()->inTable())

Eek, how did I get this wrong?!?

> Also the name "forString" doesn't read that well to me. How about
> toStringResource?  (toString may have worked but there are so many *String* in
> here, that I thought it would be nice to distinguish.)

That's a v8-ism inherited, I believe, from SmallTalk.  I've changed it.

> And as discussed, this solution removed the duplicate code but increased memory
> usage unnecessarily.

I've changed it to avoid that.
Comment 31 David Levin 2009-07-31 09:20:13 PDT
Comment on attachment 33879 [details]
Fixed memory waste etc.

If you want a review, it is good to set the r? (or no one will see this).  I'm adding it for you.
Comment 32 Christian Plesner Hansen 2009-07-31 09:21:56 PDT
Thanks!  I thought that was just the first time.
Comment 33 David Levin 2009-07-31 09:40:06 PDT
Comment on attachment 33879 [details]
Fixed memory waste etc.

Just a few things that can be fixed on landing..


> Index: WebCore/ChangeLog
> ===================================================================
> +2009-07-31  Christian Plesner Hansen  <christian.plesner.hansen@gmail.com>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        [V8] Cache atomic strings in externalized v8 strings
> +        https://bugs.webkit.org/show_bug.cgi?id=27762
> +

I should have said this before, but typically there is a comment here about why no layout tests were added if none were.

In this case, there is no change in functionality, so you could say:

        No change in behavior, so no tests.


> Index: WebCore/bindings/v8/V8Binding.cpp
> ===================================================================

> +String v8StringToWebCoreString(v8::Handle<v8::String> v8String, ExternalMode external,
> +        StringType type)

This is typically indented to align with the (.



> Index: WebCore/bindings/v8/V8Binding.h

Enum members should user InterCaps with an initial capital letter.

> +    enum ExternalMode { EXTERNALIZE, DONT_EXTERNALIZE };

Externalize, DoNotExternalize

> +    enum StringType { PLAIN, ATOMIC };

Since these enum are in the WebCore namespace (and cased like classes), it feels like the names for StringType should be a little more verbose.

Here's some simple alternatives:
PlainStringType
AtomicStringType
Comment 34 Eric Seidel (no email) 2009-07-31 10:03:22 PDT
Comment on attachment 33879 [details]
Fixed memory waste etc.

I still disagree with this approach.  1.  it seems silly to hold two StringImpl pointers, one in a String and one in an AtomicString.  2.  I think this is wrong to "fix" this issue just for V8.  If AtomicString::add is really such an issue, it should be fixed using StringImpl changes instead of in the individual JS engines.
Comment 35 Eric Seidel (no email) 2009-07-31 10:08:10 PDT
Comment on attachment 33879 [details]
Fixed memory waste etc.

You still haven't convinced me that this need to be fixed in V8 and not in WebCore.
Comment 36 Eric Seidel (no email) 2009-07-31 10:10:42 PDT
Do you have a copy of your test so that one of us can run it in a regular WebKit (non-chomium) build and see if this is really just a V8 problem?
Comment 37 David Levin 2009-07-31 11:41:00 PDT
Created attachment 33890 [details]
GetElementById benchmark

Responding to the several comments asking for the benchmark:

Christian told me where to get the benchmark.  I did a few minor changes (added the explanation text that he told me and made it one file instead of having a separate one for the js so that it was more self contained), so I take credit for any ugliness, mistakes, etc.
Comment 38 David Levin 2009-07-31 16:32:23 PDT
I tried to be brief as possible in this, but it still was long. Sorry.

My overall conclusion:
  If devs want to increase the perf of code using v8 bindings by taking advantage of a v8 feature and there doesn't seem a clear path to do something more generic, that seems like a really reasonable thing to do and not require an equivalent change in JSC. (The reverse goes without saying.)

Details:
1. The big issue seems to be "Why is this needed in V8 and not in something done for both JSC and V8?"

  For V8, this patch solution uses v8::string::MakeExternal (which stores the WebCore String and uses that buffer pointer inside of V8).

  JSC uses features of UString to speed up this code (-- for example, keeping the precomputed hash).  In addition, I believe that JSC could do even by taking advantage of the fact that it can share the UChar* buffer from the UString.

  After debugging both JSC and V8 on the benchmar, I saw no significant difference in way that AtomicStrings are formed from UStrings vs String, but the code path for UString was obviously invested in to make sure it preformed well.

  I'm not sure why this seemed to be a good optimization for Chromium (and has not been for Safari), but if there isn't something apparent in AtomicString that can be fixed, then it seems reasonable to use a feature of v8::String to handle it.

2. The second issue was "It seems silly to hold two StringImpl pointers, one in a String and one in an AtomicString."

  A. The WebCoreStringResource may be created with a WebCore::String to begin with.  (It doesn't make sense to incur the cost of an AtomicString if it isn't need.)

  B. When WebCoreStringResource is created, it calls MakeExternal with that WebCore::String.
  At this point the buffer in that WebCore::String must stay around for as long as the v8::String is alive.  

  C. Later, that same string value may come from v8 and want to be AtomicString.  In that case, the AtomicString is cached as well due to performance reasons.
Comment 39 Eric Seidel (no email) 2009-08-03 10:32:19 PDT
Ok.  Talking with Dave Levin over IM:

1.  JSC has this same hot-spot.
2.  JSC based WebKit is 4x as fast as V8 WebKit on this same benchmark.
3.  It's unclear what other things v8 is doing to make it so slow on this benchmark.
4.  We could add this optimization to JSC in a similar way by converting UString's hash into a hash/stringimpl* union and keeping around a bit to know which it was.  Or we could add a StringImpl* to UString.  Neither seems necessary at this point.

I no-longer object to this change.  I'm still confused as to if this is what accounts for the 4x slowdown in v8 vs. JSC on this benchmark.
Comment 40 David Levin 2009-08-03 10:44:02 PDT
I'll do a profile of chromium to compare to the one I did for webkit using nightlies and try to see why there is a huge difference here.
Comment 41 Eric Seidel (no email) 2009-08-03 10:44:57 PDT
Your free to move forward with this change, btw.  I no longer mean to stand in your way.
Comment 42 Eric Seidel (no email) 2009-08-03 12:29:31 PDT
DEBUG r46722 WebKit (JSC) with Safari Version 4.0 (5530.17, 532+) on my machine:
Runs:      10
Average:   0.5603
Deviation: 0.1520%
Values:	   0.5602 0.5605 0.5605 0.5587 0.5596 0.5599 0.5596 0.5612 0.5618 0.5609

RELEASE r46722 WebKit (JSC) with Safari Version 4.0 (5530.17, 532+) on my machine:
Runs:      10
Average:   6.580
Deviation: 0.2311%
Values:    6.591 6.585 6.585 6.579 6.542 6.591 6.585 6.585 6.560 6.591

RELEASE Chrome 3.0.196.0 for Mac:
Runs:      10
Average:   1.610
Deviation: 0.5771%
Values:    1.599 1.594 1.610 1.608 1.609 1.618 1.623 1.623 1.610 1.604

Larger numbers are better.
Comment 43 Eric Seidel (no email) 2009-08-03 12:32:58 PDT
Created attachment 33998 [details]
Shark sample of benchmark in WebKit r46722 built RELEASE for Mac with JSC

AtomicString::add is 4th on the list for JSC.  Most of the time seems to be spent ref-counting and looking up nodes via toJS(Node*)
Comment 44 Eric Seidel (no email) 2009-08-03 12:37:13 PDT
I don't have a version of Chrome with symbols, so although I made a Shark sample for that too, it's useless.
Comment 45 David Levin 2009-08-03 13:24:22 PDT
Comment on attachment 33879 [details]
Fixed memory waste etc.

Per conversation with Eric and the comments in this bug, I'm adding my r+ back to this patch.
Comment 46 David Levin 2009-08-03 17:44:57 PDT
Assigning to levin to fix style issue on landing.
Comment 47 David Levin 2009-08-03 22:42:15 PDT
http://trac.webkit.org/changeset/46745