Bug 30261

Summary: Optimization: Reduce calls to StringImpl's upper / lower / isLower
Product: WebKit Reporter: Jens Alfke <jens>
Component: WebCore Misc.Assignee: Jens Alfke <jens>
Status: RESOLVED FIXED    
Severity: Enhancement CC: commit-queue, darin, eric, ojan, simon.fraser
Priority: P3    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 48789    
Attachments:
Description Flags
patch
darin: review+
patch 2
darin: review+
patch 3 none

Description Jens Alfke 2009-10-09 14:38:17 PDT
* Darin pointed out that my recent optimization to StringImpl::lower() obviates the need for the isLower() check, which now just wastes time. It was only being used in a handful of places, so that was easy to do.
* I added lower() and upper() methods to AtomicString, which check whether the underlying StringImpl call was a no-op, and if so just return the receiver instead of doing the hash lookup again.
* By far the major usage of StringImpl::upper (hundreds of times loading yahoo.com, 1600 loading pitchfork.com) was in HTMLElement::nodeName(), to uppercase the tag name. There was already a FIXME comment suggesting that this conversion could be cached. I did so by adding a cache field to QualifiedName's impl, and an accessor localNameUpper(). This reduced the number of uppercase conversions to about 30 for a typical page, and since QualifiedNameImpls are themselves cached, the number of conversions stays low; I saw only 69 after surfing through a number of pages.
Comment 1 Jens Alfke 2009-10-09 19:52:02 PDT
Created attachment 40980 [details]
patch
Comment 2 Eric Seidel (no email) 2009-10-10 19:40:45 PDT
Comment on attachment 40980 [details]
patch

Makes sense to me.  Do we have any numbers from PLT runs to confirm this makes things faster?  I'm happy to r+ this, but I think that Darin should at least get a chance to see it.
Comment 3 Jens Alfke 2009-10-10 21:43:15 PDT
My concerns were more about memory usage — even small temporary strings can lead to memory fragmentation and increased high-water-marks — but it should have some nonzero effect on pure speed too.

Is there documentation on running the PLT? I could give it a shot.
Comment 4 Eric Seidel (no email) 2009-10-10 21:47:39 PDT
Running the PLT in safari is non-trivial.  First you have to actually have a copy of the PLT.  Quieting your machine (so that the results are stable) is the most difficult part.  Apple has some scripts for this sort of thing, some of which are in svn.webkit.org:
http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/run-pageloadtest

Probably easier to use Google's page cycler bots, but you'd have to find out of the try-bots run the performance tests.  If they do, then you could easily get numbers that way.
Comment 5 Jens Alfke 2009-10-11 10:07:03 PDT
Thanks, Eric. I'll give that a try on Monday.
Comment 6 Darin Adler 2009-10-11 20:18:30 PDT
Comment on attachment 40980 [details]
patch

>      const AtomicString& prefix() const { return m_impl->m_prefix; }
>      const AtomicString& localName() const { return m_impl->m_localName; }
> +    const AtomicString& localNameUpper() const;
>      const AtomicString& namespaceURI() const { return m_impl->m_namespace; }

I think localNameUpper should get its own paragraph in the class definition. The other three pieces are the fundamental building blocks that make up a QualifiedName, whereas localNameUpper is just a convenience function. Having it int he same paragraph gives it inappropriately high status ;-)

Are you sure the space/speed tradeoff is right on QualifiedName::localNameUpper? How did you gauge that?

I'll say r=me on this but I do look forward to hearing how the performance testing comes out.

I could imagine a different implementation that actually knows how to search the atomic string hash table without allocating a new string. There could be a fixed size stack buffer where we lowercase. Then we'd avoid the allocation in the case that the AtomicString already exists but be slower for the case where it does not. It's not clear this is important. It all depends on how hot the AtomicString::lower and upper functions are and how often they are called on a string that happens to already be in the atomic string table.
Comment 7 Jens Alfke 2009-10-13 13:42:58 PDT
Darin, thanks for nudging me to do some performance testing. I've been running Dromaeo, since there was a scare about my lower() optimization possibly affecting it.

Turns out that taking out isLower did affect Dromaeo because the initial test for no-op in my optimized lower() was slower than the existing isLower code, and also the AtomicString::lower method was doing more refcount fu than necessary. I spent some time tweaking the code and running both Dromaeo and micro-benchmarks, and got the performance to be equal to what it is without this patch.
Comment 8 Jens Alfke 2009-10-13 13:43:08 PDT
Created attachment 41124 [details]
patch 2
Comment 9 Darin Adler 2009-10-13 15:46:31 PDT
Comment on attachment 41124 [details]
patch 2

> +    // Note: This is a hot function in the Dromaeo benchmark.
> +    StringImpl *myImpl = impl();

The * here needs to be next to StringImpl, not next to myImpl.

Informally, the style in WebKit code is to do it like this:

    StringImpl* impl = this->impl();

Rather than using a prefix like "my".

> -    for (int i = 0; i < length; i++) {
> -        UChar c = m_data[i];
> -        ored |= c;
> -        noUpper = noUpper && !isASCIIUpper(c);
> +    for (const UChar* chp = m_data, *end = m_data + m_length; chp != end; chp++) {
> +        if (UNLIKELY(isASCIIUpper(*chp)))
> +            noUpper = false;
> +        ored |= *chp;
>      }

I suggest declaring the end pointer outside the loop to avoid the multiple initialization in the for, which is something we almost never do.

We've found in the past that indexing is as efficient as pointer chasing, so I'm surprised that this change was needed to speed things up.

r=me but you should consider both comments above. Some reviewers might review- just based on the StringImpl* part ;-)
Comment 10 Jens Alfke 2009-10-13 16:18:58 PDT
Created attachment 41136 [details]
patch 3

OK, this fixes those issues. I'm usually reluctant to shadow a member name with a local, but if that's the convention I'll change it.

I wrote a micro-benchmark that just runs that one method and the pointer-chasing was faster. Most of the added speed comes from the change to the way noUpper is set, though; you'd think the two forms would optimize the same, but they don't.
Comment 11 Yong Li 2009-10-19 08:44:49 PDT
Comment on attachment 41136 [details]
patch 3

Let commit-bot commit it
Comment 12 WebKit Commit Bot 2009-10-19 12:11:45 PDT
Comment on attachment 41136 [details]
patch 3

Clearing flags on attachment: 41136

Committed r49798: <http://trac.webkit.org/changeset/49798>
Comment 13 WebKit Commit Bot 2009-10-19 12:11:48 PDT
All reviewed patches have been landed.  Closing bug.
Comment 14 Simon Fraser (smfr) 2010-05-16 18:16:10 PDT
This change added a new pointer to QualifiedName, which in turn affects Attribute, MappedAttribute etc. We make thousands of these, so there's significant memory impact. I'm not sure adding m_localNameUpper was the best solution here.
Comment 15 Simon Fraser (smfr) 2010-05-16 20:22:26 PDT
Running the first 29 membuster tests, I see around 800 QualifiedNameImpls live. That's about 6Kb more memory in 64-bit with this change.
Comment 16 Eric Seidel (no email) 2010-11-01 16:06:40 PDT
Turns out this optimization caused a crash regression.  We just only discovered it a year later. :)  See bug 48789.