30261 – Optimization: Reduce calls to StringImpl's upper / lower / isLower

RESOLVED FIXED 30261

Optimization: Reduce calls to StringImpl's upper / lower / isLower

https://bugs.webkit.org/show_bug.cgi?id=30261

Summary Optimization: Reduce calls to StringImpl's upper / lower / isLower

Jens Alfke

Reported 2009-10-09 14:38:17 PDT

* Darin pointed out that my recent optimization to StringImpl::lower() obviates the need for the isLower() check, which now just wastes time. It was only being used in a handful of places, so that was easy to do. * I added lower() and upper() methods to AtomicString, which check whether the underlying StringImpl call was a no-op, and if so just return the receiver instead of doing the hash lookup again. * By far the major usage of StringImpl::upper (hundreds of times loading yahoo.com, 1600 loading pitchfork.com) was in HTMLElement::nodeName(), to uppercase the tag name. There was already a FIXME comment suggesting that this conversion could be cached. I did so by adding a cache field to QualifiedName's impl, and an accessor localNameUpper(). This reduced the number of uppercase conversions to about 30 for a typical page, and since QualifiedNameImpls are themselves cached, the number of conversions stays low; I saw only 69 after surfing through a number of pages.

Attachments
patch (8.77 KB, patch) 2009-10-09 19:52 PDT, Jens Alfke	darin: review+	Details Formatted Diff Diff
patch 2 (9.69 KB, patch) 2009-10-13 13:43 PDT, Jens Alfke	darin: review+	Details Formatted Diff Diff
patch 3 (9.71 KB, patch) 2009-10-13 16:18 PDT, Jens Alfke	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Jens Alfke

Comment 1 2009-10-09 19:52:02 PDT

Created attachment 40980 [details] patch

Eric Seidel (no email)

Comment 2 2009-10-10 19:40:45 PDT

Comment on attachment 40980 [details] patch Makes sense to me. Do we have any numbers from PLT runs to confirm this makes things faster? I'm happy to r+ this, but I think that Darin should at least get a chance to see it.

Jens Alfke

Comment 3 2009-10-10 21:43:15 PDT

My concerns were more about memory usage — even small temporary strings can lead to memory fragmentation and increased high-water-marks — but it should have some nonzero effect on pure speed too. Is there documentation on running the PLT? I could give it a shot.

Eric Seidel (no email)

Comment 4 2009-10-10 21:47:39 PDT

Running the PLT in safari is non-trivial. First you have to actually have a copy of the PLT. Quieting your machine (so that the results are stable) is the most difficult part. Apple has some scripts for this sort of thing, some of which are in svn.webkit.org: http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/run-pageloadtest Probably easier to use Google's page cycler bots, but you'd have to find out of the try-bots run the performance tests. If they do, then you could easily get numbers that way.

Jens Alfke

Comment 5 2009-10-11 10:07:03 PDT

Thanks, Eric. I'll give that a try on Monday.

Darin Adler

Comment 6 2009-10-11 20:18:30 PDT

Comment on attachment 40980 [details] patch > const AtomicString& prefix() const { return m_impl->m_prefix; } > const AtomicString& localName() const { return m_impl->m_localName; } > + const AtomicString& localNameUpper() const; > const AtomicString& namespaceURI() const { return m_impl->m_namespace; } I think localNameUpper should get its own paragraph in the class definition. The other three pieces are the fundamental building blocks that make up a QualifiedName, whereas localNameUpper is just a convenience function. Having it int he same paragraph gives it inappropriately high status ;-) Are you sure the space/speed tradeoff is right on QualifiedName::localNameUpper? How did you gauge that? I'll say r=me on this but I do look forward to hearing how the performance testing comes out. I could imagine a different implementation that actually knows how to search the atomic string hash table without allocating a new string. There could be a fixed size stack buffer where we lowercase. Then we'd avoid the allocation in the case that the AtomicString already exists but be slower for the case where it does not. It's not clear this is important. It all depends on how hot the AtomicString::lower and upper functions are and how often they are called on a string that happens to already be in the atomic string table.

Jens Alfke

Comment 7 2009-10-13 13:42:58 PDT

Darin, thanks for nudging me to do some performance testing. I've been running Dromaeo, since there was a scare about my lower() optimization possibly affecting it. Turns out that taking out isLower did affect Dromaeo because the initial test for no-op in my optimized lower() was slower than the existing isLower code, and also the AtomicString::lower method was doing more refcount fu than necessary. I spent some time tweaking the code and running both Dromaeo and micro-benchmarks, and got the performance to be equal to what it is without this patch.

Jens Alfke

Comment 8 2009-10-13 13:43:08 PDT

Created attachment 41124 [details] patch 2

Darin Adler

Comment 9 2009-10-13 15:46:31 PDT

Comment on attachment 41124 [details] patch 2 > + // Note: This is a hot function in the Dromaeo benchmark. > + StringImpl *myImpl = impl(); The * here needs to be next to StringImpl, not next to myImpl. Informally, the style in WebKit code is to do it like this: StringImpl* impl = this->impl(); Rather than using a prefix like "my". > - for (int i = 0; i < length; i++) { > - UChar c = m_data[i]; > - ored |= c; > - noUpper = noUpper && !isASCIIUpper(c); > + for (const UChar* chp = m_data, *end = m_data + m_length; chp != end; chp++) { > + if (UNLIKELY(isASCIIUpper(*chp))) > + noUpper = false; > + ored |= *chp; > } I suggest declaring the end pointer outside the loop to avoid the multiple initialization in the for, which is something we almost never do. We've found in the past that indexing is as efficient as pointer chasing, so I'm surprised that this change was needed to speed things up. r=me but you should consider both comments above. Some reviewers might review- just based on the StringImpl* part ;-)

Jens Alfke

Comment 10 2009-10-13 16:18:58 PDT

Created attachment 41136 [details] patch 3 OK, this fixes those issues. I'm usually reluctant to shadow a member name with a local, but if that's the convention I'll change it. I wrote a micro-benchmark that just runs that one method and the pointer-chasing was faster. Most of the added speed comes from the change to the way noUpper is set, though; you'd think the two forms would optimize the same, but they don't.

Yong Li

Comment 11 2009-10-19 08:44:49 PDT

Comment on attachment 41136 [details] patch 3 Let commit-bot commit it

WebKit Commit Bot

Comment 12 2009-10-19 12:11:45 PDT

Comment on attachment 41136 [details] patch 3 Clearing flags on attachment: 41136 Committed r49798: <http://trac.webkit.org/changeset/49798>

WebKit Commit Bot

Comment 13 2009-10-19 12:11:48 PDT

All reviewed patches have been landed. Closing bug.

Simon Fraser (smfr)

Comment 14 2010-05-16 18:16:10 PDT

This change added a new pointer to QualifiedName, which in turn affects Attribute, MappedAttribute etc. We make thousands of these, so there's significant memory impact. I'm not sure adding m_localNameUpper was the best solution here.

Simon Fraser (smfr)

Comment 15 2010-05-16 20:22:26 PDT

Running the first 29 membuster tests, I see around 800 QualifiedNameImpls live. That's about 6Kb more memory in 64-bit with this change.

Eric Seidel (no email)

Comment 16 2010-11-01 16:06:40 PDT

Turns out this optimization caused a crash regression. We just only discovered it a year later. :) See bug 48789.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P3

Severity Enhancement

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component WebCore Misc.

Assignee

Jens Alfke

Reported

2009-10-09 14:38 PDT

Modified

2010-11-01 16:06 PDT History

CC List

5 users Show

URL

Keywords

Depends on

Blocks

48789

Dependencies

tree graph