50516 – Avoid strlen() in AtomicString::fromUTF8

RESOLVED FIXED 50516

Avoid strlen() in AtomicString::fromUTF8

https://bugs.webkit.org/show_bug.cgi?id=50516

Summary Avoid strlen() in AtomicString::fromUTF8

Patrick R. Gansterer

Reported 2010-12-04 05:40:34 PST

see patch

Attachments
Patch (4.58 KB, patch) 2010-12-04 05:51 PST, Patrick R. Gansterer	no flags	Details Formatted Diff Diff
XML parser benchmark (1.96 KB, text/html) 2010-12-16 02:20 PST, Patrick R. Gansterer	no flags	Details
Patch (6.65 KB, patch) 2011-01-16 12:39 PST, Patrick R. Gansterer	no flags	Details Formatted Diff Diff
Patch (6.66 KB, patch) 2011-01-16 13:26 PST, Patrick R. Gansterer	darin: review+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Patrick R. Gansterer

Comment 1 2010-12-04 05:51:36 PST

Created attachment 75606 [details] Patch

Patrick R. Gansterer

Comment 2 2010-12-04 06:18:53 PST

about 2% performance improvement

Build Bot

Comment 3 2010-12-04 07:04:19 PST

Attachment 75606 [details] did not build on win: Build output: http://queues.webkit.org/results/6732040

Eric Seidel (no email)

Comment 4 2010-12-12 02:33:02 PST

Comment on attachment 75606 [details] Patch Woh. Nearly a page of code for "2%"? On what benchmark? w/o more explanation, this is clearly not worth it.

Darin Adler

Comment 5 2010-12-13 10:05:39 PST

(In reply to comment #2) > about 2% performance improvement 2% improvement by what measure on what platform? I don’t agree with Eric in that I do think this change is probably a good idea. We can probably factor things so that a lot more of the code is shared and still retain the speed. But I’d like to understand more about what is made faster by this.

Patrick R. Gansterer

Comment 6 2010-12-13 10:16:46 PST

(In reply to comment #5) > (In reply to comment #2) > > about 2% performance improvement > > 2% improvement by what measure on what platform? I did a javascript loop with (new DOMParser()).parseFromString("...", "text/xml") and did the same time measuremet like in WebCore/benchmarks/parser on my MacBook Pro. I'll upload a test.

Eric Seidel (no email)

Comment 7 2010-12-13 11:36:46 PST

I think Darin and I actually agree. :)

Patrick R. Gansterer

Comment 8 2010-12-16 02:20:10 PST

Created attachment 76745 [details] XML parser benchmark

Patrick R. Gansterer

Comment 9 2010-12-16 02:33:13 PST

I've done the benchmark now again and got over 3% only with this patch. Tested on a MacBook Pro 2.4 GHz, 4GB RAM, OS X 10.6.5. avg median stdev OS X 10.6 original >4000 >4000 ;-) r74063 2185.72 2182 12.126895728091345 r74063 with patch 2108.01 2095.5 99.8814792641759 -3.56% -3,96%

Darin Adler

Comment 10 2010-12-16 10:54:21 PST

I feel conflicted about this. I like the idea of having a fast code path. But we’ve long believed main cause of slowness with XML parsing is the approach of using libxml2. A serious effort to speed up XML parsing would probably involve switching to a new parser or creating our own. Since one of the main issues with this patch is a bit too much replicated code, we should refactor this so we can share the code. We can possibly make the two fromUT8 functions share more code by using an inline function for everything after the call to calculateStringHashFromUTF8 or use a function template. Similarly, we can make calculateStringHashFromUTF8 share almost all the code by using a function template that is inlined. I think if we do that we can land this, but maybe we shouldn’t land it as is. It’s good to know it’s a measurable speedup on an XML parsing micro-benchmark.

Eric Seidel (no email)

Comment 11 2010-12-16 14:04:59 PST

My concerns are with maintainability of this code. The benchmark needs to be checked in, so that 5 years form now, others can validate that this is still a hot spot. The code needs to be well factored so that it can be re-used, and possibly removed at some future time when this is not a hot spot. We've certainly added and removed optimizations over the years. Each page of code we add to webkit has a maintenance cost. I'm hoping to minimize that cost. :) A recent example was the HTML parser. Which had all sorts of crazy optimizations which we dropped on the HTML5 parser re-write. Because testing (and finally splitting out the code into classes) showed they didn't matter. If what Darin says is true (that this isn't really the important hotspot for XML, rather libxml is), if a big attempt to optimize XML really replaced our libxml implementation, this code might not be needed at all (and thus simply blulk). But you've seen the shark profiles and I havent. :)

Patrick R. Gansterer

Comment 12 2011-01-16 12:39:08 PST

Created attachment 79107 [details] Patch

Early Warning System Bot

Comment 13 2011-01-16 12:50:00 PST

Attachment 79107 [details] did not build on qt: Build output: http://queues.webkit.org/results/7529117

Patrick R. Gansterer

Comment 14 2011-01-16 13:26:49 PST

Created attachment 79110 [details] Patch

Patrick R. Gansterer

Comment 15 2011-02-01 10:11:29 PST

@darin: ping?

Darin Adler

Comment 16 2011-02-01 10:39:08 PST

Comment on attachment 79110 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79110&action=review > Source/JavaScriptCore/wtf/unicode/UTF8.cpp:334 > + if (dataLength) > + *dataLength += 1; Seems better to not have if statements like this inside relatively hot loops. We should instead have callers supply a place to put the length even if they won’t use it. > Source/JavaScriptCore/wtf/unicode/UTF8.cpp:369 > + if (utf16Length) > + *utf16Length += 2; No reason to have these branches in since utf16Length is never null. > Source/JavaScriptCore/wtf/unicode/UTF8.h:75 > unsigned calculateStringHashFromUTF8(const char* data, const char* dataEnd, unsigned& utf16Length); > > + unsigned calculateStringHashAndLengthFromUTF8(const char* data, unsigned& dataLength, unsigned& utf16Length); I’d group these together rather than leaving a blank line.

Patrick R. Gansterer

Comment 17 2011-02-01 14:03:32 PST

Committed r77297: <http://trac.webkit.org/changeset/77297>

Patrick R. Gansterer

Comment 18 2011-02-03 08:03:14 PST

Committed r77491: <http://trac.webkit.org/changeset/77491>

WebKit Review Bot

Comment 19 2011-02-03 08:16:53 PST

http://trac.webkit.org/changeset/77491 might have broken Leopard Intel Release (Build)

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component Web Template Framework

Assignee

Patrick R. Gansterer

Reported

2010-12-04 05:40 PST

Modified

2011-02-03 08:16 PST History

CC List

6 users Show

URL

Keywords

Depends on

51612 53538

Blocks

43085

Dependencies

tree graph