WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
50516
Avoid strlen() in AtomicString::fromUTF8
https://bugs.webkit.org/show_bug.cgi?id=50516
Summary
Avoid strlen() in AtomicString::fromUTF8
Patrick R. Gansterer
Reported
2010-12-04 05:40:34 PST
see patch
Attachments
Patch
(4.58 KB, patch)
2010-12-04 05:51 PST
,
Patrick R. Gansterer
no flags
Details
Formatted Diff
Diff
XML parser benchmark
(1.96 KB, text/html)
2010-12-16 02:20 PST
,
Patrick R. Gansterer
no flags
Details
Patch
(6.65 KB, patch)
2011-01-16 12:39 PST
,
Patrick R. Gansterer
no flags
Details
Formatted Diff
Diff
Patch
(6.66 KB, patch)
2011-01-16 13:26 PST
,
Patrick R. Gansterer
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Patrick R. Gansterer
Comment 1
2010-12-04 05:51:36 PST
Created
attachment 75606
[details]
Patch
Patrick R. Gansterer
Comment 2
2010-12-04 06:18:53 PST
about 2% performance improvement
Build Bot
Comment 3
2010-12-04 07:04:19 PST
Attachment 75606
[details]
did not build on win: Build output:
http://queues.webkit.org/results/6732040
Eric Seidel (no email)
Comment 4
2010-12-12 02:33:02 PST
Comment on
attachment 75606
[details]
Patch Woh. Nearly a page of code for "2%"? On what benchmark? w/o more explanation, this is clearly not worth it.
Darin Adler
Comment 5
2010-12-13 10:05:39 PST
(In reply to
comment #2
)
> about 2% performance improvement
2% improvement by what measure on what platform? I don’t agree with Eric in that I do think this change is probably a good idea. We can probably factor things so that a lot more of the code is shared and still retain the speed. But I’d like to understand more about what is made faster by this.
Patrick R. Gansterer
Comment 6
2010-12-13 10:16:46 PST
(In reply to
comment #5
)
> (In reply to
comment #2
) > > about 2% performance improvement > > 2% improvement by what measure on what platform?
I did a javascript loop with (new DOMParser()).parseFromString("...", "text/xml") and did the same time measuremet like in WebCore/benchmarks/parser on my MacBook Pro. I'll upload a test.
Eric Seidel (no email)
Comment 7
2010-12-13 11:36:46 PST
I think Darin and I actually agree. :)
Patrick R. Gansterer
Comment 8
2010-12-16 02:20:10 PST
Created
attachment 76745
[details]
XML parser benchmark
Patrick R. Gansterer
Comment 9
2010-12-16 02:33:13 PST
I've done the benchmark now again and got over 3% only with this patch. Tested on a MacBook Pro 2.4 GHz, 4GB RAM, OS X 10.6.5. avg median stdev OS X 10.6 original >4000 >4000 ;-)
r74063
2185.72 2182 12.126895728091345
r74063
with patch 2108.01 2095.5 99.8814792641759 -3.56% -3,96%
Darin Adler
Comment 10
2010-12-16 10:54:21 PST
I feel conflicted about this. I like the idea of having a fast code path. But we’ve long believed main cause of slowness with XML parsing is the approach of using libxml2. A serious effort to speed up XML parsing would probably involve switching to a new parser or creating our own. Since one of the main issues with this patch is a bit too much replicated code, we should refactor this so we can share the code. We can possibly make the two fromUT8 functions share more code by using an inline function for everything after the call to calculateStringHashFromUTF8 or use a function template. Similarly, we can make calculateStringHashFromUTF8 share almost all the code by using a function template that is inlined. I think if we do that we can land this, but maybe we shouldn’t land it as is. It’s good to know it’s a measurable speedup on an XML parsing micro-benchmark.
Eric Seidel (no email)
Comment 11
2010-12-16 14:04:59 PST
My concerns are with maintainability of this code. The benchmark needs to be checked in, so that 5 years form now, others can validate that this is still a hot spot. The code needs to be well factored so that it can be re-used, and possibly removed at some future time when this is not a hot spot. We've certainly added and removed optimizations over the years. Each page of code we add to webkit has a maintenance cost. I'm hoping to minimize that cost. :) A recent example was the HTML parser. Which had all sorts of crazy optimizations which we dropped on the HTML5 parser re-write. Because testing (and finally splitting out the code into classes) showed they didn't matter. If what Darin says is true (that this isn't really the important hotspot for XML, rather libxml is), if a big attempt to optimize XML really replaced our libxml implementation, this code might not be needed at all (and thus simply blulk). But you've seen the shark profiles and I havent. :)
Patrick R. Gansterer
Comment 12
2011-01-16 12:39:08 PST
Created
attachment 79107
[details]
Patch
Early Warning System Bot
Comment 13
2011-01-16 12:50:00 PST
Attachment 79107
[details]
did not build on qt: Build output:
http://queues.webkit.org/results/7529117
Patrick R. Gansterer
Comment 14
2011-01-16 13:26:49 PST
Created
attachment 79110
[details]
Patch
Patrick R. Gansterer
Comment 15
2011-02-01 10:11:29 PST
@darin: ping?
Darin Adler
Comment 16
2011-02-01 10:39:08 PST
Comment on
attachment 79110
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=79110&action=review
> Source/JavaScriptCore/wtf/unicode/UTF8.cpp:334 > + if (dataLength) > + *dataLength += 1;
Seems better to not have if statements like this inside relatively hot loops. We should instead have callers supply a place to put the length even if they won’t use it.
> Source/JavaScriptCore/wtf/unicode/UTF8.cpp:369 > + if (utf16Length) > + *utf16Length += 2;
No reason to have these branches in since utf16Length is never null.
> Source/JavaScriptCore/wtf/unicode/UTF8.h:75 > unsigned calculateStringHashFromUTF8(const char* data, const char* dataEnd, unsigned& utf16Length); > > + unsigned calculateStringHashAndLengthFromUTF8(const char* data, unsigned& dataLength, unsigned& utf16Length);
I’d group these together rather than leaving a blank line.
Patrick R. Gansterer
Comment 17
2011-02-01 14:03:32 PST
Committed
r77297
: <
http://trac.webkit.org/changeset/77297
>
Patrick R. Gansterer
Comment 18
2011-02-03 08:03:14 PST
Committed
r77491
: <
http://trac.webkit.org/changeset/77491
>
WebKit Review Bot
Comment 19
2011-02-03 08:16:53 PST
http://trac.webkit.org/changeset/77491
might have broken Leopard Intel Release (Build)
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug