RESOLVED FIXED 50516
Avoid strlen() in AtomicString::fromUTF8
https://bugs.webkit.org/show_bug.cgi?id=50516
Summary Avoid strlen() in AtomicString::fromUTF8
Patrick R. Gansterer
Reported 2010-12-04 05:40:34 PST
see patch
Attachments
Patch (4.58 KB, patch)
2010-12-04 05:51 PST, Patrick R. Gansterer
no flags
XML parser benchmark (1.96 KB, text/html)
2010-12-16 02:20 PST, Patrick R. Gansterer
no flags
Patch (6.65 KB, patch)
2011-01-16 12:39 PST, Patrick R. Gansterer
no flags
Patch (6.66 KB, patch)
2011-01-16 13:26 PST, Patrick R. Gansterer
darin: review+
Patrick R. Gansterer
Comment 1 2010-12-04 05:51:36 PST
Patrick R. Gansterer
Comment 2 2010-12-04 06:18:53 PST
about 2% performance improvement
Build Bot
Comment 3 2010-12-04 07:04:19 PST
Eric Seidel (no email)
Comment 4 2010-12-12 02:33:02 PST
Comment on attachment 75606 [details] Patch Woh. Nearly a page of code for "2%"? On what benchmark? w/o more explanation, this is clearly not worth it.
Darin Adler
Comment 5 2010-12-13 10:05:39 PST
(In reply to comment #2) > about 2% performance improvement 2% improvement by what measure on what platform? I don’t agree with Eric in that I do think this change is probably a good idea. We can probably factor things so that a lot more of the code is shared and still retain the speed. But I’d like to understand more about what is made faster by this.
Patrick R. Gansterer
Comment 6 2010-12-13 10:16:46 PST
(In reply to comment #5) > (In reply to comment #2) > > about 2% performance improvement > > 2% improvement by what measure on what platform? I did a javascript loop with (new DOMParser()).parseFromString("...", "text/xml") and did the same time measuremet like in WebCore/benchmarks/parser on my MacBook Pro. I'll upload a test.
Eric Seidel (no email)
Comment 7 2010-12-13 11:36:46 PST
I think Darin and I actually agree. :)
Patrick R. Gansterer
Comment 8 2010-12-16 02:20:10 PST
Created attachment 76745 [details] XML parser benchmark
Patrick R. Gansterer
Comment 9 2010-12-16 02:33:13 PST
I've done the benchmark now again and got over 3% only with this patch. Tested on a MacBook Pro 2.4 GHz, 4GB RAM, OS X 10.6.5. avg median stdev OS X 10.6 original >4000 >4000 ;-) r74063 2185.72 2182 12.126895728091345 r74063 with patch 2108.01 2095.5 99.8814792641759 -3.56% -3,96%
Darin Adler
Comment 10 2010-12-16 10:54:21 PST
I feel conflicted about this. I like the idea of having a fast code path. But we’ve long believed main cause of slowness with XML parsing is the approach of using libxml2. A serious effort to speed up XML parsing would probably involve switching to a new parser or creating our own. Since one of the main issues with this patch is a bit too much replicated code, we should refactor this so we can share the code. We can possibly make the two fromUT8 functions share more code by using an inline function for everything after the call to calculateStringHashFromUTF8 or use a function template. Similarly, we can make calculateStringHashFromUTF8 share almost all the code by using a function template that is inlined. I think if we do that we can land this, but maybe we shouldn’t land it as is. It’s good to know it’s a measurable speedup on an XML parsing micro-benchmark.
Eric Seidel (no email)
Comment 11 2010-12-16 14:04:59 PST
My concerns are with maintainability of this code. The benchmark needs to be checked in, so that 5 years form now, others can validate that this is still a hot spot. The code needs to be well factored so that it can be re-used, and possibly removed at some future time when this is not a hot spot. We've certainly added and removed optimizations over the years. Each page of code we add to webkit has a maintenance cost. I'm hoping to minimize that cost. :) A recent example was the HTML parser. Which had all sorts of crazy optimizations which we dropped on the HTML5 parser re-write. Because testing (and finally splitting out the code into classes) showed they didn't matter. If what Darin says is true (that this isn't really the important hotspot for XML, rather libxml is), if a big attempt to optimize XML really replaced our libxml implementation, this code might not be needed at all (and thus simply blulk). But you've seen the shark profiles and I havent. :)
Patrick R. Gansterer
Comment 12 2011-01-16 12:39:08 PST
Early Warning System Bot
Comment 13 2011-01-16 12:50:00 PST
Patrick R. Gansterer
Comment 14 2011-01-16 13:26:49 PST
Patrick R. Gansterer
Comment 15 2011-02-01 10:11:29 PST
@darin: ping?
Darin Adler
Comment 16 2011-02-01 10:39:08 PST
Comment on attachment 79110 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=79110&action=review > Source/JavaScriptCore/wtf/unicode/UTF8.cpp:334 > + if (dataLength) > + *dataLength += 1; Seems better to not have if statements like this inside relatively hot loops. We should instead have callers supply a place to put the length even if they won’t use it. > Source/JavaScriptCore/wtf/unicode/UTF8.cpp:369 > + if (utf16Length) > + *utf16Length += 2; No reason to have these branches in since utf16Length is never null. > Source/JavaScriptCore/wtf/unicode/UTF8.h:75 > unsigned calculateStringHashFromUTF8(const char* data, const char* dataEnd, unsigned& utf16Length); > > + unsigned calculateStringHashAndLengthFromUTF8(const char* data, unsigned& dataLength, unsigned& utf16Length); I’d group these together rather than leaving a blank line.
Patrick R. Gansterer
Comment 17 2011-02-01 14:03:32 PST
Patrick R. Gansterer
Comment 18 2011-02-03 08:03:14 PST
WebKit Review Bot
Comment 19 2011-02-03 08:16:53 PST
http://trac.webkit.org/changeset/77491 might have broken Leopard Intel Release (Build)
Note You need to log in before you can comment on or make changes to this bug.