RESOLVED FIXED 40172
HTML5 parser should be within 1% of old parser performance
https://bugs.webkit.org/show_bug.cgi?id=40172
Summary HTML5 parser should be within 1% of old parser performance
Adam Barth
Reported 2010-06-04 10:27:07 PDT
HTML5 parser should be within 1% of old parser performance
Attachments
Patch (9.74 KB, patch)
2010-06-04 10:30 PDT, Adam Barth
no flags
Patch (11.15 KB, patch)
2010-06-04 11:19 PDT, Adam Barth
no flags
Patch (11.15 KB, patch)
2010-06-04 13:17 PDT, Adam Barth
abarth: review-
patch with check converted to assert (11.67 KB, patch)
2010-06-04 22:48 PDT, Adam Barth
darin: review+
Adam Barth
Comment 1 2010-06-04 10:30:06 PDT
Eric Seidel (no email)
Comment 2 2010-06-04 10:36:03 PDT
Early Warning System Bot
Comment 3 2010-06-04 10:37:10 PDT
Eric Seidel (no email)
Comment 4 2010-06-04 10:39:27 PDT
Comment on attachment 57893 [details] Patch Seems we should understand which of these changes actually matter. The m_appropriateEndTagName one seems unrelated. Also, this changes behavior. This should fix the \0 layout tests.
Darin Adler
Comment 5 2010-06-04 10:41:35 PDT
I don’t think that we should limit ourselves to “within 1%”. It would be fine with me if it was 10% faster ;-)
Darin Adler
Comment 6 2010-06-04 10:43:36 PDT
Comment on attachment 57893 [details] Patch Looks like this doesn't compile either because of distinct Vector types. To assign between vectors, currently the inline capacities need to be identical. Also, it's likely to be inefficient to assign between vectors with inline capacities since the characters need to be copied. We could add code to make assignment between vectors with different inline capacities possible, but it might be worth looking at a design that to uses pointers a bit more so that is not needed.
WebKit Review Bot
Comment 7 2010-06-04 10:59:56 PDT
Adam Barth
Comment 8 2010-06-04 11:03:50 PDT
There is a function for assigning between vectors of different inline capacities: template<typename T, size_t inlineCapacity> template<size_t otherCapacity> Vector<T, inlineCapacity>& Vector<T, inlineCapacity>::operator=(const Vector<T, otherCapacity>& other) I had to change this locally to get it to build, but I must have lost the change somehow.
Adam Barth
Comment 9 2010-06-04 11:07:25 PDT
> Also, it's likely to be inefficient to assign between vectors with inline capacities since the > characters need to be copied. We could add code to make assignment between vectors with > different inline capacities possible, but it might be worth looking at a design that to uses > pointers a bit more so that is not needed. That's an interesting idea. I'd have to look at the code to how we can avoid more copies. For the moment, this patch seems like a big perf win, so I'm inclined to start with it (once it builds) and then iterate.
Adam Barth
Comment 10 2010-06-04 11:12:01 PDT
Benchmark numbers: HTML5 parser w/ patch: Running 20 times Ignoring warm-up run (3176) 2805 2795 2810 2810 2789 2817 2822 2814 2811 2840 2826 2826 2831 2845 2841 2832 2849 2857 2849 2841 avg 2825.5 stdev 18.481071397513727 Old parser: Running 20 times Ignoring warm-up run (3103) 2769 2781 2770 2771 2786 2783 2775 2780 2796 2787 2784 2808 2799 2795 2814 2807 2810 2805 2833 2822 avg 2793.75 stdev 17.699929378390188
Adam Barth
Comment 11 2010-06-04 11:19:27 PDT
WebKit Review Bot
Comment 12 2010-06-04 12:24:53 PDT
Eric Seidel (no email)
Comment 13 2010-06-04 13:08:28 PDT
../../JavaScriptCore/wtf/Vector.h: In member function ‘WTF::Vector<T, inlineCapacity>& WTF::Vector<T, inlineCapacity>::operator=(const WTF::Vector<T, otherCapacity>&) [with unsigned int otherCapacity = 1024u, T = short unsigned int, unsigned int inlineCapacity = 32u]’: ../../WebCore/html/HTML5Lexer.cpp:1589: instantiated from here ../../JavaScriptCore/wtf/Vector.h:706: error: invalid cast from type ‘const WTF::Vector<short unsigned int, 1024u>’ to type ‘const void*’ make: *** [out/Release/obj.target/webcore/../../WebCore/html/HTML5Lexer.o] Error 1
Eric Seidel (no email)
Comment 14 2010-06-04 13:09:42 PDT
Comment on attachment 57899 [details] Patch Would be nice to know which part of this is actually the big speedup. In general I like the concept, but can't really r+ since it would break the world. :(
Adam Barth
Comment 15 2010-06-04 13:17:58 PDT
WebKit Review Bot
Comment 16 2010-06-04 13:36:47 PDT
Darin Adler
Comment 17 2010-06-04 17:34:13 PDT
Comment on attachment 57908 [details] Patch Why reinterpret_cast? I'm pretty sure that static_cast will work. Another possibility would be to use local variables or a helper function then you would need no casting at all. // Helper function that does a type conversion. inline bool typelessPointersAreEqual(const void* a, const void* b) { return a == b; } if (typelessPointersAreEqual(&other, this)) return *this; I think that's way better than a reinterpret_cast.
Adam Barth
Comment 18 2010-06-04 19:14:42 PDT
> I think that's way better than a reinterpret_cast. Ok. They're pretty close semantically because static_cast to void* doesn't do any of the multi-inheritance offsets that static_cast usually does. Using a helper function is just getting the compiler to generate the static_cast implicitly.
Adam Barth
Comment 19 2010-06-04 19:21:36 PDT
Actually, the more I think about it, that check can probably be removed. If the objects occupy the same address, they must have the same inline capacity, which means we'll call the more specific template. We can change it into an ASSERT if we're scared of just deleting it.
Darin Adler
Comment 20 2010-06-04 22:27:21 PDT
(In reply to comment #18) > They're pretty close semantically because static_cast to void* doesn't do any of the multi-inheritance offsets that static_cast usually does. Sure, what the two casts do is pretty close. But the point of the differently named casts is more about cases where they the casts can't be used be used. A reinterpret_cast could convert an integer to a pointer for example, but when you see a static_cast you know that’s not going on even without looking at the rest of the line of code. I don’t want to see these sorts of conversions to void* when I grep through the code for all reinterpret_casts. > Using a helper function is just getting the compiler to generate the static_cast implicitly. That’s a funny way to put it. In my opinion, relying on normal type conversion rules and avoiding a cast is not “just a way to generate [a cast]”. Casts can force the compiler to perform conversions that normally would not be done. Type conversion does a more limited set. It’s great, though, that you can avoid the check entirely. I do think you should just omit it.
Adam Barth
Comment 21 2010-06-04 22:48:12 PDT
Created attachment 57962 [details] patch with check converted to assert
Adam Barth
Comment 22 2010-06-04 22:57:48 PDT
Thanks Darin.
Adam Barth
Comment 23 2010-06-04 23:26:03 PDT
Note You need to log in before you can comment on or make changes to this bug.