Optimize URLParser
Created attachment 289143 [details] Patch
Comment on attachment 289143 [details] Patch Incomplete and needs tests. Based partially on feedback from https://bugs.webkit.org/show_bug.cgi?id=161968
Created attachment 289419 [details] Patch
There's probably a better way to encode invalid code points. I'm not familiar with it, though.
By the way, just using the result of U8_APPEND_UNSAFE with invalid code points matches the existing URL::parse. Should we just keep doing that, or try to match Chrome and Firefox?
Created attachment 289421 [details] Patch
Created attachment 289424 [details] Patch
Comment on attachment 289424 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289424&action=review r=me > Source/WebCore/ChangeLog:8 > + Covered by new API tests. What's the benchmark result? > Source/WebCore/platform/URLParser.cpp:468 > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; you can use sizeof(replacementCharacterUTF8PercentEncoded) - 1 here.
Created attachment 289433 [details] Patch
(In reply to comment #8) > Comment on attachment 289424 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=289424&action=review > > r=me > > > Source/WebCore/ChangeLog:8 > > + Covered by new API tests. > > What's the benchmark result? This is about a 5% speedup on my url parsing benchmark. > > > Source/WebCore/platform/URLParser.cpp:468 > > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; > > you can use sizeof(replacementCharacterUTF8PercentEncoded) - 1 here. replacementCharacterUTF8PercentEncoded is a pointer.
http://trac.webkit.org/changeset/206198
Comment on attachment 289433 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289433&action=review > Source/WebCore/platform/URLParser.cpp:467 > +const char* replacementCharacterUTF8PercentEncoded = "%EF%BF%BD"; Should be either: const char* const replacementCharacterUTF8PercentEncoded = "%EF%BF%BD"; Or: const char replacementCharacterUTF8PercentEncoded[] = "%EF%BF%BD"; > Source/WebCore/platform/URLParser.cpp:468 > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; This should not be necessary with modern compilers. Instead you should be able to use strlen. The compilers do constant folding if you call strlen on a constant string. However, because of the issue above, replacementCharacterUTF8PercentEncoded was probably not a constant string, which I suppose made this necessary/valuable. > Source/WebCore/platform/URLParser.cpp:516 > + if (!U_IS_UNICODE_CHAR(codePoint)) { I don’t think the test coverage covers all the cases in U_IS_UNICODE_CHAR: a) U+D800-U+DFFF b) U+FDD0-U+FDEF (new with Unicode 3.1) c) U+xxFFFE and U+xxFFFF where xx is 00-10, a total of 34 different code points d) any value greater than U+10FFFF I think we are focusing on (a), but we’d need to try at least some characters in (b), (c), and (d) to be sure we match what the standard calls for and confirm that this is the correct macro to use. And cover some other unusual character to make sure they are not turned into the replacement characters, such as private use characters and characters right next to the range here.