RESOLVED FIXED Bug 162105
Optimize URLParser
https://bugs.webkit.org/show_bug.cgi?id=162105
Summary Optimize URLParser
Alex Christensen
Reported 2016-09-16 16:57:40 PDT
Optimize URLParser
Attachments
Patch (15.58 KB, patch)
2016-09-16 16:58 PDT, Alex Christensen
no flags
Patch (8.24 KB, patch)
2016-09-20 18:18 PDT, Alex Christensen
no flags
Patch (8.43 KB, patch)
2016-09-20 18:36 PDT, Alex Christensen
no flags
Patch (8.66 KB, patch)
2016-09-20 19:20 PDT, Alex Christensen
no flags
Patch (9.85 KB, patch)
2016-09-20 23:30 PDT, Alex Christensen
no flags
Alex Christensen
Comment 1 2016-09-16 16:58:30 PDT
Alex Christensen
Comment 2 2016-09-16 17:00:45 PDT
Comment on attachment 289143 [details] Patch Incomplete and needs tests. Based partially on feedback from https://bugs.webkit.org/show_bug.cgi?id=161968
Alex Christensen
Comment 3 2016-09-20 18:18:24 PDT
Alex Christensen
Comment 4 2016-09-20 18:19:14 PDT
There's probably a better way to encode invalid code points. I'm not familiar with it, though.
Alex Christensen
Comment 5 2016-09-20 18:32:54 PDT
By the way, just using the result of U8_APPEND_UNSAFE with invalid code points matches the existing URL::parse. Should we just keep doing that, or try to match Chrome and Firefox?
Alex Christensen
Comment 6 2016-09-20 18:36:03 PDT
Alex Christensen
Comment 7 2016-09-20 19:20:50 PDT
Geoffrey Garen
Comment 8 2016-09-20 20:24:50 PDT
Comment on attachment 289424 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289424&action=review r=me > Source/WebCore/ChangeLog:8 > + Covered by new API tests. What's the benchmark result? > Source/WebCore/platform/URLParser.cpp:468 > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; you can use sizeof(replacementCharacterUTF8PercentEncoded) - 1 here.
Alex Christensen
Comment 9 2016-09-20 23:30:12 PDT
Alex Christensen
Comment 10 2016-09-20 23:34:40 PDT
(In reply to comment #8) > Comment on attachment 289424 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=289424&action=review > > r=me > > > Source/WebCore/ChangeLog:8 > > + Covered by new API tests. > > What's the benchmark result? This is about a 5% speedup on my url parsing benchmark. > > > Source/WebCore/platform/URLParser.cpp:468 > > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; > > you can use sizeof(replacementCharacterUTF8PercentEncoded) - 1 here. replacementCharacterUTF8PercentEncoded is a pointer.
Alex Christensen
Comment 11 2016-09-20 23:36:56 PDT
Darin Adler
Comment 12 2016-09-21 10:55:48 PDT
Comment on attachment 289433 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289433&action=review > Source/WebCore/platform/URLParser.cpp:467 > +const char* replacementCharacterUTF8PercentEncoded = "%EF%BF%BD"; Should be either: const char* const replacementCharacterUTF8PercentEncoded = "%EF%BF%BD"; Or: const char replacementCharacterUTF8PercentEncoded[] = "%EF%BF%BD"; > Source/WebCore/platform/URLParser.cpp:468 > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; This should not be necessary with modern compilers. Instead you should be able to use strlen. The compilers do constant folding if you call strlen on a constant string. However, because of the issue above, replacementCharacterUTF8PercentEncoded was probably not a constant string, which I suppose made this necessary/valuable. > Source/WebCore/platform/URLParser.cpp:516 > + if (!U_IS_UNICODE_CHAR(codePoint)) { I don’t think the test coverage covers all the cases in U_IS_UNICODE_CHAR: a) U+D800-U+DFFF b) U+FDD0-U+FDEF (new with Unicode 3.1) c) U+xxFFFE and U+xxFFFF where xx is 00-10, a total of 34 different code points d) any value greater than U+10FFFF I think we are focusing on (a), but we’d need to try at least some characters in (b), (c), and (d) to be sure we match what the standard calls for and confirm that this is the correct macro to use. And cover some other unusual character to make sure they are not turned into the replacement characters, such as private use characters and characters right next to the range here.
Note You need to log in before you can comment on or make changes to this bug.