162105 – Optimize URLParser

RESOLVED FIXED 162105

Optimize URLParser

https://bugs.webkit.org/show_bug.cgi?id=162105

Summary Optimize URLParser

Alex Christensen

Reported 2016-09-16 16:57:40 PDT

Optimize URLParser

Attachments
Patch (15.58 KB, patch) 2016-09-16 16:58 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Patch (8.24 KB, patch) 2016-09-20 18:18 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Patch (8.43 KB, patch) 2016-09-20 18:36 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Patch (8.66 KB, patch) 2016-09-20 19:20 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Patch (9.85 KB, patch) 2016-09-20 23:30 PDT, Alex Christensen	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Alex Christensen

Comment 1 2016-09-16 16:58:30 PDT

Created attachment 289143 [details] Patch

Alex Christensen

Comment 2 2016-09-16 17:00:45 PDT

Comment on attachment 289143 [details] Patch Incomplete and needs tests. Based partially on feedback from https://bugs.webkit.org/show_bug.cgi?id=161968

Alex Christensen

Comment 3 2016-09-20 18:18:24 PDT

Created attachment 289419 [details] Patch

Alex Christensen

Comment 4 2016-09-20 18:19:14 PDT

There's probably a better way to encode invalid code points. I'm not familiar with it, though.

Alex Christensen

Comment 5 2016-09-20 18:32:54 PDT

By the way, just using the result of U8_APPEND_UNSAFE with invalid code points matches the existing URL::parse. Should we just keep doing that, or try to match Chrome and Firefox?

Alex Christensen

Comment 6 2016-09-20 18:36:03 PDT

Created attachment 289421 [details] Patch

Alex Christensen

Comment 7 2016-09-20 19:20:50 PDT

Created attachment 289424 [details] Patch

Geoffrey Garen

Comment 8 2016-09-20 20:24:50 PDT

Comment on attachment 289424 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289424&action=review r=me > Source/WebCore/ChangeLog:8 > + Covered by new API tests. What's the benchmark result? > Source/WebCore/platform/URLParser.cpp:468 > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; you can use sizeof(replacementCharacterUTF8PercentEncoded) - 1 here.

Alex Christensen

Comment 9 2016-09-20 23:30:12 PDT

Created attachment 289433 [details] Patch

Alex Christensen

Comment 10 2016-09-20 23:34:40 PDT

(In reply to comment #8) > Comment on attachment 289424 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=289424&action=review > > r=me > > > Source/WebCore/ChangeLog:8 > > + Covered by new API tests. > > What's the benchmark result? This is about a 5% speedup on my url parsing benchmark. > > > Source/WebCore/platform/URLParser.cpp:468 > > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; > > you can use sizeof(replacementCharacterUTF8PercentEncoded) - 1 here. replacementCharacterUTF8PercentEncoded is a pointer.

Alex Christensen

Comment 11 2016-09-20 23:36:56 PDT

http://trac.webkit.org/changeset/206198

Darin Adler

Comment 12 2016-09-21 10:55:48 PDT

Comment on attachment 289433 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=289433&action=review > Source/WebCore/platform/URLParser.cpp:467 > +const char* replacementCharacterUTF8PercentEncoded = "%EF%BF%BD"; Should be either: const char* const replacementCharacterUTF8PercentEncoded = "%EF%BF%BD"; Or: const char replacementCharacterUTF8PercentEncoded[] = "%EF%BF%BD"; > Source/WebCore/platform/URLParser.cpp:468 > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; This should not be necessary with modern compilers. Instead you should be able to use strlen. The compilers do constant folding if you call strlen on a constant string. However, because of the issue above, replacementCharacterUTF8PercentEncoded was probably not a constant string, which I suppose made this necessary/valuable. > Source/WebCore/platform/URLParser.cpp:516 > + if (!U_IS_UNICODE_CHAR(codePoint)) { I don’t think the test coverage covers all the cases in U_IS_UNICODE_CHAR: a) U+D800-U+DFFF b) U+FDD0-U+FDEF (new with Unicode 3.1) c) U+xxFFFE and U+xxFFFF where xx is 00-10, a total of 34 different code points d) any value greater than U+10FFFF I think we are focusing on (a), but we’d need to try at least some characters in (b), (c), and (d) to be sure we match what the standard calls for and confirm that this is the correct macro to use. And cover some other unusual character to make sure they are not turned into the replacement characters, such as private use characters and characters right next to the range here.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Alex Christensen

Reported

2016-09-16 16:57 PDT

Modified

2016-09-21 10:55 PDT History

CC List

2 users Show

URL

Keywords

Depends on

Blocks