WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 162105
Optimize URLParser
https://bugs.webkit.org/show_bug.cgi?id=162105
Summary
Optimize URLParser
Alex Christensen
Reported
2016-09-16 16:57:40 PDT
Optimize URLParser
Attachments
Patch
(15.58 KB, patch)
2016-09-16 16:58 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(8.24 KB, patch)
2016-09-20 18:18 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(8.43 KB, patch)
2016-09-20 18:36 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(8.66 KB, patch)
2016-09-20 19:20 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(9.85 KB, patch)
2016-09-20 23:30 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Alex Christensen
Comment 1
2016-09-16 16:58:30 PDT
Created
attachment 289143
[details]
Patch
Alex Christensen
Comment 2
2016-09-16 17:00:45 PDT
Comment on
attachment 289143
[details]
Patch Incomplete and needs tests. Based partially on feedback from
https://bugs.webkit.org/show_bug.cgi?id=161968
Alex Christensen
Comment 3
2016-09-20 18:18:24 PDT
Created
attachment 289419
[details]
Patch
Alex Christensen
Comment 4
2016-09-20 18:19:14 PDT
There's probably a better way to encode invalid code points. I'm not familiar with it, though.
Alex Christensen
Comment 5
2016-09-20 18:32:54 PDT
By the way, just using the result of U8_APPEND_UNSAFE with invalid code points matches the existing URL::parse. Should we just keep doing that, or try to match Chrome and Firefox?
Alex Christensen
Comment 6
2016-09-20 18:36:03 PDT
Created
attachment 289421
[details]
Patch
Alex Christensen
Comment 7
2016-09-20 19:20:50 PDT
Created
attachment 289424
[details]
Patch
Geoffrey Garen
Comment 8
2016-09-20 20:24:50 PDT
Comment on
attachment 289424
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=289424&action=review
r=me
> Source/WebCore/ChangeLog:8 > + Covered by new API tests.
What's the benchmark result?
> Source/WebCore/platform/URLParser.cpp:468 > +const size_t replacementCharacterUTF8PercentEncodedLength = 9;
you can use sizeof(replacementCharacterUTF8PercentEncoded) - 1 here.
Alex Christensen
Comment 9
2016-09-20 23:30:12 PDT
Created
attachment 289433
[details]
Patch
Alex Christensen
Comment 10
2016-09-20 23:34:40 PDT
(In reply to
comment #8
)
> Comment on
attachment 289424
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=289424&action=review
> > r=me > > > Source/WebCore/ChangeLog:8 > > + Covered by new API tests. > > What's the benchmark result?
This is about a 5% speedup on my url parsing benchmark.
> > > Source/WebCore/platform/URLParser.cpp:468 > > +const size_t replacementCharacterUTF8PercentEncodedLength = 9; > > you can use sizeof(replacementCharacterUTF8PercentEncoded) - 1 here.
replacementCharacterUTF8PercentEncoded is a pointer.
Alex Christensen
Comment 11
2016-09-20 23:36:56 PDT
http://trac.webkit.org/changeset/206198
Darin Adler
Comment 12
2016-09-21 10:55:48 PDT
Comment on
attachment 289433
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=289433&action=review
> Source/WebCore/platform/URLParser.cpp:467 > +const char* replacementCharacterUTF8PercentEncoded = "%EF%BF%BD";
Should be either: const char* const replacementCharacterUTF8PercentEncoded = "%EF%BF%BD"; Or: const char replacementCharacterUTF8PercentEncoded[] = "%EF%BF%BD";
> Source/WebCore/platform/URLParser.cpp:468 > +const size_t replacementCharacterUTF8PercentEncodedLength = 9;
This should not be necessary with modern compilers. Instead you should be able to use strlen. The compilers do constant folding if you call strlen on a constant string. However, because of the issue above, replacementCharacterUTF8PercentEncoded was probably not a constant string, which I suppose made this necessary/valuable.
> Source/WebCore/platform/URLParser.cpp:516 > + if (!U_IS_UNICODE_CHAR(codePoint)) {
I don’t think the test coverage covers all the cases in U_IS_UNICODE_CHAR: a) U+D800-U+DFFF b) U+FDD0-U+FDEF (new with Unicode 3.1) c) U+xxFFFE and U+xxFFFF where xx is 00-10, a total of 34 different code points d) any value greater than U+10FFFF I think we are focusing on (a), but we’d need to try at least some characters in (b), (c), and (d) to be sure we match what the standard calls for and confirm that this is the correct macro to use. And cover some other unusual character to make sure they are not turned into the replacement characters, such as private use characters and characters right next to the range here.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug