WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
160811
Initial URLParser implementation
https://bugs.webkit.org/show_bug.cgi?id=160811
Summary
Initial URLParser implementation
Alex Christensen
Reported
2016-08-12 12:09:10 PDT
Initial URLParser implementation
Attachments
Patch
(12.47 KB, patch)
2016-08-12 12:09 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Patch
(12.69 KB, patch)
2016-08-12 12:57 PDT
,
Alex Christensen
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Alex Christensen
Comment 1
2016-08-12 12:09:58 PDT
Created
attachment 285926
[details]
Patch
Brady Eidson
Comment 2
2016-08-12 12:26:27 PDT
Comment on
attachment 285926
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=285926&action=review
r+ is contingent on fixing the buffer safety issues.
> Source/WebCore/platform/URLParser.cpp:40 > +static bool isC0Control(const StringView::CodePoints::Iterator& c) { return *c <= 0x001F; } > +static bool isC0ControlOrSpace(const StringView::CodePoints::Iterator& c) { return isC0Control(c) || *c == 0x0020; } > +static bool isTabOrNewline(const StringView::CodePoints::Iterator& c) { return *c == 0x0009 || *c == 0x000A || *c == 0x000D; } > +static bool isASCIIDigit(const StringView::CodePoints::Iterator& c) { return *c >= 0x0030 && *c <= 0x0039; } > +static bool isASCIIAlpha(const StringView::CodePoints::Iterator& c) { return (*c >= 0x0041 && *c <= 0x005A) || (*c >= 0x0061 && *c <= 0x007A); } > +static bool isASCIIAlphanumeric(const StringView::CodePoints::Iterator& c) { return isASCIIDigit(c) || isASCIIAlpha(c); }
Do we seriously not have any of these things anywhere else?
> Source/WebCore/platform/URLParser.cpp:55 > +// 4.2. URL parsing > +Optional<URL> URLParser::parse(const String& input, const URL& base, const TextEncoding&)
This comment literally repeats what the code on the line below it already says. Not sure it's useful.
> Source/WebCore/platform/URLParser.cpp:69 > + SchemeEndCheckForSlashes, // Not explicitly in spec, but in scheme state steps 2. 8.
Not sure this comment is useful/helpful.
> Source/WebCore/platform/URLParser.cpp:117 > + if (urlScheme == "file")
ASCIILiteral
> Source/WebCore/platform/URLParser.cpp:153 > + ++c; > + if (*c == '/') {
Deref'ing c after incrementing it without checking for end. :(
> Source/WebCore/platform/URLParser.cpp:181 > + ++c; > + buffer.append('/'); > + if (*c == '/') {
Deref'ing c after incrementing it without checking for end. :(
> Source/WebCore/platform/URLParser.cpp:275 > + if (*c == '/') { > + ++c; > + if (*c == '.') { > + ++c; > + if (*c == '.') > + notImplemented();
Deref'ing c after incrementing it without checking for end. :(
Brady Eidson
Comment 3
2016-08-12 12:26:39 PDT
And making the build work.
Alex Christensen
Comment 4
2016-08-12 12:56:44 PDT
(In reply to
comment #2
)
> Do we seriously not have any of these things anywhere else?
We probably do, but not ones that take these iterators. We can probably consolidate this code later.
> > Source/WebCore/platform/URLParser.cpp:69 > > + SchemeEndCheckForSlashes, // Not explicitly in spec, but in scheme state steps 2. 8. > > Not sure this comment is useful/helpful.
It is. All the other states are specified in the spec. This one is not.
> ASCIILiteral
That doesn't compile. This whole case needs to be rewritten anyways, hence the comment.
Alex Christensen
Comment 5
2016-08-12 12:57:30 PDT
Created
attachment 285934
[details]
Patch
Alex Christensen
Comment 6
2016-08-12 13:05:32 PDT
http://trac.webkit.org/changeset/204417
Brady Eidson
Comment 7
2016-08-12 15:32:19 PDT
(In reply to
comment #4
)
> (In reply to
comment #2
) > > Do we seriously not have any of these things anywhere else? > We probably do, but not ones that take these iterators. We can probably > consolidate this code later. > > > Source/WebCore/platform/URLParser.cpp:69 > > > + SchemeEndCheckForSlashes, // Not explicitly in spec, but in scheme state steps 2. 8. > > > > Not sure this comment is useful/helpful. > It is. All the other states are specified in the spec. This one is not.
I understand that, because that's what the comment says. I just don't see how it's relevant here. There should be a relatively high bar to breaking up source code with comments, and it's usually when the code doesn't explain itself. This name explains itself.
> > ASCIILiteral > That doesn't compile. This whole case needs to be rewritten anyways, hence > the comment.
Weird, why not?
Alex Christensen
Comment 8
2016-08-12 15:38:20 PDT
> > > ASCIILiteral > > That doesn't compile. This whole case needs to be rewritten anyways, hence > > the comment. > > Weird, why not?
/Users/alexchristensen/code/OpenSource/Source/WebCore/platform/URLParser.cpp:115:31: error: use of overloaded operator '==' is ambiguous (with operand types 'WTF::String' and 'WTF::ASCIILiteral') if (urlScheme == ASCIILiteral("file")) ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~ In file included from /Users/alexchristensen/code/OpenSource/Source/WebCore/platform/URLParser.cpp:27: In file included from /Users/alexchristensen/code/OpenSource/Source/WebCore/platform/URLParser.h:28: In file included from /Users/alexchristensen/code/OpenSource/Source/WebCore/platform/text/TextEncoding.h:29: In file included from /Users/alexchristensen/code/OpenSource/Source/WebCore/platform/text/TextCodec.h:34: /Users/alexchristensen/code/OpenSource/WebKitBuild/Debug/usr/local/include/wtf/text/WTFString.h:481:13: note: candidate function inline bool operator==(const String& a, const String& b) { return equal(a.impl(), b.impl()); } ^ /Users/alexchristensen/code/OpenSource/WebKitBuild/Debug/usr/local/include/wtf/text/WTFString.h:483:13: note: candidate function inline bool operator==(const String& a, const char* b) { return equal(a.impl(), reinterpret_cast<const LChar*>(b)); }
Darin Adler
Comment 9
2016-08-15 23:38:48 PDT
How is this different from the algorithms inside URL.cpp?
Alex Christensen
Comment 10
2016-08-16 10:54:09 PDT
The new URLParser will be more standards compliant. It is designed around one finite state machine that looks more like the spec, so as the spec evolves it will be easier to make changes to keep up. It uses code point iterators to ideally just go through the URL once instead of making several copies and looking at different parts of the input multiple times. The character class table and percent encode table in URL.cpp are wrong. There are a growing number of security bugs based on the way the current parser works. I decided it would be better to just start from scratch instead of trying to massage the old URL parser into a new one.
Darin Adler
Comment 11
2016-08-18 10:35:42 PDT
(In reply to
comment #10
)
> The new URLParser will be more standards compliant. > It is designed around one finite state machine that looks more like the > spec, so as the spec evolves it will be easier to make changes to keep up. > It uses code point iterators to ideally just go through the URL once instead > of making several copies and looking at different parts of the input > multiple times. > The character class table and percent encode table in URL.cpp are wrong. > There are a growing number of security bugs based on the way the current > parser works. I decided it would be better to just start from scratch > instead of trying to massage the old URL parser into a new one.
All that makes sense. But I don’t think it’s OK to end up having two URL parsers. How can we make sure we get rid of the old one?
Alex Christensen
Comment 12
2016-08-19 11:38:19 PDT
I plan to develop the new one as the URLParser class until it is good enough to replace the URL::parse and URL::init functions, then replace those functions with the code I'm developing now.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug