RESOLVED FIXED 160811
Initial URLParser implementation
https://bugs.webkit.org/show_bug.cgi?id=160811
Summary Initial URLParser implementation
Alex Christensen
Reported 2016-08-12 12:09:10 PDT
Initial URLParser implementation
Attachments
Patch (12.47 KB, patch)
2016-08-12 12:09 PDT, Alex Christensen
no flags
Patch (12.69 KB, patch)
2016-08-12 12:57 PDT, Alex Christensen
no flags
Alex Christensen
Comment 1 2016-08-12 12:09:58 PDT
Brady Eidson
Comment 2 2016-08-12 12:26:27 PDT
Comment on attachment 285926 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285926&action=review r+ is contingent on fixing the buffer safety issues. > Source/WebCore/platform/URLParser.cpp:40 > +static bool isC0Control(const StringView::CodePoints::Iterator& c) { return *c <= 0x001F; } > +static bool isC0ControlOrSpace(const StringView::CodePoints::Iterator& c) { return isC0Control(c) || *c == 0x0020; } > +static bool isTabOrNewline(const StringView::CodePoints::Iterator& c) { return *c == 0x0009 || *c == 0x000A || *c == 0x000D; } > +static bool isASCIIDigit(const StringView::CodePoints::Iterator& c) { return *c >= 0x0030 && *c <= 0x0039; } > +static bool isASCIIAlpha(const StringView::CodePoints::Iterator& c) { return (*c >= 0x0041 && *c <= 0x005A) || (*c >= 0x0061 && *c <= 0x007A); } > +static bool isASCIIAlphanumeric(const StringView::CodePoints::Iterator& c) { return isASCIIDigit(c) || isASCIIAlpha(c); } Do we seriously not have any of these things anywhere else? > Source/WebCore/platform/URLParser.cpp:55 > +// 4.2. URL parsing > +Optional<URL> URLParser::parse(const String& input, const URL& base, const TextEncoding&) This comment literally repeats what the code on the line below it already says. Not sure it's useful. > Source/WebCore/platform/URLParser.cpp:69 > + SchemeEndCheckForSlashes, // Not explicitly in spec, but in scheme state steps 2. 8. Not sure this comment is useful/helpful. > Source/WebCore/platform/URLParser.cpp:117 > + if (urlScheme == "file") ASCIILiteral > Source/WebCore/platform/URLParser.cpp:153 > + ++c; > + if (*c == '/') { Deref'ing c after incrementing it without checking for end. :( > Source/WebCore/platform/URLParser.cpp:181 > + ++c; > + buffer.append('/'); > + if (*c == '/') { Deref'ing c after incrementing it without checking for end. :( > Source/WebCore/platform/URLParser.cpp:275 > + if (*c == '/') { > + ++c; > + if (*c == '.') { > + ++c; > + if (*c == '.') > + notImplemented(); Deref'ing c after incrementing it without checking for end. :(
Brady Eidson
Comment 3 2016-08-12 12:26:39 PDT
And making the build work.
Alex Christensen
Comment 4 2016-08-12 12:56:44 PDT
(In reply to comment #2) > Do we seriously not have any of these things anywhere else? We probably do, but not ones that take these iterators. We can probably consolidate this code later. > > Source/WebCore/platform/URLParser.cpp:69 > > + SchemeEndCheckForSlashes, // Not explicitly in spec, but in scheme state steps 2. 8. > > Not sure this comment is useful/helpful. It is. All the other states are specified in the spec. This one is not. > ASCIILiteral That doesn't compile. This whole case needs to be rewritten anyways, hence the comment.
Alex Christensen
Comment 5 2016-08-12 12:57:30 PDT
Alex Christensen
Comment 6 2016-08-12 13:05:32 PDT
Brady Eidson
Comment 7 2016-08-12 15:32:19 PDT
(In reply to comment #4) > (In reply to comment #2) > > Do we seriously not have any of these things anywhere else? > We probably do, but not ones that take these iterators. We can probably > consolidate this code later. > > > Source/WebCore/platform/URLParser.cpp:69 > > > + SchemeEndCheckForSlashes, // Not explicitly in spec, but in scheme state steps 2. 8. > > > > Not sure this comment is useful/helpful. > It is. All the other states are specified in the spec. This one is not. I understand that, because that's what the comment says. I just don't see how it's relevant here. There should be a relatively high bar to breaking up source code with comments, and it's usually when the code doesn't explain itself. This name explains itself. > > ASCIILiteral > That doesn't compile. This whole case needs to be rewritten anyways, hence > the comment. Weird, why not?
Alex Christensen
Comment 8 2016-08-12 15:38:20 PDT
> > > ASCIILiteral > > That doesn't compile. This whole case needs to be rewritten anyways, hence > > the comment. > > Weird, why not? /Users/alexchristensen/code/OpenSource/Source/WebCore/platform/URLParser.cpp:115:31: error: use of overloaded operator '==' is ambiguous (with operand types 'WTF::String' and 'WTF::ASCIILiteral') if (urlScheme == ASCIILiteral("file")) ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~ In file included from /Users/alexchristensen/code/OpenSource/Source/WebCore/platform/URLParser.cpp:27: In file included from /Users/alexchristensen/code/OpenSource/Source/WebCore/platform/URLParser.h:28: In file included from /Users/alexchristensen/code/OpenSource/Source/WebCore/platform/text/TextEncoding.h:29: In file included from /Users/alexchristensen/code/OpenSource/Source/WebCore/platform/text/TextCodec.h:34: /Users/alexchristensen/code/OpenSource/WebKitBuild/Debug/usr/local/include/wtf/text/WTFString.h:481:13: note: candidate function inline bool operator==(const String& a, const String& b) { return equal(a.impl(), b.impl()); } ^ /Users/alexchristensen/code/OpenSource/WebKitBuild/Debug/usr/local/include/wtf/text/WTFString.h:483:13: note: candidate function inline bool operator==(const String& a, const char* b) { return equal(a.impl(), reinterpret_cast<const LChar*>(b)); }
Darin Adler
Comment 9 2016-08-15 23:38:48 PDT
How is this different from the algorithms inside URL.cpp?
Alex Christensen
Comment 10 2016-08-16 10:54:09 PDT
The new URLParser will be more standards compliant. It is designed around one finite state machine that looks more like the spec, so as the spec evolves it will be easier to make changes to keep up. It uses code point iterators to ideally just go through the URL once instead of making several copies and looking at different parts of the input multiple times. The character class table and percent encode table in URL.cpp are wrong. There are a growing number of security bugs based on the way the current parser works. I decided it would be better to just start from scratch instead of trying to massage the old URL parser into a new one.
Darin Adler
Comment 11 2016-08-18 10:35:42 PDT
(In reply to comment #10) > The new URLParser will be more standards compliant. > It is designed around one finite state machine that looks more like the > spec, so as the spec evolves it will be easier to make changes to keep up. > It uses code point iterators to ideally just go through the URL once instead > of making several copies and looking at different parts of the input > multiple times. > The character class table and percent encode table in URL.cpp are wrong. > There are a growing number of security bugs based on the way the current > parser works. I decided it would be better to just start from scratch > instead of trying to massage the old URL parser into a new one. All that makes sense. But I don’t think it’s OK to end up having two URL parsers. How can we make sure we get rid of the old one?
Alex Christensen
Comment 12 2016-08-19 11:38:19 PDT
I plan to develop the new one as the URLParser class until it is good enough to replace the URL::parse and URL::init functions, then replace those functions with the code I'm developing now.
Note You need to log in before you can comment on or make changes to this bug.