Bug 162228 - URLParser should parse serialized valid URLs faster than unknown input
Summary: URLParser should parse serialized valid URLs faster than unknown input
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-19 13:20 PDT by Alex Christensen
Modified: 2016-09-19 16:05 PDT (History)
1 user (show)

See Also:


Attachments
Patch (22.31 KB, patch)
2016-09-19 13:22 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (19.44 KB, patch)
2016-09-19 15:52 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2016-09-19 13:20:03 PDT
URLParser should parse serialized valid URLs faster than unknown input
Comment 1 Alex Christensen 2016-09-19 13:22:31 PDT
Created attachment 289250 [details]
Patch
Comment 2 Chris Dumez 2016-09-19 14:17:24 PDT
Comment on attachment 289250 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=289250&action=review

> Source/WebCore/ChangeLog:12
> +        No new tests.  No change in behavior.

nit: extra space :)

> Source/WebCore/platform/URLParser.cpp:855
> +template<typename CharacterType, bool serialized>

Why the extra template parameter here? It is not used.

> Source/WebCore/platform/URLParser.cpp:890
> +template<typename CharacterType, bool serialized>

If we reversed the parameters, we could let the compiler deduct the second one

> Source/WebCore/platform/URLParser.cpp:891
> +void incrementIterator(CodePointIterator<CharacterType>& iterator)

I would call this something like "skipWhileTabOrNewline" and move the first ++iterator out. This seems clearer and more reusable.
The call sites can then do:
++iterator;
if (!serialized)
    skipWhileTabOrNewline();

Alternatively, since this seems to be used a lot, you could simply rename this function to something clearer like: incrementIteratorSkippingTabAndNewLine.

> Source/WebCore/platform/URLParser.cpp:898
> +template<typename CharacterType, bool serialized>

Would look nicer at call sites to use an enum class instead of a boolean. e.g. enum class IsSerialized { No, Yes };

If we reversed the parameters, we could let the compiler deduct the second one
Comment 3 Alex Christensen 2016-09-19 15:52:14 PDT
Created attachment 289268 [details]
Patch
Comment 4 Alex Christensen 2016-09-19 15:54:08 PDT
Comment on attachment 289268 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=289268&action=review

> Source/WebCore/platform/URLParser.cpp:439
> +template<bool serialized>
>  inline static void utf8PercentEncode(UChar32 codePoint, Vector<LChar>& destination, bool(*isInCodeSet)(UChar32))

I made these templates instead of another parameter to be more consistent with the other uses of serialized.  The generated code is equivalent once optimized.

> Source/WebCore/platform/URLParser.cpp:890
> +    const bool serialized = true;
> +    if (input.is8Bit())
> +        return parse<serialized>(input.characters8(), input.length(), { }, UTF8Encoding());

I named the bool instead of adding an enum.  This makes this call site easier to read, and it makes the use sites less cluttered.

> Source/WebCore/platform/URLParser.cpp:1111
> +                incrementIteratorSkippingTabAndNewLine<serialized>(c);

I chose the verbose name.
Comment 5 Alex Christensen 2016-09-19 16:05:25 PDT
http://trac.webkit.org/changeset/206125