Bug 165521 - Add Link header support for preload.
Summary: Add Link header support for preload.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-07 01:45 PST by Yoav Weiss
Modified: 2017-01-24 02:01 PST (History)
4 users (show)

See Also:


Attachments
Patch (33.37 KB, patch)
2016-12-08 13:14 PST, Yoav Weiss
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews100 for mac-yosemite (1.18 MB, application/zip)
2016-12-08 14:17 PST, Build Bot
no flags Details
Archive of layout-test-results from ews114 for mac-yosemite (1.85 MB, application/zip)
2016-12-08 14:27 PST, Build Bot
no flags Details
Archive of layout-test-results from ews104 for mac-yosemite-wk2 (994.14 KB, application/zip)
2016-12-08 14:27 PST, Build Bot
no flags Details
Archive of layout-test-results from ews125 for ios-simulator-wk2 (18.72 MB, application/zip)
2016-12-08 14:42 PST, Build Bot
no flags Details
Patch (33.72 KB, patch)
2016-12-08 16:53 PST, Yoav Weiss
no flags Details | Formatted Diff | Diff
Patch (33.87 KB, patch)
2016-12-09 13:09 PST, Yoav Weiss
no flags Details | Formatted Diff | Diff
Patch (38.06 KB, patch)
2016-12-13 06:59 PST, Yoav Weiss
no flags Details | Formatted Diff | Diff
Patch (40.34 KB, patch)
2017-01-18 02:27 PST, Yoav Weiss
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews103 for mac-elcapitan (865.03 KB, application/zip)
2017-01-18 03:35 PST, Build Bot
no flags Details
Patch (40.42 KB, patch)
2017-01-18 14:00 PST, Yoav Weiss
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yoav Weiss 2016-12-07 01:45:54 PST
Add Link header support for preload.
Comment 1 Yoav Weiss 2016-12-08 13:14:00 PST
Created attachment 296554 [details]
Patch
Comment 2 Build Bot 2016-12-08 14:17:31 PST
Comment on attachment 296554 [details]
Patch

Attachment 296554 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/2655791

New failing tests:
http/tests/preload/dynamic_remove_preload_href.html
Comment 3 Build Bot 2016-12-08 14:17:33 PST
Created attachment 296566 [details]
Archive of layout-test-results from ews100 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 4 Build Bot 2016-12-08 14:27:38 PST
Comment on attachment 296554 [details]
Patch

Attachment 296554 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/2655814

New failing tests:
http/tests/preload/dynamic_remove_preload_href.html
Comment 5 Build Bot 2016-12-08 14:27:40 PST
Created attachment 296568 [details]
Archive of layout-test-results from ews114 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 6 Build Bot 2016-12-08 14:27:50 PST
Comment on attachment 296554 [details]
Patch

Attachment 296554 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/2655897

New failing tests:
http/tests/preload/dynamic_remove_preload_href.html
Comment 7 Build Bot 2016-12-08 14:27:53 PST
Created attachment 296569 [details]
Archive of layout-test-results from ews104 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 8 Build Bot 2016-12-08 14:42:11 PST
Comment on attachment 296554 [details]
Patch

Attachment 296554 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/2655931

New failing tests:
http/tests/preload/dynamic_remove_preload_href.html
Comment 9 Build Bot 2016-12-08 14:42:14 PST
Created attachment 296573 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.6
Comment 10 Yoav Weiss 2016-12-08 16:53:33 PST
Created attachment 296593 [details]
Patch
Comment 11 Yoav Weiss 2016-12-09 13:09:08 PST
Created attachment 296683 [details]
Patch
Comment 12 Alex Christensen 2016-12-12 16:42:36 PST
Comment on attachment 296683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=296683&action=review

> Source/WebCore/loader/LinkHeader.cpp:39
> +static bool isWhitespace(CharType chr)
> +{
> +    return (chr == ' ') || (chr == '\t');
> +}

This function should have a better name.  We probably already have a function that checks for tab or space.

> Source/WebCore/loader/LinkHeader.cpp:45
> +static bool isValidURLChar(CharType chr)
> +{
> +    return !isWhitespace(chr) && chr != '>';
> +}

Why do we think a character is valid for a URL if it's not ' ', '\t' or '>'?  URLs can have spaces and '>', and they ignore tabs while parsing.

> Source/WebCore/loader/LinkLoader.cpp:161
> +std::optional<std::unique_ptr<LinkPreloadResourceClient>> LinkLoader::preloadIfNeeded(const LinkRelAttribute& relAttribute, const URL& href, Document& document, const String& as,

std::unique_ptr already has a value that indicates it is empty.  The std::optional seems excessive.

> Source/WebCore/loader/LinkLoader.cpp:162
> +    const String& crossOriginMode, LinkLoader* loader, LinkLoaderClient* client)

This doesn't need to be on a new line.  Also, making client a pointer instead of a reference seems like a step in the wrong direction.  This function has one call site, and it could pass a reference.

> Source/WebCore/loader/LinkLoader.cpp:164
> +    fprintf(stderr, "preloadIfNeeded\n");

Do we really want to print to stderr every time this function is called?

> LayoutTests/http/tests/preload/resources/download_resources_from_header.php:2
> +header("Link: <../resources/dummy.js>; rel=preload; as=script", false);

We need more tests with non-ASCII characters, invalid URLs, attempts at making CORS requests from link headers, null characters, vertical tabs, URLs with '>' in them, etc.
Comment 13 Yoav Weiss 2016-12-13 06:52:36 PST
Comment on attachment 296683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=296683&action=review

>> Source/WebCore/loader/LinkHeader.cpp:39
>> +}
> 
> This function should have a better name.  We probably already have a function that checks for tab or space.

The only one I found is a static inline one in html/track/WebVTTParser.h
I could move this one to platform/network/HTTPParsers and reuse it in there, if you think it's a good idea. In the mean time, I changed its name to isSpaceOrTab.

>> Source/WebCore/loader/LinkHeader.cpp:45
>> +}
> 
> Why do we think a character is valid for a URL if it's not ' ', '\t' or '>'?  URLs can have spaces and '>', and they ignore tabs while parsing.

The function's name was misleading. It was aimed at finding the end of the URL part of the header. At the same time, it might be better/safer to only read URLs up to the first invalid char. I changed the function accordingly.

>> Source/WebCore/loader/LinkLoader.cpp:161
>> +std::optional<std::unique_ptr<LinkPreloadResourceClient>> LinkLoader::preloadIfNeeded(const LinkRelAttribute& relAttribute, const URL& href, Document& document, const String& as,
> 
> std::unique_ptr already has a value that indicates it is empty.  The std::optional seems excessive.

True. Removed the std::optional

>> Source/WebCore/loader/LinkLoader.cpp:162
>> +    const String& crossOriginMode, LinkLoader* loader, LinkLoaderClient* client)
> 
> This doesn't need to be on a new line.  Also, making client a pointer instead of a reference seems like a step in the wrong direction.  This function has one call site, and it could pass a reference.

OK regarding the new line.
Regarding making LinkLoaderClient a reference, this function is called from both loadLinksFromHeader and from loadLink.
In loadLinksFromHeader there is no client (as it doesn't make sense to register load/error events in this context), which is the reason I chose to pass it as a pointer, and pass nullptr when a client is missing.

>> Source/WebCore/loader/LinkLoader.cpp:164
>> +    fprintf(stderr, "preloadIfNeeded\n");
> 
> Do we really want to print to stderr every time this function is called?

No, I don't. Omission from debugging phase :/

>> LayoutTests/http/tests/preload/resources/download_resources_from_header.php:2
>> +header("Link: <../resources/dummy.js>; rel=preload; as=script", false);
> 
> We need more tests with non-ASCII characters, invalid URLs, attempts at making CORS requests from link headers, null characters, vertical tabs, URLs with '>' in them, etc.

Good call. Adding a test for invalid URLs. What do you want to test with CORS requests?
Comment 14 Yoav Weiss 2016-12-13 06:59:50 PST
Created attachment 297011 [details]
Patch
Comment 15 Yoav Weiss 2017-01-13 06:30:51 PST
Friendly ping :)
Comment 16 Alex Christensen 2017-01-13 13:43:08 PST
Comment on attachment 297011 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=297011&action=review

> Source/WebCore/loader/LinkHeader.cpp:42
> +static bool isValidURLChar(CharType chr)

This seems very strange.  Determining if a character is valid for a URL without context isn't something you should be able to do.  For example, if you have an invalid surrogate pair with a valid begin and an invalid end that is one of the characters that is determined here to be "invalid" then this would naively think the second half of an invalid surrogate pair is an invalid character when in fact the whole thing is something that will be URL encoded to be the replacement character.  It's bad enough that CSP has its own definition of what parsing part of a URL from a header should be (and maybe we should align with that "parsing") that is separate from the URL spec.  Let's not add a third definition of what a URL is supposed to look like.

> Source/WebCore/loader/LinkHeader.cpp:95
> +static bool parseURL(CharType*& position, CharType* end, String& url)

end should definitely be const
I don't like the name "parseURL".  This function doesn't parse the URL, it just finds the end of the URL.
This should return a std::optional<String>

> Source/WebCore/loader/LinkHeader.cpp:122
> +    ASSERT(position <= end);

This assertion should be at every function that has a position and an end.
It seems like an iterator type would be better than raw pointers in this whole patch.  StringView.h has code point and code unit iterators.

> Source/WebCore/loader/LinkHeader.cpp:138
> +static bool parseParameterDelimiter(CharType*& position, CharType* end, bool& isValid)

This should return std::optional<bool> instead of a bool with a bool& parameter.

> Source/WebCore/loader/LinkHeader.cpp:274
> +    else if (name == LinkParameterAnchor)

It looks like a switch would be better here.

> Source/WebCore/loader/LinkHeader.h:78
> +    LinkHeader& operator[](size_t i) { return m_headerSet[i]; }

This doesn't have bounds checks.

> Source/WebCore/loader/LinkHeader.h:82
> +    template <typename CharType>

typename CharacterType

> Source/WebCore/loader/LinkHeader.h:83
> +    void init(CharType* headerValue, unsigned len);

unsigned len -> size_t length
Or even better, use an iterator type.  See below.

> Source/WebCore/loader/LinkLoader.cpp:99
> +        if (url == baseURL)

What if url and baseURL only differ in the fragment?

> LayoutTests/http/tests/preload/resources/download_resources_from_header.php:2
> +header("Link: <../resources/dummy.js>; rel=preload; as=script", false);

Can we put a complete absolute URL here?  We should add a test with a URL that looks like http://localhost:8000/preload/resources/something.html on a page with CSP that doesn't allow loading from localhost.  We should also add a test with an invalid URL like http://host:invalidport/ just to see what happens.  It's hard to make a URL fail to parse when only giving the path.  A fragment-only URL might also be interesting. Also a lowercase "link", a missing colon, no <> or URL, <> but no characters inside it, and a missing space after the colon.  Also what about emojis, invalid surrogate pairs, and null characters in the middle of the URL?
Comment 17 Yoav Weiss 2017-01-18 02:20:06 PST
Comment on attachment 297011 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=297011&action=review

>> Source/WebCore/loader/LinkHeader.cpp:42
>> +static bool isValidURLChar(CharType chr)
> 
> This seems very strange.  Determining if a character is valid for a URL without context isn't something you should be able to do.  For example, if you have an invalid surrogate pair with a valid begin and an invalid end that is one of the characters that is determined here to be "invalid" then this would naively think the second half of an invalid surrogate pair is an invalid character when in fact the whole thing is something that will be URL encoded to be the replacement character.  It's bad enough that CSP has its own definition of what parsing part of a URL from a header should be (and maybe we should align with that "parsing") that is separate from the URL spec.  Let's not add a third definition of what a URL is supposed to look like.

OK. Since URL validation is not critical here (and will happen further down the pipe), I've changed this to just detect ">" as the URL boundary.

>> Source/WebCore/loader/LinkHeader.cpp:95
>> +static bool parseURL(CharType*& position, CharType* end, String& url)
> 
> end should definitely be const
> I don't like the name "parseURL".  This function doesn't parse the URL, it just finds the end of the URL.
> This should return a std::optional<String>

Good point regarding end. Changed parseURL into FindURLBoundaries with an optional return value.

>> Source/WebCore/loader/LinkHeader.cpp:122
>> +    ASSERT(position <= end);
> 
> This assertion should be at every function that has a position and an end.
> It seems like an iterator type would be better than raw pointers in this whole patch.  StringView.h has code point and code unit iterators.

Good point regarding the assertion. Added everywhere else.

Regarding switching to StringView's iterators I tried to convert the logic to that, but it seems to be missing some concepts needed here (skipWhile, skipExactly and/or getting an iterator for a position in the middle of a StringView).
For now I kept it using pointers, but if we want to go the iterator route, it'd require adding a new iterator type (e.g. ParserStringView) which will include the required concepts. Let me know if you think this is the route we should take and if so, where should such a class be located.

>> Source/WebCore/loader/LinkHeader.cpp:138
>> +static bool parseParameterDelimiter(CharType*& position, CharType* end, bool& isValid)
> 
> This should return std::optional<bool> instead of a bool with a bool& parameter.

OK

>> Source/WebCore/loader/LinkHeader.cpp:274
>> +    else if (name == LinkParameterAnchor)
> 
> It looks like a switch would be better here.

switched

>> Source/WebCore/loader/LinkHeader.h:78
>> +    LinkHeader& operator[](size_t i) { return m_headerSet[i]; }
> 
> This doesn't have bounds checks.

This  as well as size() don't seem to be required. Removed both

>> Source/WebCore/loader/LinkHeader.h:82
>> +    template <typename CharType>
> 
> typename CharacterType

OK

>> Source/WebCore/loader/LinkHeader.h:83
>> +    void init(CharType* headerValue, unsigned len);
> 
> unsigned len -> size_t length
> Or even better, use an iterator type.  See below.

switched to size_t for now

>> Source/WebCore/loader/LinkLoader.cpp:99
>> +        if (url == baseURL)
> 
> What if url and baseURL only differ in the fragment?

Good point. Switched to use equalIgnoringFragmentIdentifier

>> LayoutTests/http/tests/preload/resources/download_resources_from_header.php:2
>> +header("Link: <../resources/dummy.js>; rel=preload; as=script", false);
> 
> Can we put a complete absolute URL here?  We should add a test with a URL that looks like http://localhost:8000/preload/resources/something.html on a page with CSP that doesn't allow loading from localhost.  We should also add a test with an invalid URL like http://host:invalidport/ just to see what happens.  It's hard to make a URL fail to parse when only giving the path.  A fragment-only URL might also be interesting. Also a lowercase "link", a missing colon, no <> or URL, <> but no characters inside it, and a missing space after the colon.  Also what about emojis, invalid surrogate pairs, and null characters in the middle of the URL?

Added tests for CSP blocked, invalid port, fragment-only, lower-case link, missing colon, missing spaces, missing URL, etc.
(Invalid port passes the isPreloaded() test, but I'm guessing that it's being blocked further down the stack)

I'm not sure how to add emojis, invalid surrogate pairs and null chars in PHP... Any examples you can point me to?
Comment 18 Yoav Weiss 2017-01-18 02:27:50 PST
Created attachment 299128 [details]
Patch
Comment 19 Build Bot 2017-01-18 03:35:07 PST
Comment on attachment 299128 [details]
Patch

Attachment 299128 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/2908914

New failing tests:
imported/w3c/web-platform-tests/html/browsers/history/the-location-interface/location-protocol-setter-non-broken.html
Comment 20 Build Bot 2017-01-18 03:35:11 PST
Created attachment 299131 [details]
Archive of layout-test-results from ews103 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews103  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 21 Alex Christensen 2017-01-18 10:10:22 PST
(In reply to comment #17)
> >> Source/WebCore/loader/LinkHeader.cpp:122
> >> +    ASSERT(position <= end);
> > 
> > This assertion should be at every function that has a position and an end.
> > It seems like an iterator type would be better than raw pointers in this whole patch.  StringView.h has code point and code unit iterators.
> 
> Good point regarding the assertion. Added everywhere else.
> 
> Regarding switching to StringView's iterators I tried to convert the logic
> to that, but it seems to be missing some concepts needed here (skipWhile,
> skipExactly and/or getting an iterator for a position in the middle of a
> StringView).
> For now I kept it using pointers, but if we want to go the iterator route,
> it'd require adding a new iterator type (e.g. ParserStringView) which will
> include the required concepts. Let me know if you think this is the route we
> should take and if so, where should such a class be located.
Yeah, it's probably not worth adding another iterator type just for this.
> >> Source/WebCore/loader/LinkHeader.cpp:274
> >> +    else if (name == LinkParameterAnchor)
> > 
> > It looks like a switch would be better here.
> 
> switched
lol
> >> LayoutTests/http/tests/preload/resources/download_resources_from_header.php:2
> >> +header("Link: <../resources/dummy.js>; rel=preload; as=script", false);
> > 
> > Can we put a complete absolute URL here?  We should add a test with a URL that looks like http://localhost:8000/preload/resources/something.html on a page with CSP that doesn't allow loading from localhost.  We should also add a test with an invalid URL like http://host:invalidport/ just to see what happens.  It's hard to make a URL fail to parse when only giving the path.  A fragment-only URL might also be interesting. Also a lowercase "link", a missing colon, no <> or URL, <> but no characters inside it, and a missing space after the colon.  Also what about emojis, invalid surrogate pairs, and null characters in the middle of the URL?
> 
> Added tests for CSP blocked, invalid port, fragment-only, lower-case link,
> missing colon, missing spaces, missing URL, etc.
> (Invalid port passes the isPreloaded() test, but I'm guessing that it's
> being blocked further down the stack)
> 
> I'm not sure how to add emojis, invalid surrogate pairs and null chars in
> PHP... Any examples you can point me to?
Now that I think about it, HTTP headers can only contain 8-bit values, so we can't have 16-bit characters in an HTTP header.

EWS failure looks unrelated.  I'll re-review this soon.
Comment 22 Alex Christensen 2017-01-18 12:40:54 PST
Comment on attachment 299128 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=299128&action=review

> Source/WebCore/loader/LinkHeader.cpp:115
> +static bool validFieldEnd(CharacterType*& position, CharacterType * const end)

extra space after CharacterType.  I see 9 instances of this.
I guess we don't have a lot of T* const in WebKit, but we should.  It's more WebKit-style than T * const.

> Source/WebCore/loader/LinkHeader.cpp:215
> +    completeQuotes = false;

This should be initialized by the caller.

> Source/WebCore/loader/LinkHeader.cpp:262
> +    if (hasQuotes)
> +        ++valueStart;
> +    if (completeQuotes)
> +        --valueEnd;

There should be some checks or at least assertions to make sure we don't go out of bounds.  We should be cautious with out bounds checks.  What if there is only a single quote?  (It would probably fail earlier, but still)

if (valueStart < valueEnd && hasQuotes)
    ++valueStart;
if (valueStart < valueEnd && completeQuotes)
    --valueEnd;

> Source/WebCore/loader/LinkHeader.cpp:309
> +    : m_isValid(true)

This should be initialized in the header.
bool m_isValid { true };
Comment 23 Yoav Weiss 2017-01-18 13:58:23 PST
Comment on attachment 299128 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=299128&action=review

>> Source/WebCore/loader/LinkHeader.cpp:115
>> +static bool validFieldEnd(CharacterType*& position, CharacterType * const end)
> 
> extra space after CharacterType.  I see 9 instances of this.
> I guess we don't have a lot of T* const in WebKit, but we should.  It's more WebKit-style than T * const.

changed

>> Source/WebCore/loader/LinkHeader.cpp:215
>> +    completeQuotes = false;
> 
> This should be initialized by the caller.

moved initialization to the caller

>> Source/WebCore/loader/LinkHeader.cpp:262
>> +        --valueEnd;
> 
> There should be some checks or at least assertions to make sure we don't go out of bounds.  We should be cautious with out bounds checks.  What if there is only a single quote?  (It would probably fail earlier, but still)
> 
> if (valueStart < valueEnd && hasQuotes)
>     ++valueStart;
> if (valueStart < valueEnd && completeQuotes)
>     --valueEnd;

Added ASSERT as well as a single quote test case

>> Source/WebCore/loader/LinkHeader.cpp:309
>> +    : m_isValid(true)
> 
> This should be initialized in the header.
> bool m_isValid { true };

done
Comment 24 Yoav Weiss 2017-01-18 14:00:29 PST
Created attachment 299178 [details]
Patch
Comment 25 Yoav Weiss 2017-01-18 22:49:37 PST
Comment on attachment 299178 [details]
Patch

Thanks for reviewing! :)
Comment 26 WebKit Commit Bot 2017-01-18 23:15:06 PST
Comment on attachment 299178 [details]
Patch

Clearing flags on attachment: 299178

Committed r210914: <http://trac.webkit.org/changeset/210914>
Comment 27 WebKit Commit Bot 2017-01-18 23:15:11 PST
All reviewed patches have been landed.  Closing bug.
Comment 28 Alex Christensen 2017-01-23 11:12:07 PST
We should add some more tests like this:
Link: <<url>>
LiNk: <url>
Link: <url>>
etc.
Comment 29 Alex Christensen 2017-01-23 11:56:28 PST
Also duplicate key-value pairs in the list:
Link: <../resources/test.mp4>; rel=preload; as=media; as=font
Comment 30 Yoav Weiss 2017-01-24 02:00:13 PST
(In reply to comment #28)
> We should add some more tests like this:
> Link: <<url>>
> LiNk: <url>
> Link: <url>>
> etc.

Added to https://bugs.webkit.org/show_bug.cgi?id=167366
Comment 31 Yoav Weiss 2017-01-24 02:01:09 PST
(In reply to comment #29)
> Also duplicate key-value pairs in the list:
> Link: <../resources/test.mp4>; rel=preload; as=media; as=font

That test would be significantly easier to add once https://bugs.webkit.org/show_bug.cgi?id=159678 lands