Bug 172748 - Consider blocking requests to HTTP(S) URLs that contain both `\n` and `<` characters.
Summary: Consider blocking requests to HTTP(S) URLs that contain both `\n` and `<` cha...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Enhancement
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-31 06:29 PDT by Mike West
Modified: 2019-10-08 11:50 PDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike West 2017-05-31 06:29:01 PDT
In the hopes of mitigating one form of dangling-markup-based exfiltration, Blink plans to block requests whose URLs contained both removable whitespace (`\n`, `\r`, `\t`) _and_ raw less-than (`<`) characters. https://github.com/whatwg/fetch/issues/546 lays out the strategy and justification in more detail, proposed patches to URL and Fetch are up for review at https://github.com/whatwg/url/pull/284 and https://github.com/whatwg/fetch/pull/519 respectively, and Blink's "Intent to Remove" might be helpful: https://groups.google.com/a/chromium.org/d/msg/blink-dev/KaA_YNOlTPk/VmmoV88xBgAJ.

CCing achristensen@ who's had helpful comments on the URL patch, though I don't think they're in favor of the exact implementation strategy outlined there. :)

WDYT?
Comment 1 Alex Christensen 2017-05-31 10:57:21 PDT
As outlined in https://github.com/whatwg/url/pull/284 I am very opposed to this approach to mitigating the problem.  Please don't do this in Chromium or the specifications.
Comment 2 Mike West 2017-05-31 22:57:21 PDT
Hey Alex!

My understanding from your comments in the patch against URL (particularly https://github.com/whatwg/url/pull/284#issuecomment-304087641) is that you're not opposed to the behavior, but opposed to doing it by patching URL as opposed to HTML. Is that not accurate?
Comment 3 Alex Christensen 2019-09-13 08:57:20 PDT
URLs are used in a lot of places that aren't vulnerable to dangling markup attacks, so it definitely shouldn't go in the URL parser or specification.  HTML is a more appropriate place because you're trying to avoid URLs that look like HTML, and URLs should not need to know anything about HTML.

That said, I'm worried about compatibility.  I'm under the impression that hand written URLs sometimes contain tabs, newlines, < and > for good reasons, but I have no data to back that up.
Comment 4 Mike West 2019-10-08 11:50:54 PDT
> URLs are used in a lot of places that aren't vulnerable to dangling markup attacks,
> so it definitely shouldn't go in the URL parser or specification.  HTML is a more
> appropriate place because you're trying to avoid URLs that look like HTML, and URLs
> should not need to know anything about HTML.

It's totally possible to implement this outside the URL parser. In Chromium, it's implemented as a flag that the URL parser sets during parsing (https://cs.chromium.org/chromium/src/url/url_canon_etc.cc?rcl=2bd9bea1c6b9ace95707a0e8715f40793c9dc909&l=26). We're scanning the URL anyway at that point to remove whitespace, and scanning the string prior to canonicalizing it turned out to show up in benchmarks. There is likely a clever way to avoid that performance impact, but it's what Chromium is doing today.

From a spec perspective, I'd be fine with this all living in HTML, with the caveat that it seems like a large amount of work to go through that spec to find all the places where URLs could be parsed and wire them up to some parsing proxy. I don't have time right now to do that work. :(

> That said, I'm worried about compatibility.  I'm under the impression that hand
> written URLs sometimes contain tabs, newlines, < and > for good reasons, but I
> have no data to back that up.

FWIW, Chrome has been shipping this behavior since 2017.