<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>172748</bug_id>
          
          <creation_ts>2017-05-31 06:29:01 -0700</creation_ts>
          <short_desc>Consider blocking requests to HTTP(S) URLs that contain both `\n` and `&lt;` characters.</short_desc>
          <delta_ts>2024-05-30 10:54:49 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>WebCore Misc.</component>
          <version>WebKit Nightly Build</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Enhancement</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Mike West">mkwst</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>achristensen</cc>
    
    <cc>ap</cc>
    
    <cc>bfulgham</cc>
    
    <cc>cdumez</cc>
    
    <cc>cyb.ai.815</cc>
    
    <cc>wilander</cc>
    
    <cc>zcorpan</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>1314138</commentid>
    <comment_count>0</comment_count>
    <who name="Mike West">mkwst</who>
    <bug_when>2017-05-31 06:29:01 -0700</bug_when>
    <thetext>In the hopes of mitigating one form of dangling-markup-based exfiltration, Blink plans to block requests whose URLs contained both removable whitespace (`\n`, `\r`, `\t`) _and_ raw less-than (`&lt;`) characters. https://github.com/whatwg/fetch/issues/546 lays out the strategy and justification in more detail, proposed patches to URL and Fetch are up for review at https://github.com/whatwg/url/pull/284 and https://github.com/whatwg/fetch/pull/519 respectively, and Blink&apos;s &quot;Intent to Remove&quot; might be helpful: https://groups.google.com/a/chromium.org/d/msg/blink-dev/KaA_YNOlTPk/VmmoV88xBgAJ.

CCing achristensen@ who&apos;s had helpful comments on the URL patch, though I don&apos;t think they&apos;re in favor of the exact implementation strategy outlined there. :)

WDYT?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1314223</commentid>
    <comment_count>1</comment_count>
    <who name="Alex Christensen">achristensen</who>
    <bug_when>2017-05-31 10:57:21 -0700</bug_when>
    <thetext>As outlined in https://github.com/whatwg/url/pull/284 I am very opposed to this approach to mitigating the problem.  Please don&apos;t do this in Chromium or the specifications.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1314548</commentid>
    <comment_count>2</comment_count>
    <who name="Mike West">mkwst</who>
    <bug_when>2017-05-31 22:57:21 -0700</bug_when>
    <thetext>Hey Alex!

My understanding from your comments in the patch against URL (particularly https://github.com/whatwg/url/pull/284#issuecomment-304087641) is that you&apos;re not opposed to the behavior, but opposed to doing it by patching URL as opposed to HTML. Is that not accurate?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1570595</commentid>
    <comment_count>3</comment_count>
    <who name="Alex Christensen">achristensen</who>
    <bug_when>2019-09-13 08:57:20 -0700</bug_when>
    <thetext>URLs are used in a lot of places that aren&apos;t vulnerable to dangling markup attacks, so it definitely shouldn&apos;t go in the URL parser or specification.  HTML is a more appropriate place because you&apos;re trying to avoid URLs that look like HTML, and URLs should not need to know anything about HTML.

That said, I&apos;m worried about compatibility.  I&apos;m under the impression that hand written URLs sometimes contain tabs, newlines, &lt; and &gt; for good reasons, but I have no data to back that up.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1577857</commentid>
    <comment_count>4</comment_count>
    <who name="Mike West">mkwst</who>
    <bug_when>2019-10-08 11:50:54 -0700</bug_when>
    <thetext>&gt; URLs are used in a lot of places that aren&apos;t vulnerable to dangling markup attacks,
&gt; so it definitely shouldn&apos;t go in the URL parser or specification.  HTML is a more
&gt; appropriate place because you&apos;re trying to avoid URLs that look like HTML, and URLs
&gt; should not need to know anything about HTML.

It&apos;s totally possible to implement this outside the URL parser. In Chromium, it&apos;s implemented as a flag that the URL parser sets during parsing (https://cs.chromium.org/chromium/src/url/url_canon_etc.cc?rcl=2bd9bea1c6b9ace95707a0e8715f40793c9dc909&amp;l=26). We&apos;re scanning the URL anyway at that point to remove whitespace, and scanning the string prior to canonicalizing it turned out to show up in benchmarks. There is likely a clever way to avoid that performance impact, but it&apos;s what Chromium is doing today.

From a spec perspective, I&apos;d be fine with this all living in HTML, with the caveat that it seems like a large amount of work to go through that spec to find all the places where URLs could be parsed and wire them up to some parsing proxy. I don&apos;t have time right now to do that work. :(

&gt; That said, I&apos;m worried about compatibility.  I&apos;m under the impression that hand
&gt; written URLs sometimes contain tabs, newlines, &lt; and &gt; for good reasons, but I
&gt; have no data to back that up.

FWIW, Chrome has been shipping this behavior since 2017.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2038739</commentid>
    <comment_count>5</comment_count>
    <who name="Simon Pieters (:zcorpan)">zcorpan</who>
    <bug_when>2024-05-30 04:26:16 -0700</bug_when>
    <thetext>The current proposal is https://github.com/whatwg/html/pull/10022</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>