Bug 51638 - Protect path of HTTP Referer Header
Summary: Protect path of HTTP Referer Header
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 41801
  Show dependency treegraph
 
Reported: 2010-12-27 04:16 PST by Robert Hogan
Modified: 2018-03-21 03:47 PDT (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Hogan 2010-12-27 04:16:29 PST
From https://bugzilla.mozilla.org/show_bug.cgi?id=587523:

"The browser's http referer header is a source of significant amount of private
data leakage. See http://www.cs.wpi.edu/~cew/papers/wosn08.pdf and
http://online.wsj.com/article/NA_WSJ_PUB:SB10001424052748704513104575256701215465596.html
as an example of something that was fixed by particular sites (Facebook).

One issue that has not been fixed yet, is the fact that users' search terms
leak to 3rd party sites via the referer header, when they click on results from
the search engine results page.

An example of this can be seen by searching for 'no knead bread' with Google,
and clicking on the 4th search result, which takes you to
www.breadtopia.com/basic-no-knead-method/, a page which "helpfully" lets you
know that it is aware of the search terms that brought you to the site.

This bug (https://bugzilla.mozilla.org/show_bug.cgi?id=55477) has quite a bit
of debate about the idea of stripping some info from the referer header. One of
the good ideas in that bug is the idea of a REFERRER_3RDPARTY_NO_PREPATH
option, which "Strip[s] off the path from the referrer for 3rd party requests,
otherwise leave[s] it alone."

Under such a model, a user visiting wikipedia.com, and clicking on a link to
another page on wikipedia would still have the full referer transmitted.
However, a user clicking on a link from Google.com's results page to a 3rd
party site would result in the referer of "http://www.google.com" being sent. 

Looking through that and other bug reports, the two main  positive use cases
for the delivery of the referer header to 3rd party sites seem to be:

1. Stopping bandwidth leeching. E.g. Stopping other sites from including images
from your site, which cost you bandwidth when users visit those sites.
2. Analytics: Helping webmasters to figure out where their traffic is coming
from.

Item 1 is easy to solve, since even with just the domain in the referer header,
it would still be easy to determine that a myspace.com user had embedded your
content in their site. You wouldn't know which myspace user had done so, but
you could still easily block such requests (or give them a different image).

With regard to item 2. Webmasters simply do not have any natural right to know
where their users are coming from. Yes, this is how the web has always worked,
but it doesn't need to stay this way, particularly when the user has expressed
a desire to be private, by turning on private browsing mode.

By switching to a model of scrubbing the path, but not the domain, from the
referer header, these 3rd party sites would still have a rough idea of where
users are coming from, but wouldn't learn the exact page the user is on. For
sites that include private info in the URL (for example:
http://www.webmd.com/breast-cancer/default.htm), this would lead to a
significant improvement in user privacy.

Furthermore, for webmasters that want to find out what search terms are drawing
users, Google already offers aggregate stats for individual webmasters, which
can be viewed at http://www.google.com/webmaster. These webmasters would merely
be denied this info about individual users in real time, and would instead have
to make do with aggregate info.

What I propose is adding this option to strip the path from the referer headers
sent to third party sites. This option should not be enabled by default, but if
a user wishes to go into about:config and enable it, so be it. However, this
option would be enabled whenever the user goes into private browsing mode. Once
the user leaves private browsing mode, their browser will back to sending full
headers again."
Comment 1 Marco Peereboom 2011-07-11 11:46:28 PDT
I would love a knob for this as well.  Something along the lines of "default", "domain_only" and "disabled".  Just like cookies.