Bug 105590

Summary: Add functions for detecting top level and high level domain names
Product: WebKit Reporter: Vicki Pfau <jeffrey+webkit>
Component: WebCore Misc.Assignee: Vicki Pfau <jeffrey+webkit>
Status: NEW ---    
Severity: Normal CC: abarth, ap, beidson, benjamin, ddkilzer, fishd, jberlin, jonlee, matthewjamesengott, mjs, mrowe, ojan.autocc, sam, webkit.review.bot
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch jeffrey+webkit: commit-queue-

Description Vicki Pfau 2012-12-20 16:34:12 PST
WTF should have functions for detecting top level and high level domain names. This is useful for tasks such as eliminating the subdomains from a hostname.
Comment 1 Vicki Pfau 2012-12-20 16:45:03 PST
Created attachment 180443 [details]
Patch
Comment 2 WebKit Review Bot 2012-12-20 16:47:20 PST
Attachment 180443 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/Derive..." exit_code: 1
Source/WTF/wtf/SuffixList.in:352:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 1 in 11 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Mark Rowe (bdash) 2012-12-20 17:54:51 PST
Comment on attachment 180443 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=180443&action=review

> Source/WTF/DerivedSources.make:25
> +# Copyright (C) 2012 Apple Inc. All rights reserved.
> +#
> +# Redistribution and use in source and binary forms, with or without
> +# modification, are permitted provided that the following conditions
> +# are met:
> +#
> +# 1.  Redistributions of source code must retain the above copyright
> +#     notice, this list of conditions and the following disclaimer.
> +# 2.  Redistributions in binary form must reproduce the above copyright
> +#     notice, this list of conditions and the following disclaimer in the
> +#     documentation and/or other materials provided with the distribution.
> +# 3.  Neither the name of Apple Computer, Inc. ("Apple") nor the names of
> +#     its contributors may be used to endorse or promote products derived
> +#     from this software without specific prior written permission.
> +#
> +# THIS SOFTWARE IS PROVIDED BY APPLE AND ITS CONTRIBUTORS "AS IS" AND ANY
> +# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
> +# DISCLAIMED. IN NO EVENT SHALL APPLE OR ITS CONTRIBUTORS BE LIABLE FOR ANY
> +# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> +# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> +# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
> +# THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

This is the wrong license. New files use the two-clause BSD license and don't refer to "Apple Computer, Inc".

> Source/WTF/WTF.xcodeproj/project.pbxproj:1506
> +			baseConfigurationReference = 5D247B7314689C4700E78B76 /* WTF.xcconfig */;

I don't think it makes sense for the Derived Sources target to be based on the WTF configuration. I think it shouldn't be based on anything at this level, and should just inherit everything from the project level.

> Source/WTF/generate-suffix-list.py:1
> +#!/usr/bin/env python

This should probably have a license header too.

> Source/WTF/generate-suffix-list.py:55
> +        stateFile.write(struct.pack('I', pos // 4))
> +        stateFile.seek(pos, os.SEEK_SET)
> +    inputStates.extend([(item[1], pos + x * 8 + 12) for (item, x) in zip(subtable, xrange(len(subtable)))])

I'm not sure why this script is using multiple temporary files only to read them back in to memory. Why isn't the computation simply done in memory? I'm sure that manipulating a data structure in memory before serializing it to the output file would also result in the logic being much easier to follow.

> Source/WTF/wtf/SuffixList.h:13
> + * THIS SOFTWARE IS PROVIDED BY APPLE COMPUTER, INC. ``AS IS'' AND ANY

"APPLE COMPUTER, INC." is not correct.

> Source/WTF/wtf/SuffixList.in:5
> +// ===BEGIN ICANN DOMAINS===

There should be a note somewhere describing where this file originated, and which version of this file we are using (if it has the concept of version numbers) or the date on which it was retreived.
Comment 4 Mark Rowe (bdash) 2012-12-20 20:24:55 PST
Comment on attachment 180443 [details]
Patch

Marking as r- due to the reasons previously described.
Comment 5 Benjamin Poulain 2012-12-21 08:00:42 PST
I think WTF may not be the best place for this (at least for now).

Other platforms already integrate that list in lower level frameworks linked with WebKit. In many case, that huge list will end up twice in memory.

Unless you have a good reason to have this in WTF, it would probably make more sense to have it in WebCore/platform. There, it should be easy to have platform override.


On a other subject, don't forget to use String allocation from literal when possible. :-)
Comment 6 Alexey Proskuryakov 2012-12-21 10:45:21 PST
Yes, having the list in memory twice won't be nice on OS X either.
Comment 7 Vicki Pfau 2012-12-21 17:33:05 PST
Created attachment 180579 [details]
Patch
Comment 8 Vicki Pfau 2012-12-21 17:37:38 PST
(In reply to comment #4)
> (From update of attachment 180443 [details])
> Marking as r- due to the reasons previously described.

I believe I've fixed all of these now.(In reply to comment #5)

> I think WTF may not be the best place for this (at least for now).
> 
> Other platforms already integrate that list in lower level frameworks linked with WebKit. In many case, that huge list will end up twice in memory.
> 
> Unless you have a good reason to have this in WTF, it would probably make more sense to have it in WebCore/platform. There, it should be easy to have platform override.

I couldn't think of a good reason to keep it in WTF, so WebCore/platform it is!

> On a other subject, don't forget to use String allocation from literal when possible. :-)

I looked over the code and I wasn't quite sure what literals you would have been talking about.

(In reply to comment #6)
> Yes, having the list in memory twice won't be nice on OS X either.

Fortunately, the way this is allocated (const strings), it doesn't use up extra memory per-process. Unfortunately, we don't seem to be able to use the underlying OS X implementation currently, so it needs to be duplicated for now. Hopefully the duplication can be eliminated soon though.
Comment 9 Mark Rowe (bdash) 2013-01-03 15:46:30 PST
Comment on attachment 180579 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=180579&action=review

> Source/WebCore/make-suffix-list.py:33
> +tree = [True, {}]

It would help the readability of this code if you were to use a slightly more descriptive data structure. A trivial way to do so would be to use Python's namedtuple type (http://docs.python.org/2/library/collections.html#collections.namedtuple). You can use it something like:

State = namedtuple('State', ('accepts', 'states'))
tree = State(True, {})
tree.states['foo'] = State(False, {})

> Source/WebCore/make-suffix-list.py:56
> +    else:
> +        subtree[0] = not negative

The else clause on the for loop seems unnecessary. It's purpose is to run code only when the loop executes to completion rather than being interrupted due to a break, but in this case the loop body contains no break statements. I think you should remove the else clause.

> Source/WebCore/make-suffix-list.py:62
> +flatStates = array.array('l')

"flatStates" doesn't seem like a great name.  Is there a reason why you're using an array here rather than using a regular list? It seems like you just treat it as an array of numbers furhter on when you write it to the output file.

> Source/WebCore/make-suffix-list.py:65
> +    state, parent = inputStates.pop(0)

It appears from reading the code that follows that "parent" is really the position within the data that falls inside the parent and is where the "extent offset" should be stored at? "parent" doesn't seem like a particularly clear name for this.

> Source/WebCore/make-suffix-list.py:66
> +    pos = len(flatStates)

I'm assuming that "pos" is an abbreviation of "position". We prefer to avoid abbreviations in variable names.

> Source/WebCore/make-suffix-list.py:71
> +    inputStates.extend([(item[1], pos + x * 2 + 3) for (item, x) in zip(subtable, xrange(len(subtable)))])

for (item, x) in zip(subtable, xrange(len(subtable))) is equivalent to for (i, item) in enumerate(subtable).

The [ ] around the comprehension likely aren't necessary in this context, since list.extend should work as well with a generator as it does with a list.

This is another case where the use of tuples rather than a type with named fields make it harder to follow. The magic numbers in this calculation don't help either. I'm struggling to see where the 3 in the calcluation comes from.

> Source/WebCore/make-suffix-list.py:79
> +flatString = array.array('c')

What does using an array here buy you over just using a list of strings?

> Source/WebCore/make-suffix-list.py:82
> +while stringTables:
> +    table = stringTables.pop(0)

for table in stringTables:

> Source/WebCore/make-suffix-list.py:84
> +    entry = 0
> +    for string in table[1]:

for entry, string in enumerate(table[1]):

> Source/WebCore/make-suffix-list.py:92
> +eof = len(flatStates)

"eof" isn't a good name for a variable that contains a length.

> Source/WebCore/make-suffix-list.py:115
> +for value in flatString:

It seems like it would be clearer for this to operate a string at a time rather than a character at a time. That would let you drop the "current" state.
Comment 10 Benjamin Poulain 2013-01-03 15:58:51 PST
(In reply to comment #0)
> WTF should have functions for detecting top level and high level domain names. This is useful for tasks such as eliminating the subdomains from a hostname.

Outside the technical aspect, why do you add this to WebKit at all?
Where is this gonna be used?
Comment 11 Adam Barth 2013-01-03 16:15:08 PST
Comment on attachment 180579 [details]
Patch

Can you put this behind a USE macro?  Chromium has its own copy of this list and we do not want two copies compiled into the Chromium binary.
Comment 12 Adam Barth 2013-01-03 16:17:07 PST
Note: Chromium calls this concept a "registry-controlled domain" rather than a "top-level domain."
Comment 13 Adam Barth 2013-01-03 16:19:35 PST
> Where is this gonna be used?

I'm also curious about the answer to this question.  We've so far been able to keep this information out of WebCore since it's irrelevant to many consumers of WebCore.  Generally speaking, the fewer uses of the public suffix list in the platform, the better.  The suffix list is needed for cookies, but WebCore isn't concerned with the details of how cookies work.
Comment 14 Sam Weinig 2013-01-04 10:56:45 PST
(In reply to comment #13)
> > Where is this gonna be used?
> 
> I'm also curious about the answer to this question.  We've so far been able to keep this information out of WebCore since it's irrelevant to many consumers of WebCore.  Generally speaking, the fewer uses of the public suffix list in the platform, the better.  The suffix list is needed for cookies, but WebCore isn't concerned with the details of how cookies work.

Though probably not Jeffrey's intended use, it could probably be used to make our implementation of document.domain match the spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/origin-0.html#dom-document-domain) which requires checking against the public suffix list.
Comment 15 Vicki Pfau 2013-01-04 14:37:29 PST
(In reply to comment #11)
> (From update of attachment 180579 [details])
> Can you put this behind a USE macro?  Chromium has its own copy of this list and we do not want two copies compiled into the Chromium binary.

The way I intended this to work was that specific ports could either include their stock version by including TopLevelDomain.cpp in their build, or if the list already exists in their port, this just exposes the API to WebCore and they can implement a different .cpp file. Hence I'm not sure fi the USE macro should be around just TopLevelDomain.cpp or around the header, too.

(In reply to comment #12)
> Note: Chromium calls this concept a "registry-controlled domain" rather than a "top-level domain."

Mozilla and other places seem to call it a public suffix, so I'm thinking about changing TopLevelDomain.* to PublicSuffix.*.

(In reply to comment #13)
> > Where is this gonna be used?
> 
> I'm also curious about the answer to this question.  We've so far been able to keep this information out of WebCore since it's irrelevant to many consumers of WebCore.  Generally speaking, the fewer uses of the public suffix list in the platform, the better.  The suffix list is needed for cookies, but WebCore isn't concerned with the details of how cookies work.

I agree that this list should be used sparingly. In specific, this list will be used for one of the storage blocking techniques where it ensures that different domains (e.g. a.com and b.com) can't access the same cache items, but subdomains that share the domain's cache (e.g. a.com, b.a.com and c.a.com) all have access to the same cache items. Doing it strictly per full hostname caused more cache misses than was desirable on some websites, and this prevents egregious use of the cache for tracking.
Comment 16 Benjamin Poulain 2013-01-04 14:57:43 PST
> I agree that this list should be used sparingly. In specific, this list will be used for one of the storage blocking techniques where it ensures that different domains (e.g. a.com and b.com) can't access the same cache items, but subdomains that share the domain's cache (e.g. a.com, b.a.com and c.a.com) all have access to the same cache items. Doing it strictly per full hostname caused more cache misses than was desirable on some websites, and this prevents egregious use of the cache for tracking.

Just my opinion:
-Before landing, this should consider the support for OS X and Chromium. Duplicating the list seems really bad.
-Because this is not feature #ifdef-ed, you should land this together with your other patch.

If the whole feature is not obvious, please first announce it on webkit-dev, and have a feature #define until it is ready for prime time.
Comment 17 Vicki Pfau 2013-01-04 15:02:01 PST
(In reply to comment #16)
> > I agree that this list should be used sparingly. In specific, this list will be used for one of the storage blocking techniques where it ensures that different domains (e.g. a.com and b.com) can't access the same cache items, but subdomains that share the domain's cache (e.g. a.com, b.a.com and c.a.com) all have access to the same cache items. Doing it strictly per full hostname caused more cache misses than was desirable on some websites, and this prevents egregious use of the cache for tracking.
> 
> Just my opinion:
> -Before landing, this should consider the support for OS X and Chromium. Duplicating the list seems really bad.
> -Because this is not feature #ifdef-ed, you should land this together with your other patch.
> 
> If the whole feature is not obvious, please first announce it on webkit-dev, and have a feature #define until it is ready for prime time.

This is a subfeature of the general storage blocking feature that's been in progress for a few months. It's not used mandatorily--there's a setting that controls it. AFAIK, nothing except the Mac port uses that setting. I'll probably need to do some #ifdefs around things to make sure that nothing tries to compile against functions that don't exist on other ports, though.
Comment 18 Vicki Pfau 2013-01-04 16:41:31 PST
Created attachment 181405 [details]
Patch

This doesn't have the #ifdefs currently, so this version is for reviewing and not landing yet.
Comment 19 Adam Barth 2013-01-04 16:42:10 PST
Please include the #error for Chromium:

#if PLATFORM(CHROMIUM)
#error "Do not add this file to the Chromium port. Chromium uses the registry-controlled domain list from http://src.chromium.org/viewvc/chrome/trunk/src/net/base/registry_controlled_domains/"
#endif

Otherwise, it's too each for someone to include this file in the build and they we'll end up with two (likely not identical) copies of the list in Chromium.

Even in the apple-mac build, this patch will result in Safari haven't two (likely not identical) copies of this list because CFNetwork already has a copy as required by cookies.  That seems both inefficient and potentially dangerous.

IMHO, this patch is bad for WebKit.  Many folks use WebKit in places that can't easily be updated (e.g., embedded devices).  Having this list in WebKit means that more people will have out-of-date copies of this list, which screws people in obscure jurisdictions.

Also, the public suffix list favors the powerful over the less powerful, which is bad for the web.  For example, the current version of the public suffix list includes appspot.com but not heroku.com even though the reasons for including appspot.com also apply to heroku.com.

Anyway, I don't particularly want to block you from adding the public suffix list to WebKit, but I want to be clear that I think this is a bad idea and I don't want to use this list (or even compile it) in Chromium.  Existing WebKit features do not appear to require the list (i.e., WebKit has been happy without the list for a decade).  I don't think we should design any new features require the list.  New features that require the list are broken and should be redesigned not to use the list.
Comment 20 Vicki Pfau 2013-01-04 18:36:38 PST
<rdar://problem/12961592>
Comment 21 Darin Adler 2013-01-16 14:10:16 PST
Comment on attachment 181405 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=181405&action=review

A patch that adds a function, currently unused by WebKit except in its own unit tests, is unusual for the WebKit project. Is there some simple use of this that could be incorporated with the patch so the code would be both added and used?

> Source/WebCore/ChangeLog:21
> +        (WebCore):

Please remove bogus lines like this one from change log and/or fix the script so it doesn’t add them.

> Source/WebCore/DerivedSources.make:1076
> +SuffixList.cpp : make-suffix-list.py platform/SuffixList.in
> +	python $^

How does the script know where to write its output? Normally we would have to pass $@ to the script too.

> Source/WebCore/platform/PublicSuffix.cpp:40
> +String childStateName(const SubstateTuple*);
> +
> +String childStateName(const SubstateTuple* tuple)

This is an anti-pattern. If the function is used outside this file, then it should be declared in a header file. If it’s not used outside the file its definition should be marked “static” and so we would not need a separate declaration.

> Source/WebCore/platform/PublicSuffix.cpp:44
> +    return String(&publicSuffixStrings[tuple->nameOffset]);

It seems wasteful to construct a String just to return a null-terminated C string. Is there some reason we have to wrap these in String objects? If we just want to compare with a String as is done in the searchForState function we can do that without doing memory allocation.

> Source/WebCore/platform/PublicSuffix.cpp:87
> +    for (size_t nextDot = UINT_MAX, previousDot = domain.reverseFind('.', UINT_MAX); nextDot != notFound; nextDot = previousDot, previousDot = domain.reverseFind('.', nextDot - 1)) {

The use of UINT_MAX here is strange. The type of nextDot is not unsigned, so UINT_MAX is the wrong maximum. I would have suggested using numeric_limits<size_t>::max(),

For reverseFind, I think you should just omit the second argument, because it starts from the end of the string by default. But also, we could probably rewrite the loop so the start code and next code didn’t have to repeat the reverseFind.

> Source/WebCore/platform/PublicSuffix.cpp:91
> +            substate = searchForState(state, "*");

Seems unfortunate that this will construct a String containing the single character "*" repeatedly an in an inefficient way.

> Tools/ChangeLog:12
> +        (TestWebKitAPI):

Again, bogus line that should be deleted.
Comment 22 Darin Adler 2013-01-16 14:11:10 PST
Comment on attachment 181405 [details]
Patch

I just saw the comments from Adam Barth, given those, I am clearing my review+ until I better understand what the strategy is.
Comment 23 Maciej Stachowiak 2013-01-18 21:24:04 PST
I'd suggest an ENABLE macro around the whole file instead of a Chromium-specific error. Adam, do you think both are needed?

Also an ENABLE would make it easier to substitute platform-specific implementations that can use an existing copy of the list.

Adam, you suggested that any feature using such a list is likely broken. Jeffrey explained his use in comment #15:

> I agree that this list should be used sparingly. In specific, this list will be used for one of the storage blocking techniques where it ensures that different domains (e.g. a.com and b.com) can't access the same cache items, but subdomains that share the domain's cache (e.g. a.com, b.a.com and c.a.com) all have access to the same cache items. Doing it strictly per full hostname caused more cache misses than was desirable on some websites, and this prevents egregious use of the cache for tracking.

Do you think this specific use is broken? The consequence of an outdated copy of the list in this case would be either failure to block tracking via the http cache across sites that are actually unrelated but share a public suffix for extra entries; or missed caching opportunities in third-party contexts for missing entries. This risk seems less than it would be for other hypothetical uses.

And as a final comment, this file should probably be called something relating to "PublicSuffix" rather than "TopLevelDomain" because the domains it identifies are not top-level domains in the officially defined sense.
Comment 24 Adam Barth 2013-01-19 15:08:47 PST
> I'd suggest an ENABLE macro around the whole file instead of a Chromium-specific error. Adam, do you think both are needed?

I would like a #error somewhere to catch someone in the future who tries to turn this on for Chromium.  If the whole file wrapped in an ENABLE macro, then we can put the #error in a Chromium-specific file somewhere else and have it be triggered by the ENABLE macro.

> Do you think this specific use is broken?

It's certainly not as bad as other uses of the public suffix list in that it doesn't cause an interoperability problem.  What it means is that this feature will make caching behavior pretty arbitrary.  For example, two high schools in Californian will be able to share the same cached version of jQuery but two University of California campuses would not.

My bigger concern is that once we have a copy of this list in WebCore, we'll be tempted to use it for other things, such as document.domain.  As I wrote above, I'm don't want to block this patch patch given that its only use is for WebKit2.  I'm just want to be clear about my point of view so that there's no surprises later if/when someone tries to use the list in a way that affects other ports (such as for document.domain).

Perhaps we ought to put the list in Source/WebKit2/Platform to make it clear that this code is for WebKit2 alone and is not to be used in WebCore.
Comment 25 Maciej Stachowiak 2013-01-19 16:37:34 PST
(In reply to comment #24)
> > I'd suggest an ENABLE macro around the whole file instead of a Chromium-specific error. Adam, do you think both are needed?
> 
> I would like a #error somewhere to catch someone in the future who tries to turn this on for Chromium.  If the whole file wrapped in an ENABLE macro, then we can put the #error in a Chromium-specific file somewhere else and have it be triggered by the ENABLE macro.

OK.

> 
> > Do you think this specific use is broken?
> 
> It's certainly not as bad as other uses of the public suffix list in that it doesn't cause an interoperability problem.  What it means is that this feature will make caching behavior pretty arbitrary.  For example, two high schools in Californian will be able to share the same cached version of jQuery but two University of California campuses would not.

Yeah, we're prepared to accept this as a downside in the case where a user has third-party data blocking enabled for privacy purposes.

> 
> My bigger concern is that once we have a copy of this list in WebCore, we'll be tempted to use it for other things, such as document.domain. 

Do you think it would be worthwhile documenting the limited purpose of this file?

Also, if the WebKit community as a whole would object to following the spec requirement to use the Public Suffix List for document.domain, we should probably both clearly document somewhere what we intend to do instead and why, and make sure to give feedback on the spec. (I have no personal opinion on whether document.domain should use the public suffix list or do something else, it just seems helpful to clearly document intentional divergences from the relevant spec in the cases where we need them.)

> As I wrote above, I'm don't want to block this patch patch given that its only use is for WebKit2.  I'm just want to be clear about my point of view so that there's no surprises later if/when someone tries to use the list in a way that affects other ports (such as for document.domain).
> 
> Perhaps we ought to put the list in Source/WebKit2/Platform to make it clear that this code is for WebKit2 alone and is not to be used in WebCore.

It's not really related to WebKit2 at all. WebKit/mac will use it too, for instance. Or any other port that wants to support third-party data blocking and doesn't have ready access to another form of this list.
Comment 26 Adam Barth 2013-01-19 20:46:46 PST
Another approach that we might want to consider is not hard-coding this sort of policy information in WebCore.  Perhaps WebCore should simply provide client callbacks into the embedding application for controlling these sorts of cache policies.  In that approach, the embedding application could make decisions about caching, perhaps using the public suffix list or perhaps not.  That approach seems more consistent with the general approach we've taken to WebCore in other places.
Comment 27 Adam Barth 2013-01-19 20:51:18 PST
> Also, if the WebKit community as a whole would object to following the spec requirement to use the Public Suffix List for document.domain, we should probably both clearly document somewhere what we intend to do instead and why, and make sure to give feedback on the spec. (I have no personal opinion on whether document.domain should use the public suffix list or do something else, it just seems helpful to clearly document intentional divergences from the relevant spec in the cases where we need them.)

Yeah, it's something I would like to discuss eventually with the standards community.  I would also like to discuss and harmonize the security model for data URLs.  I'm secretly hoping that document.domain fades into obscurity and that we can eventually remove it from the platform.
Comment 28 Maciej Stachowiak 2013-01-19 23:54:12 PST
(In reply to comment #26)
> Another approach that we might want to consider is not hard-coding this sort of policy information in WebCore.  Perhaps WebCore should simply provide client callbacks into the embedding application for controlling these sorts of cache policies.  In that approach, the embedding application could make decisions about caching, perhaps using the public suffix list or perhaps not.  That approach seems more consistent with the general approach we've taken to WebCore in other places.

Typically for the Mac WebKit public API (both WK1 and WK2), we apply the following general rules to callbacks to the WebKit client: (1) we only add them when it's reasonably likely different clients will actually want to do different things; (2) they're always strictly optional; the framework provides sensible defaults for clients that don't do anything for a given callback.

If it were not for the concern about adding the public suffix list to the WebKit repository, then by (1) we likely wouldn't add this callback (we don't expect different clients would actually want different levels of granularity for cache partitioning with third-party data blocking enabled) and by (2), if we did we'd likely want to offer the current non-customizable behavior as the framework default.

So while it's technologically possible, it would go against our usual API design principles.

One potentially interesting open question is whether any other pots would be interested in offering cache partitioning for tracking prevention to their clients, and if so, whether they want a different policy or a customizable policy for the partitioning boundaries. That would affect how much it makes sense to make the behavior customizable.
Comment 29 Adam Barth 2013-01-20 00:11:57 PST
I guess it depends on whether you think the public suffix list is a sensible default.  IMHO, the list more or less arbitrary rather than representing any sort of "sense", but reasonable people can disagree on that point.

Anyway, it seems that this conversation has run its course.  It sounds like we should accept this patch with the ENABLE macro, the #error, and potentially some documentation about how the list ought to be used in WebCore.