Bug 190792

Summary: topPrivatelyControlledDomain is slow
Product: WebKit Reporter: Antti Koivisto <koivisto>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, beidson, cdumez, commit-queue, ews-watchlist, ggaren, tsavell, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=190814
Attachments:
Description Flags
patch
none
patch
achristensen: review+
patch
ews-watchlist: commit-queue-
Archive of layout-test-results from ews117 for mac-sierra
none
with locking
none
with locking
none
patch none

Description Antti Koivisto 2018-10-22 06:22:06 PDT
It calls into some slowish CFNetwork code and ends up showing up in profiles.
Comment 1 Antti Koivisto 2018-10-22 06:57:24 PDT
Created attachment 352888 [details]
patch
Comment 2 Antti Koivisto 2018-10-22 07:21:24 PDT
Created attachment 352890 [details]
patch
Comment 3 Alex Christensen 2018-10-22 07:27:38 PDT
Comment on attachment 352890 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=352890&action=review

> Source/WebCore/platform/mac/PublicSuffixMac.mm:51
> +    static NeverDestroyed<HashMap<String, AtomicString, ASCIICaseInsensitiveHash>> cache;

Is there really a benefit to using AtomicString here?
Comment 4 Antti Koivisto 2018-10-22 07:36:43 PDT
> Is there really a benefit to using AtomicString here?

At least in theory. Many domains may map into the same top domain. It shouldn't hurt.
Comment 5 Chris Dumez 2018-10-22 08:44:00 PDT
Comment on attachment 352890 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=352890&action=review

>> Source/WebCore/platform/mac/PublicSuffixMac.mm:51
>> +    static NeverDestroyed<HashMap<String, AtomicString, ASCIICaseInsensitiveHash>> cache;
> 
> Is there really a benefit to using AtomicString here?

Have you confirmed that this method is not called from multiple threads? I suspect it could be (e.g. for ITP).
Comment 6 Chris Dumez 2018-10-22 08:47:46 PDT
Comment on attachment 352890 [details]
patch

At the very least, it seems we should ASSERT.
Comment 7 Chris Dumez 2018-10-22 09:44:04 PDT
Comment on attachment 352890 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=352890&action=review

> Source/WebCore/platform/mac/PublicSuffixMac.mm:55
> +        cache.get().clear();

Seems very aggressive to clear the whole cache? Why not simply kick one value out like we usually do?
Comment 8 Antti Koivisto 2018-10-22 10:20:08 PDT
> Seems very aggressive to clear the whole cache? Why not simply kick one
> value out like we usually do?

I tested and it turns out it is a terrible strategy for this sort of thing at least in the usual 'map.remove(map.begin())' form. Since the table is never rehashed we always end up removing keys that hash to the early part of the table and the rest are stuck forever.
Comment 9 Antti Koivisto 2018-10-22 10:21:03 PDT
> Have you confirmed that this method is not called from multiple threads? I
> suspect it could be (e.g. for ITP).

Ok, I'll just switch to String.
Comment 10 Chris Dumez 2018-10-22 10:26:28 PDT
(In reply to Antti Koivisto from comment #9)
> > Have you confirmed that this method is not called from multiple threads? I
> > suspect it could be (e.g. for ITP).
> 
> Ok, I'll just switch to String.

Switching to String would not be sufficient as you'd still have HashMap operations from multiple threads. That said, I do not know for sure we have multithreaded access and an assertion would suffice. If we do have multi threaded access though, then I'd suggest using String + Locks.
Comment 11 Chris Dumez 2018-10-22 10:27:20 PDT
(In reply to Antti Koivisto from comment #8)
> > Seems very aggressive to clear the whole cache? Why not simply kick one
> > value out like we usually do?
> 
> I tested and it turns out it is a terrible strategy for this sort of thing
> at least in the usual 'map.remove(map.begin())' form. Since the table is
> never rehashed we always end up removing keys that hash to the early part of
> the table and the rest are stuck forever.

That's good to know. Sounds like we may want a method that returns a random iterator inside the HashMap? e.g. HashMap::random() or HashMap::any()
Comment 12 Antti Koivisto 2018-10-22 10:37:27 PDT
> Switching to String would not be sufficient as you'd still have HashMap
> operations from multiple threads. That said, I do not know for sure we have
> multithreaded access and an assertion would suffice. If we do have multi
> threaded access though, then I'd suggest using String + Locks.

Yep. I'll see if it is being used.
Comment 13 Antti Koivisto 2018-10-22 10:41:33 PDT
> That's good to know. Sounds like we may want a method that returns a random
> iterator inside the HashMap? e.g. HashMap::random() or HashMap::any()

Or more explicit support for this pattern:

HashMap::shrinkRandomlyToSize(size_t)
Comment 14 Geoffrey Garen 2018-10-22 12:27:58 PDT
(In reply to Antti Koivisto from comment #8)
> > Seems very aggressive to clear the whole cache? Why not simply kick one
> > value out like we usually do?
> 
> I tested and it turns out it is a terrible strategy for this sort of thing
> at least in the usual 'map.remove(map.begin())' form. Since the table is
> never rehashed we always end up removing keys that hash to the early part of
> the table and the rest are stuck forever.

Why doesn't the table rehash?
Comment 15 Chris Dumez 2018-10-22 12:49:47 PDT
(In reply to Geoffrey Garen from comment #14)
> (In reply to Antti Koivisto from comment #8)
> > > Seems very aggressive to clear the whole cache? Why not simply kick one
> > > value out like we usually do?
> > 
> > I tested and it turns out it is a terrible strategy for this sort of thing
> > at least in the usual 'map.remove(map.begin())' form. Since the table is
> > never rehashed we always end up removing keys that hash to the early part of
> > the table and the rest are stuck forever.
> 
> Why doesn't the table rehash?

My understanding is that we only rehash when shrinking / expanding the HashTable capacity. In this kind of pattern, we have a max entries (in this case 100) so we never need to expand / shrink. We reach 100 entries, remove HashTable::begin() then add the new entry.
Comment 16 Geoffrey Garen 2018-10-22 13:13:16 PDT
(In reply to Chris Dumez from comment #15)
> (In reply to Geoffrey Garen from comment #14)
> > (In reply to Antti Koivisto from comment #8)
> > > > Seems very aggressive to clear the whole cache? Why not simply kick one
> > > > value out like we usually do?
> > > 
> > > I tested and it turns out it is a terrible strategy for this sort of thing
> > > at least in the usual 'map.remove(map.begin())' form. Since the table is
> > > never rehashed we always end up removing keys that hash to the early part of
> > > the table and the rest are stuck forever.
> > 
> > Why doesn't the table rehash?
> 
> My understanding is that we only rehash when shrinking / expanding the
> HashTable capacity. In this kind of pattern, we have a max entries (in this
> case 100) so we never need to expand / shrink. We reach 100 entries, remove
> HashTable::begin() then add the new entry.

I see. If we always remove one and add one, then the table never rehashes. Yeah, an explicit idiom to select a random entry would be nice.
Comment 17 Antti Koivisto 2018-10-23 01:29:22 PDT
Created attachment 352960 [details]
patch
Comment 18 EWS Watchlist 2018-10-23 02:40:38 PDT
Comment on attachment 352960 [details]
patch

Attachment 352960 [details] did not pass mac-debug-ews (mac):
Output: https://webkit-queues.webkit.org/results/9703747

Number of test failures exceeded the failure limit.
Comment 19 EWS Watchlist 2018-10-23 02:40:40 PDT
Created attachment 352965 [details]
Archive of layout-test-results from ews117 for mac-sierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-sierra  Platform: Mac OS X 10.12.6
Comment 20 Antti Koivisto 2018-10-23 03:53:24 PDT
Created attachment 352967 [details]
with locking
Comment 21 Antti Koivisto 2018-10-23 05:40:28 PDT
Created attachment 352969 [details]
with locking
Comment 22 Chris Dumez 2018-10-23 09:00:14 PDT
We should update the code to use https://bugs.webkit.org/show_bug.cgi?id=190814 when it lands.
Comment 23 WebKit Commit Bot 2018-10-23 11:02:41 PDT
Comment on attachment 352969 [details]
with locking

Clearing flags on attachment: 352969

Committed r237357: <https://trac.webkit.org/changeset/237357>
Comment 24 WebKit Commit Bot 2018-10-23 11:02:43 PDT
All reviewed patches have been landed.  Closing bug.
Comment 25 Radar WebKit Bug Importer 2018-10-23 11:03:58 PDT
<rdar://problem/45492955>
Comment 26 Truitt Savell 2018-10-23 12:48:09 PDT
After the changes in https://trac.webkit.org/changeset/237357/webkit

The API test is now failing

Log:
https://build.webkit.org/builders/Apple%20Sierra%20Release%20WK1%20%28Tests%29/builds/13555/steps/run-api-tests/logs/stdio

build:
https://build.webkit.org/builders/Apple%20Sierra%20Release%20WK1%20%28Tests%29/builds/13555

Failed

    TestWebKitAPI.PublicSuffix.TopPrivatelyControlledDomain
        
        /Volumes/Data/slave/sierra-release/build/Tools/TestWebKitAPI/Tests/WebCore/PublicSuffix.cpp:159
        Expected equality of these values:
          String()
            Which is: 
          topPrivatelyControlledDomain("")
            Which is:
Comment 27 Truitt Savell 2018-10-23 13:53:25 PDT
Reverted r237357 for reason:

API test is now failing on all platforms.

Committed r237366: <https://trac.webkit.org/changeset/237366>
Comment 28 Antti Koivisto 2018-10-24 00:53:43 PDT
Created attachment 353026 [details]
patch
Comment 29 Antti Koivisto 2018-10-24 03:22:44 PDT
https://trac.webkit.org/changeset/237379/webkit