Bug 153575 - [WK1] Gather some rudimentary statistics during resource load
Summary: [WK1] Gather some rudimentary statistics during resource load
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Mac All
: P2 Normal
Assignee: Brent Fulgham
URL:
Keywords:
Depends on:
Blocks: 154278
  Show dependency treegraph
 
Reported: 2016-01-27 17:28 PST by Brent Fulgham
Modified: 2016-02-15 21:36 PST (History)
16 users (show)

See Also:


Attachments
Patch (69.55 KB, patch)
2016-01-27 17:47 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews104 for mac-yosemite-wk2 (1.49 MB, application/zip)
2016-01-27 20:39 PST, Build Bot
no flags Details
Archive of layout-test-results from ews112 for mac-yosemite (1.20 MB, application/zip)
2016-01-27 20:44 PST, Build Bot
no flags Details
Archive of layout-test-results from ews100 for mac-yosemite (1.40 MB, application/zip)
2016-01-27 20:47 PST, Build Bot
no flags Details
Patch (57.58 KB, patch)
2016-01-29 14:24 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (58.78 KB, patch)
2016-01-29 14:36 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (70.04 KB, patch)
2016-01-29 18:14 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (72.03 KB, patch)
2016-01-30 16:06 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (72.05 KB, patch)
2016-01-31 12:41 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Updated based on Darin's comments. (69.61 KB, patch)
2016-02-01 21:35 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (60.01 KB, patch)
2016-02-02 12:51 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (66.76 KB, patch)
2016-02-02 17:41 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (66.68 KB, patch)
2016-02-03 11:54 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (59.93 KB, patch)
2016-02-08 15:28 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (62.65 KB, patch)
2016-02-10 16:32 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (77.41 KB, patch)
2016-02-12 13:26 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Patch (74.57 KB, patch)
2016-02-12 17:24 PST, Brent Fulgham
no flags Details | Formatted Diff | Diff
Disable write-to-disk in normal build. (74.71 KB, patch)
2016-02-15 13:09 PST, Brent Fulgham
beidson: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brent Fulgham 2016-01-27 17:28:44 PST
Simple POC trying to gather some statistics about resource loads. Do not review for landing -- I just don't want to lose this code.
Comment 1 Brent Fulgham 2016-01-27 17:47:58 PST
Created attachment 270069 [details]
Patch
Comment 2 WebKit Commit Bot 2016-01-27 17:50:59 PST
Attachment 270069 [details] did not pass style-queue:


ERROR: Source/WebCore/loader/cf/ResourceLoadStatisticsPersistenceCF.cpp:30:  Streams are highly discouraged.  [readability/streams] [3]
Total errors found: 1 in 17 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Build Bot 2016-01-27 20:39:30 PST
Comment on attachment 270069 [details]
Patch

Attachment 270069 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/748187

Number of test failures exceeded the failure limit.
Comment 4 Build Bot 2016-01-27 20:39:35 PST
Created attachment 270083 [details]
Archive of layout-test-results from ews104 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 5 Build Bot 2016-01-27 20:44:34 PST
Comment on attachment 270069 [details]
Patch

Attachment 270069 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/748192

Number of test failures exceeded the failure limit.
Comment 6 Build Bot 2016-01-27 20:44:37 PST
Created attachment 270084 [details]
Archive of layout-test-results from ews112 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 7 Build Bot 2016-01-27 20:47:11 PST
Comment on attachment 270069 [details]
Patch

Attachment 270069 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/748205

Number of test failures exceeded the failure limit.
Comment 8 Build Bot 2016-01-27 20:47:15 PST
Created attachment 270085 [details]
Archive of layout-test-results from ews100 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 9 Brady Eidson 2016-01-27 23:13:21 PST
Comment on attachment 270069 [details]
Patch

I don't have time before bed to give a full review, but I can both resolve the style complaint (don't use streams!), simplify a lot of the CF-style code, and set this up to be more easily cross platform going forward:  Use KeyedEncoder/Decoder instead of your own custom property list code.

They already exist in WebCore/platform and the CF backend is already a property list.
Comment 10 Brent Fulgham 2016-01-29 14:24:58 PST
Created attachment 270254 [details]
Patch
Comment 11 Brent Fulgham 2016-01-29 14:36:21 PST
Created attachment 270255 [details]
Patch
Comment 12 Sam Weinig 2016-01-29 14:40:13 PST
What makes this Mac specific?
Comment 13 Brent Fulgham 2016-01-29 18:14:47 PST
Created attachment 270280 [details]
Patch
Comment 14 Brent Fulgham 2016-01-29 18:15:38 PST
Currently this patch is only useful for WK1 measurements.
Comment 15 Brent Fulgham 2016-01-29 18:16:14 PST
(In reply to comment #12)
> What makes this Mac specific?

It's not. It's now cross-platform, but I've only bothered to do a WK1 integration for now.
Comment 16 Brent Fulgham 2016-01-30 16:06:57 PST
Created attachment 270325 [details]
Patch
Comment 17 Brady Eidson 2016-01-30 20:17:14 PST
All patches here are marked obsolete, and none are for review.
Intentional?
Comment 18 Brent Fulgham 2016-01-30 23:17:26 PST
(In reply to comment #17)
> All patches here are marked obsolete, and none are for review.
> Intentional?

Yes. I found a bug in that patch that wild cause test failures, so I pulled it before anyone wasted time on it.
Comment 19 Brent Fulgham 2016-01-31 12:41:21 PST
Created attachment 270344 [details]
Patch
Comment 20 Darin Adler 2016-01-31 19:18:40 PST
Comment on attachment 270344 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=270344&action=review

This patch is interesting. I don’t understand the basic goals and techniques here. I can’t think of anything else that just writes things into the home directory, and it’s also a little peculiar to build the writing in at such a low level.

> Source/WebCore/dom/Document.h:1676
> +    bool m_userHasInteractedWithDocument { false };

This is unused. Please don’t add it.

> Source/WebCore/dom/UserGestureIndicator.cpp:59
> +        Document& topDocument = document->topDocument();
> +        
> +        if (!topDocument.userHasInteractedWithDocument())
> +            ResourceLoadObserver::sharedObserver().logUserInteraction(topDocument);
> +
> +        topDocument.updateLastHandledUserGestureTimestamp();

Could this logic go into updateLastHandledUserGestureTimestamp instead of here? If so, then we could avoid touching Document.h at all.

> Source/WebCore/loader/DocumentLoader.cpp:528
> +    const URL& targetUrl = newRequest.url();

Should be targetURL. But also, not sure I see the benefit of having this local variable.

> Source/WebCore/loader/DocumentLoader.cpp:529
> +    const Ref<SecurityOrigin> targetOrigin(SecurityOrigin::create(targetUrl));

No need to use const here. Also, this local variable is computed but never used, so please don’t make this change.

> Source/WebCore/loader/DocumentLoader.cpp:532
> +    Frame& frame = frameLoader()->frame();
> +    Frame& topFrame = m_frame->tree().top();

Not sure why we have both frame and m_frame here. Really hard to see how the two differ.

> Source/WebCore/loader/DocumentLoader.cpp:534
> +    ResourceLoadObserver::sharedObserver().logFrameNavigation(!redirectResponse.isNull(), frame.document()->url(), targetUrl, frame.isMainFrame(), topFrame.document()->url());

Do we have guarantees that both frame.document() and topFrame.document() are non-null?

> Source/WebCore/loader/ResourceLoadObserver.cpp:61
> +void ResourceLoadObserver::logFrameNavigation(bool isRedirect, const URL& sourceUrl, const URL& targetUrl, bool isMainFrame, const URL& mainFrameUrl)

sourceURL, targetURL, mainFrameURL

This function is quite long. Please consider refactoring it into multiple smaller functions.

> Source/WebCore/loader/ResourceLoadObserver.cpp:67
> +    if (!targetUrl.isValid() || !mainFrameUrl.isValid())
> +        return;

Why check these two and not sourceURL?

> Source/WebCore/loader/ResourceLoadObserver.cpp:70
> +    const Ref<SecurityOrigin> targetOrigin(SecurityOrigin::create(targetUrl));
> +    const Ref<SecurityOrigin> mainFrameOrigin(SecurityOrigin::create(mainFrameUrl)); // Note: Is the same as sourceOrigin if navigation happens in the top frame

No need for the local variables to be const. I suggest just using auto for these:

   auto targetOrigin = SecurityOrigin::create(targetURL);

> Source/WebCore/loader/ResourceLoadObserver.cpp:72
> +    const String& targetOriginString = targetOrigin->toString();

I suggest the type String or auto rather than const String& here.

> Source/WebCore/loader/ResourceLoadObserver.cpp:74
> +    auto& targetOriginResourceStatistics = resourceStatisticsForPrimaryDomain(targetUrl);

In the context of this function perhaps we can use shorter names. No need to keep saying "resourceStatistics" all the time. And to say "origin" all the time. I think targetStatistics would be a good name for the local variable.

> Source/WebCore/loader/ResourceLoadObserver.cpp:76
> +    NetworkStorageSession& session = NetworkStorageSession::defaultStorageSession();

I suggest auto&.

> Source/WebCore/loader/ResourceLoadObserver.cpp:77
> +    String cookieStringForTargetUrl = platformStrategies()->cookiesStrategy()->cookiesForDOM(session, targetUrl, targetUrl);

Are we guaranteed these strategy pointers are non-null? We should change them to be references.

I suggest you call this cookies rather than cookieStringForTargetUrl, unless there is some reason to constantly be reminded exactly which cookies.

> Source/WebCore/loader/ResourceLoadObserver.cpp:91
> +            size_t numberOfTimesSubFrameUnderThisMainFrameOrigin = targetOriginResourceStatistics.subFrameUnderTopFrameOrigins.get(mainFrameOrigin->toString());
> +            targetOriginResourceStatistics.subFrameUnderTopFrameOrigins.set(mainFrameOrigin->toString(), numberOfTimesSubFrameUnderThisMainFrameOrigin + 1);

This is an inefficient way to use a HashMap. I suggest we use a HashCountedSet instead if the whole point is to count something; it does this efficiently.

> Source/WebCore/loader/ResourceLoadObserver.cpp:96
> +        Ref<SecurityOrigin> redirectingOrigin(SecurityOrigin::create(sourceUrl));

Again, I suggest using auto here for the type.

> Source/WebCore/loader/ResourceLoadObserver.cpp:99
> +            auto& redirectingOriginResourceStatistics = resourceStatisticsForPrimaryDomain(sourceUrl);

How about "redirectingOriginStatistics". No need to keep repeating "resource". Or maybe sourceStatistics.

> Source/WebCore/loader/ResourceLoadObserver.cpp:103
> +                size_t numberOfTimesRedirectedToThisPrevalentOrigin = redirectingOriginResourceStatistics.redirectedToOtherPrevalentResourceOrigins.get(targetOriginString);
> +                redirectingOriginResourceStatistics.redirectedToOtherPrevalentResourceOrigins.set(targetOriginString, numberOfTimesRedirectedToThisPrevalentOrigin + 1);

HashCountedSet would be better.

> Source/WebCore/loader/ResourceLoadObserver.cpp:117
> +        Ref<SecurityOrigin> sourceOrigin(SecurityOrigin::create(sourceUrl));

auto would be better

> Source/WebCore/loader/ResourceLoadObserver.cpp:171
> +            double totalVisited = std::max(static_cast<double>(m_originsVisitedMap.size()), 1.0);

This will always be 1 since nothing is populating m_originsVisitedMap.  There is no need to convert this to double before doing the math below. Only one of the numbers involved needs to be a double.

> Source/WebCore/loader/ResourceLoadObserver.cpp:178
> +        double totalVisited = std::max(static_cast<double>(m_originsVisitedMap.size()), 1.0);

This will always be 1 since nothing is populating m_originsVisitedMap. There is no need to convert this to double before doing the math below. Only one of the numbers involved needs to be a double.

> Source/WebCore/loader/ResourceLoadObserver.cpp:180
> +        targetOriginResourceStatistics.subresourceHasBeenSubresourceCountDividedByTotalNumberOfOriginsVisited = static_cast<double>(targetOriginResourceStatistics.subresourceHasBeenSubresourceCount) / totalVisited;

This code and much of the rest of this clause is copied and pasted. Please refactor to share more code.

> Source/WebCore/loader/ResourceLoadObserver.cpp:191
> +    auto& topOriginResourceStatistics = resourceStatisticsForPrimaryDomain(document.url());

Just "statistics" would be fine for this local variable.

> Source/WebCore/loader/ResourceLoadObserver.cpp:198
> +    UNUSED_PARAM(url);

This is an ugly use of UNUSED_PARAM for a parameter that is not always unused. Would be nicer to do this below in an #else to make it clearer.

> Source/WebCore/loader/ResourceLoadObserver.cpp:214
> +    String primaryDomain = getPrimaryDomain(url);

No need for the local variable here.

> Source/WebCore/loader/ResourceLoadObserver.cpp:235
> +String ResourceLoadObserver::getPrimaryDomain(const URL& url) const

WebKit coding style does not use "get" in function names like this.

Why is this a const member function rather than a non-member function or a static member function?

> Source/WebCore/loader/ResourceLoadObserver.cpp:242
> +    String primaryDomain = "";

This is initialized in all branches below, so no need to initialize it here. Also, this is the slow way to initialize to the empty string. The fast way is emptyString().

> Source/WebCore/loader/ResourceLoadObserver.cpp:249
> +        // Skip TLD and then up to two domains smaller than 4 characters

This seems peculiar. Why not the real TLD rule?

> Source/WebCore/loader/ResourceLoadObserver.cpp:263
> +            // All parts are smaller than

Strange comment here.

> Source/WebCore/loader/ResourceLoadObserver.cpp:277
> +    m_statisticsPersistence->writeDataToDisk(m_resourceStatisticsMap);

Why no check for Settings::resourceLoadStatisticsEnabled here?

> Source/WebCore/loader/ResourceLoadObserver.h:63
> +    HashMap<String, size_t> m_originsVisitedMap;

We want to fold case for origins, so this should be using ASCIICaseInsensitiveHash.

I see no code writing to this map, just code calling size() on it. I have no idea what the value in the map is. A visit count maybe?

> Source/WebCore/loader/ResourceLoadObserver.h:64
> +    HashMap<String, std::unique_ptr<ResourceLoadStatistics>> m_resourceStatisticsMap;

If the key here is an origin, then we should use ASCIICaseInsensitiveHash.

Should just use ResourceLoadStatistics for the values in the map rather than using unique_ptr.

> Source/WebCore/loader/ResourceLoadStatistics.cpp:45
> +// Magic numbers
> +static size_t redirectedToOtherPrevalentResourceOriginLimit = 3;
> +
> +// Sub frame thresholds
> +static size_t subFrameUnderTopFrameOriginsThreshold = 2;
> +static size_t subFrameHasBeenRedirectedFromThreshold = 2;
> +static size_t subFrameHasBeenRedirectedToThreshold = 3;
> +
> +// Subresource thresholds
> +static size_t subresourceUnderTopFrameOriginsThreshold = 3;
> +static size_t subresourceHasBeenRedirectedFromThreshold = 3;
> +static size_t subresourceHasBeenRedirectedToThreshold = 5;
> +
> +static double subresourceHasBeenSubresourceCountDividedByTotalNumberOfOriginsVisitedThreshold = 1.5;
> +static size_t minimumOriginsVisitedForPrevalenceClassification = 100;

Why aren’t any of these marked const?

> Source/WebCore/loader/ResourceLoadStatistics.cpp:47
> +bool ResourceLoadStatistics::shouldBecomePrevalentResource(const ResourceLoadStatistics& statistics, size_t originsVisitedSoFar)

Why is this a static member function rather than a const member function?

> Source/WebCore/loader/ResourceLoadStatistics.h:39
> +    bool doesHavePrevalentRedirection() const;
> +    bool doesHavePrevalentFrameMembership() const;

Should be named "has" rather than "doesHave"

> Source/WebCore/loader/ResourceLoadStatistics.h:42
> +    bool hasHadUserInteraction { false };

Should just be m_hadUserInteraction.

> Source/WebCore/loader/ResourceLoadStatistics.h:49
> +    uint32_t topFrameHasBeenRedirectedTo { 0 };
> +    uint32_t topFrameHasBeenRedirectedFrom { 0 };
> +    uint32_t topFrameInitialLoadCount { 0 };
> +    uint32_t topFrameHasBeenNavigatedTo { 0 };
> +    uint32_t topFrameHasBeenNavigatedFrom { 0 };

Why uint32_t? So specific. Elsewhere you were using size_t. And we normally use unsigned.

> Source/WebCore/loader/ResourceLoadStatistics.h:71
> +    HashMap<String, uint32_t> redirectedToOtherPrevalentResourceOrigins;

Seems like this should use ASCIICaseInsensitiveHash.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:40
> +String ResourceLoadStatisticsPersistence::persistentStoragePath(const String& label) const

I’m not sure I’m entirely comfortable with this kind of direct file system manipulation. Highly unusual for WebKit code. Is this the right idiom?

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:52
> +    String resourceLog(pwd.pw_dir);

This is no the correct way to convert a file system path into a String. It won’t work if any characters are non-ASCII because it will treat them as Latin-1, not UTF-8.

All the functions in FileSystemMac.mm seem to use String::fromUTF8 for this. And there's one named homeDirectoryPath(), so I suggest just using that!

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:85
> +        const auto& keys = resourceStatistics.subFrameUnderTopFrameOrigins.keys();

The const here is not all that helpful.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:86
> +        encoder.encodeObjects("subFrameUnderTopFrameOrigins", keys.begin(), keys.end(), [&](KeyedEncoder& encoderInner, const String& origin) {

Please consider naming the captured variable explicitly instead of using [&].

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:101
> +        const auto& keys = resourceStatistics.subresourceUnderTopFrameOrigins.keys();
> +        encoder.encodeObjects("subresourceUnderTopFrameOrigins", keys.begin(), keys.end(), [&](KeyedEncoder& encoderInner, const String& underTopFrameOrigin) {

Ditto.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:110
> +        const auto& keys = resourceStatistics.redirectedToOtherPrevalentResourceOrigins.keys();
> +        encoder.encodeObjects("redirectedToOtherPrevalentResourceOrigins", keys.begin(), keys.end(), [&](KeyedEncoder& encoderInner, const String& otherPrevalentOrigin) {

Ditto.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:125
> +    ASSERT(storedOrigin == originString);
> +    UNUSED_PARAM(originString);

ASSERT_UNUSED is designed for uses like this one. Please use it.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:183
> +    Vector<String> ignore;
> +    decoder.decodeObjects("subFrameUnderTopFrameOrigins", ignore, [&](KeyedDecoder& decoderInner, String& origin) {
> +        if (!decoderInner.decodeString("origin", origin))
> +            return false;
> +
> +        uint32_t count;
> +        if (!decoderInner.decodeUInt32("count", count))
> +            return false;
> +
> +        resourceStatistics.subFrameUnderTopFrameOrigins.set(origin, count);
> +        return true;
> +    });

Looks like if the inner decoding fails, we won’t fail the outer decoding. Probably need to look at the return value from decodeObjects.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:204
> +    decoder.decodeObjects("subresourceUnderTopFrameOrigins", ignore, [&](KeyedDecoder& decoderInner, String& origin) {

Ditto.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:247
> +    int64_t writtenBytes = writeToFile(handle, rawData->data(), rawData->size());

I suggest the type auto for writtenBytes instead of int64_t.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:251
> +    if (writtenBytes != static_cast<int64_t>(rawData->size())) {
> +        deleteFile(resourceLog);

Is it our error handling policy to delete the file if we can’t write it completely successfully a good one? Seems it would make things mysterious. How helpful is this anyway? Do we think partially written files are a likely error case and deleting those files will save us time in some way?

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:252
> +        return;

The return here doesn’t do anything useful. This is the end of the function.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:260
> +    String resourceLog = persistentStoragePath(label);
> +
> +    RefPtr<SharedBuffer> rawData = SharedBuffer::createWithContentsOfFile(resourceLog);

Would read better without the local variable, I think.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:278
> +    String label = origin;
> +    label.replace('/', '_');
> +    label.replace(':', '+');

What is this scheme about?

FileSystem.h has a function named encodeForFileName that I think is intended to deal with things like this.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:310
> +        browsingStatistics.set(origin, std::make_unique<ResourceLoadStatistics>());
> +        auto& statistics = *browsingStatistics.get(origin);

This is an inefficient way to add something to a set. Not sure exactly why it's being done like this but this does the same thing twice as fast:

    auto& statistics = browsingStatistics.add(origin, std::make_unique<ResourceLoadStatistics>()).iterator->value;

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.h:40
> +class KeyedDecoder;
> +class KeyedEncoder;

Not sure about the strategy of using KeyedEncoder/Decoder for this at all. I don’t think it’s designed for this kind of use. Please discuss with Anders.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.h:50
> +    ResourceLoadStatisticsPersistence()
> +    {
> +    }
> +    
> +    ~ResourceLoadStatisticsPersistence()
> +    {
> +    }

Just leave these out completely. Defining them like this has no effect.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.h:63
> +    void writeDataToDisk(const String& originString, const ResourceLoadStatistics&) const;
> +    void writeDataToDisk(const HashMap<String, std::unique_ptr<ResourceLoadStatistics>>&) const;
> +    void readDataFromDisk(HashMap<String, std::unique_ptr<ResourceLoadStatistics>>&) const;
> +
> +private:
> +    void addOriginToEncoder(KeyedEncoder&, const String& origin, const ResourceLoadStatistics&) const;
> +    void writeEncoderToDisk(KeyedEncoder&, const String& label) const;
> +
> +    bool readOriginFromDecoder(KeyedDecoder&, const String& origin, ResourceLoadStatistics&) const;
> +    std::unique_ptr<KeyedDecoder> createDecoderFromDisk(const String& label) const;
> +
> +    String persistentStoragePath(const String& label) const;

Since this class has no state, all its members could just be static member functions. Or these could just be non-member functions.
Comment 21 Darin Adler 2016-01-31 19:19:15 PST
Why is this patch marked review?
Comment 22 Brent Fulgham 2016-01-31 19:21:24 PST
(In reply to comment #21)
> Why is this patch marked review?

I spoke with Brady and others about this code, and we wanted to get it into the tree so more people could play with it. I'm waiting for Brady to do a review.
Comment 23 Darin Adler 2016-01-31 19:34:40 PST
OK, I was confused because above it says "Do not review for landing" and there was no comment about when that changed.
Comment 24 Brent Fulgham 2016-02-01 21:35:52 PST
Created attachment 270471 [details]
Updated based on Darin's comments.
Comment 25 Brent Fulgham 2016-02-01 21:37:43 PST
Comment on attachment 270471 [details]
Updated based on Darin's comments.

This patch still isn't perfect, but is much better now.

I would like to get Brady's review of the encoding logic, and hopefully Anders can give it a quick look.
Comment 26 Darin Adler 2016-02-02 09:03:00 PST
Comment on attachment 270471 [details]
Updated based on Darin's comments.

View in context: https://bugs.webkit.org/attachment.cgi?id=270471&action=review

Many of my comments on the original patch were ignored. Here I repeat them.

> Source/WebCore/dom/Document.h:1189
> +    bool userHasInteractedWithDocument() const { return !!m_lastHandledUserGestureTimestamp; }

No value to adding this function. It would be clearer if this rule ("0 means never stored a time stamp before") was implemented directly in the updateLastHandledUserGestureTimestamp.

> Source/WebCore/loader/ResourceLoadObserver.cpp:42
> +#undef LOG_TO_FILE

More conventional to use an always defined symbol and define it to either 0 or 1.

> Source/WebCore/loader/ResourceLoadObserver.cpp:46
> +typedef HashMap<String, ResourceLoadStatistics>::iterator ResourceLoadStatisticsValue;

Unused typedef should be removed.

> Source/WebCore/loader/ResourceLoadObserver.cpp:72
> +    const Ref<SecurityOrigin> targetOrigin(SecurityOrigin::create(targetURL));
> +    const Ref<SecurityOrigin> mainFrameOrigin(SecurityOrigin::create(mainFrameURL)); // Note: Is the same as sourceOrigin if navigation happens in the top frame

OK, but a little peculiar, to use const here. we have lots of other local variables that are const but we don’t state it explicitly.

> Source/WebCore/loader/ResourceLoadObserver.cpp:74
> +    const String& targetOriginString = targetOrigin->toString();

Clearer to have the type here be String or even auto. Using const String& is misleading because it’s going to be a reference to a temporary anyway.

> Source/WebCore/loader/ResourceLoadObserver.cpp:105
> +                targetStatistics.topFrameHasBeenRedirectedTo++;

I think we should put the ++ at the start of the line on all of these. Makes it easier to spot, and kind of standard for C++ anyway.

> Source/WebCore/loader/ResourceLoadObserver.cpp:123
> +        } else if (!sourceOrigin.get().equal(&targetOrigin.get())) {

We normally prefer targetOrigin.ptr() over &targetOrigin.get().

But also wondering why the equal function requires a pointer rather than a reference.

> Source/WebCore/loader/ResourceLoadObserver.cpp:167
> +            double totalVisited = std::max(static_cast<double>(m_originsVisitedMap.size()), 1.0);
> +            
> +            targetStatistics.subresourceHasBeenSubresourceCountDividedByTotalNumberOfOriginsVisited = static_cast<double>(targetStatistics.subresourceHasBeenSubresourceCount) / totalVisited;

This is overdoing the "double" casting. We need only a single double here to make the floating point division use double, but instead we are making the size and the constant 1 both be double, and also casting the numerator to double. I suggest writing:

    auto totalVisited = std::max(m_originsVisitedMap.size(), 1U);

Or something like that. No need to convert to double before doing the division.

Might also be a better design to just track the numerator and denominator and do the math later when reporting. Not valuable to keep recomputing the ratio as the numbers change.

> Source/WebCore/loader/ResourceLoadObserver.cpp:174
> +        double totalVisited = std::max(static_cast<double>(m_originsVisitedMap.size()), 1.0);
> +        
> +        targetStatistics.subresourceHasBeenSubresourceCountDividedByTotalNumberOfOriginsVisited = static_cast<double>(targetStatistics.subresourceHasBeenSubresourceCount) / totalVisited;

Ditto.

> Source/WebCore/loader/ResourceLoadObserver.cpp:190
> +void ResourceLoadObserver::logIfPrevalentResource(const URL& url, ResourceLoadStatistics& statistics)

Peculiar that this function has a side effect of setting the isPrevalentResource boolean given its name. Quite unclear.

> Source/WebCore/loader/ResourceLoadObserver.cpp:255
> +            primaryDomain = hostSplitOnDot.at(primaryDomainCutOffIndex);
> +            for (size_t j = primaryDomainCutOffIndex + 1; j < hostSplitOnDot.size(); j++)
> +                primaryDomain = primaryDomain + "." + hostSplitOnDot.at(j);

Should not append strings in a loop like this. Super slow and inefficient. Should use StringBuilder for this sort of thing.

> Source/WebCore/loader/ResourceLoadStatistics.cpp:37
> +// Sub frame thresholds
> +static const unsigned subFrameUnderTopFrameOriginsThreshold = 2;
> +static const unsigned subFrameHasBeenRedirectedFromThreshold = 2;
> +static const unsigned subFrameHasBeenRedirectedToThreshold = 3;

We should use the word “subframe”, not the two words “sub frame”, so the identifiers should not have a capital “F” and the comment should not have a space between “Sub” and “frame”.

> Source/WebCore/loader/ResourceLoadStatistics.h:34
> +class ResourceLoadStatistics {

This should be a struct, not a class.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:54
> +    // Home Directory
> +    char buffer[4096];
> +    int bufferSize = sizeof(buffer);
> +    struct passwd pwd;
> +    struct passwd* result = nullptr;
> +    if (getpwuid_r(getuid(), &pwd, buffer, bufferSize, &result) || !result) {
> +        WTFLogAlways("ResourceLoadStatisticsPersistence::persistentStoragePath: Couldn't find home directory\n");
> +        return String();
> +    }
> +    
> +    String resourceLog(pwd.pw_dir);

Should use the homeDirectoryPath() function from FileSystem.h instead of reimplementing it.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:258
> +        deleteFile(resourceLog);

Still a slightly strange choice to delete the file in this case; not entirely sure what the benefit is.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:259
> +        return;

Strange to have a return here at the end of a function.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.h:42
> +class ResourceLoadStatisticsPersistence {

This should not be a class. It has no state. If we do use a class all the function should be static member functions, not const member functions.

> Source/WebCore/loader/ResourceLoadStatisticsPersistence.h:50
> +    ResourceLoadStatisticsPersistence()
> +    {
> +    }
> +    
> +    ~ResourceLoadStatisticsPersistence()
> +    {
> +    }

Please omit these. Defining them like this has no effect.
Comment 27 Brady Eidson 2016-02-02 09:06:25 PST
(I'll review after the next patch is uploaded, which I assume is imminent due to Darin's feedback)
Comment 28 Brent Fulgham 2016-02-02 11:52:06 PST
Comment on attachment 270471 [details]
Updated based on Darin's comments.

View in context: https://bugs.webkit.org/attachment.cgi?id=270471&action=review

>> Source/WebCore/dom/Document.h:1189
>> +    bool userHasInteractedWithDocument() const { return !!m_lastHandledUserGestureTimestamp; }
> 
> No value to adding this function. It would be clearer if this rule ("0 means never stored a time stamp before") was implemented directly in the updateLastHandledUserGestureTimestamp.

OK!

>> Source/WebCore/loader/ResourceLoadObserver.cpp:42
>> +#undef LOG_TO_FILE
> 
> More conventional to use an always defined symbol and define it to either 0 or 1.

I see that in a few other cases; I'll do the same.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:46
>> +typedef HashMap<String, ResourceLoadStatistics>::iterator ResourceLoadStatisticsValue;
> 
> Unused typedef should be removed.

Done.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:72
>> +    const Ref<SecurityOrigin> mainFrameOrigin(SecurityOrigin::create(mainFrameURL)); // Note: Is the same as sourceOrigin if navigation happens in the top frame
> 
> OK, but a little peculiar, to use const here. we have lots of other local variables that are const but we don’t state it explicitly.

I'll switch to auto, as we did elsewhere.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:74
>> +    const String& targetOriginString = targetOrigin->toString();
> 
> Clearer to have the type here be String or even auto. Using const String& is misleading because it’s going to be a reference to a temporary anyway.

OK!

>> Source/WebCore/loader/ResourceLoadObserver.cpp:105
>> +                targetStatistics.topFrameHasBeenRedirectedTo++;
> 
> I think we should put the ++ at the start of the line on all of these. Makes it easier to spot, and kind of standard for C++ anyway.

Will do.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:123
>> +        } else if (!sourceOrigin.get().equal(&targetOrigin.get())) {
> 
> We normally prefer targetOrigin.ptr() over &targetOrigin.get().
> 
> But also wondering why the equal function requires a pointer rather than a reference.

At a minimum, it does seem like it should be "!sourceOrigin->equal(targetOrigin.ptr())"

It's a little weird that the SourceOrigin class wants a pointer to the element being compared with, rather than a reference. Maybe SourceOrigins are frequently nullptr when being compared?

>> Source/WebCore/loader/ResourceLoadObserver.cpp:167
>> +            targetStatistics.subresourceHasBeenSubresourceCountDividedByTotalNumberOfOriginsVisited = static_cast<double>(targetStatistics.subresourceHasBeenSubresourceCount) / totalVisited;
> 
> This is overdoing the "double" casting. We need only a single double here to make the floating point division use double, but instead we are making the size and the constant 1 both be double, and also casting the numerator to double. I suggest writing:
> 
>     auto totalVisited = std::max(m_originsVisitedMap.size(), 1U);
> 
> Or something like that. No need to convert to double before doing the division.
> 
> Might also be a better design to just track the numerator and denominator and do the math later when reporting. Not valuable to keep recomputing the ratio as the numbers change.

Hmm. Maybe it could just be a member function of the statistics class that takes the visited count as an argument.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:174
>> +        targetStatistics.subresourceHasBeenSubresourceCountDividedByTotalNumberOfOriginsVisited = static_cast<double>(targetStatistics.subresourceHasBeenSubresourceCount) / totalVisited;
> 
> Ditto.

Will do.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:190
>> +void ResourceLoadObserver::logIfPrevalentResource(const URL& url, ResourceLoadStatistics& statistics)
> 
> Peculiar that this function has a side effect of setting the isPrevalentResource boolean given its name. Quite unclear.

I changed it to "logAndSetAsPrevalentResourceIfNecessary", and changed "shouldBecomePrevalentResource" to "checkAndSetAsPrevalentResourceIfNecessary"

>> Source/WebCore/loader/ResourceLoadObserver.cpp:255
>> +                primaryDomain = primaryDomain + "." + hostSplitOnDot.at(j);
> 
> Should not append strings in a loop like this. Super slow and inefficient. Should use StringBuilder for this sort of thing.

Oh, Yes! Thank you for catching this.

>> Source/WebCore/loader/ResourceLoadStatistics.cpp:37
>> +static const unsigned subFrameHasBeenRedirectedToThreshold = 3;
> 
> We should use the word “subframe”, not the two words “sub frame”, so the identifiers should not have a capital “F” and the comment should not have a space between “Sub” and “frame”.

Done.

>> Source/WebCore/loader/ResourceLoadStatistics.h:34
>> +class ResourceLoadStatistics {
> 
> This should be a struct, not a class.

Done.

>> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:54
>> +    String resourceLog(pwd.pw_dir);
> 
> Should use the homeDirectoryPath() function from FileSystem.h instead of reimplementing it.

OK!

>> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:258
>> +        deleteFile(resourceLog);
> 
> Still a slightly strange choice to delete the file in this case; not entirely sure what the benefit is.

OK. I'll just log that something unexpected happened.

>> Source/WebCore/loader/ResourceLoadStatisticsPersistence.cpp:259
>> +        return;
> 
> Strange to have a return here at the end of a function.

OK!

>> Source/WebCore/loader/ResourceLoadStatisticsPersistence.h:42
>> +class ResourceLoadStatisticsPersistence {
> 
> This should not be a class. It has no state. If we do use a class all the function should be static member functions, not const member functions.

I think I'll merge it back into ResourceLoadStatistics.

>> Source/WebCore/loader/ResourceLoadStatisticsPersistence.h:50
>> +    }
> 
> Please omit these. Defining them like this has no effect.

OK!
Comment 29 Brent Fulgham 2016-02-02 12:51:55 PST
Created attachment 270508 [details]
Patch
Comment 30 Brent Fulgham 2016-02-02 17:14:20 PST
Comment on attachment 270508 [details]
Patch

I somehow missed a string name change I made while tidying up the patch. Doh!
Comment 31 Brent Fulgham 2016-02-02 17:41:58 PST
Created attachment 270535 [details]
Patch
Comment 32 Darin Adler 2016-02-02 20:35:21 PST
Comment on attachment 270535 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=270535&action=review

> Source/WebCore/loader/ResourceLoadObserver.cpp:240
> +            primaryDomain = hostSplitOnDot.at(primaryDomainCutOffIndex);
> +            StringBuilder builder;
> +            builder.append(primaryDomain);

This should just be:

    StringBuilder builder;
    builder.append(hostSplitOnDot.at(primaryDomainCutOffIndex));

There’s no reason to initialize primaryDomain to this string and then just overwrite it later.

> Source/WebCore/loader/ResourceLoadObserver.cpp:242
> +                builder.append(".");

Should append '.', the character, rather than ".", the string.

> Source/WebCore/loader/ResourceLoadObserver.cpp:291
> +    String fileName = label + "_resourceLog.plist";
> +    return pathByAppendingComponent(homeDirectoryPath(), fileName);

Should be a one liner, not two lines:

    return pathByAppendingComponent(homeDirectoryPath(), label + "_resourceLog.plist");
Comment 33 Alexey Proskuryakov 2016-02-02 22:07:22 PST
Comment on attachment 270535 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=270535&action=review

Is this purely debugging code for MiniBrowser only? If that's the case, I think that we should find a way to not expose this functionality in production via user defaults.

> Source/WebCore/loader/ResourceLoadObserver.cpp:54
> +void ResourceLoadObserver::logFrameNavigation(bool isRedirect, const URL& sourceURL, const URL& targetURL, bool isMainFrame, const URL& mainFrameURL)

This function hardcodes a lot of assumption about how loading works, so much that it seems unlikely that we could maintain it going forward.

> Source/WebCore/loader/ResourceLoadObserver.cpp:65
> +    auto& session = NetworkStorageSession::defaultStorageSession();

Why is the default session always appropriate here?

> Source/WebCore/loader/ResourceLoadObserver.cpp:70
> +                targetStatistics.topFrameHadCookiesBeforeFirstNavigationTo = true;

cookiesForDOM is not the right function to use when dealing with navigation, as it excludes HTTPOnly cookies.

> Source/WebCore/loader/ResourceLoadObserver.cpp:191
> +bool ResourceLoadObserver::isPrevalentResource(const URL& url) const

Even after reading this function, I'm still not sure what a prevalent resource is. Did we use this concept before?

> Source/WebCore/loader/ResourceLoadObserver.cpp:209
> +String ResourceLoadObserver::primaryDomain(const URL& url)

I have not seen the concept of "primary domain" before, what is this function expected to do?

> Source/WebCore/loader/ResourceLoadObserver.h:38
> +class Document;
> +class KeyedDecoder;
> +class KeyedEncoder;
> +struct ResourceLoadStatistics;
> +class URL;

I think that we normally sort these lines, so all class come before all structs.

> Source/WebCore/loader/ResourceLoadStatistics.h:30
> +#include <wtf/text/WTFString.h>

This include shouldn't be needed, wtf/Forward.h declares String.
Comment 34 Brent Fulgham 2016-02-03 11:50:36 PST
Comment on attachment 270535 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=270535&action=review

>> Source/WebCore/loader/ResourceLoadObserver.cpp:54
>> +void ResourceLoadObserver::logFrameNavigation(bool isRedirect, const URL& sourceURL, const URL& targetURL, bool isMainFrame, const URL& mainFrameURL)
> 
> This function hardcodes a lot of assumption about how loading works, so much that it seems unlikely that we could maintain it going forward.

Alexey, I'll have John go over these topics with you separately.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:240
>> +            builder.append(primaryDomain);
> 
> This should just be:
> 
>     StringBuilder builder;
>     builder.append(hostSplitOnDot.at(primaryDomainCutOffIndex));
> 
> There’s no reason to initialize primaryDomain to this string and then just overwrite it later.

OK!

>> Source/WebCore/loader/ResourceLoadObserver.cpp:242
>> +                builder.append(".");
> 
> Should append '.', the character, rather than ".", the string.

Done.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:291
>> +    return pathByAppendingComponent(homeDirectoryPath(), fileName);
> 
> Should be a one liner, not two lines:
> 
>     return pathByAppendingComponent(homeDirectoryPath(), label + "_resourceLog.plist");

Done.

>> Source/WebCore/loader/ResourceLoadObserver.h:38
>> +class URL;
> 
> I think that we normally sort these lines, so all class come before all structs.

Done.

>> Source/WebCore/loader/ResourceLoadStatistics.h:30
>> +#include <wtf/text/WTFString.h>
> 
> This include shouldn't be needed, wtf/Forward.h declares String.

OK!
Comment 35 Brent Fulgham 2016-02-03 11:52:30 PST
Comment on attachment 270535 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=270535&action=review

>>> Source/WebCore/loader/ResourceLoadStatistics.h:30
>>> +#include <wtf/text/WTFString.h>
>> 
>> This include shouldn't be needed, wtf/Forward.h declares String.
> 
> OK!

Nope -- it won't build without the include.
Comment 36 Brent Fulgham 2016-02-03 11:54:03 PST
Created attachment 270593 [details]
Patch
Comment 37 Brent Fulgham 2016-02-08 15:28:01 PST
Created attachment 270886 [details]
Patch
Comment 38 Brady Eidson 2016-02-08 16:29:31 PST
Comment on attachment 270886 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=270886&action=review

I have a few major quibbles.

My overarching concern is that we're starting a new mechanism from scratch - one that will run code on every single load, and one that will touch the disk - yet there's pretty huge gaps in "best practices".

Let's get this clean from the start.

> Source/WebCore/ChangeLog:8
> +        No new tests: No change in behavior.

I'm concerned about the testing strategy here.

I know this is pretty much "initial scaffolding" for a new feature, but IMHO that's the most important time to lay groundwork for testing.

> Source/WebCore/dom/Document.cpp:6401
> +    if (!m_lastHandledUserGestureTimestamp)
> +        ResourceLoadObserver::sharedObserver().logUserInteraction(*this);

So we only care the first time the user interacts with a document?

Why not any subsequent time?

> Source/WebCore/loader/DocumentLoader.cpp:535
> +    ResourceLoadObserver::sharedObserver().logFrameNavigation(!redirectResponse.isNull(), m_frame->document()->url(), newRequest.url(), m_frame->isMainFrame(), topFrame.document()->url());

While this feature is on the ground floor I think we need to get in the habit of always checking if its even enabled before we log stuff.

On the one hand, logFrameNavigation() probably only logs things if logging is enabled.

But on the other hand, we do a lot of unnecessary work just in calling logFrameNavigation() if logging is not enabled.

Historically, all those little allocations and conversations between object types tend to show up on profiles in aggregate.

> Source/WebCore/loader/ResourceLoadObserver.cpp:65
> +    auto& session = NetworkStorageSession::defaultStorageSession();

Why the default session?

> Source/WebCore/loader/ResourceLoadObserver.cpp:79
> +            auto mainFrameOrigin = SecurityOrigin::create(mainFrameURL); // Note: Is the same as sourceOrigin if navigation happens in the top frame

Please don't have comments on the same line as the code they're commenting on.

> Source/WebCore/loader/ResourceLoadObserver.cpp:88
> +            if (isPrevalentResource(targetURL))

What is a "prevalent resource"?
As someone familiar with loading and networking code, I don't know what this means, nor can I guess.

> Source/WebCore/loader/ResourceLoadObserver.cpp:184
> +#if LOG_STATISTICS_TO_INDIVIDUAL_FILES

We're developing something brand new, from the ground up - Why are we giving it two different *compile-time* modes right out of the gate?

It doesn't make sense to me why we need this.

> Source/WebCore/loader/ResourceLoadObserver.cpp:223
> +        int primaryDomainCutOffIndex = hostSplitOnDot.size() - 2; // Type int since we're looping backwards and it can become negative

Please don't have comments on the same line as the code they're commenting on.

> Source/WebCore/loader/ResourceLoadObserver.cpp:225
> +        size_t numberOfParts = 1; // Start with TLD as a given part

Please don't have comments on the same line as the code they're commenting on.

> Source/WebCore/loader/ResourceLoadObserver.cpp:232
> +            if (hostSplitOnDot.at(primaryDomainCutOffIndex).length() >= 4 || numberOfParts >= 3) {
> +                // We have either a domain part that's 4 chars or longer, or 3 domain parts including TLD
> +                break;
> +            }

This comment could go above the "if", removing the stylistic need for the braces.

> Source/WebCore/loader/ResourceLoadObserver.cpp:321
> +    ASSERT(visitedOrigins == static_cast<size_t>(loadedOrigins.size())); // I add an extra key/value to hold total size

Please don't have comments on the same line as the code they're commenting on.

"I add..." is kind of a weird comment.

> Source/WebCore/loader/ResourceLoadObserver.cpp:337
> +void ResourceLoadObserver::writeEncoderToDisk(KeyedEncoder& encoder, const String& label) const
> +{
> +#if LOG_STATISTICS_TO_FILE

This method is called "write to disk", then if this mysterious compile time flag is set it writes stuff to disk... but if the flag is not set, it does literally nothing.

Again, wondering what the point is?

> Source/WebCore/loader/SubresourceLoader.cpp:194
> +    ResourceLoadObserver::sharedObserver().logSubresourceLoading(!redirectResponse.isNull(), redirectResponse.url(), newRequest.url(), m_frame ? m_frame->mainFrame().document()->url() : URL());

As mentioned before, this should only be called if the feature is currently enabled (otherwise we do some potentially costly work for no reason)

> Tools/MiniBrowser/mac/SettingsController.m:116
> +    [self _addItemWithTitle:@"Enable Resource Load Statistics" action:@selector(toggleResourceLoadStatisticsEnabled:) indented:NO];

Indication that this is WK1-only for now?
Comment 39 Brent Fulgham 2016-02-08 17:59:18 PST
Comment on attachment 270886 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=270886&action=review

>> Source/WebCore/ChangeLog:8
>> +        No new tests: No change in behavior.
> 
> I'm concerned about the testing strategy here.
> 
> I know this is pretty much "initial scaffolding" for a new feature, but IMHO that's the most important time to lay groundwork for testing.

OK

>> Source/WebCore/loader/DocumentLoader.cpp:535
>> +    ResourceLoadObserver::sharedObserver().logFrameNavigation(!redirectResponse.isNull(), m_frame->document()->url(), newRequest.url(), m_frame->isMainFrame(), topFrame.document()->url());
> 
> While this feature is on the ground floor I think we need to get in the habit of always checking if its even enabled before we log stuff.
> 
> On the one hand, logFrameNavigation() probably only logs things if logging is enabled.
> 
> But on the other hand, we do a lot of unnecessary work just in calling logFrameNavigation() if logging is not enabled.
> 
> Historically, all those little allocations and conversations between object types tend to show up on profiles in aggregate.

That sounds reasonable. We can revise things to avoid the work if the feature is disabled.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:79
>> +            auto mainFrameOrigin = SecurityOrigin::create(mainFrameURL); // Note: Is the same as sourceOrigin if navigation happens in the top frame
> 
> Please don't have comments on the same line as the code they're commenting on.

OK!

>> Source/WebCore/loader/ResourceLoadObserver.cpp:184
>> +#if LOG_STATISTICS_TO_INDIVIDUAL_FILES
> 
> We're developing something brand new, from the ground up - Why are we giving it two different *compile-time* modes right out of the gate?
> 
> It doesn't make sense to me why we need this.

This is basically debugging code that I left in for convenience. I'll strip it from the patch since no one else is likely to want this stuff, and it's easy enough to put back if I do want it.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:223
>> +        int primaryDomainCutOffIndex = hostSplitOnDot.size() - 2; // Type int since we're looping backwards and it can become negative
> 
> Please don't have comments on the same line as the code they're commenting on.

OK!

>> Source/WebCore/loader/ResourceLoadObserver.cpp:225
>> +        size_t numberOfParts = 1; // Start with TLD as a given part
> 
> Please don't have comments on the same line as the code they're commenting on.

OK!

>> Source/WebCore/loader/ResourceLoadObserver.cpp:232
>> +            }
> 
> This comment could go above the "if", removing the stylistic need for the braces.

Good idea!

>> Source/WebCore/loader/ResourceLoadObserver.cpp:321
>> +    ASSERT(visitedOrigins == static_cast<size_t>(loadedOrigins.size())); // I add an extra key/value to hold total size
> 
> Please don't have comments on the same line as the code they're commenting on.
> 
> "I add..." is kind of a weird comment.

Ha! It's not even true anymore!

A great example of why comments are worse than code that just says what it does.

>> Source/WebCore/loader/ResourceLoadObserver.cpp:337
>> +#if LOG_STATISTICS_TO_FILE
> 
> This method is called "write to disk", then if this mysterious compile time flag is set it writes stuff to disk... but if the flag is not set, it does literally nothing.
> 
> Again, wondering what the point is?

One of the reasons I wanted you to review this patch was to discuss how to store these statistics. Right now I'm throwing them into a plist so I can play around with this stuff, but I'm not sure how this should be handled longer term.

Essentially this is "writeEncoderToLongTermStorage" or something like that. But today, that's a plist in my home directory -- that's not how this should ship.

This is essentially a cache that I want to reload when I launch so I can continue gathering statistics as I browse the web. What's the correct place to store this kind of stuff?

>> Source/WebCore/loader/SubresourceLoader.cpp:194
>> +    ResourceLoadObserver::sharedObserver().logSubresourceLoading(!redirectResponse.isNull(), redirectResponse.url(), newRequest.url(), m_frame ? m_frame->mainFrame().document()->url() : URL());
> 
> As mentioned before, this should only be called if the feature is currently enabled (otherwise we do some potentially costly work for no reason)

Done!

>> Tools/MiniBrowser/mac/SettingsController.m:116
>> +    [self _addItemWithTitle:@"Enable Resource Load Statistics" action:@selector(toggleResourceLoadStatisticsEnabled:) indented:NO];
> 
> Indication that this is WK1-only for now?

Hmm. I can rename this to "Enable WK1 Resource Load Statistics" for now.
Comment 40 Brady Eidson 2016-02-09 14:10:45 PST
(In reply to comment #39)
> Comment on attachment 270886 [details]
> Patch
> >> Source/WebCore/loader/ResourceLoadObserver.cpp:337
> >> +#if LOG_STATISTICS_TO_FILE
> > 
> > This method is called "write to disk", then if this mysterious compile time flag is set it writes stuff to disk... but if the flag is not set, it does literally nothing.
> > 
> > Again, wondering what the point is?
> 
> One of the reasons I wanted you to review this patch was to discuss how to
> store these statistics. Right now I'm throwing them into a plist so I can
> play around with this stuff, but I'm not sure how this should be handled
> longer term.
> 
> Essentially this is "writeEncoderToLongTermStorage" or something like that.
> But today, that's a plist in my home directory -- that's not how this should
> ship.
> 
> This is essentially a cache that I want to reload when I launch so I can
> continue gathering statistics as I browse the web. What's the correct place
> to store this kind of stuff?

A flat plist is fine while its under development (obviously not for shipping).

We can figure out the longer term later.

You call it a "cache", so if you're right and it really is a cache (e.g., purgable) then it can go in ~/Library/Caches/<app bundle id>

If you don't want it to be anywhere purgable, ~/Library/Application Support/<app bundle id>

In either case, the path should come from WebKit and not WebCore.
Comment 41 Brent Fulgham 2016-02-10 16:32:01 PST
Created attachment 271037 [details]
Patch
Comment 42 Brent Fulgham 2016-02-10 16:35:37 PST
(In reply to comment #40)
> You call it a "cache", so if you're right and it really is a cache (e.g.,
> purgable) then it can go in ~/Library/Caches/<app bundle id>
> 
> If you don't want it to be anywhere purgable, ~/Library/Application
> Support/<app bundle id>
> 
> In either case, the path should come from WebKit and not WebCore.

I decided it wasn't really a cache because we can't purge it at any time without losing history. So I am using the OS API calls to get the program's "Application Support" directory to store the file.
Comment 43 Brent Fulgham 2016-02-11 12:25:03 PST
(In reply to comment #39)
> Comment on attachment 270886 [details]
> >> +        No new tests: No change in behavior.
> > 
> > I'm concerned about the testing strategy here.
> > 
> > I know this is pretty much "initial scaffolding" for a new feature, but IMHO that's the most important time to lay groundwork for testing.

I think what I'd like to do is expose something through the Internals object that would allow us to dump the statistics in JS as part of a layout test.

I could do something like our existing redirect or history tests, and just confirm that counts changed as expected or something.

At present, this could only be a WK1 test.

Do you want that to be in place as part of this initial scaffolding, or can we add it as part of the next step?
Comment 44 Brady Eidson 2016-02-11 13:18:35 PST
(In reply to comment #43)
> (In reply to comment #39)
> > Comment on attachment 270886 [details]
> > >> +        No new tests: No change in behavior.
> > > 
> > > I'm concerned about the testing strategy here.
> > > 
> > > I know this is pretty much "initial scaffolding" for a new feature, but IMHO that's the most important time to lay groundwork for testing.
> 
> I think what I'd like to do is expose something through the Internals object
> that would allow us to dump the statistics in JS as part of a layout test.
> 
> I could do something like our existing redirect or history tests, and just
> confirm that counts changed as expected or something.
> 
> At present, this could only be a WK1 test.
> 
> Do you want that to be in place as part of this initial scaffolding, or can
> we add it as part of the next step?

Ready for mind==blown?  How about the Internals test dumping support... *before* this patch?  =D =D =D
Comment 45 Brent Fulgham 2016-02-12 13:26:57 PST
Created attachment 271210 [details]
Patch
Comment 46 Brady Eidson 2016-02-12 14:26:25 PST
Comment on attachment 271210 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=271210&action=review

> Source/WebCore/loader/ResourceLoadObserver.cpp:64
> +    auto& session = NetworkStorageSession::defaultStorageSession();

Both Alexey and I have asked this, and I don't see a response yet - Why is the default session the right one to use?

This seems quite wrong.
Comment 47 Brent Fulgham 2016-02-12 16:47:46 PST
(In reply to comment #46)
> Comment on attachment 271210 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=271210&action=review
> 
> > Source/WebCore/loader/ResourceLoadObserver.cpp:64
> > +    auto& session = NetworkStorageSession::defaultStorageSession();
> 
> Both Alexey and I have asked this, and I don't see a response yet - Why is
> the default session the right one to use?

It turns out this isn't needed. I'm correcting the patch.
Comment 48 Brent Fulgham 2016-02-12 17:24:05 PST
Created attachment 271252 [details]
Patch
Comment 49 Brent Fulgham 2016-02-15 13:09:24 PST
Created attachment 271363 [details]
Disable write-to-disk in normal build.
Comment 50 Brady Eidson 2016-02-15 16:09:44 PST
Comment on attachment 271363 [details]
Disable write-to-disk in normal build.

r+ as long as Windows builds!
Comment 51 Brent Fulgham 2016-02-15 20:35:12 PST
Windows build was failing because it didn't compile the KeyedEncoderCF/KeyedDecoderCF code.
Comment 52 Brent Fulgham 2016-02-15 21:36:35 PST
Committed r196622: <http://trac.webkit.org/changeset/196622>