Bug 164527 - Network event record/replay
Summary: Network event record/replay
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Keith Rollin
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2016-11-08 15:16 PST by Keith Rollin
Modified: 2016-12-08 23:49 PST (History)
9 users (show)

See Also:


Attachments
This is not entirely tested, so doesn't necessarily completely work. I'm uploading this early patch for early review comments since there's a significant amount of work here. (143.57 KB, patch)
2016-11-08 15:55 PST, Keith Rollin
no flags Details | Formatted Diff | Diff
Patch (135.04 KB, patch)
2016-11-16 12:32 PST, Keith Rollin
no flags Details | Formatted Diff | Diff
Patch (134.85 KB, patch)
2016-11-18 16:01 PST, Keith Rollin
no flags Details | Formatted Diff | Diff
Patch (134.92 KB, patch)
2016-11-18 16:19 PST, Keith Rollin
no flags Details | Formatted Diff | Diff
Patch (515.36 KB, patch)
2016-11-30 16:16 PST, Keith Rollin
no flags Details | Formatted Diff | Diff
Patch (515.36 KB, patch)
2016-11-30 17:40 PST, Keith Rollin
no flags Details | Formatted Diff | Diff
Patch (516.59 KB, patch)
2016-12-07 13:23 PST, Keith Rollin
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Keith Rollin 2016-11-08 15:16:57 PST
Add mechanism for recording and replaying network events as part of rdar://problem/28273781 ("New Content Capture for Performance Benchmarks").
Comment 1 Radar WebKit Bug Importer 2016-11-08 15:20:12 PST
<rdar://problem/29168157>
Comment 2 Keith Rollin 2016-11-08 15:55:16 PST
Created attachment 294198 [details]
This is not entirely tested, so doesn't necessarily completely work. I'm uploading this early patch for early review comments since there's a significant amount of work here.
Comment 3 Keith Rollin 2016-11-16 12:32:55 PST
Created attachment 294962 [details]
Patch
Comment 4 WebKit Commit Bot 2016-11-16 12:35:16 PST
Attachment 294962 [details] did not pass style-queue:


ERROR: Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.h:37:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:38:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WebKit2/NetworkProcess/capture/NetworkCaptureRecorder.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 3 in 26 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 5 Keith Rollin 2016-11-18 16:01:15 PST
Created attachment 295208 [details]
Patch
Comment 6 Keith Rollin 2016-11-18 16:19:42 PST
Created attachment 295212 [details]
Patch
Comment 7 WebKit Commit Bot 2016-11-18 16:22:50 PST
Attachment 295212 [details] did not pass style-queue:


ERROR: Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.h:37:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:38:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WebKit2/NetworkProcess/capture/NetworkCaptureRecorder.h:34:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 3 in 26 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Darin Adler 2016-11-22 10:50:29 PST
Comment on attachment 295212 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=295212&action=review

I didn’t get a chance to review the whole patch yet, but this looks really good. I’ll review the details of the patch as soon as I get a chance, but if someone else wants to do it first, please don’t wait for me.

> Source/WebKit2/NetworkProcess/NetworkLoad.cpp:95
> +    if (!m_task)
> +#endif
> +
>      m_task = NetworkDataTask::create(networkSession, *this, m_parameters);

This is awkward. We should try to find a more elegant way to do this. Maybe just create a shouldCreateTask boolean to avoid the if that straddles and #endif boundary.
Comment 9 Antti Koivisto 2016-11-23 06:19:55 PST
Comment on attachment 295212 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=295212&action=review

>> Source/WebKit2/NetworkProcess/NetworkLoad.cpp:95
>>      m_task = NetworkDataTask::create(networkSession, *this, m_parameters);
> 
> This is awkward. We should try to find a more elegant way to do this. Maybe just create a shouldCreateTask boolean to avoid the if that straddles and #endif boundary.

Or just move if (!m_task) to the other side. It is not like we need to branch optimize here.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:38
> +namespace WebKit {
> +namespace NetworkCapture {
> +
> +CachedResource::CachedResource(const String& eventFilePath)

Does "Cached" add anything? Maybe just WebCore::NetworkCapture::Resource?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:97
> +    , m_mappedEventFile(WebCore::MappedFileData(m_eventFilePath, m_haveMappedEventFile))

, m_mappedEventFile(m_eventFilePath, m_haveMappedEventFile)

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:102
> +WTF::Optional<CaptureEvent> CachedResource::EventStream::nextEvent()

No need for "WTF::", here or elsewhere.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:106
> +        return WTF::Optional<CaptureEvent>();

return { };

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.h:87
> +    // Lazily calculated.
> +    WebCore::URL m_url;
> +    WebCore::URL m_baseURL;
> +    WebCore::URLParser::URLEncodedForm m_queryParameters;
> +
> +    bool m_haveURL { false };
> +    bool m_haveBaseURL { false };
> +    bool m_haveQueryParameters { false };

Often this sort of bool+field pairs end up reading better with Optional:

Optional<WebCore::URL> m_url;

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:46
> +void ArgumentCoder<CaptureTimeType>::encode(IPC::Encoder& encoder, const CaptureTimeType& time)

As discussed, this shouldn't use IPC coders since they are not safe to persist. Cache coders are a better option though they haven't really been desingned for permanent storage either. Using something like plist or json might be a good idea.

Still, this is probably ok to get things up and running.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:82
> +    static constexpr uint8_t kType { CaptureEventTypes::TypeIndex<RequestSentEvent> };

We don't usually use kFoo naming. How about just 'type'?

Does constexpr do something here?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:468
> +bool Manager::ensureFileHandle(const String& filePath, FileHandle* fileHandle)

This could take FileHandle& instead. 

This is always called before printToFile(). Maybe it could be invoked unconditionally and this functionality be part of printToFile.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:486
> +    FileHandle f = FileHandle::openFile(filePath, mode);

Please use full variable names. Also this sort of things can use auto.
Comment 10 Keith Rollin 2016-11-23 16:28:13 PST
(In reply to comment #8)
> Comment on attachment 295212 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=295212&action=review
> 
> I didn’t get a chance to review the whole patch yet, but this looks really
> good. I’ll review the details of the patch as soon as I get a chance, but if
> someone else wants to do it first, please don’t wait for me.

I've asked Antti and Alex for their reviews.

> > Source/WebKit2/NetworkProcess/NetworkLoad.cpp:95
> > +    if (!m_task)
> > +#endif
> > +
> >      m_task = NetworkDataTask::create(networkSession, *this, m_parameters);
> 
> This is awkward. We should try to find a more elegant way to do this. Maybe
> just create a shouldCreateTask boolean to avoid the if that straddles and
> #endif boundary.

I agree on the awkwardness. I went with Antti's suggestion.
Comment 11 Keith Rollin 2016-11-23 16:40:04 PST
(In reply to comment #9)
> Comment on attachment 295212 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=295212&action=review
> 
> >> Source/WebKit2/NetworkProcess/NetworkLoad.cpp:95
> >>      m_task = NetworkDataTask::create(networkSession, *this, m_parameters);
> > 
> > This is awkward. We should try to find a more elegant way to do this. Maybe just create a shouldCreateTask boolean to avoid the if that straddles and #endif boundary.
> 
> Or just move if (!m_task) to the other side. It is not like we need to
> branch optimize here.

Done.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:38
> > +namespace WebKit {
> > +namespace NetworkCapture {
> > +
> > +CachedResource::CachedResource(const String& eventFilePath)
> 
> Does "Cached" add anything? Maybe just WebCore::NetworkCapture::Resource?

I could agree to that. Especially since there's another CachedResource in WebCore.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:97
> > +    , m_mappedEventFile(WebCore::MappedFileData(m_eventFilePath, m_haveMappedEventFile))
> 
> , m_mappedEventFile(m_eventFilePath, m_haveMappedEventFile)

Yikes, what was I thinking?

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:102
> > +WTF::Optional<CaptureEvent> CachedResource::EventStream::nextEvent()
> 
> No need for "WTF::", here or elsewhere.

Done.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:106
> > +        return WTF::Optional<CaptureEvent>();
> 
> return { };

Done.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.h:87
> > +    // Lazily calculated.
> > +    WebCore::URL m_url;
> > +    WebCore::URL m_baseURL;
> > +    WebCore::URLParser::URLEncodedForm m_queryParameters;
> > +
> > +    bool m_haveURL { false };
> > +    bool m_haveBaseURL { false };
> > +    bool m_haveQueryParameters { false };
> 
> Often this sort of bool+field pairs end up reading better with Optional:
> 
> Optional<WebCore::URL> m_url;

Done. This will lead to a change in behavior, but I'm OK with that. The old way, if the value could not be lazily calculated, an empty instance will be returned. The new way, there may not be an empty instance to return, so the process may crash (due to calling Optional::get() on an "unengaged" instance). I'm OK with that because there's nothing useful to be done if the lazily-calculated value can't be calculated.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:46
> > +void ArgumentCoder<CaptureTimeType>::encode(IPC::Encoder& encoder, const CaptureTimeType& time)
> 
> As discussed, this shouldn't use IPC coders since they are not safe to
> persist. Cache coders are a better option though they haven't really been
> desingned for permanent storage either. Using something like plist or json
> might be a good idea.
> 
> Still, this is probably ok to get things up and running.

I've got a JSON-based implementation for an upcoming patch. It looks and works great.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:82
> > +    static constexpr uint8_t kType { CaptureEventTypes::TypeIndex<RequestSentEvent> };
> 
> We don't usually use kFoo naming. How about just 'type'?
> 
> Does constexpr do something here?

This will change in the JSON-based implementation. A scalar-based kType seemed like a good idea for an IPC::Encoder-based storage mechanism, but a string-based kTypeName fits better with a JSON-based storage mechanism. Those strings are already just "const", and I'll change the name to "typeName".

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:468
> > +bool Manager::ensureFileHandle(const String& filePath, FileHandle* fileHandle)
> 
> This could take FileHandle& instead. 

Done.

> This is always called before printToFile(). Maybe it could be invoked
> unconditionally and this functionality be part of printToFile.

I'm not sure what you're envisioning here. Do you mean change something like:

    if (ensureFileHandle(reportLoadPath(), m_loadFileHandle))
        printToFile(m_loadFileHandle, "%s\n", DEBUG_STR(resource.url().string()));

to:

    printToFile(reportLoadPath(), m_loadFileHandle, "%s\n", DEBUG_STR(resource.url().string()));

I would have to pass in the name of the file in order to merge ensureFileHandle wth printToFile. And while that seems unwieldy to me, I guess it's nonetheless better the two-line sequences I currently have.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:486
> > +    FileHandle f = FileHandle::openFile(filePath, mode);
> 
> Please use full variable names. Also this sort of things can use auto.

Done (here and in other places).
Comment 12 Keith Rollin 2016-11-23 16:41:02 PST
I'll post a new patch once Alex has a chance to review the old one.
Comment 13 Antti Koivisto 2016-11-24 02:07:34 PST
> I'm not sure what you're envisioning here. Do you mean change something like:
> 
>     if (ensureFileHandle(reportLoadPath(), m_loadFileHandle))
>         printToFile(m_loadFileHandle, "%s\n",
> DEBUG_STR(resource.url().string()));
> 
> to:
> 
>     printToFile(reportLoadPath(), m_loadFileHandle, "%s\n",
> DEBUG_STR(resource.url().string()));
> 
> I would have to pass in the name of the file in order to merge
> ensureFileHandle wth printToFile. And while that seems unwieldy to me, I
> guess it's nonetheless better the two-line sequences I currently have.

Or maybe just do ensureFileHandle() for the relevant file once when starting recording/playback and have printToFile do nothing if the file is not open. 

Or maybe FileHandle could be more of a File and have open/closed states. Then it could encapsulate the path for files that are not open too:

Manager::Manager() {
   : m_loadFile(reportLoadPath())

printToFile(m_loadFile, ...):
...

void printToFile(File& file) {
    if (!file.ensureOpen())
       return;
    ...

Anyway, not super important.
Comment 14 Alex Christensen 2016-11-28 22:33:32 PST
Comment on attachment 295212 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=295212&action=review

partial review.  This is going to be really cool, but it needs some refining.

>>>>> Source/WebKit2/NetworkProcess/NetworkLoad.cpp:95
>>>>>      m_task = NetworkDataTask::create(networkSession, *this, m_parameters);
>>>> 
>>>> This is awkward. We should try to find a more elegant way to do this. Maybe just create a shouldCreateTask boolean to avoid the if that straddles and #endif boundary.
>>> 
>>> Or just move if (!m_task) to the other side. It is not like we need to branch optimize here.
>> 
>> I agree on the awkwardness. I went with Antti's suggestion.
> 
> Done.

Most users of WebKit will not be using NetworkCapture at all, so I think we should make it a quick check, which would also make it more straightforward.

switch (NetworkCapture::Manager::singleton().state()) {
case Disabled:
    // This is the common case.
    doRegularStuff();
    return;
case Recording:
    doRecordingStuff();
    return;
case Replaying:
    doReplayingStuff();
    return;
}

> Source/WebKit2/NetworkProcess/NetworkProcess.cpp:233
> +        parameters.shouldRecordNetworkRequests,
> +        parameters.shouldReplayNetworkRequests,
> +        parameters.recordReplayCacheLocation);

This could definitely be put into an enum instead of two bools.  Would we ever be recording and replaying at the same time?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:84
> +        m_queryParameters = WebCore::URLParser::parseURLEncodedForm(url().query());

I intended this to only be used with URLSearchParams which takes any string as input.  This takes a URL's query, which has already been parsed.  I'm not sure what I think about the use here.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:111
> +        DEBUG_LOG_ERROR("Unable to read event type: file = %{public}s", DEBUG_STR(m_eventFilePath));

I'm curious about this "%{public}s".  You're the only one who seems to have used it in WebKit, Keith.  What is it and what does it do?  Should more people be using it?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:130
> +        return WTF::Optional<CaptureEvent>();

Isn't this the same as Nullopt?

>>> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:46
>>> +void ArgumentCoder<CaptureTimeType>::encode(IPC::Encoder& encoder, const CaptureTimeType& time)
>> 
>> As discussed, this shouldn't use IPC coders since they are not safe to persist. Cache coders are a better option though they haven't really been desingned for permanent storage either. Using something like plist or json might be a good idea.
>> 
>> Still, this is probably ok to get things up and running.
> 
> I've got a JSON-based implementation for an upcoming patch. It looks and works great.

If you have a JSON-based implementation, should we even land this binary implementation?

Also, I don't quite understand how this recorded cache is going to be used.  Will it be read from disk and the entire recording stored in memory in the NetworkProcess, or just read from disk as needed?  It could get really big.  (The answer to this question is probably in this patch, but I haven't read the whole thing yet.)

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:135
> +    // Force the creation of needed platform-specific data that otherwise is not created.

Why?  If this is actually needed, we shouldn't need (void).

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:156
> +        return WTF::Optional<CaptureEvent>();

Nullopt

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:195
> +        return WTF::Optional<CaptureEvent>();

Nullopt

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:197
> +    return WTF::Optional<CaptureEvent>(CaptureEvent(FinishedEvent(WTFMove(time), WTFMove(error))));

WTF::Optional<...> not needed, I think.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:72
> +    CaptureTimeType time;

Is this used in this patch?  It seems like it's just encoded, decoded, and moved around.  It would be good to have.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:88
> +        : TimedEvent(), response(response) { }

response should be on its own line here and elsewhere in this file.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureLogging.h:42
> +#define DEBUG_LOG(...)                  ((void) 0)

Lots of unnecessary spaces.
Also, why not just define it to be nothing?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:63
> +// TODO: What's the threading model in which we're running? We're acting as a
> +// singleton that gets called all over the place, but we're not doing anything
> +// to protect ourself from multi-threaded access. On the other hand, perhaps
> +// we're OK without synchronization. When recording, we're just writing data to
> +// files. When playing things back, we initialize our data up front and then
> +// use it in a read-only manner afterwards.

Things like this are done on the main thread in the NetworkProcess right now.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:71
> +void Manager::initialize(

Let's just put all the parameters on the same line.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:90
> +void Manager::terminate()
> +{
> +}

This doesn't do anything.  Why not just make the Manager a member variable of the NetworkProcess singleton?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:160
> +int Manager::fuzzyMatchURLs(

parameters on same line.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:167
> +#if 0

:(

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:215
> +    int score = 0;

can score ever be negative?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:402
> +    WTF::MD5 md5;

This is a strange hash function.  Why MD5?  We already have StringHash and StringHasher.  Were they lacking something?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:417
> +    String result;

StringBuilder.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.h:85
> +    int fuzzyMatchURLs(

Is the need for fuzzy matching driven by any real-world examples?  I could see it leading to nondeterministic behavior, which could cause strange things to happen when measuring performance data.  Let's talk in person about this.

> Source/WebKit2/NetworkProcess/capture/NetworkDataTaskReplay.h:92
> +    enum { kPositionNotSpecified = -1 };

This is never used.
Comment 15 Alex Christensen 2016-11-29 07:24:56 PST
Comment on attachment 295212 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=295212&action=review

>> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.h:85
>> +    int fuzzyMatchURLs(
> 
> Is the need for fuzzy matching driven by any real-world examples?  I could see it leading to nondeterministic behavior, which could cause strange things to happen when measuring performance data.  Let's talk in person about this.

Cache busting.  Of course this is necessary
Comment 16 Keith Rollin 2016-11-29 10:59:32 PST
(In reply to comment #14)
> Comment on attachment 295212 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=295212&action=review
> 
> partial review.  This is going to be really cool, but it needs some refining.
> 
> >>>>> Source/WebKit2/NetworkProcess/NetworkLoad.cpp:95
> >>>>>      m_task = NetworkDataTask::create(networkSession, *this, m_parameters);
> >>>> 
> >>>> This is awkward. We should try to find a more elegant way to do this. Maybe just create a shouldCreateTask boolean to avoid the if that straddles and #endif boundary.
> >>> 
> >>> Or just move if (!m_task) to the other side. It is not like we need to branch optimize here.
> >> 
> >> I agree on the awkwardness. I went with Antti's suggestion.
> > 
> > Done.
> 
> Most users of WebKit will not be using NetworkCapture at all, so I think we
> should make it a quick check, which would also make it more straightforward.
> 
> switch (NetworkCapture::Manager::singleton().state()) {
> case Disabled:
>     // This is the common case.
>     doRegularStuff();
>     return;
> case Recording:
>     doRecordingStuff();
>     return;
> case Replaying:
>     doReplayingStuff();
>     return;
> }

Currently, the NetworkCapture facility is optionally included. I'm not sure what maintaining that would do to the above. Wouldn't it end up looking like the following?

#if ENABLE(NETWORK_CAPTURE)
switch (NetworkCapture::Manager::singleton().state()) {
case Disabled:
    // This is the common case.
    doRegularStuff();
    return;
case Recording:
    doRecordingStuff();
    return;
case Replaying:
    doReplayingStuff();
    return;
}
#else
    doRegularStuff();
#endif

> > Source/WebKit2/NetworkProcess/NetworkProcess.cpp:233
> > +        parameters.shouldRecordNetworkRequests,
> > +        parameters.shouldReplayNetworkRequests,
> > +        parameters.recordReplayCacheLocation);
> 
> This could definitely be put into an enum instead of two bools.  Would we
> ever be recording and replaying at the same time?

Probably not. I'll make the change.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:84
> > +        m_queryParameters = WebCore::URLParser::parseURLEncodedForm(url().query());
> 
> I intended this to only be used with URLSearchParams which takes any string
> as input.  This takes a URL's query, which has already been parsed.  I'm not
> sure what I think about the use here.

What's the alternative for getting the query parameters from the URL in a form where I can examine each key/value pair?

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:111
> > +        DEBUG_LOG_ERROR("Unable to read event type: file = %{public}s", DEBUG_STR(m_eventFilePath));
> 
> I'm curious about this "%{public}s".  You're the only one who seems to have
> used it in WebKit, Keith.  What is it and what does it do?  Should more
> people be using it?

It depends on your approach to debugging. Some people use Xcode and its debugger. Some use logging to the Xcode console. Some use logging to the terminal. Some use logging to the Console. There are likely other approaches of which I'm completely unaware. Personally, when I joined WebKit, I was first involved with adding support for the new os_log() logging facility and the new Console application. I thus got headed down that route out of the starting gate. As part of os_log(), strings are not captured by default. In order to tell os_log() that you are not capturing information that is possibly personally identifiable, you need to annotate the string as you see it. Since this logging is only enabled when the record/replay facility is activated, and since it won't be activated on user's devices, I've marked all strings as {public}, allowing me to capture things such as URLs and filenames.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:130
> > +        return WTF::Optional<CaptureEvent>();
> 
> Isn't this the same as Nullopt?

I'm not familiar with Nullopt and don't find it in WebKit sources anywhere. I see std::nullopt on cppreference.com. Is that what you're referring to? I've not used this before. I now see it used liberally through WebKit. I believe you're suggesting I use "return std::nullopt"? I could do that, but Antti suggested using "return { }". Is one preferred over the other?

> >>> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:46
> >>> +void ArgumentCoder<CaptureTimeType>::encode(IPC::Encoder& encoder, const CaptureTimeType& time)
> >> 
> >> As discussed, this shouldn't use IPC coders since they are not safe to persist. Cache coders are a better option though they haven't really been desingned for permanent storage either. Using something like plist or json might be a good idea.
> >> 
> >> Still, this is probably ok to get things up and running.
> > 
> > I've got a JSON-based implementation for an upcoming patch. It looks and works great.
> 
> If you have a JSON-based implementation, should we even land this binary
> implementation?

I could upload the JSON version if you want. I was working on the JSON version as a follow-up to the binary version in order to keep a pipeline of changes going. But with the Thanksgiving holiday holding up reviews for a while, the JSON implementation came in before this current patch could be fully reviewed.

So, whichever way you prefer. Whatever makes it easier for you to review.

> Also, I don't quite understand how this recorded cache is going to be used. 
> Will it be read from disk and the entire recording stored in memory in the
> NetworkProcess, or just read from disk as needed?  It could get really big. 
> (The answer to this question is probably in this patch, but I haven't read
> the whole thing yet.)

The idea is that minimal cache information will be read in to serve as an index of available resources. The actual resources won't be read in until actually needed.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:135
> > +    // Force the creation of needed platform-specific data that otherwise is not created.
> 
> Why?  If this is actually needed, we shouldn't need (void).

The information that we need created is some state internal to the object (in particular, the m_nsResponse object).

This kludge is not needed or used in the JSON implementation, wherein the ResourceResponse is created in a different way that allows the m_nsResponse to be created without forcing it with this dummy call.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:156
> > +        return WTF::Optional<CaptureEvent>();
> 
> Nullopt
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:195
> > +        return WTF::Optional<CaptureEvent>();
> 
> Nullopt

Ok. Ok. :-)

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:197
> > +    return WTF::Optional<CaptureEvent>(CaptureEvent(FinishedEvent(WTFMove(time), WTFMove(error))));
> 
> WTF::Optional<...> not needed, I think.

I think this was supposed to follow the form of the other similar methods that resurrect stored events and call the wrap() function to take care of all that nesting. And I think I tried it without the WTF::Optional<...> and it didn't work.

Since this code is based on IPC::Decoder, it's all getting replaced in the JSON version anyway.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:72
> > +    CaptureTimeType time;
> 
> Is this used in this patch?  It seems like it's just encoded, decoded, and
> moved around.  It would be good to have.

It's not currently used. It's there because we intend to use it to optionally replay events at the same rate they were recorded.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:88
> > +        : TimedEvent(), response(response) { }
> 
> response should be on its own line here and elsewhere in this file.

OK.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureLogging.h:42
> > +#define DEBUG_LOG(...)                  ((void) 0)
> 
> Lots of unnecessary spaces.
> Also, why not just define it to be nothing?

The spaces are there to line up the definitions. For me, it makes it easier to read. It also helps me see the differences between similar macros.

Defining a function-like macro to ((void) 0) is a standard technique for accounting for the trailing semi-colon. If you don't account for it, some compilers will complain about "if" or "else" bodies that contain a bare semi-colon, or about seeing two semi-colons in a row (one from the now-missing logging line, and one from the previous statement). You can find other examples in WebKit by searching for "((void)0)".

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:71
> > +void Manager::initialize(
> 
> Let's just put all the parameters on the same line.

OK.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:90
> > +void Manager::terminate()
> > +{
> > +}
> 
> This doesn't do anything.

It used to. It used to close the various logging files that I open up and write to over the course of recording or replaying. The code in there got removed when I added the FileHandle class to manage the lifetime of the open files. I kept the terminate function around in case I came up with a new use for it. And come to think of it, I may actually still need it for closing files. Since the Manager is a NeverDestroyed singleton, it's destructor will never be called. That means that the FileHandle objects managing the lifetime of my files will never be destroyed. Which means that the files they manage will not be properly closed. I don't know if this will lead to information not getting flushed to disk or not.

> Why not just make the Manager a member variable
> of the NetworkProcess singleton?

I guess that could be done. What are the advantages or reasons for doing so?

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:160
> > +int Manager::fuzzyMatchURLs(
> 
> parameters on same line.

OK.

> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:167
> > +#if 0
> 
> :(

Gone in the JSON version.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:215
> > +    int score = 0;
> 
> can score ever be negative?

Yes, if the URLs are sufficiently different (I add to the score on positive correlations and subtract from it on mismatches). I think I have a note somewhere considering completely ignoring results with negative scores, regardless of whether or not they are considered the best match.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:402
> > +    WTF::MD5 md5;
> 
> This is a strange hash function.  Why MD5?  We already have StringHash and
> StringHasher.  Were they lacking something?

I'd never known MD5 to be odd or of it being insufficient for the task I'm using it for. Mostly I'm using it because the approach I'm using for naming files is based on code I was originally using from Apache mod_cache. I was using that code because of an early desire to be compatible with our current PLT data corpus, which is stored using that facility. I later moved away from that decision, but the MD5 code stayed because there was no reason to change it. Additionally, I was not familiar with StringHash and/or StringHasher. I could look into them but am not sure they'll be what I need. Going forward, I may be creating a hash out of more than a single string. I'll need to throw several strings into the hash, and MD5 handles that type of usage.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:417
> > +    String result;
> 
> StringBuilder.

OK. Looking into that, I see that String::append() is considered inefficient and slated for possible removal. I didn't know that.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.h:85
> > +    int fuzzyMatchURLs(
> 
> Is the need for fuzzy matching driven by any real-world examples?  I could
> see it leading to nondeterministic behavior, which could cause strange
> things to happen when measuring performance data.  Let's talk in person
> about this.

Yes, this is driven be real-world cases. In addition to cache-busters as you noted in a follow-up, URLs can contain session IDs and/or analytics such as time screen size. One of the main motivations for implementing this new record/replay facility was to address these dynamic URLs in a way that couldn't properly be done with the current Apache + mod_rewrite + hand-tweaked corpus approach.

> > Source/WebKit2/NetworkProcess/capture/NetworkDataTaskReplay.h:92
> > +    enum { kPositionNotSpecified = -1 };
> 
> This is never used.

Yeah, I don't know what was up with that. It's been removed in the JSON version.
Comment 17 Alex Christensen 2016-11-29 15:04:31 PST
(In reply to comment #16)
> Currently, the NetworkCapture facility is optionally included. I'm not sure
> what maintaining that would do to the above. Wouldn't it end up looking like
> the following?
> 
> #if ENABLE(NETWORK_CAPTURE)
> switch (NetworkCapture::Manager::singleton().state()) {
> case Disabled:
>     // This is the common case.
>     doRegularStuff();
>     return;
> case Recording:
>     doRecordingStuff();
>     return;
> case Replaying:
>     doReplayingStuff();
>     return;
> }
> #else
>     doRegularStuff();
> #endif
It would.  I think this is straightforward and efficient.

> > > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:84
> > > +        m_queryParameters = WebCore::URLParser::parseURLEncodedForm(url().query());
> > 
> > I intended this to only be used with URLSearchParams which takes any string
> > as input.  This takes a URL's query, which has already been parsed.  I'm not
> > sure what I think about the use here.
> 
> What's the alternative for getting the query parameters from the URL in a
> form where I can examine each key/value pair?
The URL query has been parsed as a query, but not as key/value pairs until we make a URLEncodedForm.  This is exactly what that function is for.  Disregard my comment.

> > > Source/WebKit2/NetworkProcess/capture/NetworkCaptureCachedResource.cpp:130
> > > +        return WTF::Optional<CaptureEvent>();
> > 
> > Isn't this the same as Nullopt?
> 
> I'm not familiar with Nullopt and don't find it in WebKit sources anywhere.
> I see std::nullopt on cppreference.com. Is that what you're referring to?
> I've not used this before. I now see it used liberally through WebKit. I
> believe you're suggesting I use "return std::nullopt"? I could do that, but
> Antti suggested using "return { }". Is one preferred over the other?
I find 656 instances of "Nullopt" in WebKit, unless someone did something big in the last week since I last updated.  It means the same as std::nullopt, but with WTF::Optional instead of std::optional.  I prefer Nullopt over { }.  Others may disagree.

> > If you have a JSON-based implementation, should we even land this binary
> > implementation?
> 
> I could upload the JSON version if you want. I was working on the JSON
> version as a follow-up to the binary version in order to keep a pipeline of
> changes going. But with the Thanksgiving holiday holding up reviews for a
> while, the JSON implementation came in before this current patch could be
> fully reviewed.
Let's see the JSON version.  Do we base-64 encode binary data or something?

> > Why not just make the Manager a member variable
> > of the NetworkProcess singleton?
> 
> I guess that could be done. What are the advantages or reasons for doing so?
That would make more organizational sense.  Unfortunately we still do some networking in the WebProcess, so I don't think we can actually do this right now.

> > This is a strange hash function.  Why MD5?  We already have StringHash and
> > StringHasher.  Were they lacking something?
> 
> I'd never known MD5 to be odd or of it being insufficient for the task I'm
> using it for. Mostly I'm using it because the approach I'm using for naming
> files is based on code I was originally using from Apache mod_cache. I was
> using that code because of an early desire to be compatible with our current
> PLT data corpus, which is stored using that facility. I later moved away
> from that decision, but the MD5 code stayed because there was no reason to
> change it. Additionally, I was not familiar with StringHash and/or
> StringHasher. I could look into them but am not sure they'll be what I need.
> Going forward, I may be creating a hash out of more than a single string.
> I'll need to throw several strings into the hash, and MD5 handles that type
> of usage.
I was just thinking of other hash functions I've reviewed, like IDBDatabaseIdentifier::hash which combines hashes of other things.  I was just curious if you explicitly chose MD5 because of some cool properties instead of SHA1 or SHA256 or one of our custom hashes.  I guess it's not a big deal here.
Comment 18 Keith Rollin 2016-11-29 15:39:59 PST
(In reply to comment #17)
> I find 656 instances of "Nullopt" in WebKit, unless someone did something
> big in the last week since I last updated.  It means the same as
> std::nullopt, but with WTF::Optional instead of std::optional.  I prefer
> Nullopt over { }.  Others may disagree.

Yes, the contents of Optional.h was replaced with something more akin to std::optional a few days ago. The latter is now used instead of WTF::Optional. I guess WTF::Nullopt was changed to std::nullopt at the same time.

> Let's see the JSON version.

OK, I'll upload a new version once I've finished incorporating the previous review comments.

> Do we base-64 encode binary data or something?

It is currently base-64 encoded. I've discussed plans with Gavin regarding possibly splitting the raw data out into its own file if the base-64 encoding/decoding is too expensive.

> I was just thinking of other hash functions I've reviewed, like
> IDBDatabaseIdentifier::hash which combines hashes of other things.  I was
> just curious if you explicitly chose MD5 because of some cool properties
> instead of SHA1 or SHA256 or one of our custom hashes.  I guess it's not a
> big deal here.

No deep purpose here. I just started with the mod_cache code (which used MD5) and evolved it. I could use SHA1 or SHA256, but the main need for hashing here is to create a unique name of reasonable size with legal characters based on the URL (and, in the future, some other identifying data). I don't need to avoid collision attacks or any of the other reasons why SHA is favored over MD5. And MD5 is faster.
Comment 19 Keith Rollin 2016-11-30 16:16:58 PST
Created attachment 295782 [details]
Patch
Comment 20 WebKit Commit Bot 2016-11-30 16:19:12 PST
Attachment 295782 [details] did not pass style-queue:


ERROR: Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:38:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 27 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 21 Keith Rollin 2016-11-30 16:19:51 PST
This new patch includes the initial JSON implementation that uses the header-only JSON library. Gavin wanted me to get a version up for review sooner than later, suggesting that we can revisit the storage mechanism (which can include a binary format, one that is XML-based, one that uses JSC's JSON facilities, etc.) in a subsequent update.
Comment 22 Keith Rollin 2016-11-30 17:29:21 PST
The mac-debug build failed because of:

/Volumes/Data/EWS/WebKit/WebKitBuild/Debug/DerivedSources/WebCore/JSInternalSettingsGenerated.cpp:986:10: error: no member named 'setAsyncImageDecodingEnabled' in 'WebCore::InternalSettingsGenerated'
    impl.setAsyncImageDecodingEnabled(WTFMove(asyncImageDecodingEnabled));
    ~~~~ ^
1 error generated.

It's not apparent to me how this is related to my patch. I see that setAsyncImageDecodingEnabled was removed in 209159 just a couple of hours ago, so perhaps there's a build lag.

The gtk-wk2 build failed with:

g++-4.9: error: ../../Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp: No such file or directory
g++-4.9: fatal error: no input files
compilation terminated.

And others. I'm not sure why this is happening. I've updated CMakeLists.txt with my new files. And a previous patch successfully built with the same change to CMakeLists.txt.
Comment 23 Keith Rollin 2016-11-30 17:40:21 PST
Created attachment 295800 [details]
Patch
Comment 24 WebKit Commit Bot 2016-11-30 17:42:58 PST
Attachment 295800 [details] did not pass style-queue:


ERROR: Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:38:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 27 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 25 Sam Weinig 2016-12-02 12:32:56 PST
Comment on attachment 295800 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=295800&action=review

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureResource.h:55
> +public:

This public is unnecessary.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureTypes.h:36
> +class FileHandle final {
> +public:

This seems like an odd file to define a primitive like this.  It should probably go in its own file, and probably go to WebCore.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureTypes.h:95
> +template <typename ... Types>
> +struct TypeHolder {

This seems to duplicate a bit of what we have in Brigand.h. I think you can brigand::list<...> and brigand::for_each<list>(...).
Comment 26 Keith Rollin 2016-12-02 12:43:29 PST
(In reply to comment #25)
> Comment on attachment 295800 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=295800&action=review
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureResource.h:55
> > +public:
> 
> This public is unnecessary.

I tend to use additional public/private statements to serve as section dividers -- to break apart different areas of the interface. But I'll take it out since it sounds like that approach is not part of WebKit's MO.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureTypes.h:36
> > +class FileHandle final {
> > +public:
> 
> This seems like an odd file to define a primitive like this.  It should
> probably go in its own file, and probably go to WebCore.

I totally agree. Antti suggested that it could be expanded a little bit, too. I figured I would do that work and move it into FileSystem.h in a follow-up patch.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureTypes.h:95
> > +template <typename ... Types>
> > +struct TypeHolder {
> 
> This seems to duplicate a bit of what we have in Brigand.h. I think you can
> brigand::list<...> and brigand::for_each<list>(...).

Cool. I wasn't familiar with Brigand.h. I'll look into it.
Comment 27 Alex Christensen 2016-12-05 23:13:18 PST
Comment on attachment 295800 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=295800&action=review

Let's land this after addressing these comments

> Source/WebKit2/ChangeLog:150
> +        * NetworkProcess/capture/json.hpp: Added.

After discussing with Gavin, I'm ok with us having this in WebKit for short-term experimentation for this project, but it should have comments added saying not to use it for anything else, especially not anything a user would ever run.  We should also put a FIXME in saying to make JSONParse use callbacks for constructing objects like our YarrParser does, and replace this use and other uses including the JSONParse call in ContentExtensionParser.cpp

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:50
> +static Headers copyHeaders(const WebCore::HTTPHeaderMap& httpHeaders)

HTTPHeaderMap::isolatedCopy() does the same thing as this.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:173
> +using json = nlohmann::basic_json<>;

grumble, grumble, grumble
This would be a good place for a comment saying we should use JSONParse after its generalization is complete.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:358
> +        (void) base64Decode(str.c_str(), str.size(), data);

Let's assert the return value of this is true instead of the (void)
This should probably return an optional value, but let's not do that in this patch.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:65
> +struct KeyValuePair {

Can we just use std::pair<String, String>?  This class seems excessive.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:75
> +struct ResourceRequest {

I don't particularly like using these names because they already exist in the WebCore namespace.  I know, that's what namespaces are for, but I often search for class names, and IDEs get confused.  What about just Request, Response, and Error?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:207
> +struct FinishedEvent final : public TimedEvent {

Seems like a lot of structs in one file.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:92
> +    // DEBUG_LOG("URL = %{public}s", DEBUG_STR(request.url().string()));

uncomment or remove

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:120
> +        // DEBUG_LOG("Found exact match: %{public}s", DEBUG_STR(lower->url().string()));

ditto

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:407
> +    for (int dirNum = 0; dirNum < kNumberOfParentDirectories; ++dirNum) {

This loop seems unnecessary since kNumberOfParentDirectories is always exactly 1.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:496
> +bool Manager::readFile(const String& filePath, Vector<Vector<String>>& lines)

This should return a std::optional<Vector<Vector<String>>>

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:499
> +    WebCore::MappedFileData file(filePath, success);

I'm not sure it's necessary to find a consecutive section of the virtual address space large enough for the file, especially since all we need is a bunch of little pieces of the file.  I think this would be better implemented with openFile/readFromFile/closeFile

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureRecorder.cpp:74
> +    // TODO: Is there a better response to receiving a multi-part resource?

multipart responses probably won't be common in any content we're going to be recording.  It's mostly used for webcams these days.  If we find it's necessary or useful, we can make new abstractions.

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureRecorder.cpp:140
> +        if (handle.writeToFile(asString.c_str(), asString.size() + 1) == -1) {

Do we want to write the null-terminating character to file?

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureTypes.h:35
> +class FileHandle final {

I don't think this adds much that PlatformFileHandle doesn't already do, but maybe that's just the old C programmer in me.  It doesn't do anything cool like handle errors, keep state, buffer, have stream operators, etc.
Comment 28 Keith Rollin 2016-12-06 12:56:26 PST
(In reply to comment #27)
> Comment on attachment 295800 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=295800&action=review
> 
> Let's land this after addressing these comments
> 
> > Source/WebKit2/ChangeLog:150
> > +        * NetworkProcess/capture/json.hpp: Added.
> 
> After discussing with Gavin, I'm ok with us having this in WebKit for
> short-term experimentation for this project, but it should have comments
> added saying not to use it for anything else, especially not anything a user
> would ever run.  We should also put a FIXME in saying to make JSONParse use
> callbacks for constructing objects like our YarrParser does, and replace
> this use and other uses including the JSONParse call in
> ContentExtensionParser.cpp

OK.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:50
> > +static Headers copyHeaders(const WebCore::HTTPHeaderMap& httpHeaders)
> 
> HTTPHeaderMap::isolatedCopy() does the same thing as this.

This function copies from HTTPHeaderMap to Headers, which is a different structure. Structures like Headers, Request, Response, and Error were defined to help bridge the transition between WebKit types and a json library that works best with STL-like structures.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:173
> > +using json = nlohmann::basic_json<>;
> 
> grumble, grumble, grumble
> This would be a good place for a comment saying we should use JSONParse
> after its generalization is complete.

OK.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:358
> > +        (void) base64Decode(str.c_str(), str.size(), data);
> 
> Let's assert the return value of this is true instead of the (void)
> This should probably return an optional value, but let's not do that in this
> patch.

OK.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:65
> > +struct KeyValuePair {
> 
> Can we just use std::pair<String, String>?  This class seems excessive.

OK.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:75
> > +struct ResourceRequest {
> 
> I don't particularly like using these names because they already exist in
> the WebCore namespace.  I know, that's what namespaces are for, but I often
> search for class names, and IDEs get confused.  What about just Request,
> Response, and Error?

OK.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:207
> > +struct FinishedEvent final : public TimedEvent {
> 
> Seems like a lot of structs in one file.

Are you suggesting splitting them out into ten different files? I'm not sure what we'd gain from that. These structs tend to be used in tandem, so clients of them would end up #including ten files instead of one.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:92
> > +    // DEBUG_LOG("URL = %{public}s", DEBUG_STR(request.url().string()));
> 
> uncomment or remove
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:120
> > +        // DEBUG_LOG("Found exact match: %{public}s", DEBUG_STR(lower->url().string()));
> 
> ditto

These line exist because they were useful during bring-up, but are now commented out because they have served their purpose. I kept them around because I can imagine their being useful again when we revisit the fuzzy matching and try to make it better. If we remove them, someone will just have to add them back when needed.

How about my changing them to DEBUG_LOG_VERBOSE, which will be a no-op in the normal case but could be re-enabled if/when needed?

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:407
> > +    for (int dirNum = 0; dirNum < kNumberOfParentDirectories; ++dirNum) {
> 
> This loop seems unnecessary since kNumberOfParentDirectories is always
> exactly 1.

OK. That loop is an artifact of (a) the fact that I originally adopted the code from Apache mod_cache as part of being compatible with their disk format, and (b) that I didn't know if we'd be staying with the xx/xxxxx... format specifically. But we're no longer trying to be compatible with Apache, and the xx/xxxxx... format looks like it will hold up, so I'll simplify the code.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:496
> > +bool Manager::readFile(const String& filePath, Vector<Vector<String>>& lines)
> 
> This should return a std::optional<Vector<Vector<String>>>

OK.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:499
> > +    WebCore::MappedFileData file(filePath, success);
> 
> I'm not sure it's necessary to find a consecutive section of the virtual
> address space large enough for the file, especially since all we need is a
> bunch of little pieces of the file.  I think this would be better
> implemented with openFile/readFromFile/closeFile

We'll be using this record/replay facility not only for the PLT, but for the PLUM as well. The idea (suggested by Stephanie and Gavin) was to use an approach that wouldn't affect our memory-reporting facilities. Doing an open/read/close would create some dirty memory that would then be reported. If we map the file, not only will it not get reported, but the feeling is that mapping the file rather than reading it would be faster.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureRecorder.cpp:140
> > +        if (handle.writeToFile(asString.c_str(), asString.size() + 1) == -1) {
> 
> Do we want to write the null-terminating character to file?

Yes. The json library I'm using reads until a terminating NUL when reading from a string. If I didn't write the NUL to the file, I'd have to copy the substring that I was interested in from the file (identifying its end in some way) and add a NUL to it. Rather than do all that, I just write the NUL to the file. I can add a comment to that effect if you think it would help.

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureTypes.h:35
> > +class FileHandle final {
> 
> I don't think this adds much that PlatformFileHandle doesn't already do, but
> maybe that's just the old C programmer in me.  It doesn't do anything cool
> like handle errors, keep state, buffer, have stream operators, etc.

The original intent of the FileHandle was to merely manage the closing of the PlatformFileHandle in the destructor. I then added openFile and readFromFile because "file.readFromFile(...)" seemed better than "WebCore::readFromFile(file, ...)". I added writeToFile for symmetry. I added closeFile when I realized that I needed a way to manually close the file because my destructor wasn't being called. I will probably add an ensureOpenFile at Antti's suggestion, and flesh it out more and move it to FileSystem.h at Sam's suggestion.
Comment 29 Alex Christensen 2016-12-06 14:58:35 PST
(In reply to comment #28)
> > > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:207
> > > +struct FinishedEvent final : public TimedEvent {
> > 
> > Seems like a lot of structs in one file.
> 
> Are you suggesting splitting them out into ten different files?
Not necessarily.  We usually have one header per class, but I think this is fine.
> 
> > > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:92
> > > +    // DEBUG_LOG("URL = %{public}s", DEBUG_STR(request.url().string()));
> > 
> > uncomment or remove
> > 
> > > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:120
> > > +        // DEBUG_LOG("Found exact match: %{public}s", DEBUG_STR(lower->url().string()));
> > 
> > ditto
> 
> These line exist because they were useful during bring-up, but are now
> commented out because they have served their purpose. I kept them around
> because I can imagine their being useful again when we revisit the fuzzy
> matching and try to make it better. If we remove them, someone will just
> have to add them back when needed.
> 
> How about my changing them to DEBUG_LOG_VERBOSE, which will be a no-op in
> the normal case but could be re-enabled if/when needed?
Sure.  Commented out debug logs are clutter.
> > > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:499
> > > +    WebCore::MappedFileData file(filePath, success);
> > 
> > I'm not sure it's necessary to find a consecutive section of the virtual
> > address space large enough for the file, especially since all we need is a
> > bunch of little pieces of the file.  I think this would be better
> > implemented with openFile/readFromFile/closeFile
> 
> We'll be using this record/replay facility not only for the PLT, but for the
> PLUM as well. The idea (suggested by Stephanie and Gavin) was to use an
> approach that wouldn't affect our memory-reporting facilities. Doing an
> open/read/close would create some dirty memory that would then be reported.
> If we map the file, not only will it not get reported, but the feeling is
> that mapping the file rather than reading it would be faster.
We could read it to buffers that are fixed size and only on the stack.  That wouldn't do anything to the heap memory.
Comment 30 Keith Rollin 2016-12-06 15:06:49 PST
(In reply to comment #29)
> We could read it to buffers that are fixed size and only on the stack.  That
> wouldn't do anything to the heap memory.

If I understand your proposal correctly, the amount of work to take that approach would be pretty large. We're reading JSON strings from this file. These strings could be large since they include the resource data. This in turn means that the JSON parser would have to support paging in subsets of the string as needed, since not all of the string could fit in this fixed-size buffer.
Comment 31 Alex Christensen 2016-12-06 15:33:06 PST
(In reply to comment #30)
> (In reply to comment #29)
> > We could read it to buffers that are fixed size and only on the stack.  That
> > wouldn't do anything to the heap memory.
> 
> If I understand your proposal correctly, the amount of work to take that
> approach would be pretty large. We're reading JSON strings from this file.
> These strings could be large since they include the resource data. This in
> turn means that the JSON parser would have to support paging in subsets of
> the string as needed, since not all of the string could fit in this
> fixed-size buffer.

Whatever.  If you're ok with the current mmap approach, I'm ok with it.
Comment 32 Carlos Garcia Campos 2016-12-06 23:49:33 PST
(In reply to comment #27)
> Comment on attachment 295800 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=295800&action=review
> 
> Let's land this after addressing these comments
> 
> > Source/WebKit2/ChangeLog:150
> > +        * NetworkProcess/capture/json.hpp: Added.
> 
> After discussing with Gavin, I'm ok with us having this in WebKit for
> short-term experimentation for this project, but it should have comments
> added saying not to use it for anything else, especially not anything a user
> would ever run.  We should also put a FIXME in saying to make JSONParse use
> callbacks for constructing objects like our YarrParser does, and replace
> this use and other uses including the JSONParse call in
> ContentExtensionParser.cpp
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:50
> > +static Headers copyHeaders(const WebCore::HTTPHeaderMap& httpHeaders)
> 
> HTTPHeaderMap::isolatedCopy() does the same thing as this.
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:173
> > +using json = nlohmann::basic_json<>;
> 
> grumble, grumble, grumble
> This would be a good place for a comment saying we should use JSONParse
> after its generalization is complete.

I wonder, would it be possible to move JSONParse to WTF after the generalization? or will not be so general?

> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.cpp:358
> > +        (void) base64Decode(str.c_str(), str.size(), data);
> 
> Let's assert the return value of this is true instead of the (void)
> This should probably return an optional value, but let's not do that in this
> patch.
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:65
> > +struct KeyValuePair {
> 
> Can we just use std::pair<String, String>?  This class seems excessive.
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:75
> > +struct ResourceRequest {
> 
> I don't particularly like using these names because they already exist in
> the WebCore namespace.  I know, that's what namespaces are for, but I often
> search for class names, and IDEs get confused.  What about just Request,
> Response, and Error?
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureEvent.h:207
> > +struct FinishedEvent final : public TimedEvent {
> 
> Seems like a lot of structs in one file.
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:92
> > +    // DEBUG_LOG("URL = %{public}s", DEBUG_STR(request.url().string()));
> 
> uncomment or remove
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:120
> > +        // DEBUG_LOG("Found exact match: %{public}s", DEBUG_STR(lower->url().string()));
> 
> ditto
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:407
> > +    for (int dirNum = 0; dirNum < kNumberOfParentDirectories; ++dirNum) {
> 
> This loop seems unnecessary since kNumberOfParentDirectories is always
> exactly 1.
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:496
> > +bool Manager::readFile(const String& filePath, Vector<Vector<String>>& lines)
> 
> This should return a std::optional<Vector<Vector<String>>>
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:499
> > +    WebCore::MappedFileData file(filePath, success);
> 
> I'm not sure it's necessary to find a consecutive section of the virtual
> address space large enough for the file, especially since all we need is a
> bunch of little pieces of the file.  I think this would be better
> implemented with openFile/readFromFile/closeFile
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureRecorder.cpp:74
> > +    // TODO: Is there a better response to receiving a multi-part resource?
> 
> multipart responses probably won't be common in any content we're going to
> be recording.  It's mostly used for webcams these days.  If we find it's
> necessary or useful, we can make new abstractions.
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureRecorder.cpp:140
> > +        if (handle.writeToFile(asString.c_str(), asString.size() + 1) == -1) {
> 
> Do we want to write the null-terminating character to file?
> 
> > Source/WebKit2/NetworkProcess/capture/NetworkCaptureTypes.h:35
> > +class FileHandle final {
> 
> I don't think this adds much that PlatformFileHandle doesn't already do, but
> maybe that's just the old C programmer in me.  It doesn't do anything cool
> like handle errors, keep state, buffer, have stream operators, etc.
Comment 33 Keith Rollin 2016-12-07 13:23:38 PST
Created attachment 296410 [details]
Patch
Comment 34 WebKit Commit Bot 2016-12-07 13:25:18 PST
Attachment 296410 [details] did not pass style-queue:


ERROR: Source/WebKit2/NetworkProcess/capture/NetworkCaptureManager.cpp:38:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 27 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 35 Keith Rollin 2016-12-07 13:25:38 PST
(In reply to comment #32)
> > This would be a good place for a comment saying we should use JSONParse
> > after its generalization is complete.
> 
> I wonder, would it be possible to move JSONParse to WTF after the
> generalization? or will not be so general?

I'm hoping to the be the one to do this work. I'll keep your suggestion in mind. I have no idea what I'll find once I get in there.
Comment 36 Alex Christensen 2016-12-07 13:50:05 PST
JSONParse should be general enough to not depend on JavaScriptCore kind of like YarrParser, but I don't see a reason to move it.  I don't see a reason not to move it, either.
Comment 37 Alex Christensen 2016-12-07 13:56:54 PST
Comment on attachment 296410 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=296410&action=review

> Source/WebKit2/NetworkProcess/capture/NetworkCaptureTypes.h:35
> +class FileHandle final {

Let's move this to WebCore in a file called FileHandle.h.
Comment 38 WebKit Commit Bot 2016-12-07 16:44:49 PST
Comment on attachment 296410 [details]
Patch

Clearing flags on attachment: 296410

Committed r209498: <http://trac.webkit.org/changeset/209498>
Comment 39 WebKit Commit Bot 2016-12-07 16:44:57 PST
All reviewed patches have been landed.  Closing bug.
Comment 40 Alex Christensen 2016-12-08 00:12:02 PST
This broke the iOS debug build. https://build.webkit.org/builders/Apple%20iOS%2010%20Simulator%20Debug%20%28Build%29/builds/2458/steps/compile-webkit/logs/stdio

Speculative fix in https://trac.webkit.org/changeset/209530
Please investigate before re-enabling logs.
Comment 41 Carlos Garcia Campos 2016-12-08 23:49:39 PST
(In reply to comment #36)
> JSONParse should be general enough to not depend on JavaScriptCore kind of
> like YarrParser, but I don't see a reason to move it.  I don't see a reason
> not to move it, either.

I'm currently working on WebDriver implementation for the GTK+ port. The Web Driver process needs to parse JSON, so I made it link to JSC only to use JSONParse (well, actually InspectorValues). If we had a general purpose JSON parser in WTF, the Web Driver process would only depend on WTF. Not a big deal in any case.