Bug 213306 (event-collector)

Summary: WIP: we should be able to capture mouse, keyboard, and wheel events to investigate tracking
Product: WebKit Reporter: Eli Lucherini <elucherini>
Component: WebKit Misc.Assignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: bfulgham, cdumez, cmarcelo, elucherini, ews-watchlist, fred.wang, ggaren, graouts, jamesr, japhet, katherine_cheney, luiz, simon.fraser, thorton, tonikitoo, wenson_hsieh, wilander
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch none

Description Eli Lucherini 2020-06-17 10:46:35 PDT
Goal: capturing mouse, keyboard, and wheel events on webpages to investigate tracking capabilities of bad actors.
Comment 1 Eli Lucherini 2020-06-17 11:35:10 PDT
Created attachment 402138 [details]
Patch
Comment 2 John Wilander 2020-06-17 13:16:06 PDT
Comment on attachment 402138 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402138&action=review

Nice to see some code! We should consider gating this behind a runtime flag and to put the code behind the RESOURCE_LOAD_STATISTICS compile-time flag.

The build error on non-Cocoa platforms seems to be that they don't recognize the header file. I wonder if it needs to go to some CMake text file for those platforms?

> Source/WebCore/ChangeLog:3
> +        Add support to capture mouse, keyboard, and wheel events.

I think this title is a bit too generic. I would say "Capture mouse, keyboard, and wheel events for statistical purposes". I don't think we use periods in change log titles.

> Source/WebCore/ChangeLog:15
> +        * page/EventCollector.h: Added.

Make sure to move the new files to their correct alphabetical position in the Project Navigator.

> Source/WebKit/ChangeLog:11
> +        (WebKit::m_schemeRegistry): Deleted.

This looks worrying. However, looking at the code, I don't think it was deleted. Please make sure and then remove this line in the change log.

> Source/WebCore/page/EventCollector.cpp:2
> + * Copyright (C) 2018 Apple Inc. All rights reserved.

There is a template for creating new WebKit files which should get the year right. New file –> scroll down.

> Source/WebCore/page/EventCollector.cpp:45
> +    WTFLogAlways("Keyboard event with key %s, key identifier %s, code %s, text '%s'", platformKeyboardEvent.key().ascii().data(), platformKeyboardEvent.keyIdentifier().ascii().data(), platformKeyboardEvent.code().ascii().data(), platformKeyboardEvent.text().ascii().data());

I use .utf8(), not .ascii() for log out put of strings. That's safer than ASCII.

> Source/WebCore/page/EventCollector.cpp:66
> +    // TODO: Store in database

I know this is a comment that'll be removed soon but we always use periods in comments.

> Source/WebCore/page/EventCollector.h:2
> + * Copyright (C) 2015 Apple Inc. All rights reserved.

Same thing here. Use the template instead.

> Source/WebCore/page/EventCollector.h:51
> +class EventCollector

I think this name is too generic. Maybe EventStatisticsCollector?

> Source/WebCore/page/EventCollector.h:53
> +    // keeping this in WebCore for the moment; could also go to WebKit?

If you intend to land this comment: It needs to start with upper case and WebCore doesn't know about "WebKit" so we should probably not mention it in a comment.

> Source/WebCore/page/EventCollector.h:55
> +    EventCollector() { }

Was something in Xcode or the build telling you to add an empty constructor?

> Source/WebCore/page/EventCollector.h:63
> +    // something like this but with hashable key

If WallTime is not hashable you can try getting its raw seconds.

> Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:114
> +    , m_eventCollector(EventCollector::create())

I'm not sure this does what you want it to do. You're creating two different EventCollector objects – one in the network process and one in every web content process. So when you call it here in the network process to say that something happened, it's not going to be the same object the ones in the web content processes. My impression is that such an object is only needed in the web content processes. Is the instance here in the network process only for the purposes of telling the web content process for a specific page ID that the page loaded? If so, I'm sure there is such an event already available in the web content process since the page code itself gets the onload event.

> Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:535
> +    // TODO: send info to WebProcess

Period here too.

> Source/WebKit/WebProcess/WebPage/WebPage.cpp:2713
> +    WebProcess::singleton().eventCollector().collectMouseEvent(page->identifier(), platformMouseEvent);

Unless we need to do something right away with the event, we try to collect it as late as possible so that performance critical code can run first. See if it makes sense to collect this later in the function.

> Source/WebKit/WebProcess/WebPage/WebPage.cpp:2821
> +    WebProcess::singleton().eventCollector().collectKeyEvent(platform(keyboardEvent));

Same thing here on where to collect the data.

> Source/WebKit/WebProcess/WebProcess.h:645
> +    Ref<WebCore::EventCollector> m_eventCollector;

I think this name is too generic. Maybe m_eventStatisticsCollector?
Comment 3 John Wilander 2020-06-17 13:18:54 PDT
Taking a step back, I would try to add this code to the existing WebCore::ResourceLoadObserver and its subclass WebKit::WebResourceLoadObserver since it is already set up as a singleton in the web content process and has the capability to forward data to the network process and ITP.
Comment 4 Kate Cheney 2020-06-17 13:34:38 PDT
Comment on attachment 402138 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402138&action=review

Yay first patch!! Excited to see work being done in this space.

> Source/WebCore/page/EventCollector.cpp:40
> +    WTFLogAlways("Mouse event on page %llu at x=%d, y=%d, type %d, button %d, clickcount %d, force %f", pageID.toUInt64(), platformMouseEvent.position().x(), platformMouseEvent.position().y(), platformMouseEvent.type(), platformMouseEvent.button(), platformMouseEvent.clickCount(), platformMouseEvent.force());

Wondering if this should instead be RELEASE_LOG_IF_ALLOWED? Or maybe even only logged in debug builds. See CachedResource.cpp for examples.

> Source/WebCore/page/EventCollector.cpp:50
> +    WTFLogAlways("Wheel event at x=%d, y=%d, ticksX %f, ticksY %f, granularity %d, velocity width %f height %f", platformWheelEvent.position().x(), platformWheelEvent.position().y(), platformWheelEvent.wheelTicksX(), platformWheelEvent.wheelTicksY(), platformWheelEvent.granularity(), platformWheelEvent.scrollingVelocity().width(), platformWheelEvent.scrollingVelocity().height());

Ditto.

> Source/WebCore/page/EventCollector.cpp:55
> +    WTFLogAlways("Page %llu load complete", webPageID.toUInt64());

Ditto.

> Source/WebCore/page/EventCollector.h:33
> +

I think no space here.

>> Source/WebCore/page/EventCollector.h:63
>> +    // something like this but with hashable key
> 
> If WallTime is not hashable you can try getting its raw seconds.

You could use a double here? secondsSinceEpoch().value() would convert a WallTime value into a double.

>> Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:114
>> +    , m_eventCollector(EventCollector::create())
> 
> I'm not sure this does what you want it to do. You're creating two different EventCollector objects – one in the network process and one in every web content process. So when you call it here in the network process to say that something happened, it's not going to be the same object the ones in the web content processes. My impression is that such an object is only needed in the web content processes. Is the instance here in the network process only for the purposes of telling the web content process for a specific page ID that the page loaded? If so, I'm sure there is such an event already available in the web content process since the page code itself gets the onload event.

I agree EventCollector should probably just be used in the web content process. You could maybe create an EventCollector for each WebPage object, and send collected data to the NetworkProcess in batches when the WebPage closes or reloads.
Comment 5 Kate Cheney 2020-06-17 13:47:51 PDT
(In reply to John Wilander from comment #3)
> Taking a step back, I would try to add this code to the existing
> WebCore::ResourceLoadObserver and its subclass
> WebKit::WebResourceLoadObserver since it is already set up as a singleton in
> the web content process and has the capability to forward data to the
> network process and ITP.

This is probably a better idea than mine, all of the IPC infrastructure is already in place to send data to the network process.
Comment 6 Eli Lucherini 2020-06-19 12:02:31 PDT
Created attachment 402302 [details]
Patch
Comment 7 Eli Lucherini 2020-06-19 13:56:39 PDT
(In reply to John Wilander from comment #3)
> Taking a step back, I would try to add this code to the existing
> WebCore::ResourceLoadObserver and its subclass
> WebKit::WebResourceLoadObserver since it is already set up as a singleton in
> the web content process and has the capability to forward data to the
> network process and ITP.

I changed the design and integrated EventCollector in ResourceLoadObserver/WebResourceLoadObserver. The patch I uploaded simply logs the events. I removed the code in WebKit::NetworkConnectionToWebProcess that detects that a page has loaded.
Comment 8 Kate Cheney 2020-06-19 14:33:02 PDT
Comment on attachment 402302 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402302&action=review

> Source/WebKit/WebProcess/WebCoreSupport/WebResourceLoadObserver.cpp:417
> +    WTFLogAlways("Keyboard event with key %s, key identifier %s, code %s, text '%s'", platformKeyboardEvent.key().utf8().data(), platformKeyboardEvent.keyIdentifier().utf8().data(), platformKeyboardEvent.code().utf8().data(), platformKeyboardEvent.text().utf8().data());

Could logging the text value be a privacy or security issue here? For passwords or sensitive data?
Comment 9 Eli Lucherini 2020-06-19 14:48:34 PDT
(In reply to katherine_cheney from comment #8)
> Comment on attachment 402302 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=402302&action=review
> 
> > Source/WebKit/WebProcess/WebCoreSupport/WebResourceLoadObserver.cpp:417
> > +    WTFLogAlways("Keyboard event with key %s, key identifier %s, code %s, text '%s'", platformKeyboardEvent.key().utf8().data(), platformKeyboardEvent.keyIdentifier().utf8().data(), platformKeyboardEvent.code().utf8().data(), platformKeyboardEvent.text().utf8().data());
> 
> Could logging the text value be a privacy or security issue here? For
> passwords or sensitive data?

Good point. Actually, all the information I am gathering about keyboard events can be potentially sensitive as it can be easily reconnected to the characters entered. So, if anything, I should just log that a key event has happened at a given time.
Comment 10 Chris Dumez 2020-06-19 14:53:11 PDT
Comment on attachment 402302 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=402302&action=review

> Source/WebKit/WebProcess/WebCoreSupport/WebResourceLoadObserver.cpp:422
> +    WTFLogAlways("Mouse event at x=%d, y=%d, type %d, button %d, clickcount %d, force %f", platformMouseEvent.position().x(), platformMouseEvent.position().y(), platformMouseEvent.type(), platformMouseEvent.button(), platformMouseEvent.clickCount(), platformMouseEvent.force());

Doesn't this generate a LOT of logging whenever I move the mouse on the view?
Comment 11 Eli Lucherini 2020-06-19 15:16:04 PDT
(In reply to Chris Dumez from comment #10)
> Comment on attachment 402302 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=402302&action=review
> 
> > Source/WebKit/WebProcess/WebCoreSupport/WebResourceLoadObserver.cpp:422
> > +    WTFLogAlways("Mouse event at x=%d, y=%d, type %d, button %d, clickcount %d, force %f", platformMouseEvent.position().x(), platformMouseEvent.position().y(), platformMouseEvent.type(), platformMouseEvent.button(), platformMouseEvent.clickCount(), platformMouseEvent.force());
> 
> Doesn't this generate a LOT of logging whenever I move the mouse on the view?

It does. My goal would be to study mouse movement patterns here. This being said, before landing this, I would put the code behind the WEB_API_STATISTICS flag (https://bugs.webkit.org/show_bug.cgi?id=213319) so it logs the events only if enabled at runtime.
Comment 12 Chris Dumez 2020-06-19 15:17:02 PDT
(In reply to Elena from comment #11)
> (In reply to Chris Dumez from comment #10)
> > Comment on attachment 402302 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=402302&action=review
> > 
> > > Source/WebKit/WebProcess/WebCoreSupport/WebResourceLoadObserver.cpp:422
> > > +    WTFLogAlways("Mouse event at x=%d, y=%d, type %d, button %d, clickcount %d, force %f", platformMouseEvent.position().x(), platformMouseEvent.position().y(), platformMouseEvent.type(), platformMouseEvent.button(), platformMouseEvent.clickCount(), platformMouseEvent.force());
> > 
> > Doesn't this generate a LOT of logging whenever I move the mouse on the view?
> 
> It does. My goal would be to study mouse movement patterns here. This being
> said, before landing this, I would put the code behind the
> WEB_API_STATISTICS flag (https://bugs.webkit.org/show_bug.cgi?id=213319) so
> it logs the events only if enabled at runtime.

Ok then, if the patch is not ready, please do not set review flag.
Comment 13 Tim Horton 2020-06-19 15:34:54 PDT
(In reply to Chris Dumez from comment #12)
> Ok then, if the patch is not ready, please do not set review flag.

(if you use webkit-patch like I suspect from the title, pass `--no-review`)
Comment 14 Eli Lucherini 2020-06-19 15:38:41 PDT
Created attachment 402337 [details]
Patch
Comment 15 Eli Lucherini 2020-06-19 15:39:46 PDT
(In reply to Tim Horton from comment #13)
> (In reply to Chris Dumez from comment #12)
> > Ok then, if the patch is not ready, please do not set review flag.
> 
> (if you use webkit-patch like I suspect from the title, pass `--no-review`)

Thanks!
Comment 16 Simon Fraser (smfr) 2020-06-19 16:48:30 PDT
Comment on attachment 402337 [details]
Patch

You can't just add WTFLogAlways() to the code. This needs to be being a switch, off by default. Maybe it should use RELEASE_LOG stuff.
Comment 17 Eli Lucherini 2020-06-23 16:20:44 PDT
Created attachment 402605 [details]
Patch
Comment 18 Eli Lucherini 2020-06-23 16:25:41 PDT
I uploaded a new patch with two changes:

1. I introduced the internal runtime flag LogUserEventsEnabled.
2. I changed all WTFLogAlways to RELEASE_LOG_IF_ALLOWED.