Bug 158678

Summary: The sampling profiler should further protect itself against certain forms of sampling bias that arise due to the sampling interval being in sync with some other system process
Product: WebKit Reporter: Saam Barati <saam>
Component: JavaScriptCoreAssignee: Saam Barati <saam>
Status: RESOLVED FIXED    
Severity: Normal CC: bburg, benjamin, commit-queue, fpizlo, ggaren, gskachkov, joepeck, keith_miller, mark.lam, msaboff, naserid13, oliver, sukolsak, ysuzuki
Priority: P2    
Version: WebKit Local Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
patch
saam: review-
patch
none
patch none

Description Saam Barati 2016-06-12 17:07:50 PDT
This problem was first brought to my attention after reading this paper.
See section 6.2:
http://plv.colorado.edu/papers/mytkowicz-pldi10.pdf

It seems somewhat rare that this will happen in practice, and it's
unlikely to happen many times in a row after restarting the profiler,
but it's good practice to defend against this form of bias.
Comment 1 Saam Barati 2016-06-13 09:31:15 PDT
Created attachment 281176 [details]
patch
Comment 2 WebKit Commit Bot 2016-06-13 09:32:55 PDT
Attachment 281176 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/SamplingProfiler.cpp:52:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Benjamin Poulain 2016-06-13 12:03:54 PDT
Comment on attachment 281176 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=281176&action=review

> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:237
> +        std::chrono::microseconds randomFluctuation = std::chrono::microseconds(static_cast<uint64_t>(randomNumber() * (static_cast<double>(m_timingInterval.count()) * 0.20l)));

Is this right?

I understand a distribution around "t" but randomNumber() is always positive, m_timingInterval.count() is always positive.
Shouldn't the distribution be spread on both sides of t?

This line would also benefit from mentioning the paper in a comment.
Comment 4 Saam Barati 2016-06-13 12:45:32 PDT
Comment on attachment 281176 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=281176&action=review

>> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:237
>> +        std::chrono::microseconds randomFluctuation = std::chrono::microseconds(static_cast<uint64_t>(randomNumber() * (static_cast<double>(m_timingInterval.count()) * 0.20l)));
> 
> Is this right?
> 
> I understand a distribution around "t" but randomNumber() is always positive, m_timingInterval.count() is always positive.
> Shouldn't the distribution be spread on both sides of t?
> 
> This line would also benefit from mentioning the paper in a comment.

Yeah, obviously. I wasn't thinking. This is definitely wrong.
I shall fix and update with a link. Thanks for catching this
Comment 5 Saam Barati 2016-06-13 18:40:09 PDT
Created attachment 281221 [details]
patch
Comment 6 WebKit Commit Bot 2016-06-13 18:42:03 PDT
Attachment 281221 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/runtime/SamplingProfiler.cpp:52:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 Benjamin Poulain 2016-06-13 18:46:01 PDT
Comment on attachment 281221 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=281221&action=review

> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:241
> +        double randomSignedNumber = (randomNumber() * 2.0l) - 1.0l; // A random number between [-1, 1).

What's the .01?
Why not "(randomNumber() * 2.) - 1"?

> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:242
> +        std::chrono::microseconds randomFluctuation = std::chrono::microseconds(static_cast<uint64_t>(randomSignedNumber * static_cast<double>(m_timingInterval.count()) * 0.20l));

What's that "* 0.20l" factor here?
Comment 8 Saam Barati 2016-06-13 18:48:48 PDT
Comment on attachment 281221 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=281221&action=review

>> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:241
>> +        double randomSignedNumber = (randomNumber() * 2.0l) - 1.0l; // A random number between [-1, 1).
> 
> What's the .01?
> Why not "(randomNumber() * 2.) - 1"?

it's an 'L' not a 1.
For a double literal. I'll just switch to 2.0 and 1.0

>> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:242
>> +        std::chrono::microseconds randomFluctuation = std::chrono::microseconds(static_cast<uint64_t>(randomSignedNumber * static_cast<double>(m_timingInterval.count()) * 0.20l));
> 
> What's that "* 0.20l" factor here?

We add +-(0.2t) where t is our sampling interval.
We're just choosing some range between [-t, t]
Comment 9 Saam Barati 2016-06-13 18:50:02 PDT
Created attachment 281223 [details]
patch
Comment 10 Saam Barati 2016-06-13 18:56:58 PDT
(In reply to comment #8)
> Comment on attachment 281221 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=281221&action=review
> 
> >> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:241
> >> +        double randomSignedNumber = (randomNumber() * 2.0l) - 1.0l; // A random number between [-1, 1).
> > 
> > What's the .01?
> > Why not "(randomNumber() * 2.) - 1"?
> 
> it's an 'L' not a 1.
> For a double literal. I'll just switch to 2.0 and 1.0
> 
> >> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:242
> >> +        std::chrono::microseconds randomFluctuation = std::chrono::microseconds(static_cast<uint64_t>(randomSignedNumber * static_cast<double>(m_timingInterval.count()) * 0.20l));
> > 
> > What's that "* 0.20l" factor here?
> 
> We add +-(0.2t) where t is our sampling interval.
> We're just choosing some range between [-t, t]

To elaborate on this, we don't do the full
[-t, t] because we want to defend against longer sample
times for short running programs. Even though short
running programs are hard to run under a sampling profiler,
we don't want to hurt such uses. We choose 0.2 because
a random 20% of our default (1ms) range is enough to add 
randomness to our sample times so we don't end up
in sync with some other system process.
Comment 11 WebKit Commit Bot 2016-06-13 19:27:50 PDT
Comment on attachment 281223 [details]
patch

Clearing flags on attachment: 281223

Committed r202021: <http://trac.webkit.org/changeset/202021>
Comment 12 WebKit Commit Bot 2016-06-13 19:27:55 PDT
All reviewed patches have been landed.  Closing bug.