Bug 228137 - Improve time precision when cross-origin isolated via COOP+COEP
Summary: Improve time precision when cross-origin isolated via COOP+COEP
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on: 229559
Blocks: 228755
  Show dependency treegraph
 
Reported: 2021-07-20 20:34 PDT by Keith Miller
Modified: 2021-09-01 14:35 PDT (History)
17 users (show)

See Also:


Attachments
Patch (10.16 KB, patch)
2021-08-27 13:39 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (10.16 KB, patch)
2021-08-27 13:40 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (4.19 KB, patch)
2021-09-01 08:35 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (4.13 KB, patch)
2021-09-01 11:55 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Keith Miller 2021-07-20 20:34:41 PDT
It doesn’t make any sense to keep time precision reduced if SAB is enabled for a page.
Comment 1 Sam Weinig 2021-07-21 16:53:39 PDT
(I'm terrible with acronyms, sorry) what is SAB?
Comment 2 Yusuke Suzuki 2021-07-21 17:34:31 PDT
(In reply to Sam Weinig from comment #1)
> (I'm terrible with acronyms, sorry) what is SAB?

SharedArrayBuffer :)
Comment 3 Sam Weinig 2021-07-22 09:28:36 PDT
(In reply to Yusuke Suzuki from comment #2)
> (In reply to Sam Weinig from comment #1)
> > (I'm terrible with acronyms, sorry) what is SAB?
> 
> SharedArrayBuffer :)

Ah! Thanks!.
Comment 4 Radar WebKit Bug Importer 2021-07-27 20:35:17 PDT
<rdar://problem/81197138>
Comment 5 Chris Dumez 2021-08-27 10:39:19 PDT
According to https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink code inspection, it looks like Chrome is using 100us precision in general and 5ns when cross-origin isolated.

It looks like WebKit is using a much higher 1ms precision in general currently.

I guess we need to decide what precision we want to use when cross-origin-isolated. Do we want to go back to 100us, like before r226495? It would still make us nowhere near Chrome in terms of precision.
Comment 6 Chris Dumez 2021-08-27 10:49:23 PDT
(In reply to Chris Dumez from comment #5)
> According to
> https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink
> code inspection, it looks like Chrome is using 100us precision in general
> and 5ns when cross-origin isolated.
> 
> It looks like WebKit is using a much higher 1ms precision in general
> currently.
> 
> I guess we need to decide what precision we want to use when
> cross-origin-isolated. Do we want to go back to 100us, like before r226495?
> It would still make us nowhere near Chrome in terms of precision.

Looks like Gecko is using 1ms for general precision and 20us for cross-origin-isolated precision.

Maybe we should at least match Gecko from cross-origin-isolated precision?
Comment 7 Keith Miller 2021-08-27 10:56:34 PDT
I'm not positive but I would guess that someone could get a statistically significant sub-us precision from SAB (due to scheduling on the timer thread). So once we have SAB it's not clear how much value there is around giving something greater than 1us timing. It's probably worth verifying my assumption though.
Comment 8 Ryosuke Niwa 2021-08-27 12:30:26 PDT
Matching Gecko makes sense. We need to come up with a strategy to mitigate malicious websites using timing attacks to read out history database. Full GPU process support & move of visited link color work done in GPU process will mitigate that to a large extent.
Comment 9 Chris Dumez 2021-08-27 13:39:38 PDT
Created attachment 436668 [details]
Patch
Comment 10 Chris Dumez 2021-08-27 13:40:53 PDT
Created attachment 436669 [details]
Patch
Comment 11 Ryosuke Niwa 2021-08-27 14:46:57 PDT
Comment on attachment 436669 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436669&action=review

> Source/WebCore/page/Performance.cpp:63
> +constexpr Seconds highTimePrecision { 20_us };

Does Mozilla allow 20microseconds without jitters or with jitters?
If with jitters, since we don't have jitters, we may want to stick with 50microseconds.

> Source/WebCore/page/Performance.cpp:97
> +void Performance::setShouldAllowHighPrecisionTime()

Maybe we should call this allowHighPrecisionTime?

> Source/WebCore/page/Performance.cpp:104
> -    double resolution = (1000_us).seconds();
> +    double resolution = shouldAllowHighPrecisionTime ? highTimePrecision.seconds() : regularTimePrecision.seconds();

It's not great that we'd have to do a runtime switch on every function call although the branch prediction helps.
Can this also be static given we determine this at web process initialization time?
Comment 12 Chris Dumez 2021-09-01 08:28:09 PDT
Comment on attachment 436669 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436669&action=review

>> Source/WebCore/page/Performance.cpp:63
>> +constexpr Seconds highTimePrecision { 20_us };
> 
> Does Mozilla allow 20microseconds without jitters or with jitters?
> If with jitters, since we don't have jitters, we may want to stick with 50microseconds.

I don't know what jitters means here. I am not sure what the answer is.

The fact is though that Chrome is providing much more accuracy. Also, according to both Phil and Keith, SharedArrayBuffer already provides a much greater precision anyway.
Comment 13 Chris Dumez 2021-09-01 08:35:55 PDT
Created attachment 437034 [details]
Patch
Comment 14 Alex Christensen 2021-09-01 11:26:51 PDT
(In reply to Chris Dumez from comment #5)
> According to
> https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink
> code inspection, it looks like Chrome is using 100us precision in general
> and 5ns when cross-origin isolated.
5us not 5ns.

> I don't know what jitters means here. I am not sure what the answer is.
"jitters" here means adding a random value in an attempt to make timing attacks harder.  That sounds expensive and not as helpful as you might think.

(In reply to Ryosuke Niwa from comment #11)
> Does Mozilla allow 20microseconds without jitters or with jitters?
I think both.
Comment 15 Ryosuke Niwa 2021-09-01 11:35:04 PDT
(In reply to Alex Christensen from comment #14)
>
> (In reply to Ryosuke Niwa from comment #11)
> > Does Mozilla allow 20microseconds without jitters or with jitters?
> I think both.

What do you mean by both? Either you have a jitter, not, right?
This is about https://bugzilla.mozilla.org/show_bug.cgi?id=1425462
Comment 16 Alex Christensen 2021-09-01 11:36:40 PDT
I think 20us is the high resolution precision when jitters are used and when they are not used.  I'm referring to https://searchfox.org/mozilla-central/search?q=clampedAndJittered&redirect=false
Comment 17 Ryosuke Niwa 2021-09-01 11:40:29 PDT
(In reply to Alex Christensen from comment #16)
> I think 20us is the high resolution precision when jitters are used and when
> they are not used.  I'm referring to
> https://searchfox.org/mozilla-central/
> search?q=clampedAndJittered&redirect=false

But is jitters enabled when COOP+COEP is on? If so, we may need to reduce our precision to 40-50us. My precious study concluded that having jitter is roughly equivalent to 2-5x precision loss.
Comment 18 Alex Christensen 2021-09-01 11:42:20 PDT
It looks like Gecko only has jitters on when COOP+COEP is on.
Comment 19 Ryosuke Niwa 2021-09-01 11:45:00 PDT
(In reply to Alex Christensen from comment #18)
> It looks like Gecko only has jitters on when COOP+COEP is on.

Really? No jitters when isolation is off? That's contrary to https://bugzilla.mozilla.org/show_bug.cgi?id=1451790.
Comment 20 Alex Christensen 2021-09-01 11:49:15 PDT
Comment on attachment 437034 [details]
Patch

I'm just looking at unconditionalClamping in nsRFPService::ReduceTimePrecisionImpl in searchfox.  I think I actually got it backwards.  I think they only jitter when coop/coep is off.
Comment 21 Ryosuke Niwa 2021-09-01 11:52:15 PDT
(In reply to Alex Christensen from comment #20)
> Comment on attachment 437034 [details]
> Patch
> 
> I'm just looking at unconditionalClamping in
> nsRFPService::ReduceTimePrecisionImpl in searchfox.  I think I actually got
> it backwards.  I think they only jitter when coop/coep is off.

Ok, that makes sense. So maybe it's okay for us to match Gekco's precision when COOP+COEP is turned on.
Comment 22 Alex Christensen 2021-09-01 11:53:10 PDT
I think we should.  20us seems better
Comment 23 Chris Dumez 2021-09-01 11:55:18 PDT
Created attachment 437048 [details]
Patch
Comment 24 Ryosuke Niwa 2021-09-01 12:05:13 PDT
Comment on attachment 437048 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review

> Source/WebCore/page/Performance.cpp:99
> -    double resolution = (1000_us).seconds();
> +    double resolution = timePrecision.seconds();
>      double reduced = std::floor(seconds.seconds() / resolution) * resolution;

I still don't like the fact we'd be doing memory dependent math here now.
Can we instead if else two cases so that division is done with an immediate value (compile time constant)?
Comment 25 Chris Dumez 2021-09-01 12:14:20 PDT
Comment on attachment 437048 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review

> Source/WebCore/page/Performance.cpp:61
> +constexpr Seconds highTimePrecision { 20_us };

So:
```
constexpr auto highTimeResolution { 20_us.seconds() };
constexpr auto lowTimeResolution { 1_ms.seconds() };
static bool shouldUseHighTimeResolution = false;
```

>> Source/WebCore/page/Performance.cpp:99
>>      double reduced = std::floor(seconds.seconds() / resolution) * resolution;
> 
> I still don't like the fact we'd be doing memory dependent math here now.
> Can we instead if else two cases so that division is done with an immediate value (compile time constant)?

And then:
```
double reduced;
if (shouldUseHighTimeResolution)
    reduced = std::floor(seconds.seconds() / highTimeResolution) * highTimeResolution;
else
    reduced = std::floor(seconds.seconds() / lowTimeResolution) * lowTimeResolution;
```

Is that it?

Looks a bit weird.
Comment 26 Alex Christensen 2021-09-01 12:22:16 PDT
Comment on attachment 437048 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review

> Source/WebCore/page/Performance.cpp:103
> +void Performance::allowHighPrecisionTime()

We could also assert that shouldUseHighTimeResolution was false before this change.
Comment 27 Ryosuke Niwa 2021-09-01 12:39:13 PDT
More like this:

template <Seconds precision> Seconds reduceTimeToPrecision(Seconds seconds)
{
    double resolution = precision.seconds();
    double reduced = std::floor(seconds.seconds() / resolution) * resolution;
    return Seconds(reduced);
}

Seconds Performance::reduceTimeResolution(Seconds seconds)
{
    return s_highPrecisionTimeIsAllowed ? reduceTimeToPrecision<20_us>(reseconds) : reduceTimeToPrecision<1000_us>(reseconds); 
}
Comment 28 Darin Adler 2021-09-01 12:46:43 PDT
Is this a real difference? Is there a guarantee the compiler will generate more secure code if we do it this way?
Comment 29 Ryosuke Niwa 2021-09-01 13:06:34 PDT
(In reply to Darin Adler from comment #28)
> Is this a real difference? Is there a guarantee the compiler will generate
> more secure code if we do it this way?

Oh, I'm not concerned about security. I'm rather concerned of the performance implication on relying on a non-local/stack memory value to do computation. Having a memory load dependent branch is a lot cheaper because of the branch predictor in CPU.
Comment 30 Chris Dumez 2021-09-01 13:20:20 PDT
(In reply to Ryosuke Niwa from comment #27)
> More like this:
> 
> template <Seconds precision> Seconds reduceTimeToPrecision(Seconds seconds)
> {
>     double resolution = precision.seconds();
>     double reduced = std::floor(seconds.seconds() / resolution) * resolution;
>     return Seconds(reduced);
> }

I don't think we can use Seconds as template parameter until C++20, at least that's what the compiler tells me.
Comment 31 Chris Dumez 2021-09-01 13:24:09 PDT
FYI, with clang -03, this program:
```
static double timePrecision { 0.001 };

__attribute__((noinline))
double reduceTimePrecision(double value)
{
    return std::floor(value / timePrecision) * timePrecision;
}

int main()
{
    printf("%f", reduceTimePrecision(0.234556475859));
    return 0;
}
```

gives

```
.LCPI0_0:
        .quad   0x3f50624dd2f1a9fc              # double 0.001
_Z19reduceTimePrecisiond:               # @_Z19reduceTimePrecisiond
        pushq   %rax
        divsd   .LCPI0_0(%rip), %xmm0
        callq   floor@PLT
        mulsd   .LCPI0_0(%rip), %xmm0
        popq    %rax
        retq
.LCPI1_0:
        .quad   0x3fce05f2547090c9              # double 0.234556475859
main:                                   # @main
        pushq   %rax
        movsd   .LCPI1_0(%rip), %xmm0           # xmm0 = mem[0],zero
        callq   _Z19reduceTimePrecisiond
        movl    $.L.str, %edi
        movb    $1, %al
        callq   printf
        xorl    %eax, %eax
        popq    %rcx
        retq
.L.str:
        .asciz  "%f"
```


while this program:
```
constexpr double highTimePrecision { 0.00002 };
constexpr double lowTimePrecision { 0.001 };
static bool s_highPrecisionTimeIsAllowed { false };

__attribute__((noinline))
double reduceTimePrecision(double value)
{
    if (s_highPrecisionTimeIsAllowed)
        return std::floor(value / highTimePrecision) * highTimePrecision;
    else
        return std::floor(value / lowTimePrecision) * lowTimePrecision;}

int main()
{
    printf("%f", reduceTimePrecision(0.234556475859));
    return 0;
}
```

gives
```
.LCPI0_0:
        .quad   0x3f50624dd2f1a9fc              # double 0.001
_Z19reduceTimePrecisiond:               # @_Z19reduceTimePrecisiond
        pushq   %rax
        divsd   .LCPI0_0(%rip), %xmm0
        callq   floor@PLT
        mulsd   .LCPI0_0(%rip), %xmm0
        popq    %rax
        retq
.LCPI1_0:
        .quad   0x3fce05f2547090c9              # double 0.234556475859
main:                                   # @main
        pushq   %rax
        movsd   .LCPI1_0(%rip), %xmm0           # xmm0 = mem[0],zero
        callq   _Z19reduceTimePrecisiond
        movl    $.L.str, %edi
        movb    $1, %al
        callq   printf
        xorl    %eax, %eax
        popq    %rcx
        retq
_GLOBAL__sub_I_example.cpp:             # @_GLOBAL__sub_I_example.cpp
        pushq   %rax
        movl    $_ZStL8__ioinit, %edi
        callq   _ZNSt8ios_base4InitC1Ev
        movl    $_ZNSt8ios_base4InitD1Ev, %edi
        movl    $_ZStL8__ioinit, %esi
        movl    $__dso_handle, %edx
        popq    %rax
        jmp     __cxa_atexit                    # TAILCALL
.L.str:
        .asciz  "%f"
```
Comment 32 Chris Dumez 2021-09-01 13:27:08 PDT
Sorry, bad copy / paste, trying again:

FYI, with clang -03, this program:
```
static double timePrecision { 0.001 };

__attribute__((noinline))
double reduceTimePrecision(double value)
{
    return std::floor(value / timePrecision) * timePrecision;
}

int main()
{
    printf("%f", reduceTimePrecision(0.234556475859));
    return 0;
}
```

gives

```
.LCPI0_0:
        .quad   0x3f50624dd2f1a9fc              # double 0.001
_Z19reduceTimePrecisiond:               # @_Z19reduceTimePrecisiond
        pushq   %rax
        divsd   .LCPI0_0(%rip), %xmm0
        callq   floor@PLT
        mulsd   .LCPI0_0(%rip), %xmm0
        popq    %rax
        retq
.LCPI1_0:
        .quad   0x3fce05f2547090c9              # double 0.234556475859
main:                                   # @main
        pushq   %rax
        movsd   .LCPI1_0(%rip), %xmm0           # xmm0 = mem[0],zero
        callq   _Z19reduceTimePrecisiond
        movl    $.L.str, %edi
        movb    $1, %al
        callq   printf
        xorl    %eax, %eax
        popq    %rcx
        retq
.L.str:
        .asciz  "%f"
```


while this program:
```
constexpr double highTimePrecision { 0.00002 };
constexpr double lowTimePrecision { 0.001 };
static bool s_highPrecisionTimeIsAllowed { false };

__attribute__((noinline))
double reduceTimePrecision(double value)
{
    if (s_highPrecisionTimeIsAllowed)
        return std::floor(value / highTimePrecision) * highTimePrecision;
    else
        return std::floor(value / lowTimePrecision) * lowTimePrecision;}

int main()
{
    printf("%f", reduceTimePrecision(0.234556475859));
    return 0;
}
```

gives
```
.LCPI0_0:
        .quad   0x3f50624dd2f1a9fc              # double 0.001
_Z19reduceTimePrecisiond:               # @_Z19reduceTimePrecisiond
        pushq   %rax
        divsd   .LCPI0_0(%rip), %xmm0
        callq   floor@PLT
        mulsd   .LCPI0_0(%rip), %xmm0
        popq    %rax
        retq
.LCPI1_0:
        .quad   0x3fce05f2547090c9              # double 0.234556475859
main:                                   # @main
        pushq   %rax
        movsd   .LCPI1_0(%rip), %xmm0           # xmm0 = mem[0],zero
        callq   _Z19reduceTimePrecisiond
        movl    $.L.str, %edi
        movb    $1, %al
        callq   printf
        xorl    %eax, %eax
        popq    %rcx
        retq
.L.str:
        .asciz  "%f"
```
Comment 33 Chris Dumez 2021-09-01 13:31:26 PDT
Arg, the compiler is optimizing away some things I don't want it too. Adding volatile to help:

FYI, with clang -03, this program:
```
static volatile double timePrecision { 0.001 };

__attribute__((noinline))
double reduceTimePrecision(double value)
{
    return std::floor(value / timePrecision) * timePrecision;
}

int main()
{
    printf("%f", reduceTimePrecision(0.234556475859));
    return 0;
}
```

gives

```
_Z19reduceTimePrecisiond:               # @_Z19reduceTimePrecisiond
        pushq   %rax
        divsd   _ZL13timePrecision(%rip), %xmm0
        callq   floor@PLT
        mulsd   _ZL13timePrecision(%rip), %xmm0
        popq    %rax
        retq
.LCPI1_0:
        .quad   0x3fce05f2547090c9              # double 0.234556475859
main:                                   # @main
        pushq   %rax
        movsd   .LCPI1_0(%rip), %xmm0           # xmm0 = mem[0],zero
        callq   _Z19reduceTimePrecisiond
        movl    $.L.str, %edi
        movb    $1, %al
        callq   printf
        xorl    %eax, %eax
        popq    %rcx
        retq
_ZL13timePrecision:
        .quad   0x3f50624dd2f1a9fc              # double 0.001

.L.str:
        .asciz  "%f"
```


while this program:
```
constexpr double highTimePrecision { 0.00002 };
constexpr double lowTimePrecision { 0.001 };
static volatile bool s_highPrecisionTimeIsAllowed { false };

__attribute__((noinline))
double reduceTimePrecision(double value)
{
    if (s_highPrecisionTimeIsAllowed)
        return std::floor(value / highTimePrecision) * highTimePrecision;
    else
        return std::floor(value / lowTimePrecision) * lowTimePrecision;}

int main()
{
    printf("%f", reduceTimePrecision(0.234556475859));
    return 0;
}
```

gives
```
.LCPI0_0:
        .quad   0x3ef4f8b588e368f1              # double 2.0000000000000002E-5
        .quad   0x3f50624dd2f1a9fc              # double 0.001
_Z19reduceTimePrecisiond:               # @_Z19reduceTimePrecisiond
        pushq   %rax
        xorl    %eax, %eax
        cmpb    $0, _ZL28s_highPrecisionTimeIsAllowed(%rip)
        sete    %al
        movsd   .LCPI0_0(,%rax,8), %xmm1        # xmm1 = mem[0],zero
        movsd   %xmm1, (%rsp)                   # 8-byte Spill
        divsd   %xmm1, %xmm0
        callq   floor@PLT
        mulsd   (%rsp), %xmm0                   # 8-byte Folded Reload
        popq    %rax
        retq
.LCPI1_0:
        .quad   0x3fce05f2547090c9              # double 0.234556475859
main:                                   # @main
        pushq   %rax
        movsd   .LCPI1_0(%rip), %xmm0           # xmm0 = mem[0],zero
        callq   _Z19reduceTimePrecisiond
        movl    $.L.str, %edi
        movb    $1, %al
        callq   printf
        xorl    %eax, %eax
        popq    %rcx
        retq
.L.str:
        .asciz  "%f"
```
Comment 34 Ryosuke Niwa 2021-09-01 13:57:46 PDT
Hm... weird that we're not using an immediate value for the double values. Some quick search tells me that this might be a limitation of the assembly language itself.
Comment 35 Ryosuke Niwa 2021-09-01 14:21:07 PDT
Comment on attachment 437048 [details]
Patch

Ok, sounds like this is the optimal approach after taking with Yusuke.
Comment 36 EWS 2021-09-01 14:35:07 PDT
Committed r281879 (241209@main): <https://commits.webkit.org/241209@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 437048 [details].