228137 – Improve time precision when cross-origin isolated via COOP+COEP

RESOLVED FIXED 228137

Improve time precision when cross-origin isolated via COOP+COEP

https://bugs.webkit.org/show_bug.cgi?id=228137

Summary Improve time precision when cross-origin isolated via COOP+COEP

Keith Miller

Reported 2021-07-20 20:34:41 PDT

It doesn’t make any sense to keep time precision reduced if SAB is enabled for a page.

Attachments
Patch (10.16 KB, patch) 2021-08-27 13:39 PDT, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (10.16 KB, patch) 2021-08-27 13:40 PDT, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (4.19 KB, patch) 2021-09-01 08:35 PDT, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (4.13 KB, patch) 2021-09-01 11:55 PDT, Chris Dumez	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Sam Weinig

Comment 1 2021-07-21 16:53:39 PDT

(I'm terrible with acronyms, sorry) what is SAB?

Yusuke Suzuki

Comment 2 2021-07-21 17:34:31 PDT

(In reply to Sam Weinig from comment #1) > (I'm terrible with acronyms, sorry) what is SAB? SharedArrayBuffer :)

Sam Weinig

Comment 3 2021-07-22 09:28:36 PDT

(In reply to Yusuke Suzuki from comment #2) > (In reply to Sam Weinig from comment #1) > > (I'm terrible with acronyms, sorry) what is SAB? > > SharedArrayBuffer :) Ah! Thanks!.

Radar WebKit Bug Importer

Comment 4 2021-07-27 20:35:17 PDT

<rdar://problem/81197138>

Chris Dumez

Comment 5 2021-08-27 10:39:19 PDT

According to https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink code inspection, it looks like Chrome is using 100us precision in general and 5ns when cross-origin isolated. It looks like WebKit is using a much higher 1ms precision in general currently. I guess we need to decide what precision we want to use when cross-origin-isolated. Do we want to go back to 100us, like before r226495? It would still make us nowhere near Chrome in terms of precision.

Chris Dumez

Comment 6 2021-08-27 10:49:23 PDT

(In reply to Chris Dumez from comment #5) > According to > https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink > code inspection, it looks like Chrome is using 100us precision in general > and 5ns when cross-origin isolated. > > It looks like WebKit is using a much higher 1ms precision in general > currently. > > I guess we need to decide what precision we want to use when > cross-origin-isolated. Do we want to go back to 100us, like before r226495? > It would still make us nowhere near Chrome in terms of precision. Looks like Gecko is using 1ms for general precision and 20us for cross-origin-isolated precision. Maybe we should at least match Gecko from cross-origin-isolated precision?

Keith Miller

Comment 7 2021-08-27 10:56:34 PDT

I'm not positive but I would guess that someone could get a statistically significant sub-us precision from SAB (due to scheduling on the timer thread). So once we have SAB it's not clear how much value there is around giving something greater than 1us timing. It's probably worth verifying my assumption though.

Ryosuke Niwa

Comment 8 2021-08-27 12:30:26 PDT

Matching Gecko makes sense. We need to come up with a strategy to mitigate malicious websites using timing attacks to read out history database. Full GPU process support & move of visited link color work done in GPU process will mitigate that to a large extent.

Chris Dumez

Comment 9 2021-08-27 13:39:38 PDT

Created attachment 436668 [details] Patch

Chris Dumez

Comment 10 2021-08-27 13:40:53 PDT

Created attachment 436669 [details] Patch

Ryosuke Niwa

Comment 11 2021-08-27 14:46:57 PDT

Comment on attachment 436669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436669&action=review > Source/WebCore/page/Performance.cpp:63 > +constexpr Seconds highTimePrecision { 20_us }; Does Mozilla allow 20microseconds without jitters or with jitters? If with jitters, since we don't have jitters, we may want to stick with 50microseconds. > Source/WebCore/page/Performance.cpp:97 > +void Performance::setShouldAllowHighPrecisionTime() Maybe we should call this allowHighPrecisionTime? > Source/WebCore/page/Performance.cpp:104 > - double resolution = (1000_us).seconds(); > + double resolution = shouldAllowHighPrecisionTime ? highTimePrecision.seconds() : regularTimePrecision.seconds(); It's not great that we'd have to do a runtime switch on every function call although the branch prediction helps. Can this also be static given we determine this at web process initialization time?

Chris Dumez

Comment 12 2021-09-01 08:28:09 PDT

Comment on attachment 436669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436669&action=review >> Source/WebCore/page/Performance.cpp:63 >> +constexpr Seconds highTimePrecision { 20_us }; > > Does Mozilla allow 20microseconds without jitters or with jitters? > If with jitters, since we don't have jitters, we may want to stick with 50microseconds. I don't know what jitters means here. I am not sure what the answer is. The fact is though that Chrome is providing much more accuracy. Also, according to both Phil and Keith, SharedArrayBuffer already provides a much greater precision anyway.

Chris Dumez

Comment 13 2021-09-01 08:35:55 PDT

Created attachment 437034 [details] Patch

Alex Christensen

Comment 14 2021-09-01 11:26:51 PDT

(In reply to Chris Dumez from comment #5) > According to > https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink > code inspection, it looks like Chrome is using 100us precision in general > and 5ns when cross-origin isolated. 5us not 5ns. > I don't know what jitters means here. I am not sure what the answer is. "jitters" here means adding a random value in an attempt to make timing attacks harder. That sounds expensive and not as helpful as you might think. (In reply to Ryosuke Niwa from comment #11) > Does Mozilla allow 20microseconds without jitters or with jitters? I think both.

Ryosuke Niwa

Comment 15 2021-09-01 11:35:04 PDT

(In reply to Alex Christensen from comment #14) > > (In reply to Ryosuke Niwa from comment #11) > > Does Mozilla allow 20microseconds without jitters or with jitters? > I think both. What do you mean by both? Either you have a jitter, not, right? This is about https://bugzilla.mozilla.org/show_bug.cgi?id=1425462

Alex Christensen

Comment 16 2021-09-01 11:36:40 PDT

I think 20us is the high resolution precision when jitters are used and when they are not used. I'm referring to https://searchfox.org/mozilla-central/search?q=clampedAndJittered&redirect=false

Ryosuke Niwa

Comment 17 2021-09-01 11:40:29 PDT

(In reply to Alex Christensen from comment #16) > I think 20us is the high resolution precision when jitters are used and when > they are not used. I'm referring to > https://searchfox.org/mozilla-central/ > search?q=clampedAndJittered&redirect=false But is jitters enabled when COOP+COEP is on? If so, we may need to reduce our precision to 40-50us. My precious study concluded that having jitter is roughly equivalent to 2-5x precision loss.

Alex Christensen

Comment 18 2021-09-01 11:42:20 PDT

It looks like Gecko only has jitters on when COOP+COEP is on.

Ryosuke Niwa

Comment 19 2021-09-01 11:45:00 PDT

(In reply to Alex Christensen from comment #18) > It looks like Gecko only has jitters on when COOP+COEP is on. Really? No jitters when isolation is off? That's contrary to https://bugzilla.mozilla.org/show_bug.cgi?id=1451790.

Alex Christensen

Comment 20 2021-09-01 11:49:15 PDT

Comment on attachment 437034 [details] Patch I'm just looking at unconditionalClamping in nsRFPService::ReduceTimePrecisionImpl in searchfox. I think I actually got it backwards. I think they only jitter when coop/coep is off.

Ryosuke Niwa

Comment 21 2021-09-01 11:52:15 PDT

(In reply to Alex Christensen from comment #20) > Comment on attachment 437034 [details] > Patch > > I'm just looking at unconditionalClamping in > nsRFPService::ReduceTimePrecisionImpl in searchfox. I think I actually got > it backwards. I think they only jitter when coop/coep is off. Ok, that makes sense. So maybe it's okay for us to match Gekco's precision when COOP+COEP is turned on.

Alex Christensen

Comment 22 2021-09-01 11:53:10 PDT

I think we should. 20us seems better

Chris Dumez

Comment 23 2021-09-01 11:55:18 PDT

Created attachment 437048 [details] Patch

Ryosuke Niwa

Comment 24 2021-09-01 12:05:13 PDT

Comment on attachment 437048 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review > Source/WebCore/page/Performance.cpp:99 > - double resolution = (1000_us).seconds(); > + double resolution = timePrecision.seconds(); > double reduced = std::floor(seconds.seconds() / resolution) * resolution; I still don't like the fact we'd be doing memory dependent math here now. Can we instead if else two cases so that division is done with an immediate value (compile time constant)?

Chris Dumez

Comment 25 2021-09-01 12:14:20 PDT

Comment on attachment 437048 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review > Source/WebCore/page/Performance.cpp:61 > +constexpr Seconds highTimePrecision { 20_us }; So: ``` constexpr auto highTimeResolution { 20_us.seconds() }; constexpr auto lowTimeResolution { 1_ms.seconds() }; static bool shouldUseHighTimeResolution = false; ``` >> Source/WebCore/page/Performance.cpp:99 >> double reduced = std::floor(seconds.seconds() / resolution) * resolution; > > I still don't like the fact we'd be doing memory dependent math here now. > Can we instead if else two cases so that division is done with an immediate value (compile time constant)? And then: ``` double reduced; if (shouldUseHighTimeResolution) reduced = std::floor(seconds.seconds() / highTimeResolution) * highTimeResolution; else reduced = std::floor(seconds.seconds() / lowTimeResolution) * lowTimeResolution; ``` Is that it? Looks a bit weird.

Alex Christensen

Comment 26 2021-09-01 12:22:16 PDT

Comment on attachment 437048 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review > Source/WebCore/page/Performance.cpp:103 > +void Performance::allowHighPrecisionTime() We could also assert that shouldUseHighTimeResolution was false before this change.

Ryosuke Niwa

Comment 27 2021-09-01 12:39:13 PDT

More like this: template <Seconds precision> Seconds reduceTimeToPrecision(Seconds seconds) { double resolution = precision.seconds(); double reduced = std::floor(seconds.seconds() / resolution) * resolution; return Seconds(reduced); } Seconds Performance::reduceTimeResolution(Seconds seconds) { return s_highPrecisionTimeIsAllowed ? reduceTimeToPrecision<20_us>(reseconds) : reduceTimeToPrecision<1000_us>(reseconds); }

Darin Adler

Comment 28 2021-09-01 12:46:43 PDT

Is this a real difference? Is there a guarantee the compiler will generate more secure code if we do it this way?

Ryosuke Niwa

Comment 29 2021-09-01 13:06:34 PDT

(In reply to Darin Adler from comment #28) > Is this a real difference? Is there a guarantee the compiler will generate > more secure code if we do it this way? Oh, I'm not concerned about security. I'm rather concerned of the performance implication on relying on a non-local/stack memory value to do computation. Having a memory load dependent branch is a lot cheaper because of the branch predictor in CPU.

Chris Dumez

Comment 30 2021-09-01 13:20:20 PDT

(In reply to Ryosuke Niwa from comment #27) > More like this: > > template <Seconds precision> Seconds reduceTimeToPrecision(Seconds seconds) > { > double resolution = precision.seconds(); > double reduced = std::floor(seconds.seconds() / resolution) * resolution; > return Seconds(reduced); > } I don't think we can use Seconds as template parameter until C++20, at least that's what the compiler tells me.

Chris Dumez

Comment 31 2021-09-01 13:24:09 PDT

FYI, with clang -03, this program: ``` static double timePrecision { 0.001 }; __attribute__((noinline)) double reduceTimePrecision(double value) { return std::floor(value / timePrecision) * timePrecision; } int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ``` while this program: ``` constexpr double highTimePrecision { 0.00002 }; constexpr double lowTimePrecision { 0.001 }; static bool s_highPrecisionTimeIsAllowed { false }; __attribute__((noinline)) double reduceTimePrecision(double value) { if (s_highPrecisionTimeIsAllowed) return std::floor(value / highTimePrecision) * highTimePrecision; else return std::floor(value / lowTimePrecision) * lowTimePrecision;} int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq _GLOBAL__sub_I_example.cpp: # @_GLOBAL__sub_I_example.cpp pushq %rax movl $_ZStL8__ioinit, %edi callq _ZNSt8ios_base4InitC1Ev movl $_ZNSt8ios_base4InitD1Ev, %edi movl $_ZStL8__ioinit, %esi movl $__dso_handle, %edx popq %rax jmp __cxa_atexit # TAILCALL .L.str: .asciz "%f" ```

Chris Dumez

Comment 32 2021-09-01 13:27:08 PDT

Sorry, bad copy / paste, trying again: FYI, with clang -03, this program: ``` static double timePrecision { 0.001 }; __attribute__((noinline)) double reduceTimePrecision(double value) { return std::floor(value / timePrecision) * timePrecision; } int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ``` while this program: ``` constexpr double highTimePrecision { 0.00002 }; constexpr double lowTimePrecision { 0.001 }; static bool s_highPrecisionTimeIsAllowed { false }; __attribute__((noinline)) double reduceTimePrecision(double value) { if (s_highPrecisionTimeIsAllowed) return std::floor(value / highTimePrecision) * highTimePrecision; else return std::floor(value / lowTimePrecision) * lowTimePrecision;} int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ```

Chris Dumez

Comment 33 2021-09-01 13:31:26 PDT

Arg, the compiler is optimizing away some things I don't want it too. Adding volatile to help: FYI, with clang -03, this program: ``` static volatile double timePrecision { 0.001 }; __attribute__((noinline)) double reduceTimePrecision(double value) { return std::floor(value / timePrecision) * timePrecision; } int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd _ZL13timePrecision(%rip), %xmm0 callq floor@PLT mulsd _ZL13timePrecision(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq _ZL13timePrecision: .quad 0x3f50624dd2f1a9fc # double 0.001 .L.str: .asciz "%f" ``` while this program: ``` constexpr double highTimePrecision { 0.00002 }; constexpr double lowTimePrecision { 0.001 }; static volatile bool s_highPrecisionTimeIsAllowed { false }; __attribute__((noinline)) double reduceTimePrecision(double value) { if (s_highPrecisionTimeIsAllowed) return std::floor(value / highTimePrecision) * highTimePrecision; else return std::floor(value / lowTimePrecision) * lowTimePrecision;} int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3ef4f8b588e368f1 # double 2.0000000000000002E-5 .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax xorl %eax, %eax cmpb $0, _ZL28s_highPrecisionTimeIsAllowed(%rip) sete %al movsd .LCPI0_0(,%rax,8), %xmm1 # xmm1 = mem[0],zero movsd %xmm1, (%rsp) # 8-byte Spill divsd %xmm1, %xmm0 callq floor@PLT mulsd (%rsp), %xmm0 # 8-byte Folded Reload popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ```

Ryosuke Niwa

Comment 34 2021-09-01 13:57:46 PDT

Hm... weird that we're not using an immediate value for the double values. Some quick search tells me that this might be a limitation of the assembly language itself.

Ryosuke Niwa

Comment 35 2021-09-01 14:21:07 PDT

Comment on attachment 437048 [details] Patch Ok, sounds like this is the optimal approach after taking with Yusuke.

EWS

Comment 36 2021-09-01 14:35:07 PDT

Committed r281879 (241209@main): <https://commits.webkit.org/241209@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 437048 [details].

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Chris Dumez

Reported

2021-07-20 20:34 PDT

Modified

2021-09-01 14:35 PDT History

CC List

17 users Show

URL

Keywords InRadar

Depends on

229559

Blocks

228755

Dependencies

tree graph