RESOLVED FIXED Bug 228137
Improve time precision when cross-origin isolated via COOP+COEP
https://bugs.webkit.org/show_bug.cgi?id=228137
Summary Improve time precision when cross-origin isolated via COOP+COEP
Keith Miller
Reported 2021-07-20 20:34:41 PDT
It doesn’t make any sense to keep time precision reduced if SAB is enabled for a page.
Attachments
Patch (10.16 KB, patch)
2021-08-27 13:39 PDT, Chris Dumez
no flags
Patch (10.16 KB, patch)
2021-08-27 13:40 PDT, Chris Dumez
no flags
Patch (4.19 KB, patch)
2021-09-01 08:35 PDT, Chris Dumez
no flags
Patch (4.13 KB, patch)
2021-09-01 11:55 PDT, Chris Dumez
no flags
Sam Weinig
Comment 1 2021-07-21 16:53:39 PDT
(I'm terrible with acronyms, sorry) what is SAB?
Yusuke Suzuki
Comment 2 2021-07-21 17:34:31 PDT
(In reply to Sam Weinig from comment #1) > (I'm terrible with acronyms, sorry) what is SAB? SharedArrayBuffer :)
Sam Weinig
Comment 3 2021-07-22 09:28:36 PDT
(In reply to Yusuke Suzuki from comment #2) > (In reply to Sam Weinig from comment #1) > > (I'm terrible with acronyms, sorry) what is SAB? > > SharedArrayBuffer :) Ah! Thanks!.
Radar WebKit Bug Importer
Comment 4 2021-07-27 20:35:17 PDT
Chris Dumez
Comment 5 2021-08-27 10:39:19 PDT
According to https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink code inspection, it looks like Chrome is using 100us precision in general and 5ns when cross-origin isolated. It looks like WebKit is using a much higher 1ms precision in general currently. I guess we need to decide what precision we want to use when cross-origin-isolated. Do we want to go back to 100us, like before r226495? It would still make us nowhere near Chrome in terms of precision.
Chris Dumez
Comment 6 2021-08-27 10:49:23 PDT
(In reply to Chris Dumez from comment #5) > According to > https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink > code inspection, it looks like Chrome is using 100us precision in general > and 5ns when cross-origin isolated. > > It looks like WebKit is using a much higher 1ms precision in general > currently. > > I guess we need to decide what precision we want to use when > cross-origin-isolated. Do we want to go back to 100us, like before r226495? > It would still make us nowhere near Chrome in terms of precision. Looks like Gecko is using 1ms for general precision and 20us for cross-origin-isolated precision. Maybe we should at least match Gecko from cross-origin-isolated precision?
Keith Miller
Comment 7 2021-08-27 10:56:34 PDT
I'm not positive but I would guess that someone could get a statistically significant sub-us precision from SAB (due to scheduling on the timer thread). So once we have SAB it's not clear how much value there is around giving something greater than 1us timing. It's probably worth verifying my assumption though.
Ryosuke Niwa
Comment 8 2021-08-27 12:30:26 PDT
Matching Gecko makes sense. We need to come up with a strategy to mitigate malicious websites using timing attacks to read out history database. Full GPU process support & move of visited link color work done in GPU process will mitigate that to a large extent.
Chris Dumez
Comment 9 2021-08-27 13:39:38 PDT
Chris Dumez
Comment 10 2021-08-27 13:40:53 PDT
Ryosuke Niwa
Comment 11 2021-08-27 14:46:57 PDT
Comment on attachment 436669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436669&action=review > Source/WebCore/page/Performance.cpp:63 > +constexpr Seconds highTimePrecision { 20_us }; Does Mozilla allow 20microseconds without jitters or with jitters? If with jitters, since we don't have jitters, we may want to stick with 50microseconds. > Source/WebCore/page/Performance.cpp:97 > +void Performance::setShouldAllowHighPrecisionTime() Maybe we should call this allowHighPrecisionTime? > Source/WebCore/page/Performance.cpp:104 > - double resolution = (1000_us).seconds(); > + double resolution = shouldAllowHighPrecisionTime ? highTimePrecision.seconds() : regularTimePrecision.seconds(); It's not great that we'd have to do a runtime switch on every function call although the branch prediction helps. Can this also be static given we determine this at web process initialization time?
Chris Dumez
Comment 12 2021-09-01 08:28:09 PDT
Comment on attachment 436669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436669&action=review >> Source/WebCore/page/Performance.cpp:63 >> +constexpr Seconds highTimePrecision { 20_us }; > > Does Mozilla allow 20microseconds without jitters or with jitters? > If with jitters, since we don't have jitters, we may want to stick with 50microseconds. I don't know what jitters means here. I am not sure what the answer is. The fact is though that Chrome is providing much more accuracy. Also, according to both Phil and Keith, SharedArrayBuffer already provides a much greater precision anyway.
Chris Dumez
Comment 13 2021-09-01 08:35:55 PDT
Alex Christensen
Comment 14 2021-09-01 11:26:51 PDT
(In reply to Chris Dumez from comment #5) > According to > https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink > code inspection, it looks like Chrome is using 100us precision in general > and 5ns when cross-origin isolated. 5us not 5ns. > I don't know what jitters means here. I am not sure what the answer is. "jitters" here means adding a random value in an attempt to make timing attacks harder. That sounds expensive and not as helpful as you might think. (In reply to Ryosuke Niwa from comment #11) > Does Mozilla allow 20microseconds without jitters or with jitters? I think both.
Ryosuke Niwa
Comment 15 2021-09-01 11:35:04 PDT
(In reply to Alex Christensen from comment #14) > > (In reply to Ryosuke Niwa from comment #11) > > Does Mozilla allow 20microseconds without jitters or with jitters? > I think both. What do you mean by both? Either you have a jitter, not, right? This is about https://bugzilla.mozilla.org/show_bug.cgi?id=1425462
Alex Christensen
Comment 16 2021-09-01 11:36:40 PDT
I think 20us is the high resolution precision when jitters are used and when they are not used. I'm referring to https://searchfox.org/mozilla-central/search?q=clampedAndJittered&redirect=false
Ryosuke Niwa
Comment 17 2021-09-01 11:40:29 PDT
(In reply to Alex Christensen from comment #16) > I think 20us is the high resolution precision when jitters are used and when > they are not used. I'm referring to > https://searchfox.org/mozilla-central/ > search?q=clampedAndJittered&redirect=false But is jitters enabled when COOP+COEP is on? If so, we may need to reduce our precision to 40-50us. My precious study concluded that having jitter is roughly equivalent to 2-5x precision loss.
Alex Christensen
Comment 18 2021-09-01 11:42:20 PDT
It looks like Gecko only has jitters on when COOP+COEP is on.
Ryosuke Niwa
Comment 19 2021-09-01 11:45:00 PDT
(In reply to Alex Christensen from comment #18) > It looks like Gecko only has jitters on when COOP+COEP is on. Really? No jitters when isolation is off? That's contrary to https://bugzilla.mozilla.org/show_bug.cgi?id=1451790.
Alex Christensen
Comment 20 2021-09-01 11:49:15 PDT
Comment on attachment 437034 [details] Patch I'm just looking at unconditionalClamping in nsRFPService::ReduceTimePrecisionImpl in searchfox. I think I actually got it backwards. I think they only jitter when coop/coep is off.
Ryosuke Niwa
Comment 21 2021-09-01 11:52:15 PDT
(In reply to Alex Christensen from comment #20) > Comment on attachment 437034 [details] > Patch > > I'm just looking at unconditionalClamping in > nsRFPService::ReduceTimePrecisionImpl in searchfox. I think I actually got > it backwards. I think they only jitter when coop/coep is off. Ok, that makes sense. So maybe it's okay for us to match Gekco's precision when COOP+COEP is turned on.
Alex Christensen
Comment 22 2021-09-01 11:53:10 PDT
I think we should. 20us seems better
Chris Dumez
Comment 23 2021-09-01 11:55:18 PDT
Ryosuke Niwa
Comment 24 2021-09-01 12:05:13 PDT
Comment on attachment 437048 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review > Source/WebCore/page/Performance.cpp:99 > - double resolution = (1000_us).seconds(); > + double resolution = timePrecision.seconds(); > double reduced = std::floor(seconds.seconds() / resolution) * resolution; I still don't like the fact we'd be doing memory dependent math here now. Can we instead if else two cases so that division is done with an immediate value (compile time constant)?
Chris Dumez
Comment 25 2021-09-01 12:14:20 PDT
Comment on attachment 437048 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review > Source/WebCore/page/Performance.cpp:61 > +constexpr Seconds highTimePrecision { 20_us }; So: ``` constexpr auto highTimeResolution { 20_us.seconds() }; constexpr auto lowTimeResolution { 1_ms.seconds() }; static bool shouldUseHighTimeResolution = false; ``` >> Source/WebCore/page/Performance.cpp:99 >> double reduced = std::floor(seconds.seconds() / resolution) * resolution; > > I still don't like the fact we'd be doing memory dependent math here now. > Can we instead if else two cases so that division is done with an immediate value (compile time constant)? And then: ``` double reduced; if (shouldUseHighTimeResolution) reduced = std::floor(seconds.seconds() / highTimeResolution) * highTimeResolution; else reduced = std::floor(seconds.seconds() / lowTimeResolution) * lowTimeResolution; ``` Is that it? Looks a bit weird.
Alex Christensen
Comment 26 2021-09-01 12:22:16 PDT
Comment on attachment 437048 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review > Source/WebCore/page/Performance.cpp:103 > +void Performance::allowHighPrecisionTime() We could also assert that shouldUseHighTimeResolution was false before this change.
Ryosuke Niwa
Comment 27 2021-09-01 12:39:13 PDT
More like this: template <Seconds precision> Seconds reduceTimeToPrecision(Seconds seconds) { double resolution = precision.seconds(); double reduced = std::floor(seconds.seconds() / resolution) * resolution; return Seconds(reduced); } Seconds Performance::reduceTimeResolution(Seconds seconds) { return s_highPrecisionTimeIsAllowed ? reduceTimeToPrecision<20_us>(reseconds) : reduceTimeToPrecision<1000_us>(reseconds); }
Darin Adler
Comment 28 2021-09-01 12:46:43 PDT
Is this a real difference? Is there a guarantee the compiler will generate more secure code if we do it this way?
Ryosuke Niwa
Comment 29 2021-09-01 13:06:34 PDT
(In reply to Darin Adler from comment #28) > Is this a real difference? Is there a guarantee the compiler will generate > more secure code if we do it this way? Oh, I'm not concerned about security. I'm rather concerned of the performance implication on relying on a non-local/stack memory value to do computation. Having a memory load dependent branch is a lot cheaper because of the branch predictor in CPU.
Chris Dumez
Comment 30 2021-09-01 13:20:20 PDT
(In reply to Ryosuke Niwa from comment #27) > More like this: > > template <Seconds precision> Seconds reduceTimeToPrecision(Seconds seconds) > { > double resolution = precision.seconds(); > double reduced = std::floor(seconds.seconds() / resolution) * resolution; > return Seconds(reduced); > } I don't think we can use Seconds as template parameter until C++20, at least that's what the compiler tells me.
Chris Dumez
Comment 31 2021-09-01 13:24:09 PDT
FYI, with clang -03, this program: ``` static double timePrecision { 0.001 }; __attribute__((noinline)) double reduceTimePrecision(double value) { return std::floor(value / timePrecision) * timePrecision; } int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ``` while this program: ``` constexpr double highTimePrecision { 0.00002 }; constexpr double lowTimePrecision { 0.001 }; static bool s_highPrecisionTimeIsAllowed { false }; __attribute__((noinline)) double reduceTimePrecision(double value) { if (s_highPrecisionTimeIsAllowed) return std::floor(value / highTimePrecision) * highTimePrecision; else return std::floor(value / lowTimePrecision) * lowTimePrecision;} int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq _GLOBAL__sub_I_example.cpp: # @_GLOBAL__sub_I_example.cpp pushq %rax movl $_ZStL8__ioinit, %edi callq _ZNSt8ios_base4InitC1Ev movl $_ZNSt8ios_base4InitD1Ev, %edi movl $_ZStL8__ioinit, %esi movl $__dso_handle, %edx popq %rax jmp __cxa_atexit # TAILCALL .L.str: .asciz "%f" ```
Chris Dumez
Comment 32 2021-09-01 13:27:08 PDT
Sorry, bad copy / paste, trying again: FYI, with clang -03, this program: ``` static double timePrecision { 0.001 }; __attribute__((noinline)) double reduceTimePrecision(double value) { return std::floor(value / timePrecision) * timePrecision; } int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ``` while this program: ``` constexpr double highTimePrecision { 0.00002 }; constexpr double lowTimePrecision { 0.001 }; static bool s_highPrecisionTimeIsAllowed { false }; __attribute__((noinline)) double reduceTimePrecision(double value) { if (s_highPrecisionTimeIsAllowed) return std::floor(value / highTimePrecision) * highTimePrecision; else return std::floor(value / lowTimePrecision) * lowTimePrecision;} int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ```
Chris Dumez
Comment 33 2021-09-01 13:31:26 PDT
Arg, the compiler is optimizing away some things I don't want it too. Adding volatile to help: FYI, with clang -03, this program: ``` static volatile double timePrecision { 0.001 }; __attribute__((noinline)) double reduceTimePrecision(double value) { return std::floor(value / timePrecision) * timePrecision; } int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd _ZL13timePrecision(%rip), %xmm0 callq floor@PLT mulsd _ZL13timePrecision(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq _ZL13timePrecision: .quad 0x3f50624dd2f1a9fc # double 0.001 .L.str: .asciz "%f" ``` while this program: ``` constexpr double highTimePrecision { 0.00002 }; constexpr double lowTimePrecision { 0.001 }; static volatile bool s_highPrecisionTimeIsAllowed { false }; __attribute__((noinline)) double reduceTimePrecision(double value) { if (s_highPrecisionTimeIsAllowed) return std::floor(value / highTimePrecision) * highTimePrecision; else return std::floor(value / lowTimePrecision) * lowTimePrecision;} int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3ef4f8b588e368f1 # double 2.0000000000000002E-5 .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax xorl %eax, %eax cmpb $0, _ZL28s_highPrecisionTimeIsAllowed(%rip) sete %al movsd .LCPI0_0(,%rax,8), %xmm1 # xmm1 = mem[0],zero movsd %xmm1, (%rsp) # 8-byte Spill divsd %xmm1, %xmm0 callq floor@PLT mulsd (%rsp), %xmm0 # 8-byte Folded Reload popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ```
Ryosuke Niwa
Comment 34 2021-09-01 13:57:46 PDT
Hm... weird that we're not using an immediate value for the double values. Some quick search tells me that this might be a limitation of the assembly language itself.
Ryosuke Niwa
Comment 35 2021-09-01 14:21:07 PDT
Comment on attachment 437048 [details] Patch Ok, sounds like this is the optimal approach after taking with Yusuke.
EWS
Comment 36 2021-09-01 14:35:07 PDT
Committed r281879 (241209@main): <https://commits.webkit.org/241209@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 437048 [details].
Note You need to log in before you can comment on or make changes to this bug.