It doesn’t make any sense to keep time precision reduced if SAB is enabled for a page.
(I'm terrible with acronyms, sorry) what is SAB?
(In reply to Sam Weinig from comment #1) > (I'm terrible with acronyms, sorry) what is SAB? SharedArrayBuffer :)
(In reply to Yusuke Suzuki from comment #2) > (In reply to Sam Weinig from comment #1) > > (I'm terrible with acronyms, sorry) what is SAB? > > SharedArrayBuffer :) Ah! Thanks!.
<rdar://problem/81197138>
According to https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink code inspection, it looks like Chrome is using 100us precision in general and 5ns when cross-origin isolated. It looks like WebKit is using a much higher 1ms precision in general currently. I guess we need to decide what precision we want to use when cross-origin-isolated. Do we want to go back to 100us, like before r226495? It would still make us nowhere near Chrome in terms of precision.
(In reply to Chris Dumez from comment #5) > According to > https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink > code inspection, it looks like Chrome is using 100us precision in general > and 5ns when cross-origin isolated. > > It looks like WebKit is using a much higher 1ms precision in general > currently. > > I guess we need to decide what precision we want to use when > cross-origin-isolated. Do we want to go back to 100us, like before r226495? > It would still make us nowhere near Chrome in terms of precision. Looks like Gecko is using 1ms for general precision and 20us for cross-origin-isolated precision. Maybe we should at least match Gecko from cross-origin-isolated precision?
I'm not positive but I would guess that someone could get a statistically significant sub-us precision from SAB (due to scheduling on the timer thread). So once we have SAB it's not clear how much value there is around giving something greater than 1us timing. It's probably worth verifying my assumption though.
Matching Gecko makes sense. We need to come up with a strategy to mitigate malicious websites using timing attacks to read out history database. Full GPU process support & move of visited link color work done in GPU process will mitigate that to a large extent.
Created attachment 436668 [details] Patch
Created attachment 436669 [details] Patch
Comment on attachment 436669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436669&action=review > Source/WebCore/page/Performance.cpp:63 > +constexpr Seconds highTimePrecision { 20_us }; Does Mozilla allow 20microseconds without jitters or with jitters? If with jitters, since we don't have jitters, we may want to stick with 50microseconds. > Source/WebCore/page/Performance.cpp:97 > +void Performance::setShouldAllowHighPrecisionTime() Maybe we should call this allowHighPrecisionTime? > Source/WebCore/page/Performance.cpp:104 > - double resolution = (1000_us).seconds(); > + double resolution = shouldAllowHighPrecisionTime ? highTimePrecision.seconds() : regularTimePrecision.seconds(); It's not great that we'd have to do a runtime switch on every function call although the branch prediction helps. Can this also be static given we determine this at web process initialization time?
Comment on attachment 436669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436669&action=review >> Source/WebCore/page/Performance.cpp:63 >> +constexpr Seconds highTimePrecision { 20_us }; > > Does Mozilla allow 20microseconds without jitters or with jitters? > If with jitters, since we don't have jitters, we may want to stick with 50microseconds. I don't know what jitters means here. I am not sure what the answer is. The fact is though that Chrome is providing much more accuracy. Also, according to both Phil and Keith, SharedArrayBuffer already provides a much greater precision anyway.
Created attachment 437034 [details] Patch
(In reply to Chris Dumez from comment #5) > According to > https://developer.chrome.com/blog/cross-origin-isolated-hr-timers/ and blink > code inspection, it looks like Chrome is using 100us precision in general > and 5ns when cross-origin isolated. 5us not 5ns. > I don't know what jitters means here. I am not sure what the answer is. "jitters" here means adding a random value in an attempt to make timing attacks harder. That sounds expensive and not as helpful as you might think. (In reply to Ryosuke Niwa from comment #11) > Does Mozilla allow 20microseconds without jitters or with jitters? I think both.
(In reply to Alex Christensen from comment #14) > > (In reply to Ryosuke Niwa from comment #11) > > Does Mozilla allow 20microseconds without jitters or with jitters? > I think both. What do you mean by both? Either you have a jitter, not, right? This is about https://bugzilla.mozilla.org/show_bug.cgi?id=1425462
I think 20us is the high resolution precision when jitters are used and when they are not used. I'm referring to https://searchfox.org/mozilla-central/search?q=clampedAndJittered&redirect=false
(In reply to Alex Christensen from comment #16) > I think 20us is the high resolution precision when jitters are used and when > they are not used. I'm referring to > https://searchfox.org/mozilla-central/ > search?q=clampedAndJittered&redirect=false But is jitters enabled when COOP+COEP is on? If so, we may need to reduce our precision to 40-50us. My precious study concluded that having jitter is roughly equivalent to 2-5x precision loss.
It looks like Gecko only has jitters on when COOP+COEP is on.
(In reply to Alex Christensen from comment #18) > It looks like Gecko only has jitters on when COOP+COEP is on. Really? No jitters when isolation is off? That's contrary to https://bugzilla.mozilla.org/show_bug.cgi?id=1451790.
Comment on attachment 437034 [details] Patch I'm just looking at unconditionalClamping in nsRFPService::ReduceTimePrecisionImpl in searchfox. I think I actually got it backwards. I think they only jitter when coop/coep is off.
(In reply to Alex Christensen from comment #20) > Comment on attachment 437034 [details] > Patch > > I'm just looking at unconditionalClamping in > nsRFPService::ReduceTimePrecisionImpl in searchfox. I think I actually got > it backwards. I think they only jitter when coop/coep is off. Ok, that makes sense. So maybe it's okay for us to match Gekco's precision when COOP+COEP is turned on.
I think we should. 20us seems better
Created attachment 437048 [details] Patch
Comment on attachment 437048 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review > Source/WebCore/page/Performance.cpp:99 > - double resolution = (1000_us).seconds(); > + double resolution = timePrecision.seconds(); > double reduced = std::floor(seconds.seconds() / resolution) * resolution; I still don't like the fact we'd be doing memory dependent math here now. Can we instead if else two cases so that division is done with an immediate value (compile time constant)?
Comment on attachment 437048 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review > Source/WebCore/page/Performance.cpp:61 > +constexpr Seconds highTimePrecision { 20_us }; So: ``` constexpr auto highTimeResolution { 20_us.seconds() }; constexpr auto lowTimeResolution { 1_ms.seconds() }; static bool shouldUseHighTimeResolution = false; ``` >> Source/WebCore/page/Performance.cpp:99 >> double reduced = std::floor(seconds.seconds() / resolution) * resolution; > > I still don't like the fact we'd be doing memory dependent math here now. > Can we instead if else two cases so that division is done with an immediate value (compile time constant)? And then: ``` double reduced; if (shouldUseHighTimeResolution) reduced = std::floor(seconds.seconds() / highTimeResolution) * highTimeResolution; else reduced = std::floor(seconds.seconds() / lowTimeResolution) * lowTimeResolution; ``` Is that it? Looks a bit weird.
Comment on attachment 437048 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=437048&action=review > Source/WebCore/page/Performance.cpp:103 > +void Performance::allowHighPrecisionTime() We could also assert that shouldUseHighTimeResolution was false before this change.
More like this: template <Seconds precision> Seconds reduceTimeToPrecision(Seconds seconds) { double resolution = precision.seconds(); double reduced = std::floor(seconds.seconds() / resolution) * resolution; return Seconds(reduced); } Seconds Performance::reduceTimeResolution(Seconds seconds) { return s_highPrecisionTimeIsAllowed ? reduceTimeToPrecision<20_us>(reseconds) : reduceTimeToPrecision<1000_us>(reseconds); }
Is this a real difference? Is there a guarantee the compiler will generate more secure code if we do it this way?
(In reply to Darin Adler from comment #28) > Is this a real difference? Is there a guarantee the compiler will generate > more secure code if we do it this way? Oh, I'm not concerned about security. I'm rather concerned of the performance implication on relying on a non-local/stack memory value to do computation. Having a memory load dependent branch is a lot cheaper because of the branch predictor in CPU.
(In reply to Ryosuke Niwa from comment #27) > More like this: > > template <Seconds precision> Seconds reduceTimeToPrecision(Seconds seconds) > { > double resolution = precision.seconds(); > double reduced = std::floor(seconds.seconds() / resolution) * resolution; > return Seconds(reduced); > } I don't think we can use Seconds as template parameter until C++20, at least that's what the compiler tells me.
FYI, with clang -03, this program: ``` static double timePrecision { 0.001 }; __attribute__((noinline)) double reduceTimePrecision(double value) { return std::floor(value / timePrecision) * timePrecision; } int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ``` while this program: ``` constexpr double highTimePrecision { 0.00002 }; constexpr double lowTimePrecision { 0.001 }; static bool s_highPrecisionTimeIsAllowed { false }; __attribute__((noinline)) double reduceTimePrecision(double value) { if (s_highPrecisionTimeIsAllowed) return std::floor(value / highTimePrecision) * highTimePrecision; else return std::floor(value / lowTimePrecision) * lowTimePrecision;} int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq _GLOBAL__sub_I_example.cpp: # @_GLOBAL__sub_I_example.cpp pushq %rax movl $_ZStL8__ioinit, %edi callq _ZNSt8ios_base4InitC1Ev movl $_ZNSt8ios_base4InitD1Ev, %edi movl $_ZStL8__ioinit, %esi movl $__dso_handle, %edx popq %rax jmp __cxa_atexit # TAILCALL .L.str: .asciz "%f" ```
Sorry, bad copy / paste, trying again: FYI, with clang -03, this program: ``` static double timePrecision { 0.001 }; __attribute__((noinline)) double reduceTimePrecision(double value) { return std::floor(value / timePrecision) * timePrecision; } int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ``` while this program: ``` constexpr double highTimePrecision { 0.00002 }; constexpr double lowTimePrecision { 0.001 }; static bool s_highPrecisionTimeIsAllowed { false }; __attribute__((noinline)) double reduceTimePrecision(double value) { if (s_highPrecisionTimeIsAllowed) return std::floor(value / highTimePrecision) * highTimePrecision; else return std::floor(value / lowTimePrecision) * lowTimePrecision;} int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd .LCPI0_0(%rip), %xmm0 callq floor@PLT mulsd .LCPI0_0(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ```
Arg, the compiler is optimizing away some things I don't want it too. Adding volatile to help: FYI, with clang -03, this program: ``` static volatile double timePrecision { 0.001 }; __attribute__((noinline)) double reduceTimePrecision(double value) { return std::floor(value / timePrecision) * timePrecision; } int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax divsd _ZL13timePrecision(%rip), %xmm0 callq floor@PLT mulsd _ZL13timePrecision(%rip), %xmm0 popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq _ZL13timePrecision: .quad 0x3f50624dd2f1a9fc # double 0.001 .L.str: .asciz "%f" ``` while this program: ``` constexpr double highTimePrecision { 0.00002 }; constexpr double lowTimePrecision { 0.001 }; static volatile bool s_highPrecisionTimeIsAllowed { false }; __attribute__((noinline)) double reduceTimePrecision(double value) { if (s_highPrecisionTimeIsAllowed) return std::floor(value / highTimePrecision) * highTimePrecision; else return std::floor(value / lowTimePrecision) * lowTimePrecision;} int main() { printf("%f", reduceTimePrecision(0.234556475859)); return 0; } ``` gives ``` .LCPI0_0: .quad 0x3ef4f8b588e368f1 # double 2.0000000000000002E-5 .quad 0x3f50624dd2f1a9fc # double 0.001 _Z19reduceTimePrecisiond: # @_Z19reduceTimePrecisiond pushq %rax xorl %eax, %eax cmpb $0, _ZL28s_highPrecisionTimeIsAllowed(%rip) sete %al movsd .LCPI0_0(,%rax,8), %xmm1 # xmm1 = mem[0],zero movsd %xmm1, (%rsp) # 8-byte Spill divsd %xmm1, %xmm0 callq floor@PLT mulsd (%rsp), %xmm0 # 8-byte Folded Reload popq %rax retq .LCPI1_0: .quad 0x3fce05f2547090c9 # double 0.234556475859 main: # @main pushq %rax movsd .LCPI1_0(%rip), %xmm0 # xmm0 = mem[0],zero callq _Z19reduceTimePrecisiond movl $.L.str, %edi movb $1, %al callq printf xorl %eax, %eax popq %rcx retq .L.str: .asciz "%f" ```
Hm... weird that we're not using an immediate value for the double values. Some quick search tells me that this might be a limitation of the assembly language itself.
Comment on attachment 437048 [details] Patch Ok, sounds like this is the optimal approach after taking with Yusuke.
Committed r281879 (241209@main): <https://commits.webkit.org/241209@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 437048 [details].