resist-fingerprinting makes it hard to have smooth (60hz) animations
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
People
(Reporter: jgilbert, Assigned: tjr)
References
(Regressed 1 open bug)
Details
Attachments
(5 files)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details |
(deleted),
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details |
We know some of why this is (performance.now being truncated), but I'd like to know more about where we want the trade-offs to be, and where we are along that line.
This is a problem because people obviously want to resist fingerprinting, but they definitely don't realize it'll have all the adverse effects they run into.
What we don't want is people turning on RFP and then ditching Firefox because they end up with bad experiences around the web.
Reporter | ||
Comment 1•4 years ago
|
||
Raised most recently via https://github.com/aframevr/aframe/issues/4793
Reporter | ||
Comment 2•4 years ago
|
||
I'm thinking that RFP should truncate to the RAF interval, which would still be pretty good, without causing this class of issue.
RFP either already restricts RAF to 60hz, or it should be made to to prevent fingerprinting on RAF_count/second?
Assignee | ||
Comment 3•4 years ago
|
||
I think there's two questions here: What should we do for Firefox, and what should Tor do. Typically, we've taken the opinion that RFP exists for Tor, so RFP should do what Tor wants it to do. I'm not strictly opposed to doing something different in Firefox via some mechanism (e.g. through an additional pref Tor sets) but it seems like we should first figure out what Tor wants and ideally we could keep thing uniform.
Bug 1577243 is where we started applying reduced precision to the timestamp in rAF; Bug 1585589 is where we exempted the System Principal context. Bug 1662277 refactored this code to where it lives today.
As background. RFP is the acronym for Reduce Timer Precision and refers to the current-default mode of Firefox were we clamp+jitter to 1ms. RFP refers to Resist Fingerprinting. We have four types of Timer Clamping+Jittering:
- Absolutely none - this is used for browser internal stuff.
- Unconditional: this clamps to 20 ns and is in effect for (today) COOP/COEP documents and was always the case (pre-spectre) for performance.now. It is also used in Animation Timelines because it was hard to make RTP work there. No jitter. In the code this can be identified by RFPOnly mode.
- RTP - the default for Firefox: 1ms + jitter
- RFP - 100ms + jitter
My understanding from re-reading Bug Bug 1577243 is that the rAF timestamp wasn't originally being affected by any timer clamping not for RFP, not even to do the unconditional 20ns clamping. Animation Timelines however were being affected by Unconditional. To make Animation Timelines match up with rAF and resolve the test failures, we applied (unconditional) Timer Clamping to rAF which also make rAF do RFP clamping when RFP is enabled.
We're focusing on changing behavior for RFP mode here. We would want to keep Animation Timelines and rAF in sync, so if we change how rAF behaves we will also change how Animation Timelines behave.
We could switch Animation Timelines and rAF to Unconditional Clamping in RFP mode. (This would be a 'fifth' type of Timer Clamping+Jittering but what's one more?) I think the argument for this would be: rAF gets called on a regular interval, 16.666666666666667 ms (or 60 times a second) so even thought the timer in the function argument lies to you, you always know the time delta between function calls anyway. Additionally, rAF yields execution to the browser, so you can't set a global variable in an rAF callback and then immediately use it in another JS function and keep the precision.
I think the argument against this is that rAF only gets called on a regular 60 times a second when operating at high speed. It'll never go faster than that. It will go slower. So if rAF is called, hypothetically, 30 times a second because your computer is slow (either by hardware specs or because an attacker is slowing it down) the calls to rAF will get a high-resolution timer that is disclosing useful information. It seems feasible that an attacker could queue a lot of work in a rAF callback that pegs the system, and then the next rAF callback, look at the amount of work completed, and be able to glean useful information about the time deltas between the callbacks. I don't know if this attack would work. Jeff, what do you think about this concern?
Given that using timing attacks to glean information from a browser hasn't been seen in practice (I think?) in the wild, I am sympathetic to the argument that Tor's level of timer precision is hurting usability with minimal security gain. Maybe a revisit of the precision for all of RFP is in order, and we could get a big gain of usability by lowering it to 10ms. Or 20ms. or 50ms. (Whatever the highest value is that makes things work well?)
Reporter | ||
Comment 4•4 years ago
|
||
Fwiw bug 1666160 is "RFP can surprise users by being painful", which I see as one way to help this problem for most users affected by it. We'll keep the discussion here about "assuming a user wants the RFP tradeoffs, how can they be less painful".
In order for things to go mostly smoothly, I think we need our RAF target interval to be the basis for clamping, or at least related to it. I could see this being 16.67ms clamping, and mandating RAF operate at 60hz maximum under RFP. I think that's a reasonable tradeoff here. Thoughts?
(Note that even the clamping to 1ms for RTP would probably be smoother if it were 1000/60/16 = 1.041667 clamping, rather than one in three consecutive raf frames commonly having 1/16.67 = 6% variance from true presentation time)
It seems feasible that an attacker could queue a lot of work in a rAF callback that pegs the system, and then the next rAF callback, look at the amount of work completed, and be able to glean useful information about the time deltas between the callbacks. I don't know if this attack would work. Jeff, what do you think about this concern?
One of the mitigation options here is to only update "is it done" flags at frame boundaries, or some other acceptable/regular timer. (tjr:) Do we handle (via timer-gating?) this case for events to/from a worker?
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 5•4 years ago
|
||
(That said, I love that the truncation to 1.00ms makes all timing code have pretty round numbers usually!)
Assignee | ||
Comment 6•4 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #4)
In order for things to go mostly smoothly, I think we need our RAF target interval to be the basis for clamping, or at least related to it. I could see this being 16.67ms clamping, and mandating RAF operate at 60hz maximum under RFP. I think that's a reasonable tradeoff here. Thoughts?
16.67 ms seems like a good starting point for consideration (kind of a PITA to write tests for a non-terminating decimal...)
60hz maximum seems reasonable; is that something easy to implement?
It seems feasible that an attacker could queue a lot of work in a rAF callback that pegs the system, and then the next rAF callback, look at the amount of work completed, and be able to glean useful information about the time deltas between the callbacks. I don't know if this attack would work. Jeff, what do you think about this concern?
One of the mitigation options here is to only update "is it done" flags at frame boundaries, or some other acceptable/regular timer. (tjr:) Do we handle (via timer-gating?) this case for events to/from a worker?
If I understand your question, it's roughly "Do we do anything that affects the behavior of <anything> based on how we reduce time precision?" and the answer is not directly. An inability to make assertions about Animations working as before if RTP was applied to them was one example of deciding not to do this.
Fuzzyfox (whose code is still in-tree but will be ripped out at some point soonish) actually delays browser operations - it will sleep the main thread and other threads will await events firing from there; but our timer mitigations today typically munge a time value directly before it is exposed to JS.
In the past we had instances where we were storing munged values and it led to bugs - so I'm not aware of situations where we store a munged value and then base logic on it, but it's not out of the question. But in general, gating browser behavior based on timer clamping/jittering is probably hard, and probably why Fuzzyfox was designed the way it was. I could be wrong about all this, but that's my educated guesswork.
Reporter | ||
Comment 7•4 years ago
|
||
RAF rate is controlled via layout.frame_rate
, where -1 is 'use vsync' and e.g. 50 means 50Hz. RFP should set that to 60 if it doesn't already. (this will negatively affect e.g. 90hz displays) 30hz is another option that works fine for 90hz, but is a little chunky-feeling.
16.67 is indeed pretty annoying, but we're supposed to avoid magic numbers anyways, eh? :) const MS_60HZ= 1000 / 60;
I think about the best we can do is to use RAF interval as our time-atom. For example, it looks like CSS Animations (for example) only update once a tick, which is good.
Reporter | ||
Comment 8•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 9•4 years ago
|
||
FYI, I gave this a r+ so we can move it forward and experience it on Nightly. Tor Browser for Android is based off release (not ESR), but the old behavior can be reverted to in TB by specifying privacy.resistFingerprinting.reduceTimerPrecision.microseconds = 100000
. (Well, except for the locking-the-display-to-60Hz aspect of the patch, but that's an improvement to things anyway.)
Updated•4 years ago
|
Comment 10•4 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:jgilbert, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Assignee | ||
Comment 12•3 years ago
|
||
Assignee | ||
Comment 13•3 years ago
|
||
Depends on D122045
Assignee | ||
Comment 14•3 years ago
|
||
Jeff - I took a look at the test failures for this patch and think I resolved them. There were some failures on try but they seem unrelated and there's always a host of failures for a really full run.
https://treeherder.mozilla.org/jobs?repo=try&revision=5ed43b930879a7cc1344c35c9619313b0ebe0dde
https://treeherder.mozilla.org/push-health/push?revision=5ed43b930879a7cc1344c35c9619313b0ebe0dde&repo=try&testGroup=pr&selectedTest=dommediatesttestmediarecorderplaybackcanrepeathtml&selectedTaskId=®ressionsGroupBy=path&selectedJobName=dom%2Fmedia%2Ftest%2Ftest_mediarecorder_pause_resume_video.html+test-windows10-32-qr%2Fopt-mochitest-media-fis-gli-e10s-3
The most convincing failure is dom/media/test/test_mediarecorder_playback_can_repeat.html - there are several failures on mediarecorder on Windows so that might possible be something...
Reporter | ||
Comment 15•3 years ago
|
||
I worry that you're missing changes that I had made locally, but hadn't posted yet, but I'm happy to have to chip away at this.
Ping the assignee next time before taking over patches, please!
Reporter | ||
Comment 16•3 years ago
|
||
(I also would have preferred that you left the initial changes in my name, so we know who to blame :) )
Updated•3 years ago
|
Comment 17•3 years ago
|
||
Comment 18•3 years ago
|
||
Backed out for causing failures at test_getUserMedia_addTrackRemoveTrack.
Backout link: https://hg.mozilla.org/integration/autoland/rev/36974366b0fbc17ac7e2ee7e6a39ce6cead207fa
Failure log:
Assignee | ||
Comment 19•3 years ago
|
||
After a bunch of bisections on try; I eventually figured out that the test failures are due to this code:
+ Preferences::RegisterCallback(
+ FrameRatePrefChanged,
+ nsDependentCString(
+ StaticPrefs::GetPrefName_privacy_resistFingerprinting()));
I couldn't reproduce any of these failures locally so I'm not sure why this would be causing a problem. Jeff, would you have a guess? If not, you can pass back the NI and I'll do a bunch of MOZ_LOG debugging...
Comment 20•3 years ago
|
||
Hows does this fit in with Bug 1442863 ? Does it solve it?
at the very least animations should be 30fps (33ms) or preferably 60fps (17ms)
Comment 21•3 years ago
|
||
FYI, Bug 1589060 is another one that could benefit from 60fps (if you wanted that)
Reporter | ||
Comment 22•3 years ago
|
||
(In reply to Tom Ritter [:tjr] (OOTO until Aug 23rd) from comment #19)
After a bunch of bisections on try; I eventually figured out that the test failures are due to this code:
+ Preferences::RegisterCallback( + FrameRatePrefChanged, + nsDependentCString( + StaticPrefs::GetPrefName_privacy_resistFingerprinting()));
I couldn't reproduce any of these failures locally so I'm not sure why this would be causing a problem. Jeff, would you have a guess? If not, you can pass back the NI and I'll do a bunch of MOZ_LOG debugging...
Oh! We probably just need to dispatch to the main thread for FrameRatePrefChanged, that's probably our threading issue then.
Assignee | ||
Comment 23•3 years ago
|
||
Reporter | ||
Comment 27•3 years ago
|
||
Better: https://treeherder.mozilla.org/jobs?repo=try&revision=d5f3c3e70eeb71e69cb077fe26c3bd299ec66ec0
But not as green as reference: https://treeherder.mozilla.org/jobs?repo=try&revision=a41b91b1026ffb0ee6cf5ca56b589c54b6483457
Assignee | ||
Comment 28•3 years ago
|
||
Looked at this again rebasing and running your patches. Confirmed that the issue is the listening for the RFP pref change. Remove that and it's green. I wondered if moving back and forth from a framterate of 60 for RFP and -1 from layout.frame_rate was causing an issue, so I set layout.frame_rate to 55 but it didn't make a difference.
At this point I'm wondering what the consequence of not listening for the pref change would be? It wouldn't affect people who have RFP enabled all the time. I don't know how often users toggle RFP on and off in a single session (I do it occasionally myself for sure though...) but would the consequence be significant?
Comment 29•3 years ago
|
||
I don't know how often users toggle RFP on and off in a single session (I do it occasionally myself for sure though...) but would the consequence be significant?
Only per eTLD+1, where e.g. workers can retain the original state (even new instances), but the top level document doesn't. My understanding is that when we add site exceptions, it would behave much like Brave's Shields, where toggling the brave mode reloads the page (in our case it might be a message to say reload the page for changes to take affect). With multiple tabs from the same domain open, IDK what happens there (with Brave or what your thoughts are). If/when we give RFP a UI front, toggling it can also show the same message (reload web pages to take affect).
So I guess you really need to decide how RFP should work
Updated•3 years ago
|
Assignee | ||
Comment 30•3 years ago
|
||
I rebased and sent in a try run which doesn't seem to reproduce the intermittents anymore (possibly after the changes in Bug 1765399 ....)
https://treeherder.mozilla.org/jobs?repo=try&revision=4bfcc119125da296149cdf03ea3141f541cbe124
Reporter | ||
Comment 32•3 years ago
|
||
Let's land it then!
Comment 33•2 years ago
|
||
The severity field for this bug is relatively low, S3. However, the bug has 3 duplicates.
:tjr, could you consider increasing the bug severity?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 34•2 years ago
|
||
Okay; I've sent in a full try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b61436faa9fd28ac40a138ce2ef3feec185b7cd4 and if that looks good I'll post two new patches for review.
Assignee | ||
Comment 35•2 years ago
|
||
Push health has a bunch of failures, of course... these ones stood out to me:
gfx/layers/apz/test/mochitest/test_group_keyboard.html fails on all (I think?) Windows platforms - but this test has a bunch of different intermittent bugs: Bug 1761669, Bug 1771835, Bug 1771836, Bug 1659729....
browser/base/content/test/performance/browser_startup_flicker.js seems like a relevant test to this change; and it fails consistently on one flavor of platform: x64 opt devedition-qr swr fis - so it makes me think it might be okay if it's not failing on any other platform?
browser/base/content/test/performance/browser_vsync_accessibility.js also seems relevant - it fails on x64 opt linux qr swr fis only.
I'm going to post the patches and let you take a look at the results - let me know what you think we should do.
Assignee | ||
Comment 36•2 years ago
|
||
Assignee | ||
Comment 37•2 years ago
|
||
Depends on D148122
Comment 38•2 years ago
|
||
Comment 39•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ca7e6cc96539
https://hg.mozilla.org/mozilla-central/rev/25dd05e29a4b
Comment 40•2 years ago
|
||
tested: clearly a lot smoother: https://ciechanow.ski/mechanical-watch/ - I played them side by side in nightly vs beta with RFP
Assignee | ||
Comment 41•2 years ago
|
||
[Tracking Requested - why for this release]: It would be great if we could uplift this to 102 so Tor Browser can get it in the ESR without them needing to track the patch themselves.
Comment 42•2 years ago
|
||
Tom, could you make an uplift request? Thanks
Assignee | ||
Comment 43•2 years ago
|
||
Comment on attachment 9279416 [details]
Bug 1692609 - Force 60hz for RFP and use that as the time-atom r?jgilbert
Beta/Release Uplift Approval Request
- User impact if declined: Tor will have to carry an additional patchset
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Only affects Resist Fingerprinting Mode
- String changes made/needed:
- Is Android affected?: Yes
Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 44•2 years ago
|
||
Comment on attachment 9279416 [details]
Bug 1692609 - Force 60hz for RFP and use that as the time-atom r?jgilbert
Approved for 102 beta 6, thanks.
Comment 45•2 years ago
|
||
Comment on attachment 9279417 [details]
Bug 1692609 - Update tests for a higher-precision RFP r?jgilbert
Approved for 102 beta 6, thanks.
Comment 46•2 years ago
|
||
bugherder uplift |
Updated•2 years ago
|
Description
•