Crash in RtlAcquireSRWLockExclusive | mozilla::detail::MutexImpl::lock | mozilla::gfx::VsyncSource::Display::NotifyVsync
Categories
(Core :: Graphics, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox65 | --- | unaffected |
firefox66 | + | fixed |
firefox67 | --- | fixed |
People
(Reporter: philipp, Assigned: kats)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details |
This bug is for crash report bp-a1b74f48-d272-42a8-b228-991990190130.
Top 10 frames of crashing thread:
0 ntdll.dll RtlAcquireSRWLockExclusive
1 mozglue.dll mozilla::detail::MutexImpl::lock mozglue/misc/Mutex_windows.cpp:22
2 xul.dll mozilla::gfx::VsyncSource::Display::NotifyVsync gfx/thebes/VsyncSource.cpp:69
3 xul.dll void D3DVsyncSource::D3DVsyncDisplay::VBlankLoop gfx/thebes/gfxWindowsPlatform.cpp:1880
4 xul.dll nsresult mozilla::detail::RunnableMethodImpl< xpcom/threads/nsThreadUtils.h:1171
5 xul.dll bool MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:450
6 xul.dll MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:523
7 xul.dll base::MessagePumpDefault::Run ipc/chromium/src/base/message_pump_default.cc:35
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
9 xul.dll base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:192
these browser crash signatures are regressing (at least in volume) in firefox 66. there isn't too much data for this yet since 66 started into the beta cycle just yesterday but one crash comment mentioned it took place while livestreaming on a video site.
Tracking for 66 to keep an eye on it, since there are significant crashes in beta 3.
Comment 2•6 years ago
|
||
Assigning to myself to keep an eye on it
Comment 3•6 years ago
|
||
It looks as if all of these crashes happened during startup, to there are no comments or URLs.
Is there anyone who can investigate further? This is a bit worrying for 66 release.
Comment 5•6 years ago
|
||
(In reply to Liz Henry (:lizzard) (use needinfo) from comment #4)
Is there anyone who can investigate further? This is a bit worrying for 66 release.
Will figure out who could look at this
Updated•6 years ago
|
Comment 6•6 years ago
|
||
So it looks like we're passing in garbage to the mutex lock in RefreshTimerVsyncDispatcher::NotifyVsync(). I guess something is not initialized here.
Assignee | ||
Comment 7•6 years ago
|
||
Bug 1503339 touched some vsync code in 66, might be related.
Comment 8•6 years ago
|
||
Most likely because mVsyncDispatcher is null
Updated•6 years ago
|
Comment 9•6 years ago
|
||
This was all for stuff that's now disabled on 66 so if necessary we could just back things out from 66?
Assignee | ||
Comment 10•6 years ago
|
||
So I think one possibility here is if thread A calls MoveListenersToNewSource and acquires the lock, and then thread B calls NotifyVsync and blocks on the same lock and then after thread A releases the lock, thread B runs the function while mRefreshTimerVsyncDispatcher is nullptr. It's calling a nonvirtual function so presumably it could result in the crash seen here.
Assignee | ||
Comment 11•6 years ago
|
||
The data I'm seeing is consistent with my theory in comment 10. Specifically:
-
the codepath in question is invoked via ReinitFrameRate() which should generally run only on startup. The crashes are mostly startup crashes.
-
ReinitFrameRate should only run if we're deciding to flip the frame rate from default to 30fps or vice-versa, which in turn should only happen for beta users if they update from a 65 beta into an early 66 beta (in which case we'd go from default -> 30fps) or from an early 66 beta to a late 66 beta (in which case we'd go from 30fps back to default). The 1:1 install:crash ratio supports this, as do the number of crashes steadily declining from b3 to b6 (since users are progressively less likely to update from 65 beta to a higher-numbered 66 beta).
Assignee | ||
Comment 12•6 years ago
|
||
I think a null check should be all that's needed to fix this. Building it locally now.
Assignee | ||
Comment 13•6 years ago
|
||
Comment 14•6 years ago
|
||
Comment 15•6 years ago
|
||
bugherder |
Assignee | ||
Comment 16•6 years ago
|
||
Comment on attachment 9043710 [details]
Bug 1523956 - Add a null check against mRefreshTimerVsyncDispatcher getting nulled concurrently. r?Gijs
Beta/Release Uplift Approval Request
Feature/Bug causing the regression
User impact if declined
Possible crash due to race condition on low-end devices when we switch framerates
Is this code covered by automated tests?
No
Has the fix been verified in Nightly?
No
Needs manual test from QE?
No
If yes, steps to reproduce
List of other uplifts needed
none
Risk to taking this patch
Low
Why is the change risky/not risky? (and alternatives if risky)
Low risk fix, null pointer check. Main risk is that it doesn't actually fix the crash in which case we're no worse off than before.
String changes made/needed
none
Comment 17•6 years ago
|
||
The crashes seem to have gone away after unsetting EARLY_BETA_OR_EARLIER (and thus, presumably, turning off performance.adjust_to_machine. Do we need this on 66 anyway?
Assignee | ||
Comment 18•6 years ago
|
||
It's safer to put it in, in case somebody fiddles with the prefs manually. But could also skip it if you're uneasy about it.
Comment 19•6 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #18)
It's safer to put it in, in case somebody fiddles with the prefs manually.
Yes. Similar code will still run if users manually alter layout.frame_rate in about:config .
Comment 20•6 years ago
|
||
Comment on attachment 9043710 [details]
Bug 1523956 - Add a null check against mRefreshTimerVsyncDispatcher getting nulled concurrently. r?Gijs
add a null check to prevent crash, approved for 66.0b10
Comment 21•6 years ago
|
||
bugherder uplift |
Description
•