Crash in [@ RtlpWaitOnCriticalSection | RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | <unknown in nvencmfth264x.dll> | CSerialWorkQueue::QueueItem::ExecuteWorkItem]
Categories
(Core :: Audio/Video, defect, P3)
Tracking
()
People
(Reporter: RyanVM, Unassigned, NeedInfo)
Details
(Keywords: crash)
Crash Data
Maybe Fission related. (DOMFissionEnabled=1)
I see reports for this going back to September. Not sure how actionable this is on our end.
Crash report: https://crash-stats.mozilla.org/report/index/d297f57f-4230-4ec3-9d2d-389670220128
Reason: EXCEPTION_ACCESS_VIOLATION_WRITE
Top 10 frames of crashing thread:
0 ntdll.dll RtlpWaitOnCriticalSection
1 ntdll.dll RtlpEnterCriticalSectionContended
2 ntdll.dll RtlEnterCriticalSection
3 nvencmfth264x.dll <unknown in nvencmfth264x.dll>
4 nvencmfth264x.dll <unknown in nvencmfth264x.dll>
5 rtworkq.dll int CSerialWorkQueue::QueueItem::ExecuteWorkItem
6 rtworkq.dll virtual long CSerialWorkQueue::QueueItem::OnWorkItemAsyncCallback::Invoke
7 rtworkq.dll ThreadPoolWorkCallback
8 ntdll.dll TppWorkpExecuteCallback
9 ntdll.dll TppWorkerThread
Reporter | ||
Comment 2•3 years ago
|
||
This is spiking on release quite a bit. Can we find an owner for this?
Updated•3 years ago
|
Comment 3•3 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #1)
Is this possibly related to bug 1751964?
That's not in release yet.
Comment 4•3 years ago
|
||
We thought this might be tied to the webrtc update, but that shipped in 96. This is some sort of issue with an nvidia hardware encoding library that crashes in content. Unfortunately we don't have much to go on based on the stacks.
Reporter | ||
Comment 5•3 years ago
|
||
Looking at just the Nightly reports, it looks like the first ones started coming in back during the 94 Nightly cycle, but the big spike started in 96. And then it rode to release on 97. Does that line up with any feature work that comes to mind? Note that this is currently the #10 top content process crash.
Updated•3 years ago
|
Comment 6•3 years ago
|
||
So then this points to the webrtc update we did in 96.
FYI in 99, we disabled hardware encoding for webrtc due to win32k lockdown, so this should fall off then. Eventually we'll move that to the RDD, so it might show back up there.
Will triage with the webrtc team. Note though the stacks here don't tell us much. Might be worth my posting about this to the nvidia list as well.
Updated•3 years ago
|
Comment 7•3 years ago
|
||
- hardware encoding in content was disabled in 100. we're working on moving this to the rdd. so this should go away
over time. - we have the ability to block specific hardware encoders, and could consider that here if it comes back in the rdd.
Comment 8•3 years ago
|
||
I've put together a prototype of a tool that filters out sensitive information from minidumps: https://github.com/jrmuizel/minidump-filter
We should be able to use it to send a minidump to Nvidia so that they can perhaps help us out.
Updated•3 years ago
|
Comment 9•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
:jimm, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 10•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 11•2 years ago
|
||
FYI dueling bots here. When the top crash setting was removed by the second bot it should have also removed the ni for triage.
Comment 12•2 years ago
|
||
Good point, filed https://github.com/mozilla/relman-auto-nag/issues/1684.
Hopefully this is not happening too frequently, as usually topcrashes stay topcrashes for longer than a week!
Comment 13•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
:jimm, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 14•2 years ago
|
||
For some reason this crash is oscillating between topcrash status and not topcrash status, we should increase the delay between the last keyword change by the bot and the next one to reduce the noise (or increase the threshold so we add it to topcrash if it's top 10, but not remove it until it becomes lower than, say, top 20).
Comment 15•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Description
•