Open Bug 1754511 Opened 3 years ago Updated 1 year ago

Crash in [@ RtlpWaitOnCriticalSection | RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | <unknown in nvencmfth264x.dll> | CSerialWorkQueue::QueueItem::ExecuteWorkItem]

Categories

(Core :: Audio/Video, defect, P3)

Unspecified
Windows
defect

Tracking

()

People

(Reporter: RyanVM, Unassigned, NeedInfo)

Details

(Keywords: crash)

Crash Data

Maybe Fission related. (DOMFissionEnabled=1)

I see reports for this going back to September. Not sure how actionable this is on our end.

Crash report: https://crash-stats.mozilla.org/report/index/d297f57f-4230-4ec3-9d2d-389670220128

Reason: EXCEPTION_ACCESS_VIOLATION_WRITE

Top 10 frames of crashing thread:

0 ntdll.dll RtlpWaitOnCriticalSection 
1 ntdll.dll RtlpEnterCriticalSectionContended 
2 ntdll.dll RtlEnterCriticalSection 
3 nvencmfth264x.dll <unknown in nvencmfth264x.dll> 
4 nvencmfth264x.dll <unknown in nvencmfth264x.dll> 
5 rtworkq.dll int CSerialWorkQueue::QueueItem::ExecuteWorkItem 
6 rtworkq.dll virtual long CSerialWorkQueue::QueueItem::OnWorkItemAsyncCallback::Invoke 
7 rtworkq.dll ThreadPoolWorkCallback 
8 ntdll.dll TppWorkpExecuteCallback 
9 ntdll.dll TppWorkerThread 

Is this possibly related to bug 1751964?

Flags: needinfo?(jolin)

This is spiking on release quite a bit. Can we find an owner for this?

Flags: needinfo?(jmathies)
Blocks: media-triage
Flags: needinfo?(jmathies)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #1)

Is this possibly related to bug 1751964?

That's not in release yet.

We thought this might be tied to the webrtc update, but that shipped in 96. This is some sort of issue with an nvidia hardware encoding library that crashes in content. Unfortunately we don't have much to go on based on the stacks.

Flags: needinfo?(jolin)

Looking at just the Nightly reports, it looks like the first ones started coming in back during the 94 Nightly cycle, but the big spike started in 96. And then it rode to release on 97. Does that line up with any feature work that comes to mind? Note that this is currently the #10 top content process crash.

Blocks: webrtc-triage
No longer blocks: media-triage

So then this points to the webrtc update we did in 96.

FYI in 99, we disabled hardware encoding for webrtc due to win32k lockdown, so this should fall off then. Eventually we'll move that to the RDD, so it might show back up there.

Will triage with the webrtc team. Note though the stacks here don't tell us much. Might be worth my posting about this to the nvidia list as well.

Blocks: media-triage
No longer blocks: webrtc-triage
  • hardware encoding in content was disabled in 100. we're working on moving this to the rdd. so this should go away
    over time.
  • we have the ability to block specific hardware encoders, and could consider that here if it comes back in the rdd.
No longer blocks: media-triage
Severity: S2 → S4
Priority: -- → P3

I've put together a prototype of a tool that filters out sensitive information from minidumps: https://github.com/jrmuizel/minidump-filter

We should be able to use it to send a minidump to Nvidia so that they can perhaps help us out.

Flags: needinfo?(jmuizelaar)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 content process crashes on beta

:jimm, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)
Keywords: topcrash

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

FYI dueling bots here. When the top crash setting was removed by the second bot it should have also removed the ni for triage.

Flags: needinfo?(jmathies) → needinfo?(mcastelluccio)

Good point, filed https://github.com/mozilla/relman-auto-nag/issues/1684.
Hopefully this is not happening too frequently, as usually topcrashes stay topcrashes for longer than a week!

Flags: needinfo?(mcastelluccio)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 content process crashes on beta

:jimm, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)
Keywords: topcrash

For some reason this crash is oscillating between topcrash status and not topcrash status, we should increase the delay between the last keyword change by the bot and the next one to reduce the noise (or increase the threshold so we add it to topcrash if it's top 10, but not remove it until it becomes lower than, say, top 20).

Flags: needinfo?(jmathies)

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash
You need to log in before you can comment on or make changes to this bug.