Closed Bug 1601992 Opened 5 years ago Closed 2 years ago

Crash in [@ shutdownhang | mdns_service::mdns_service_stop]

Categories

(Core :: WebRTC: Networking, defect, P2)

72 Branch
All
Windows
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr68 --- unaffected
firefox71 --- unaffected
firefox72 --- wontfix
firefox73 --- fix-optional
firefox74 --- fix-optional

People

(Reporter: philipp, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-f004d1c9-b479-4915-8bec-bb0c30191206.

Top 10 frames of crashing thread:

0 ntdll.dll KiFastSystemCallRet 
1 ntdll.dll ZwWaitForSingleObject 
2 kernelbase.dll WaitForSingleObjectEx 
3 kernel32.dll WaitForSingleObjectExImplementation 
4 kernel32.dll WaitForSingleObject 
5 xul.dll void mdns_service::mdns_service_stop media/mtransport/mdns_service/src/lib.rs:554
6 xul.dll void mozilla::net::StunAddrsRequestParent::MDNSServiceWrapper::~MDNSServiceWrapper media/mtransport/ipc/StunAddrsRequestParent.cpp:247
7 xul.dll unsigned long mozilla::net::StunAddrsRequestParent::MDNSServiceWrapper::Release media/mtransport/ipc/StunAddrsRequestParent.cpp:261
8 xul.dll void mozilla::net::StunAddrsRequestParent::ActorDestroy media/mtransport/ipc/StunAddrsRequestParent.cpp:141
9 xul.dll mozilla::ipc::IProtocol::DestroySubtree ipc/glue/ProtocolUtils.cpp:572

these shutdownhang crashes on windows are starting to show up since firefox 72. the first affected nightly was 72.0a1 build 20191107094905.

Assignee: nobody → dminor
Blocks: 1544770
Priority: -- → P2

We don't currently set a write timeout on the udp socket which could cause
write calls to block indefinitely. It is possible that this is blocking
long enough to cause the shutdown hangs seen in Bug 1601992.

This also bumps the number of times we retry failed queries from 2 to 3 to
account for the increased likelihood of not sending a query or answer.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73

Is this worth uplifting to 72?

Flags: needinfo?(dminor)

Comment on attachment 9114850 [details]
Bug 1601992 - Set write timeout on udp socket; r=ng!

Beta/Release Uplift Approval Request

  • User impact if declined: Shutdown hangs / crashes.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This sets a write timeout on the socket to prevent it from blocking indefinitely. This could cause more sends to fail, but that should be ok because we already have provision for timing out and retrying requests.
  • String changes made/needed: None
Flags: needinfo?(dminor)
Attachment #9114850 - Flags: approval-mozilla-beta?

Comment on attachment 9114850 [details]
Bug 1601992 - Set write timeout on udp socket; r=ng!

try and fix some shutdownhangs, approved for 72.0b7

Attachment #9114850 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Looks like we're still seeing this hang, we'll have to see if this at least reduced the frequency.

So, doing what I should have done in the first place and looking at the mdns_service thread in a few of the crash reports, in most cases the hang is occurring on the Rust side when we try to drop the Receiver [1] when the thread exits here [2].

There are a few where the hang actually shows up while waiting on a read from the UDP socket [3]. The timeout there is set to 10ms, so it seems unlikely that is pushing us over the 60 second grace period allowed for shutdown.

The fix in Bug 1603349 might change the behaviour here, so I think I'll try to get that uplifted before I investigate this one further.

[1] https://github.com/rust-lang/rust/blob/625451e376bb2e5283fc4741caa0a3e8a2ca4d54/src/libstd/sync/mpsc/mod.rs#L1551
[2] https://hg.mozilla.org/releases/mozilla-beta/annotate/cd6d215cb8fa8d2aba4c4a105a0df668a22965cc/media/mtransport/mdns_service/src/lib.rs#l490
[3] https://hg.mozilla.org/releases/mozilla-beta/annotate/cd6d215cb8fa8d2aba4c4a105a0df668a22965cc/media/mtransport/mdns_service/src/lib.rs#l382

Reopening as this is still happening.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

With builds with the fix in Bug 1603349 landed (beta 9 and 10), I'm only seeing hangs with the UDP socket read, although it's possible we're seeing lower volume of usage due to the holiday season that might be throwing things off here.

Depends on: 1605862
Status: REOPENED → NEW
Target Milestone: mozilla73 → ---
Assignee: dminor → nobody
Severity: normal → S3
Severity: normal → S3

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 5 years ago2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: