Intermittent deadlock in webrtc/RTCDataChannel-close.html breaking tsan?
Categories
(Core :: WebRTC: Networking, defect, P2)
Tracking
()
People
(Reporter: bwc, Unassigned)
References
Details
This is different than what I've observed while working on bug 1635911:
[task 2021-08-31T23:32:14.266Z] 23:32:14 INFO - PID 1268 | [Child 1575: Main Thread]: D/DataChannel 7b2c00021160: Close()ing 7b3400034750
[task 2021-08-31T23:33:13.494Z] 23:33:13 INFO - PID 1268 | [Child 1575: Unnamed thread 7b440003bd80]: D/DataChannel In receive_cb, ulp_info=41
[task 2021-08-31T23:33:13.495Z] 23:33:13 INFO - PID 1268 | [Child 1575: Unnamed thread 7b440003bd80]: D/DataChannel In ReceiveCallback
[task 2021-08-31T23:35:21.451Z] 23:35:21 INFO - Got timeout in harness
[task 2021-08-31T23:35:21.454Z] 23:35:21 INFO - TEST-UNEXPECTED-TIMEOUT | /webrtc/RTCDataChannel-close.html | TestRunner hit external timeout (this may indicate a hang)
[task 2021-08-31T23:35:21.454Z] 23:35:21 INFO - TEST-INFO took 195004ms
That last log line is here:
We do not see the following logging, which means we're in the case where !!data:
From the logging, we are on an unnamed thread (in other words, we're getting a callback from libusrsctp), so we'll end up trying to lock here:
Right before that, we see the "Close()ing" line; this ends up locking the same mutex here:
https://searchfox.org/mozilla-central/source/netwerk/sctp/datachannel/DataChannel.cpp#2989
It looks like there might be cases where we call into libusrsctp while holding that lock, which could cause a lock-order-inversion problem, and also cause main to deadlock, which would explain why we stop seeing logging for the entire process. This is just a hypothesis, though.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 1•3 years ago
|
||
Maybe related to bug 1735972?
Comment 2•2 years ago
|
||
Are we still seeing this? Thinking this can be an S3 vs. S2 since it's not user facing.
Reporter | ||
Comment 3•2 years ago
|
||
I'm fairly sure that bug 1795697 fixes this.
Description
•