Closed
Bug 1411977
Opened 7 years ago
Closed 7 years ago
RUN_ON_THREAD() should not "queue jump" when dispatching to same-thread
Categories
(Core :: WebRTC: Networking, enhancement, P2)
Tracking
()
RESOLVED
FIXED
mozilla60
People
(Reporter: jesup, Assigned: bwc)
References
(Depends on 3 open bugs)
Details
Attachments
(7 files, 2 obsolete files)
(deleted),
patch
|
drno
:
review+
|
Details | Diff | Splinter Review |
(deleted),
text/x-review-board-request
|
drno
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
drno
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
drno
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
drno
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
drno
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
drno
:
review+
|
Details |
RUN_ON_THREAD gained a same-thread "queue jump" property due to bug 824851, which landed in dec 2012. We short-circuit when dispatching to the same thread (which 'hops' to the front of the event queue, basically). That "solved" the bug at the time, but was the wrong fix - the real bug there was "RUN_ON_THREAD(..., WrapRunnableRet(this,...));" -- 'this' didn't hold a ref to the pipeline, so no surprise it crashed in some cases. We avoid that now, and should explicitly document the lifetime guarantee for calls passing a bare 'this' to WrapRunnable.
The unexpected queue-jumping has led to a number of bits of code that are fragile or having timing issues, depending on if there's a related runnable already in the queue or not (that'll get jumped). Disabling queue-jumping has exposed a number of bugs or hidden assumptions, like that a runnable will run and take a ref before the caller can drop it's ref, or that a particular state will be exposed to JS. Also, we know that with e10s, negotiation of peerconnections and connection of ICE fails when we remove this.
We should remove queue-jumping-by-default, and if there is a need for some calling code to queue-jump, it should explicitly request that (via a different call or an optional parameter). Need for queue jumping should be clearly exposed in the calling source code and justified.
The belief is that removing this and cleaning up the accidental dependencies may cut into our unexplained-timeouts in tests, and will reduce the chance of introducing new timing bugs.
This will be the master bug for removing queue-jumping, and will have a number of blocking bugs. I expect it will make sense to land it ASAP at the start of 59 Nightly.
Reporter | ||
Comment 1•7 years ago
|
||
Attachment #8922337 -
Flags: review?(drno)
Comment 2•7 years ago
|
||
Comment on attachment 8922337 [details] [diff] [review]
don't bypass event queue in RUN_ON_THREAD
Review of attachment 8922337 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM
Attachment #8922337 -
Flags: review?(drno) → review+
Comment hidden (mozreview-request) |
Updated•7 years ago
|
Attachment #8924014 -
Attachment is obsolete: true
Attachment #8924014 -
Flags: review?(rjesup)
Updated•7 years ago
|
Attachment #8924014 -
Attachment is obsolete: false
Updated•7 years ago
|
Attachment #8924014 -
Attachment is obsolete: true
Reporter | ||
Comment 4•7 years ago
|
||
mozreview-review |
Comment on attachment 8924014 [details]
Bug 1411977: dispatch inside SingletonThreadHolder only if needed.
https://reviewboard.mozilla.org/r/195222/#review200288
r+, though it doesn't seem to give me the option..
Reporter | ||
Comment 5•7 years ago
|
||
reviewed on the right bug...
Assignee | ||
Comment 6•7 years ago
|
||
Assignee | ||
Comment 7•7 years ago
|
||
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → docfaraday
Comment 14•7 years ago
|
||
mozreview-review |
Comment on attachment 8945583 [details]
Bug 1411977 - Part 1: Stop queue jumping in RUN_ON_THREAD.
https://reviewboard.mozilla.org/r/215724/#review221492
Attachment #8945583 -
Flags: review?(drno) → review+
Comment 15•7 years ago
|
||
mozreview-review |
Comment on attachment 8945584 [details]
Bug 1411977 - Part 2: Stop using sync dispatch and queue jumping with SingletonThreadHolder.
https://reviewboard.mozilla.org/r/215726/#review221506
LGMT
::: media/mtransport/nr_socket_prsock.cpp:1134
(Diff revision 1)
> - // also guarantees socket_child_ is released from the io_thread, and
> - // tells the SingletonThreadHolder we're done with it
> -
> #if defined(MOZILLA_INTERNAL_API)
> // close(), but transfer the socket_child_ reference to die as well
> + // dispatches back to STS to call ReleaseUse, to avoid shutting down the IO
The comment threw me off for a little bit as it clearly does not dispatch to the STS thread right here. Maybe change to something like "destroy_i() dispatches internally back to STS...".
Attachment #8945584 -
Flags: review?(drno) → review+
Comment 16•7 years ago
|
||
mozreview-review |
Comment on attachment 8945585 [details]
Bug 1411977 - Part 3: Clear the SingletonThreadHolder _after_ thread shutdowns are finished, not before they are started.
https://reviewboard.mozilla.org/r/215728/#review221510
Attachment #8945585 -
Flags: review?(drno) → review+
Comment 17•7 years ago
|
||
mozreview-review |
Comment on attachment 8945586 [details]
Bug 1411977 - Part 4: Only try to dispatch the release of TransportLayers when there is a target thread.
https://reviewboard.mozilla.org/r/215730/#review221512
Attachment #8945586 -
Flags: review?(drno) → review+
Comment 18•7 years ago
|
||
mozreview-review |
Comment on attachment 8945587 [details]
Bug 1411977 - Part 5: Don't pass a pointer to a temporary to NotifyDataChannel_m.
https://reviewboard.mozilla.org/r/215732/#review221528
Attachment #8945587 -
Flags: review?(drno) → review+
Comment 19•7 years ago
|
||
mozreview-review |
Comment on attachment 8945588 [details]
Bug 1411977 - Part 6: Don't unwind the stack when firing onsignalingstatechange.
https://reviewboard.mozilla.org/r/215734/#review221568
::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:3217
(Diff revision 1)
> if (!pco) {
> return;
> }
> +
> WrappableJSErrorResult rv;
> - RUN_ON_THREAD(mThread,
> + pco->OnStateChange(PCObserverStateType::IceConnectionState, rv);
A couple of lines below in PeerConnectionImpl::IceGatheringStateChange() we call pco->OnStateChange() only via mThread->Dispatch() (to avoid queue jumping no longer RUN_ON_THREAD).
Which of the two is right? Can we make them to them same thing either way please?
Attachment #8945588 -
Flags: review?(drno) → review+
Assignee | ||
Comment 20•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8945588 [details]
Bug 1411977 - Part 6: Don't unwind the stack when firing onsignalingstatechange.
https://reviewboard.mozilla.org/r/215734/#review221568
> A couple of lines below in PeerConnectionImpl::IceGatheringStateChange() we call pco->OnStateChange() only via mThread->Dispatch() (to avoid queue jumping no longer RUN_ON_THREAD).
>
> Which of the two is right? Can we make them to them same thing either way please?
Maybe we can do this. There's some convoluted logic around making sure that gathering state changes, events for trickle candidates, and callbacks for SLD success happen in the proper order here. Maybe transceivers will let us untangle some of it?
Comment hidden (mozreview-request) |
Comment 22•7 years ago
|
||
mozreview-review |
Comment on attachment 8946310 [details]
Bug 1411977 - Part 7: Stop using extra dispatches for ICE candidates and gathering state changes.
https://reviewboard.mozilla.org/r/216280/#review222142
::: dom/media/PeerConnection.js:382
(Diff revision 1)
> // canTrickle == null means unknown; when a remote description is received it
> // is set to true or false based on the presence of the "trickle" ice-option
> this._canTrickle = null;
>
> // States
> - this._iceGatheringState = this._iceConnectionState = "new";
> + this._iceConnectionState = "new";
What is the reason to delete the iceGatheringState, but not the iceConnectionState?
::: dom/media/PeerConnection.js:1798
(Diff revision 1)
> //
> // complete The ICE agent has completed gathering. Events such as adding
> // a new interface or a new TURN server will cause the state to
> // go back to gathering.
> //
> - handleIceGatheringStateChange(gatheringState) {
> + handleIceGatheringStateChange() {
Lets delete this function all together and call directly notifyIceGatheringStateChange() instead.
::: dom/media/PeerConnection.js:1813
(Diff revision 1)
> case "IceConnectionState":
> this.handleIceConnectionStateChange(this._dompc._pc.iceConnectionState);
> break;
>
> case "IceGatheringState":
> - this.handleIceGatheringStateChange(this._dompc._pc.iceGatheringState);
> + this.handleIceGatheringStateChange();
I think this could now directly call this._dompc.notifyIceGatheringStateChange().
::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:3239
(Diff revision 1)
> - NS_DISPATCH_NORMAL);
>
> if (mIceGatheringState == PCImplIceGatheringState::Complete) {
> SendLocalIceCandidateToContent(0, "", "");
> }
> +
According to spec https://www.w3.org/TR/webrtc/#update-the-ice-gathering-state we are suppose to first fire the event and then the empty candidate. Let's make that change here while we are at it.
Attachment #8946310 -
Flags: review?(drno) → review+
Assignee | ||
Comment 23•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8946310 [details]
Bug 1411977 - Part 7: Stop using extra dispatches for ICE candidates and gathering state changes.
https://reviewboard.mozilla.org/r/216280/#review222142
> What is the reason to delete the iceGatheringState, but not the iceConnectionState?
Scope creep. ;)
> According to spec https://www.w3.org/TR/webrtc/#update-the-ice-gathering-state we are suppose to first fire the event and then the empty candidate. Let's make that change here while we are at it.
This makes our test-cases sad. Let's drop part 7, and instead do this work in a new bug.
Assignee | ||
Updated•7 years ago
|
Attachment #8946310 -
Attachment is obsolete: true
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 29•7 years ago
|
||
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4480819ddb2b
Part 1: Stop queue jumping in RUN_ON_THREAD. r=drno
https://hg.mozilla.org/integration/autoland/rev/d6c47acf39b0
Part 2: Stop using sync dispatch and queue jumping with SingletonThreadHolder. r=drno
https://hg.mozilla.org/integration/autoland/rev/481ba7607ae2
Part 3: Clear the SingletonThreadHolder _after_ thread shutdowns are finished, not before they are started. r=drno
https://hg.mozilla.org/integration/autoland/rev/2a59e9ad8ba0
Part 4: Only try to dispatch the release of TransportLayers when there is a target thread. r=drno
https://hg.mozilla.org/integration/autoland/rev/ef56214b5e2c
Part 5: Don't pass a pointer to a temporary to NotifyDataChannel_m. r=drno
https://hg.mozilla.org/integration/autoland/rev/d0cd0ad9ca5d
Part 6: Don't unwind the stack when firing onsignalingstatechange. r=drno
Comment 30•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4480819ddb2b
https://hg.mozilla.org/mozilla-central/rev/d6c47acf39b0
https://hg.mozilla.org/mozilla-central/rev/481ba7607ae2
https://hg.mozilla.org/mozilla-central/rev/2a59e9ad8ba0
https://hg.mozilla.org/mozilla-central/rev/ef56214b5e2c
https://hg.mozilla.org/mozilla-central/rev/d0cd0ad9ca5d
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox60:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in
before you can comment on or make changes to this bug.
Description
•