Closed Bug 1411977 Opened 7 years ago Closed 7 years ago

RUN_ON_THREAD() should not "queue jump" when dispatching to same-thread

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla60

Tracking Flags:

Tracking

Status

firefox-esr52

---

wontfix

firefox56

---

wontfix

firefox57

---

wontfix

firefox58

---

wontfix

firefox60

---

fixed

People

(Reporter: jesup, Assigned: bwc)

References

(Depends on 3 open bugs)

Details

Attachments

(7 files, 2 obsolete files)

don't bypass event queue in RUN_ON_THREAD 7 years ago Randell Jesup [:jesup] (needinfo me) (deleted), patch	drno : review+	Details \| Diff \| Splinter Review
Bug 1411977: dispatch inside SingletonThreadHolder only if needed. 7 years ago Nils Ohlmeier [:drno] (deleted), text/x-review-board-request		Details
Bug 1411977 - Part 1: Stop queue jumping in RUN_ON_THREAD. 7 years ago Byron Campen [:bwc] (deleted), text/x-review-board-request	drno : review+	Details
Bug 1411977 - Part 2: Stop using sync dispatch and queue jumping with SingletonThreadHolder. 7 years ago Byron Campen [:bwc] (deleted), text/x-review-board-request	drno : review+	Details
Bug 1411977 - Part 3: Clear the SingletonThreadHolder _after_ thread shutdowns are finished, not before they are started. 7 years ago Byron Campen [:bwc] (deleted), text/x-review-board-request	drno : review+	Details
Bug 1411977 - Part 4: Only try to dispatch the release of TransportLayers when there is a target thread. 7 years ago Byron Campen [:bwc] (deleted), text/x-review-board-request	drno : review+	Details
Bug 1411977 - Part 5: Don't pass a pointer to a temporary to NotifyDataChannel_m. 7 years ago Byron Campen [:bwc] (deleted), text/x-review-board-request	drno : review+	Details
Bug 1411977 - Part 6: Don't unwind the stack when firing onsignalingstatechange. 7 years ago Byron Campen [:bwc] (deleted), text/x-review-board-request	drno : review+	Details
Bug 1411977 - Part 7: Stop using extra dispatches for ICE candidates and gathering state changes. 7 years ago Byron Campen [:bwc] (deleted), text/x-review-board-request	drno : review+	Details

Randell Jesup [:jesup] (needinfo me)

Reporter

Description

•

7 years ago

RUN_ON_THREAD gained a same-thread "queue jump" property due to bug 824851, which landed in dec 2012. We short-circuit when dispatching to the same thread (which 'hops' to the front of the event queue, basically). That "solved" the bug at the time, but was the wrong fix - the real bug there was "RUN_ON_THREAD(..., WrapRunnableRet(this,...));" -- 'this' didn't hold a ref to the pipeline, so no surprise it crashed in some cases. We avoid that now, and should explicitly document the lifetime guarantee for calls passing a bare 'this' to WrapRunnable. The unexpected queue-jumping has led to a number of bits of code that are fragile or having timing issues, depending on if there's a related runnable already in the queue or not (that'll get jumped). Disabling queue-jumping has exposed a number of bugs or hidden assumptions, like that a runnable will run and take a ref before the caller can drop it's ref, or that a particular state will be exposed to JS. Also, we know that with e10s, negotiation of peerconnections and connection of ICE fails when we remove this. We should remove queue-jumping-by-default, and if there is a need for some calling code to queue-jump, it should explicitly request that (via a different call or an optional parameter). Need for queue jumping should be clearly exposed in the calling source code and justified. The belief is that removing this and cleaning up the accidental dependencies may cut into our unexplained-timeouts in tests, and will reduce the chance of introducing new timing bugs. This will be the master bug for removing queue-jumping, and will have a number of blocking bugs. I expect it will make sense to land it ASAP at the start of 59 Nightly.

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 1

•

7 years ago

Attached patch don't bypass event queue in RUN_ON_THREAD (deleted) — Details — Splinter Review

Attachment #8922337 - Flags: review?(drno)

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

7 years ago

Depends on: 1411982

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Depends on: 1412122

Nils Ohlmeier [:drno]

Comment 2

•

7 years ago

Comment on attachment 8922337 [details] [diff] [review] don't bypass event queue in RUN_ON_THREAD Review of attachment 8922337 [details] [diff] [review]: ----------------------------------------------------------------- LGTM

Attachment #8922337 - Flags: review?(drno) → review+

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Depends on: 1412123

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Depends on: 1413356

Comment hidden (mozreview-request)

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Attachment #8924014 - Attachment is obsolete: true

Attachment #8924014 - Flags: review?(rjesup)

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Attachment #8924014 - Attachment is obsolete: false

Nils Ohlmeier [:drno]

Updated

•

7 years ago

Attachment #8924014 - Attachment is obsolete: true

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 4

•

7 years ago

mozreview-review

Comment on attachment 8924014 [details] Bug 1411977: dispatch inside SingletonThreadHolder only if needed. https://reviewboard.mozilla.org/r/195222/#review200288 r+, though it doesn't seem to give me the option..

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 5

•

7 years ago

reviewed on the right bug...

Byron Campen [:bwc]

Assignee

Comment 6

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=be7c923dd292

Byron Campen [:bwc]

Assignee

Comment 7

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8812587df82

Comment hidden (mozreview-request)

Byron Campen [:bwc]

Assignee

Updated

•

7 years ago

Assignee: nobody → docfaraday

Nils Ohlmeier [:drno]

Comment 14

•

7 years ago

mozreview-review

Comment on attachment 8945583 [details] Bug 1411977 - Part 1: Stop queue jumping in RUN_ON_THREAD. https://reviewboard.mozilla.org/r/215724/#review221492

Attachment #8945583 - Flags: review?(drno) → review+

Nils Ohlmeier [:drno]

Comment 15

•

7 years ago

mozreview-review

Comment on attachment 8945584 [details] Bug 1411977 - Part 2: Stop using sync dispatch and queue jumping with SingletonThreadHolder. https://reviewboard.mozilla.org/r/215726/#review221506 LGMT ::: media/mtransport/nr_socket_prsock.cpp:1134 (Diff revision 1) > - // also guarantees socket_child_ is released from the io_thread, and > - // tells the SingletonThreadHolder we're done with it > - > #if defined(MOZILLA_INTERNAL_API) > // close(), but transfer the socket_child_ reference to die as well > + // dispatches back to STS to call ReleaseUse, to avoid shutting down the IO The comment threw me off for a little bit as it clearly does not dispatch to the STS thread right here. Maybe change to something like "destroy_i() dispatches internally back to STS...".

Attachment #8945584 - Flags: review?(drno) → review+

Nils Ohlmeier [:drno]

Comment 16

•

7 years ago

mozreview-review

Comment on attachment 8945585 [details] Bug 1411977 - Part 3: Clear the SingletonThreadHolder _after_ thread shutdowns are finished, not before they are started. https://reviewboard.mozilla.org/r/215728/#review221510

Attachment #8945585 - Flags: review?(drno) → review+

Nils Ohlmeier [:drno]

Comment 17

•

7 years ago

mozreview-review

Comment on attachment 8945586 [details] Bug 1411977 - Part 4: Only try to dispatch the release of TransportLayers when there is a target thread. https://reviewboard.mozilla.org/r/215730/#review221512

Attachment #8945586 - Flags: review?(drno) → review+

Nils Ohlmeier [:drno]

Comment 18

•

7 years ago

mozreview-review

Comment on attachment 8945587 [details] Bug 1411977 - Part 5: Don't pass a pointer to a temporary to NotifyDataChannel_m. https://reviewboard.mozilla.org/r/215732/#review221528

Attachment #8945587 - Flags: review?(drno) → review+

Nils Ohlmeier [:drno]

Comment 19

•

7 years ago

mozreview-review

Comment on attachment 8945588 [details] Bug 1411977 - Part 6: Don't unwind the stack when firing onsignalingstatechange. https://reviewboard.mozilla.org/r/215734/#review221568 ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:3217 (Diff revision 1) > if (!pco) { > return; > } > + > WrappableJSErrorResult rv; > - RUN_ON_THREAD(mThread, > + pco->OnStateChange(PCObserverStateType::IceConnectionState, rv); A couple of lines below in PeerConnectionImpl::IceGatheringStateChange() we call pco->OnStateChange() only via mThread->Dispatch() (to avoid queue jumping no longer RUN_ON_THREAD). Which of the two is right? Can we make them to them same thing either way please?

Attachment #8945588 - Flags: review?(drno) → review+

Byron Campen [:bwc]

Assignee

Comment 20

•

7 years ago

mozreview-review-reply

Comment on attachment 8945588 [details] Bug 1411977 - Part 6: Don't unwind the stack when firing onsignalingstatechange. https://reviewboard.mozilla.org/r/215734/#review221568 > A couple of lines below in PeerConnectionImpl::IceGatheringStateChange() we call pco->OnStateChange() only via mThread->Dispatch() (to avoid queue jumping no longer RUN_ON_THREAD). > > Which of the two is right? Can we make them to them same thing either way please? Maybe we can do this. There's some convoluted logic around making sure that gathering state changes, events for trickle candidates, and callbacks for SLD success happen in the proper order here. Maybe transceivers will let us untangle some of it?

Comment hidden (mozreview-request)

Nils Ohlmeier [:drno]

Comment 22

•

7 years ago

mozreview-review

Comment on attachment 8946310 [details] Bug 1411977 - Part 7: Stop using extra dispatches for ICE candidates and gathering state changes. https://reviewboard.mozilla.org/r/216280/#review222142 ::: dom/media/PeerConnection.js:382 (Diff revision 1) > // canTrickle == null means unknown; when a remote description is received it > // is set to true or false based on the presence of the "trickle" ice-option > this._canTrickle = null; > > // States > - this._iceGatheringState = this._iceConnectionState = "new"; > + this._iceConnectionState = "new"; What is the reason to delete the iceGatheringState, but not the iceConnectionState? ::: dom/media/PeerConnection.js:1798 (Diff revision 1) > // > // complete The ICE agent has completed gathering. Events such as adding > // a new interface or a new TURN server will cause the state to > // go back to gathering. > // > - handleIceGatheringStateChange(gatheringState) { > + handleIceGatheringStateChange() { Lets delete this function all together and call directly notifyIceGatheringStateChange() instead. ::: dom/media/PeerConnection.js:1813 (Diff revision 1) > case "IceConnectionState": > this.handleIceConnectionStateChange(this._dompc._pc.iceConnectionState); > break; > > case "IceGatheringState": > - this.handleIceGatheringStateChange(this._dompc._pc.iceGatheringState); > + this.handleIceGatheringStateChange(); I think this could now directly call this._dompc.notifyIceGatheringStateChange(). ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:3239 (Diff revision 1) > - NS_DISPATCH_NORMAL); > > if (mIceGatheringState == PCImplIceGatheringState::Complete) { > SendLocalIceCandidateToContent(0, "", ""); > } > + According to spec https://www.w3.org/TR/webrtc/#update-the-ice-gathering-state we are suppose to first fire the event and then the empty candidate. Let's make that change here while we are at it.

Attachment #8946310 - Flags: review?(drno) → review+

Byron Campen [:bwc]

Assignee

Comment 23

•

7 years ago

mozreview-review-reply

Comment on attachment 8946310 [details] Bug 1411977 - Part 7: Stop using extra dispatches for ICE candidates and gathering state changes. https://reviewboard.mozilla.org/r/216280/#review222142 > What is the reason to delete the iceGatheringState, but not the iceConnectionState? Scope creep. ;) > According to spec https://www.w3.org/TR/webrtc/#update-the-ice-gathering-state we are suppose to first fire the event and then the empty candidate. Let's make that change here while we are at it. This makes our test-cases sad. Let's drop part 7, and instead do this work in a new bug.

Byron Campen [:bwc]

Assignee

Updated

•

7 years ago

Attachment #8946310 - Attachment is obsolete: true

Comment hidden (mozreview-request)

Pulsebot

Comment 29

•

7 years ago

Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4480819ddb2b Part 1: Stop queue jumping in RUN_ON_THREAD. r=drno https://hg.mozilla.org/integration/autoland/rev/d6c47acf39b0 Part 2: Stop using sync dispatch and queue jumping with SingletonThreadHolder. r=drno https://hg.mozilla.org/integration/autoland/rev/481ba7607ae2 Part 3: Clear the SingletonThreadHolder _after_ thread shutdowns are finished, not before they are started. r=drno https://hg.mozilla.org/integration/autoland/rev/2a59e9ad8ba0 Part 4: Only try to dispatch the release of TransportLayers when there is a target thread. r=drno https://hg.mozilla.org/integration/autoland/rev/ef56214b5e2c Part 5: Don't pass a pointer to a temporary to NotifyDataChannel_m. r=drno https://hg.mozilla.org/integration/autoland/rev/d0cd0ad9ca5d Part 6: Don't unwind the stack when firing onsignalingstatechange. r=drno

Raul Gurzau (:RaulG)

Comment 30

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/4480819ddb2b https://hg.mozilla.org/mozilla-central/rev/d6c47acf39b0 https://hg.mozilla.org/mozilla-central/rev/481ba7607ae2 https://hg.mozilla.org/mozilla-central/rev/2a59e9ad8ba0 https://hg.mozilla.org/mozilla-central/rev/ef56214b5e2c https://hg.mozilla.org/mozilla-central/rev/d0cd0ad9ca5d

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox60: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla60

You need to log in before you can comment on or make changes to this bug.