Open Bug 1749090 Opened 3 years ago Updated 3 years ago

Intermittent "failed" iceConnectionState on one peer yet none on the other in deterministic zero-candidate case

Categories

(Core :: WebRTC, defect, P3)

Firefox 96
defect

Tracking

()

Tracking Status
firefox-esr91 --- wontfix
firefox95 --- wontfix
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 --- fix-optional

People

(Reporter: jib, Unassigned)

References

(Regression)

Details

(Keywords: regression)

STRs:

  1. Open https://jsfiddle.net/jib1/uq6pk40r/28/ (which blocks all candidates)
  2. Wait 10 seconds.
  3. If you see 4 lines of output instead of 5, then ↻ refresh the page and wait another 10 seconds (try a few times to be sure)

Actual results:

pc1.iceGatheringState = gathering
pc1.iceGatheringState = complete
pc2.iceGatheringState = gathering
pc2.iceGatheringState = complete
pc2.iceConnectionState = failed

Actual results (pre-regression):

pc1.iceGatheringState = gathering
pc1.iceGatheringState = complete
pc2.iceGatheringState = gathering
pc2.iceGatheringState = complete

Expected results after latest spec change:

pc1.iceGatheringState = gathering
pc1.iceGatheringState = complete
pc2.iceGatheringState = gathering
pc2.iceGatheringState = complete
pc1.iceConnectionState = failed
pc2.iceConnectionState = failed

Regression range:

27:38.45 INFO: No more integration revisions, bisection finished.
27:38.45 INFO: Last good revision: c8fdcf75317d40a3ead539a959f748c940d0a5c9
27:38.45 INFO: First bad revision: 1495ca5ef535f8ad692a3a579ca42eddc14f39a8
27:38.45 INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c8fdcf75317d40a3ead539a959f748c940d0a5c9&tochange=1495ca5ef535f8ad692a3a579ca42eddc14f39a8
27:41.59 INFO: ************* Switching to autoland by process of elimination (no branch detected in commit message)
27:42.31 ERROR: Unable to exploit the merge commit. Origin branch is mozilla-central, and the commit message for 1495ca5e was: [unrelated]

In words:

  • in 95 it used to not go to "failed" at all, which was at least consistent across peer connections and similar to other browsers.
  • with 96 after ~5 seconds it's displaying intermittent and inconsistent behavior on pc2 but not pc1 .
  • With the latest spec change, the spec now says to go to "failed" immediately once gathering has completed on both ends with zero candidates.

So instead of restoring the 95 behavior, the correct fix at this point would be to fire "failed" on both pc1 and pc2 as soon as that has happened (which I think is <5 seconds).

I'm still marking it as a regression, since the introduced inconsistency appears buggy.

There are some ICE stack fixes over in bug 1253706 that might just help with this, since new tests for that bug uncovered some wonkiness related to ICE failure cases when no candidates were discovered. Particularly, https://phabricator.services.mozilla.com/D135371

Severity: -- → S3
Priority: -- → P3

I bet this is still flaky before the libwebrtc update landing, but don't have proof. Probably not very important.

Set release status flags based on info from the regressing bug 1654112

Looks like the patches from bug 1253706 fix this, although there is a delay before the ICE failure happens (we have to wait for the grace period timer, since we do not support processing end-of-candidates from the other end yet).

Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.