Closed Bug 1768809 Opened 2 years ago Closed 2 years ago

Regression: FF100 fails to complete WebRTC ICE for outgoing H264 video to custom WebRTC server

Categories

(Core :: WebRTC: Networking, defect, P2)

Firefox 100
Unspecified
Linux
defect

Tracking

()

VERIFIED FIXED
103 Branch
Tracking Status
firefox-esr91 --- wontfix
firefox-esr102 --- fixed
firefox101 --- wontfix
firefox102 --- wontfix
firefox103 --- fixed

People

(Reporter: floe, Assigned: jld)

References

Details

(Keywords: regression, regressionwindow-wanted)

Attachments

(8 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0

Steps to reproduce:

I'm working with a custom WebRTC server and didn't have any issues so far connecting with Firefox including FF99. However, FF100 seems to fail/hang indefinitely while completing the ICE process for the video streams. The outgoing audio stream starts, and the incoming audio/video streams as well, but no outgoing video streams.

Actual results:

Having a closer look at about:webrtc on both FF99 and FF100, the issue seems connected to the STUN server. FF100 shows a bunch of messages "Skipping STUN server because of address type mis-match", which don't appear on FF99. Obviously, same STUN server is being used in both cases.

Expected results:

Just like on FF99, I expect outgoing streams for both video and audio to work.

The Bugbug bot thinks this bug should belong to the 'Core::WebRTC: Networking' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → WebRTC: Networking
Product: Firefox → Core

Could you please attach a copy of about:webrtc to this bug (there's a "Save Page" button at the top)?

Flags: needinfo?(floe)
Attached file aboutWebrtc-ff100.html (deleted) —

Sure, FF100 output attached.

Flags: needinfo?(floe)

So, I don't think that the "Skipping STUN server because of address type mis-match" is the problem here, because every m-section has gathered srflx candidates.

Looking at the candidate pair table I see the following, in the order of the m-sections (video, audio, video, application):

succeeded true true 86.52.87.190:57063/udp(srflx) [non-proxied] 176.9.106.24:34749/udp(host) 1 7241260434961991000 1562 201197
succeeded true true 86.52.87.190:37063/udp(srflx) [non-proxied] 176.9.106.24:33044/udp(host) 1 7241260434961991000 89139 163429
succeeded true true 86.52.87.190:51966/udp(srflx) [non-proxied] 176.9.106.24:49828/udp(host) 1 7241260434961991000 1562 130409
succeeded true true 86.52.87.190:56117/udp(srflx) [non-proxied] 176.9.106.24:57343/udp(host) 1 7241260434961991000 1669 2369

So we have ICE success and bidirectional network traffic for every m-section. However, only the audio m-section has a significant amount of bytes transmitted. So something is preventing video frames from being transmitted, but it is not ICE. Looking closer.

Was this failure observed on Windows by any chance? I wonder if this could be a regression from bug 1741244? That changed the H264 encoder we're using on Windows in version 100, and this bug is a failure to transmit H264 frames in version 100.

For completeness, here's the set of webrtc bugs with milestone 100 and no uplift to 99:

https://bugzilla.mozilla.org/buglist.cgi?target_milestone=100%20Branch&list_id=16081067&o1=notequals&component=WebRTC&component=WebRTC%3A%20Audio%2FVideo&component=WebRTC%3A%20Networking&component=WebRTC%3A%20Signaling&resolution=FIXED&query_format=advanced&classification=Client%20Software&classification=Developer%20Infrastructure&classification=Components&classification=Server%20Software&classification=Other&f1=cf_status_firefox99&v1=fixed

^

Flags: needinfo?(floe)

No, running on Linux, pretty much stock Ubuntu 22.04, with Firefox 100 installed from the mozillateams ppa.

Flags: needinfo?(floe)

Is this service accessible via the internet, so I can try to reproduce?

Flags: needinfo?(floe)

Yes, although this isn't exactly production-grade software: https://butterbrot.org:8080/stream.html

Should the server get stuck due to the missing video streams, send a GET request for /quit (i.e. https://butterbrot.org:8080/quit) and it should restart. For reference, source code is at https://github.com/floe/surfacestreams

Flags: needinfo?(floe)

Two more observations:

  • The problem goes away when I switch everything to VP8 server-side, instead of H.264 constrained-baseline, so it is probably related to H.264 encoding somehow. I'll stick with this workaround for now.
  • The problem also appears on FF 99 when I install from https://ftp.mozilla.org/pub/firefox/releases/99.0.1/linux-x86_64/en-US/firefox-99.0.1.tar.bz2 instead of the Ubuntu package, i.e. some Ubuntu or Debian patch mitigates the issue. I'll have a look at the source package.
Attached patch libpixman-disable-vmx.patch (deleted) — Splinter Review
Attached patch upstream-a107df8ae87c.patch (deleted) — Splinter Review
Attached patch upstream-c7ca5d4c890a.patch (deleted) — Splinter Review

I've attached the three patches from the Ubuntu source package that could conceivably have any relation to video encoding. Curious to hear what you think.

Flags: needinfo?(docfaraday)

Yeah, this really does seem like an H264 problem in libwebrtc. I wonder whether libwebrtc simply does not like the profile-level-id you're trying to use. Maybe this is related to (or a duplicate of) bug 1755609? We have h264 tests in CI that are passing on linux.

Is there any way you could try using 42e01f as the profile-level-id?

Flags: needinfo?(docfaraday)

^

Flags: needinfo?(floe)

Is there any way you could try using 42e01f as the profile-level-id?

I've tried patching the outgoing SDP so that it's definitely announcing 42e01f, but that didn't seem to make any difference.

Flags: needinfo?(floe)

(In reply to Florian Echtler from comment #17)

Is there any way you could try using 42e01f as the profile-level-id?

I've tried patching the outgoing SDP so that it's definitely announcing 42e01f, but that didn't seem to make any difference.

Can I see about:webrtc for this? It may have made a difference, just not enough of one.

Attached file aboutWebrtc-ff100-42e01f.html (deleted) —

Certainly; attached.

BTW: I gave Chrome a try in the meantime, and it works there, both with H.264 and VP8 codec.

That about:webrtc is extremely strange. There are several active candidate pairs, but they all seem to be linked to the local candidate with port 52422:

succeeded true true 86.52.87.190:52422/udp(prflx) [non-proxied] 176.9.106.24:43568/udp(host) 1 7961802290480809000 266573 495519
succeeded true true 86.52.87.190:52422/udp(prflx) [non-proxied] 176.9.106.24:40855/udp(host) 1 7961802290480809000 3359 600345
succeeded true true 86.52.87.190:52422/udp(prflx) [non-proxied] 176.9.106.24:44442/udp(host) 1 7961802290480809000 3222 393808

We are not using bundle here, so that port should only show up for one active candidate pair, but we're seeing it three times with remote ports from each of the three audio/video m-sections (40855 is for the first video m-section, 43568 is for the audio m-section, and 44442 is for the second video m-section). This is truly bizarre. I think I'm going to need to packet capture this.

Would be happy to help, if you can tell me an appropriate Wireshark filter (alternatively, I can just start an empty session and capture everything).

The severity field is not set for this bug.
:bwc, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(docfaraday)

Hi Florian, thanks for your patience. Since this appears to be a regression and I'm not able to reproduce, would you mind running the mozregression tool to narrow down a regression range for this bug, and post its end result here? We generally prioritize regressions over other bugs, so this would help.

Flags: needinfo?(floe)
OS: Unspecified → Linux
Summary: Regression: FF100 fails to complete WebRTC ICE for video streams → Regression: FF100 fails to complete WebRTC ICE for outgoing H264 video to custom WebRTC server
Severity: -- → S3
Priority: -- → P2

Hi again, some updates about this issue:

I dutifully went through a mozregression run, only to find no regression range, i.e. all tested versions work. As far as I can tell, my previous tests with downloaded nightly builds and empty profiles did not have the OpenH264 plugin installed properly, so that explained a large part of the issues I've been having.

In fact, it now looks like this bug is specific to the Ubuntu build of Firefox, it still appears on FF 101.0.1 as installed from the deb package (even with a fresh profile and verified OpenH264 install). As mentioned before, I'm using the mozillateams PPA, so I'm assuming this should still be a build from the same codebase as e.g. a build downloaded from https://ftp.mozilla.org/pub/firefox/releases/101.0.1/linux-x86_64/en-US/firefox-101.0.1.tar.bz2 ?

I've just tried this again, both times with an empty profile and correctly installed OpenH264 plugin:

So I guess it now boils down to the question of how these builds differ, or are linked differently?

Flags: needinfo?(floe)

It is also possible that the debian build has some non-standard prefs set? Maybe a copy of about:support (for both the debian build and the stock build) would help figure this out.

Flags: needinfo?(docfaraday) → needinfo?(floe)
QA Whiteboard: [qa-regression-triage]
Attached file ff-101.0.1-deb.json (deleted) —
Flags: needinfo?(floe)
Attached file ff-101.0.1-tbz.json (deleted) —

From a cursory diff, the only things that set these two apart is a) the language setting, b) some search extensions, and c) the keyMozillaFound value (whatever that is)?

Yeah, I'm not seeing anything that I would expect to alter the H264 behavior in there...

Could you try running this test with the Debian build, with the "Require H.264 video" checkbox checked? The expected behavior is one side with a camera stream, and the other with a fake stream (cycles through colors).

https://mozilla.github.io/webrtc-landing/pc_test.html

Flags: needinfo?(floe)

https://mozilla.github.io/webrtc-landing/pc_test.html

This only works on the Debian build when I leave "Require H.264" unchecked. Otherwise, I only see the small preview videos that are local to each peer, but not the actual remote stream. As expected, it works on the tarball build in both cases.

Flags: needinfo?(floe)

P.S. Finally found something tangible by launching the Debian build from the commandline! The following error messages only appear when trying to start a H.264 stream, not for VP8.

Sandbox: attempt to open unexpected file /usr/lib/firefox/libpthread.so.0
Sandbox: attempt to open unexpected file /lib/x86_64-linux-gnu/libpthread.so.0
Sandbox: attempt to open unexpected file /lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v4/libpthread.so.0
Sandbox: seccomp sandbox violation: pid 189896, tid 189896, syscall 262, args 4294967196 140736547849056 140736547849248 0 4294967295 140736547849056.
Sandbox: attempt to open unexpected file /lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3/libpthread.so.0
Sandbox: seccomp sandbox violation: pid 189896, tid 189896, syscall 262, args 4294967196 140736547849056 140736547849248 0 4294967295 140736547849056.
[etc. etc. ...]

Not entirely sure where this is configured, this is obviously not something internal to Firefox?

Ah, this is a sandboxing issue. Any ideas?

Flags: needinfo?(mfroman)
Flags: needinfo?(jld)

The unexpected file things are not critical AFAIK. The syscall 262 is newfstatat, which we do seem to handle in the file broker, so I'm not clear why that would generate an error. Specifically this looks similar to bug 1673770 but that was obviously fixed.

If it's indeed this kind of issue then I imagine the glibc that's being linked against might matter.

Looking at the about:support info, I see a lot of pending, unsubmitted crashes. I guess the GMP process is actually crashing similarly as in bug 1673202 but the crash reports aren't sent by default because that's a process that can't ask for crash reporting opt-in.

FTR, I tried with export MOZ_DISABLE_GMP_SANDBOX=1 and things started working also on the deb build.

Another observation, since [:gcp] mentioned the glibc:

# the deb build
$ ldd /usr/lib/firefox/firefox
	linux-vdso.so.1 (0x00007ffda5bdd000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb58cd42000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb58cc5b000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb58cc3b000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb58ca13000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fb58d05d000)

# the tarball build
$ ldd firefox/firefox
	linux-vdso.so.1 (0x00007ffe0130e000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa7c3379000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa7c3374000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa7c3148000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa7c3061000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa7c3041000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa7c2e17000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa7c33aa000)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #35)

The syscall 262 is newfstatat, which we do seem to handle in the file broker

This is GMP, so there's no file broker, just pre-opened files that are managed within the process (without exposing extra attack surface). And we've never supported stat or lstat (either the syscalls of those names or the fstatat equivalent) in the GMP sandbox.

But the underlying problem is that there's a library that's needed by the plugin and wasn't pre-loaded before the sandbox was started. That's bug 1725828, and the fix for that should have also taken care of OpenH264 (the only plugin that's exempt is clearkey (the EME reference plugin), because that's built and distributed with the browser).

To recap the other bug: glibc moved the contents of libpthread/libdl/librt into libc itself, leaving empty stub .sos for those libraries so that old binaries which DT_NEEDED them will still work, but newly linked binaries will depend only on libc. So with a new firefox and an old plugin, when we would dlopen the plugin (inside the sandbox) and it needs those libraries, they hadn't already been loaded as dependencies of libxul etc., and it failed. But we fixed that by pre-dlopening those libraries with RTLD_GLOBAL, which should make them available to satisfy the dependencies of the plugin when it's loaded later. Or, we thought we fixed it; in this case, something isn't working.

Flags: needinfo?(jld)

This is GMP, so there's no file broker, just pre-opened files that are managed within the process (without exposing extra attack surface). And we've never supported stat or lstat (either the syscalls of those names or the fstatat equivalent) in the GMP sandbox.

Maybe I'm misunderstanding something but I believe you patched the sandbox specifically so that fstatat works even if there is no file broker, and the fix was specifically for GMP: https://hg.mozilla.org/integration/autoland/rev/086605072f76 Note this code is in SandboxBrokerCommon.

I think I might know what the problem is: the attemped workaround is in a function named ParseChromiumManifest, which appears to be used only for Widevine, not for other media plugins. It would need to be factored out and also used in ReadGMPInfoFile, if I understand correctly.

(I've reproduced this in a VM.)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #41)

This is GMP, so there's no file broker, just pre-opened files that are managed within the process (without exposing extra attack surface). And we've never supported stat or lstat (either the syscalls of those names or the fstatat equivalent) in the GMP sandbox.

Maybe I'm misunderstanding something but I believe you patched the sandbox specifically so that fstatat works even if there is no file broker

That was only for the case that's equivalent to fstat (using the nonstandard Linux feature AT_EMPTY_PATH); anything involving a filesystem path was not supported.

Hey everyone, thanks for your insights and for reproducing this!

Is there a way to temporarily turn the sandbox off through about:config? Or only through envvar MOZ_DISABLE_GMP_SANDBOX?

Flags: needinfo?(mfroman)

The workaround added in bug 1725828 was intended to be applied to any
plugin that may have been linked against a different version of glibc
than the browser; i.e., everything except clearkey, which is built and
shipped along with the browser. Unfortunately, the change was made in a
function used only for the Widevine CDM, so we're now having the same
problem with OpenH264. This patch corrects that oversight and preloads
the (potentially) needed libraries for every applicable plugin.

Assignee: nobody → jld
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Pushed by jedavis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/30cb300b4f1e Fix library preloading for the OpenH264 plugin. r=media-playback-reviewers,alwu
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 103 Branch

The patch landed in nightly and beta is affected.
:jld, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox102 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jld)
Flags: qe-verify+

Comment on attachment 9281501 [details]
Bug 1768809 - Fix library preloading for the OpenH264 plugin.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Breakage of H.264 WebRTC in downstream Linux builds
  • User impact if declined: Without this patch, using WebRTC with H.264 video will be broken on many downstream Linux builds.
  • Fix Landed on Version: 103
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This just loads some libraries, and they're libraries that are already loaded when using Mozilla's builds, and the same code has been used for the Widevine CDM since version 91.
Flags: needinfo?(jld)
Attachment #9281501 - Flags: approval-mozilla-esr102?
QA Whiteboard: [qa-regression-triage] → [qa-regression-triage][qa-triaged]

Comment on attachment 9281501 [details]
Bug 1768809 - Fix library preloading for the OpenH264 plugin.

Approved for ESR102.1, thanks.

Attachment #9281501 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+

Looked over the bug and found the custom webrtc page at Comment 9, but it seems it isn't working anymore. Are there any other ways to try and reproduce the issue in order to verify it in the latest builds? If not, can you verify that the fix is working on latest beta and ESR builds? Thank you!

Flags: needinfo?(floe)

(In reply to Catalin Sasca, QA [:csasca] from comment #52)

Looked over the bug and found the custom webrtc page at Comment 9, but it seems it isn't working anymore. Are there any other ways to try and reproduce the issue in order to verify it in the latest builds? If not, can you verify that the fix is working on latest beta and ESR builds? Thank you!

You can use https://mozilla.github.io/webrtc-landing/pc_test.html (requires a camera); check the “Require H.264 video” box before clicking Start. Also make sure the OpenH264 codec is shown as installed in the Plugins section of about:addons; when testing with a new profile, it may take a little while for the plugin to be downloaded.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #53)

(In reply to Catalin Sasca, QA [:csasca] from comment #52)

Looked over the bug and found the custom webrtc page at Comment 9, but it seems it isn't working anymore. Are there any other ways to try and reproduce the issue in order to verify it in the latest builds? If not, can you verify that the fix is working on latest beta and ESR builds? Thank you!

You can use https://mozilla.github.io/webrtc-landing/pc_test.html (requires a camera); check the “Require H.264 video” box before clicking Start. Also make sure the OpenH264 codec is shown as installed in the Plugins section of about:addons; when testing with a new profile, it may take a little while for the plugin to be downloaded.

Tried reproducing with the link provided on Firefox 102.0a1, 100.0.2 builds from archive and snap 102.0.1 build, with no luck. Everytime the h264 plugin was installed in about:addons and the requires h264 option selected in the test page. No errors or hangs occurred.

I've verified this with a local build on an Ubuntu 22.04 VM (with ac_add_options --without-sysroot); Hg revision 30cb300b4f1e works and its parent revision fails.

Snap builds won't reproduce it because they're built on (essentially) Ubuntu 20.04, and Mozilla's builds use a relatively old version of Debian if I recall correctly; this needs the PPA package or something similar.

Status: RESOLVED → VERIFIED
Flags: needinfo?(floe)

Thanks Jed for looking over it and verified it. Will remove the qe+ flag.

Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: