Regression: FF100 fails to complete WebRTC ICE for outgoing H264 video to custom WebRTC server
Categories
(Core :: WebRTC: Networking, defect, P2)
Tracking
()
People
(Reporter: floe, Assigned: jld)
References
Details
(Keywords: regression, regressionwindow-wanted)
Attachments
(8 files)
(deleted),
text/html
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/html
|
Details | |
(deleted),
application/json
|
Details | |
(deleted),
application/json
|
Details | |
(deleted),
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-esr102+
|
Details |
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0
Steps to reproduce:
I'm working with a custom WebRTC server and didn't have any issues so far connecting with Firefox including FF99. However, FF100 seems to fail/hang indefinitely while completing the ICE process for the video streams. The outgoing audio stream starts, and the incoming audio/video streams as well, but no outgoing video streams.
Actual results:
Having a closer look at about:webrtc on both FF99 and FF100, the issue seems connected to the STUN server. FF100 shows a bunch of messages "Skipping STUN server because of address type mis-match", which don't appear on FF99. Obviously, same STUN server is being used in both cases.
Expected results:
Just like on FF99, I expect outgoing streams for both video and audio to work.
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::WebRTC: Networking' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•2 years ago
|
||
Could you please attach a copy of about:webrtc to this bug (there's a "Save Page" button at the top)?
Comment 4•2 years ago
|
||
So, I don't think that the "Skipping STUN server because of address type mis-match" is the problem here, because every m-section has gathered srflx candidates.
Looking at the candidate pair table I see the following, in the order of the m-sections (video, audio, video, application):
succeeded true true 86.52.87.190:57063/udp(srflx) [non-proxied] 176.9.106.24:34749/udp(host) 1 7241260434961991000 1562 201197
succeeded true true 86.52.87.190:37063/udp(srflx) [non-proxied] 176.9.106.24:33044/udp(host) 1 7241260434961991000 89139 163429
succeeded true true 86.52.87.190:51966/udp(srflx) [non-proxied] 176.9.106.24:49828/udp(host) 1 7241260434961991000 1562 130409
succeeded true true 86.52.87.190:56117/udp(srflx) [non-proxied] 176.9.106.24:57343/udp(host) 1 7241260434961991000 1669 2369
So we have ICE success and bidirectional network traffic for every m-section. However, only the audio m-section has a significant amount of bytes transmitted. So something is preventing video frames from being transmitted, but it is not ICE. Looking closer.
Comment 5•2 years ago
|
||
Was this failure observed on Windows by any chance? I wonder if this could be a regression from bug 1741244? That changed the H264 encoder we're using on Windows in version 100, and this bug is a failure to transmit H264 frames in version 100.
For completeness, here's the set of webrtc bugs with milestone 100 and no uplift to 99:
Reporter | ||
Comment 7•2 years ago
|
||
No, running on Linux, pretty much stock Ubuntu 22.04, with Firefox 100 installed from the mozillateams ppa.
Comment 8•2 years ago
|
||
Is this service accessible via the internet, so I can try to reproduce?
Reporter | ||
Comment 9•2 years ago
|
||
Yes, although this isn't exactly production-grade software: https://butterbrot.org:8080/stream.html
Should the server get stuck due to the missing video streams, send a GET request for /quit (i.e. https://butterbrot.org:8080/quit) and it should restart. For reference, source code is at https://github.com/floe/surfacestreams
Reporter | ||
Comment 10•2 years ago
|
||
Two more observations:
- The problem goes away when I switch everything to VP8 server-side, instead of H.264 constrained-baseline, so it is probably related to H.264 encoding somehow. I'll stick with this workaround for now.
- The problem also appears on FF 99 when I install from https://ftp.mozilla.org/pub/firefox/releases/99.0.1/linux-x86_64/en-US/firefox-99.0.1.tar.bz2 instead of the Ubuntu package, i.e. some Ubuntu or Debian patch mitigates the issue. I'll have a look at the source package.
Reporter | ||
Comment 11•2 years ago
|
||
Reporter | ||
Comment 12•2 years ago
|
||
Reporter | ||
Comment 13•2 years ago
|
||
Reporter | ||
Comment 14•2 years ago
|
||
I've attached the three patches from the Ubuntu source package that could conceivably have any relation to video encoding. Curious to hear what you think.
Updated•2 years ago
|
Comment 15•2 years ago
|
||
Yeah, this really does seem like an H264 problem in libwebrtc. I wonder whether libwebrtc simply does not like the profile-level-id you're trying to use. Maybe this is related to (or a duplicate of) bug 1755609? We have h264 tests in CI that are passing on linux.
Is there any way you could try using 42e01f as the profile-level-id?
Reporter | ||
Comment 17•2 years ago
|
||
Is there any way you could try using 42e01f as the profile-level-id?
I've tried patching the outgoing SDP so that it's definitely announcing 42e01f
, but that didn't seem to make any difference.
Comment 18•2 years ago
|
||
(In reply to Florian Echtler from comment #17)
Is there any way you could try using 42e01f as the profile-level-id?
I've tried patching the outgoing SDP so that it's definitely announcing
42e01f
, but that didn't seem to make any difference.
Can I see about:webrtc for this? It may have made a difference, just not enough of one.
Reporter | ||
Comment 19•2 years ago
|
||
Certainly; attached.
Reporter | ||
Comment 20•2 years ago
|
||
BTW: I gave Chrome a try in the meantime, and it works there, both with H.264 and VP8 codec.
Comment 21•2 years ago
|
||
That about:webrtc is extremely strange. There are several active candidate pairs, but they all seem to be linked to the local candidate with port 52422:
succeeded true true 86.52.87.190:52422/udp(prflx) [non-proxied] 176.9.106.24:43568/udp(host) 1 7961802290480809000 266573 495519
succeeded true true 86.52.87.190:52422/udp(prflx) [non-proxied] 176.9.106.24:40855/udp(host) 1 7961802290480809000 3359 600345
succeeded true true 86.52.87.190:52422/udp(prflx) [non-proxied] 176.9.106.24:44442/udp(host) 1 7961802290480809000 3222 393808
We are not using bundle here, so that port should only show up for one active candidate pair, but we're seeing it three times with remote ports from each of the three audio/video m-sections (40855 is for the first video m-section, 43568 is for the audio m-section, and 44442 is for the second video m-section). This is truly bizarre. I think I'm going to need to packet capture this.
Reporter | ||
Comment 22•2 years ago
|
||
Would be happy to help, if you can tell me an appropriate Wireshark filter (alternatively, I can just start an empty session and capture everything).
Comment 23•2 years ago
|
||
The severity field is not set for this bug.
:bwc, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 24•2 years ago
|
||
Hi Florian, thanks for your patience. Since this appears to be a regression and I'm not able to reproduce, would you mind running the mozregression tool to narrow down a regression range for this bug, and post its end result here? We generally prioritize regressions over other bugs, so this would help.
Updated•2 years ago
|
Reporter | ||
Comment 25•2 years ago
|
||
Hi again, some updates about this issue:
I dutifully went through a mozregression run, only to find no regression range, i.e. all tested versions work. As far as I can tell, my previous tests with downloaded nightly builds and empty profiles did not have the OpenH264 plugin installed properly, so that explained a large part of the issues I've been having.
In fact, it now looks like this bug is specific to the Ubuntu build of Firefox, it still appears on FF 101.0.1 as installed from the deb package (even with a fresh profile and verified OpenH264 install). As mentioned before, I'm using the mozillateams PPA, so I'm assuming this should still be a build from the same codebase as e.g. a build downloaded from https://ftp.mozilla.org/pub/firefox/releases/101.0.1/linux-x86_64/en-US/firefox-101.0.1.tar.bz2 ?
I've just tried this again, both times with an empty profile and correctly installed OpenH264 plugin:
- https://ftp.mozilla.org/pub/firefox/releases/101.0.1/linux-x86_64/en-US/firefox-101.0.1.tar.bz2 -> works
- https://launchpad.net/~mozillateam/+archive/ubuntu/ppa/+build/23832034 -> fails
So I guess it now boils down to the question of how these builds differ, or are linked differently?
Comment 26•2 years ago
|
||
It is also possible that the debian build has some non-standard prefs set? Maybe a copy of about:support (for both the debian build and the stock build) would help figure this out.
Updated•2 years ago
|
Reporter | ||
Comment 27•2 years ago
|
||
Reporter | ||
Comment 28•2 years ago
|
||
Reporter | ||
Comment 29•2 years ago
|
||
From a cursory diff, the only things that set these two apart is a) the language setting, b) some search extensions, and c) the keyMozillaFound
value (whatever that is)?
Comment 30•2 years ago
|
||
Yeah, I'm not seeing anything that I would expect to alter the H264 behavior in there...
Comment 31•2 years ago
|
||
Could you try running this test with the Debian build, with the "Require H.264 video" checkbox checked? The expected behavior is one side with a camera stream, and the other with a fake stream (cycles through colors).
Reporter | ||
Comment 32•2 years ago
|
||
This only works on the Debian build when I leave "Require H.264" unchecked. Otherwise, I only see the small preview videos that are local to each peer, but not the actual remote stream. As expected, it works on the tarball build in both cases.
Reporter | ||
Comment 33•2 years ago
|
||
P.S. Finally found something tangible by launching the Debian build from the commandline! The following error messages only appear when trying to start a H.264 stream, not for VP8.
Sandbox: attempt to open unexpected file /usr/lib/firefox/libpthread.so.0
Sandbox: attempt to open unexpected file /lib/x86_64-linux-gnu/libpthread.so.0
Sandbox: attempt to open unexpected file /lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v4/libpthread.so.0
Sandbox: seccomp sandbox violation: pid 189896, tid 189896, syscall 262, args 4294967196 140736547849056 140736547849248 0 4294967295 140736547849056.
Sandbox: attempt to open unexpected file /lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3/libpthread.so.0
Sandbox: seccomp sandbox violation: pid 189896, tid 189896, syscall 262, args 4294967196 140736547849056 140736547849248 0 4294967295 140736547849056.
[etc. etc. ...]
Not entirely sure where this is configured, this is obviously not something internal to Firefox?
Updated•2 years ago
|
Comment 35•2 years ago
|
||
The unexpected file things are not critical AFAIK. The syscall 262 is newfstatat
, which we do seem to handle in the file broker, so I'm not clear why that would generate an error. Specifically this looks similar to bug 1673770 but that was obviously fixed.
Comment 36•2 years ago
|
||
If it's indeed this kind of issue then I imagine the glibc that's being linked against might matter.
Comment 37•2 years ago
|
||
Looking at the about:support info, I see a lot of pending, unsubmitted crashes. I guess the GMP process is actually crashing similarly as in bug 1673202 but the crash reports aren't sent by default because that's a process that can't ask for crash reporting opt-in.
Reporter | ||
Comment 38•2 years ago
|
||
FTR, I tried with export MOZ_DISABLE_GMP_SANDBOX=1
and things started working also on the deb build.
Another observation, since [:gcp] mentioned the glibc:
# the deb build
$ ldd /usr/lib/firefox/firefox
linux-vdso.so.1 (0x00007ffda5bdd000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb58cd42000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb58cc5b000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb58cc3b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb58ca13000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb58d05d000)
# the tarball build
$ ldd firefox/firefox
linux-vdso.so.1 (0x00007ffe0130e000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa7c3379000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa7c3374000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa7c3148000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa7c3061000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa7c3041000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa7c2e17000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa7c33aa000)
Assignee | ||
Comment 39•2 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #35)
The syscall 262 is
newfstatat
, which we do seem to handle in the file broker
This is GMP, so there's no file broker, just pre-opened files that are managed within the process (without exposing extra attack surface). And we've never supported stat
or lstat
(either the syscalls of those names or the fstatat
equivalent) in the GMP sandbox.
But the underlying problem is that there's a library that's needed by the plugin and wasn't pre-loaded before the sandbox was started. That's bug 1725828, and the fix for that should have also taken care of OpenH264 (the only plugin that's exempt is clearkey
(the EME reference plugin), because that's built and distributed with the browser).
Assignee | ||
Comment 40•2 years ago
|
||
To recap the other bug: glibc moved the contents of libpthread/libdl/librt into libc itself, leaving empty stub .so
s for those libraries so that old binaries which DT_NEEDED
them will still work, but newly linked binaries will depend only on libc. So with a new firefox and an old plugin, when we would dlopen
the plugin (inside the sandbox) and it needs those libraries, they hadn't already been loaded as dependencies of libxul etc., and it failed. But we fixed that by pre-dlopen
ing those libraries with RTLD_GLOBAL
, which should make them available to satisfy the dependencies of the plugin when it's loaded later. Or, we thought we fixed it; in this case, something isn't working.
Comment 41•2 years ago
|
||
This is GMP, so there's no file broker, just pre-opened files that are managed within the process (without exposing extra attack surface). And we've never supported stat or lstat (either the syscalls of those names or the fstatat equivalent) in the GMP sandbox.
Maybe I'm misunderstanding something but I believe you patched the sandbox specifically so that fstatat works even if there is no file broker, and the fix was specifically for GMP: https://hg.mozilla.org/integration/autoland/rev/086605072f76 Note this code is in SandboxBrokerCommon.
Assignee | ||
Comment 42•2 years ago
|
||
I think I might know what the problem is: the attemped workaround is in a function named ParseChromiumManifest
, which appears to be used only for Widevine, not for other media plugins. It would need to be factored out and also used in ReadGMPInfoFile
, if I understand correctly.
(I've reproduced this in a VM.)
Assignee | ||
Comment 43•2 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #41)
This is GMP, so there's no file broker, just pre-opened files that are managed within the process (without exposing extra attack surface). And we've never supported stat or lstat (either the syscalls of those names or the fstatat equivalent) in the GMP sandbox.
Maybe I'm misunderstanding something but I believe you patched the sandbox specifically so that fstatat works even if there is no file broker
That was only for the case that's equivalent to fstat
(using the nonstandard Linux feature AT_EMPTY_PATH
); anything involving a filesystem path was not supported.
Reporter | ||
Comment 44•2 years ago
|
||
Hey everyone, thanks for your insights and for reproducing this!
Is there a way to temporarily turn the sandbox off through about:config
? Or only through envvar MOZ_DISABLE_GMP_SANDBOX
?
Updated•2 years ago
|
Assignee | ||
Comment 45•2 years ago
|
||
The workaround added in bug 1725828 was intended to be applied to any
plugin that may have been linked against a different version of glibc
than the browser; i.e., everything except clearkey, which is built and
shipped along with the browser. Unfortunately, the change was made in a
function used only for the Widevine CDM, so we're now having the same
problem with OpenH264. This patch corrects that oversight and preloads
the (potentially) needed libraries for every applicable plugin.
Updated•2 years ago
|
Comment 46•2 years ago
|
||
Comment 47•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Comment 48•2 years ago
|
||
The patch landed in nightly and beta is affected.
:jld, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox102
towontfix
.
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 49•2 years ago
|
||
Comment on attachment 9281501 [details]
Bug 1768809 - Fix library preloading for the OpenH264 plugin.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Breakage of H.264 WebRTC in downstream Linux builds
- User impact if declined: Without this patch, using WebRTC with H.264 video will be broken on many downstream Linux builds.
- Fix Landed on Version: 103
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This just loads some libraries, and they're libraries that are already loaded when using Mozilla's builds, and the same code has been used for the Widevine CDM since version 91.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 50•2 years ago
|
||
Comment on attachment 9281501 [details]
Bug 1768809 - Fix library preloading for the OpenH264 plugin.
Approved for ESR102.1, thanks.
Comment 51•2 years ago
|
||
bugherder uplift |
Comment 52•2 years ago
|
||
Looked over the bug and found the custom webrtc page at Comment 9, but it seems it isn't working anymore. Are there any other ways to try and reproduce the issue in order to verify it in the latest builds? If not, can you verify that the fix is working on latest beta and ESR builds? Thank you!
Assignee | ||
Comment 53•2 years ago
|
||
(In reply to Catalin Sasca, QA [:csasca] from comment #52)
Looked over the bug and found the custom webrtc page at Comment 9, but it seems it isn't working anymore. Are there any other ways to try and reproduce the issue in order to verify it in the latest builds? If not, can you verify that the fix is working on latest beta and ESR builds? Thank you!
You can use https://mozilla.github.io/webrtc-landing/pc_test.html (requires a camera); check the “Require H.264 video” box before clicking Start. Also make sure the OpenH264 codec is shown as installed in the Plugins section of about:addons; when testing with a new profile, it may take a little while for the plugin to be downloaded.
Comment 54•2 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #53)
(In reply to Catalin Sasca, QA [:csasca] from comment #52)
Looked over the bug and found the custom webrtc page at Comment 9, but it seems it isn't working anymore. Are there any other ways to try and reproduce the issue in order to verify it in the latest builds? If not, can you verify that the fix is working on latest beta and ESR builds? Thank you!
You can use https://mozilla.github.io/webrtc-landing/pc_test.html (requires a camera); check the “Require H.264 video” box before clicking Start. Also make sure the OpenH264 codec is shown as installed in the Plugins section of about:addons; when testing with a new profile, it may take a little while for the plugin to be downloaded.
Tried reproducing with the link provided on Firefox 102.0a1, 100.0.2 builds from archive and snap 102.0.1 build, with no luck. Everytime the h264 plugin was installed in about:addons and the requires h264 option selected in the test page. No errors or hangs occurred.
Assignee | ||
Comment 55•2 years ago
|
||
I've verified this with a local build on an Ubuntu 22.04 VM (with ac_add_options --without-sysroot
); Hg revision 30cb300b4f1e works and its parent revision fails.
Snap builds won't reproduce it because they're built on (essentially) Ubuntu 20.04, and Mozilla's builds use a relatively old version of Debian if I recall correctly; this needs the PPA package or something similar.
Comment 56•2 years ago
|
||
Thanks Jed for looking over it and verified it. Will remove the qe+ flag.
Description
•