Closed Bug 1401666 Opened 7 years ago Closed 7 years ago

poor webgl performance

Categories

(Core :: Security: Process Sandboxing, defect, P1)

57 Branch
Unspecified
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- fixed
firefox58 --- verified

People

(Reporter: hotsauce2998, Assigned: jld)

References

Details

(Keywords: regression, Whiteboard: [gfx-noted][sb+])

Attachments

(6 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0 Build ID: 20170914224058 Steps to reproduce: visit http://alteredqualia.com/three/examples/webgl_pasta.html or http://madebyevan.com/webgl-water/ Actual results: Poor performance, webgl is slow and choppy. Only happens in Firefox 57. Works great in Firefox 55 Expected results: webgl rendering should be smooth
Component: Untriaged → Canvas: WebGL
Keywords: regression
Product: Firefox → Core
Thanks for the bug report! Would you have time to give us a bit more information? I'm having trouble reproducing the issue on my hardware, and this is most likely very hardware specific (e.g., I see a difference between Intel and Nvidia graphics card on the same computer, but not between 55 and 57.) 1. The graphics section of your about:support would help us see what kind of hardware configuration you have. 2. What kind of frame rate are you getting in 57 vs. 55? Perhaps a video recording of the good/bad that we can look at? If you set layers.acceleration.draw-fps in about:config, you will be able to see the frame rate and the GPU execution time. 3. If you're showing a regression that can be evaluated quickly, it would really help us to find which particular change caused this problem. Running mozregression tool (http://mozilla.github.io/mozregression/quickstart.html) would give us that information - could you run it?
Priority: -- → P2
Whiteboard: [gfx-noted]
Attached image framerates.png (deleted) —
Attached file troubleshooting_info.txt (deleted) —
Hey Milan I've attached a screenshot of showing the frame rate of Firefox 55 and 57 side by side. Also attached the troubleshooting info from about:support. Heres the bisect info from the mozregression tool: 17:44.81 INFO: Last good revision: 8e1e06adf80f82d3d5cf08eadaf569a107bd1ecf 17:44.81 INFO: First bad revision: b5fa08551d6e74d8300fa94f3161afdffd867764 17:44.81 INFO: Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8e1e06adf80f82d3d5cf08eadaf569a107bd1ecf&tochange=b5fa08551d6e74d8300fa94f3161afdffd867764 Hope that helps (In reply to Milan Sreckovic [:milan] from comment #1) > Thanks for the bug report! > Would you have time to give us a bit more information? I'm having trouble > reproducing the issue on my hardware, and this is most likely very hardware > specific (e.g., I see a difference between Intel and Nvidia graphics card on > the same computer, but not between 55 and 57.) > > 1. The graphics section of your about:support would help us see what kind of > hardware configuration you have. > > 2. What kind of frame rate are you getting in 57 vs. 55? Perhaps a video > recording of the good/bad that we can look at? If you set > layers.acceleration.draw-fps in about:config, you will be able to see the > frame rate and the GPU execution time. > > 3. If you're showing a regression that can be evaluated quickly, it would > really help us to find which particular change caused this problem. Running > mozregression tool (http://mozilla.github.io/mozregression/quickstart.html) > would give us that information - could you run it?
Gian-Carlo, is there a way to run time "disable" some of the changes in bug 1308400 (e.g., by dialing down some sandbox setting) so that Colin can see if this improves the performance back? That's a 3x performance hit we seeing, so we want to figure out what we can do about it. Jeff any more information we can get from Colin that would help here?
Blocks: 1308400
Flags: needinfo?(jmathies)
Flags: needinfo?(jgilbert)
Flags: needinfo?(gpascutto)
Priority: P2 → P1
OS: Unspecified → Linux
Run with MOZ_SANDBOX_LOGGING=1 and see what gets blocked. Might want to attach the log to this bug. https://wiki.mozilla.org/Security/Sandbox#Customization_Settings can whitelist specific files/devices security.sandbox.content.level can turns down the whole sandbox Exactly what version of Ubuntu is this, and exactly what video hardware and drivers do you have on there? I only see a basic iGPU in the about:support log, but maybe we end up using that because we fail to reach the real GPU.
Flags: needinfo?(gpascutto)
Vendor ID: Intel Open Source Technology Center Device ID: Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2) Driver Version: 3.0 Mesa 12.0.6
(In reply to Milan Sreckovic [:milan] from comment #7) > Vendor ID: Intel Open Source Technology Center > Device ID: Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2) > Driver Version: 3.0 Mesa 12.0.6 I saw that, Milan, (as I already pointed out), but there's no guarantee he doesn't have *another* card that the sandbox is masking.
GPU identification seems to be done by the parent process, in which case it's not affected by sandboxing. Also, it uses a subprocess forked very early in startup (bug 639842), so even if I'm wrong and the probing is redone in content processes, it still shouldn't be affected by sandboxing. (If it's trying to fork in a sandboxed content process, then that would've already been broken in 55, so that's not it.)
Ok, then we're probably blocking something Mesa is expecting to be there. MOZ_SANDBOX_LOGGING=1 should reveal what.
Colin, does setting security.sandbox.content.level in about:config to 2 (or even 1), from the default 3, restarting and running again make any difference? What's the frame rate/exec time then? If this makes a difference, then setting the environment variable from comment 10 and running again would be awesome. Actually, to be safe, probably would be useful either way. If you have time for either or both of these, that's be great. If you only have time for one, MOZ_SANDBOX_LOGGING=1 first please :)
(In reply to Milan Sreckovic [:milan] from comment #5) > Gian-Carlo, is there a way to run time "disable" some of the changes in bug > 1308400 (e.g., by dialing down some sandbox setting) so that Colin can see > if this improves the performance back? That's a 3x performance hit we > seeing, so we want to figure out what we can do about it. > > Jeff any more information we can get from Colin that would help here? I don't have anything to add here. This is worrisome for graphics though, since we need to support multiple different drivers. Breaking one driver means we're at-risk for breaking others, and this isn't something one individual can test alone.
Flags: needinfo?(jgilbert)
(In reply to Milan Sreckovic [:milan] from comment #11) > Colin, does setting security.sandbox.content.level in about:config to 2 (or > even 1), from the default 3, restarting and running again make any > difference? What's the frame rate/exec time then? > > If this makes a difference, then setting the environment variable from > comment 10 and running again would be awesome. Actually, to be safe, > probably would be useful either way. > > If you have time for either or both of these, that's be great. If you only > have time for one, MOZ_SANDBOX_LOGGING=1 first please :) I'm getting about 10-11 FPS with security.sandbox.content.level set to 1 and 2, about 1-2 FPS with security.sandbox.content.level = 3, and about 20 FPS on firefox 55 I don't know how to set MOZ_SANDBOX_LOGGING=1. Is this the same thing as security.sandbox.logging.enabled in about:config?
Flags: needinfo?(jmathies)
It seems unclear how 56 is affected but we can keep this open and take a look next week. Right now, this isn't actionable for 56.
(In reply to Colin from comment #13) > I'm getting about 10-11 FPS with security.sandbox.content.level set to 1 and > 2, > about 1-2 FPS with security.sandbox.content.level = 3, > and about 20 FPS on firefox 55 > > I don't know how to set MOZ_SANDBOX_LOGGING=1. Is this the same thing as > security.sandbox.logging.enabled in about:config? You start Firefox with that flag set, i.e. in a terminal you'd type MOZ_SANDBOX_LOGGING=1 firefox and you should get logging in the console window.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #15) > You start Firefox with that flag set, i.e. in a terminal you'd type > > MOZ_SANDBOX_LOGGING=1 firefox > > and you should get logging in the console window. Firefox shouldn't be running when you do this, BTW. Shut it down first, then do the above.
Attached file sandbox_content_level_3.log (deleted) —
Attached file sandbox_content_level_2.log (deleted) —
(In reply to Gian-Carlo Pascutto [:gcp] from comment #16) > (In reply to Gian-Carlo Pascutto [:gcp] from comment #15) > > You start Firefox with that flag set, i.e. in a terminal you'd type > > > > MOZ_SANDBOX_LOGGING=1 firefox > > > > and you should get logging in the console window. > > Firefox shouldn't be running when you do this, BTW. Shut it down first, then > do the above. Ok thanks. I attached logs for security.sandbox.content.level=2 and security.sandbox.content.level=3. This time I got around 20-22 FPS at security.sandbox.content.level=2, which is about what I get in Firefox 55. Still getting 1-2 FPS when security.sandbox.content.level=3
When security.sandbox.content.level=3: libGL error: unable to load driver: i915_bpo_dri.so libGL error: driver pointer missing libGL error: failed to load driver: i915_bpo Can't load the driver without read access?
So these are actual ENOENTs — the broker allowed access, but the file doesn't exist: Sandbox: Failed errno -2 op 0 flags 02000000 path /usr/lib/x86_64-linux-gnu/dri/tls/i915_bpo_dri.so Sandbox: Failed errno -2 op 0 flags 02000000 path /usr/lib/x86_64-linux-gnu/dri/i915_bpo_dri.so Sandbox: Failed errno -2 op 0 flags 02000000 path /usr/lib/x86_64-linux-gnu/mesa/dri/tls/i915_bpo_dri.so Sandbox: Failed errno -2 op 0 flags 02000000 path /usr/lib/x86_64-linux-gnu/mesa/dri/i915_bpo_dri.so Sandbox: Failed errno -2 op 0 flags 02000000 path /usr/lib/dri/tls/i915_bpo_dri.so Sandbox: Failed errno -2 op 0 flags 02000000 path /usr/lib/dri/i915_bpo_dri.so Maybe i915_bpo_dri.so isn't the right driver and it normally loads something else? The lines above that about sysfs, leading up to “MESA-LOADER: could not get parent device”, aren't quite what they look like — it should fall back other methods of detecting the GPU's PCI ID[*]. But I'm wondering if those might not all agree in this case. So here's something to try — corresponding to, if I'm following the library code correctly, the PCI ID detection that's failing (libudev) and the next fallback (sysfs), which should work: grep PCI_ID /sys/dev/char/226:0/../../uevent cat /sys/dev/char/226:0/device/{vendor,device} Those should give the same pair of hexadecimal numbers. Another thing to try is to run with MOZ_PERMISSIVE_CONTENT_SANDBOX=1 MOZ_SANDBOX_LOGGING=1 which will allow all file acccesses and log if they would have been denied. Trying that with Ubuntu 14.04 in a VM it looks like it would normally be reading from file in /run/udev/data, but that seems to just have description strings meant for humans and probably wouldn't be overriding the PCI ID or anything like that. [*] This is, basically, the code removed in https://cgit.freedesktop.org/mesa/mesa/commit/?id=be239326aa4f9317d42ee07f0f51179c8b3d5b22 — Mesa 13 completely changed how this works.
Here's another hypothesis: suppose this Mesa is built with HAVE_LIBUDEV and HAVE_LIBDRM but not HAVE_SYSFS, and drmGetVersion returns "i915_bpo" as the device name for some reason. Without sandboxing, libudev obtains the PCI ID, which is looked up in pci_id_driver_map.h by loader_get_driver_for_fd to find the driver name "i915", and i915_dri.so is loaded. With sandboxing, libudev fails because we're not allowing access to arbitrary things in /sys/devices, then the Mesa 12 version of drm_get_pci_id_for_fd also fails because "i915_bpo" isn't in the hardcoded list of names it understands (Mesa 13 fixes this), so loader_get_pci_id_for_fd fails completely and — this is the important part — loader_get_driver_for_fd falls back on using the name from libdrm as-is. And so it looks for i915_bpo_dri.so, which doesn't exist. So, drmGetVersion in libdrm is DRM_IOCTL_VERSION which winds up at drm_version() in the kernel, and that reads the drm_driver::name field. And indeed, in Ubuntu's kernel source, there's this: ubuntu/i915/i915_drv.c: .name = DRIVER_NAME, ubuntu/i915/i915_drv.h:#define DRIVER_NAME "i915_bpo" Also, the configure script in Mesa 12.0.6 says this: --enable-sysfs enable /sys PCI identification [default=disabled] I don't see Ubuntu's package setup (debian/rules and the debian/* directory in general) explicitly using --enable-sysfs, so that would explain why it would be disabled. (Which is a shame, because it ought to work perfectly with the sandbox policy as it stands.) In conclusion: ignore comment #21; I think I know what's going on here now.
Assignee: nobody → jld
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Component: Canvas: WebGL → Security: Process Sandboxing
Whiteboard: [gfx-noted] → [gfx-noted][sb+]
Comment on attachment 8914490 [details] Bug 1401666 - Adjust sandbox policy to allow Mesa 12 to use libudev for device identification. https://reviewboard.mozilla.org/r/185818/#review191038 ::: security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp:131 (Diff revision 1) > policy->AddDir(rdonly, "/nix/store"); > policy->AddDir(rdonly, "/run/host/fonts"); > policy->AddDir(rdonly, "/run/host/user-fonts"); > > // Bug 1384178: Mesa driver loader > policy->AddPrefix(rdonly, "/sys/dev/char/226:"); Is one of the reasons that this is failing that the driver is realpath-ing behind our back? (Thus, we can't resolve the symlink back to a subdir of this one?) ::: security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp:133 (Diff revision 1) > policy->AddDir(rdonly, "/run/host/user-fonts"); > > // Bug 1384178: Mesa driver loader > policy->AddPrefix(rdonly, "/sys/dev/char/226:"); > > + // Bug 1401666: Mesa driver loader part 2: Mesa <= 12 using libudev There's enough logic here that this can perhaps go in a helper function. (I know I'm not without fault myself in this function!)
Attachment #8914490 - Flags: review?(gpascutto) → review+
(In reply to Gian-Carlo Pascutto [:gcp] from comment #24) > ::: security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp:131 > (Diff revision 1) > > policy->AddDir(rdonly, "/nix/store"); > > policy->AddDir(rdonly, "/run/host/fonts"); > > policy->AddDir(rdonly, "/run/host/user-fonts"); > > > > // Bug 1384178: Mesa driver loader > > policy->AddPrefix(rdonly, "/sys/dev/char/226:"); > > Is one of the reasons that this is failing that the driver is realpath-ing > behind our back? (Thus, we can't resolve the symlink back to a subdir of > this one?) Yes, but it's not quite realpath(), and it's also looking at paths that aren't descendents of what it readlink()ed. From attachment 8912235 [details]: Sandbox: Recording mapping /sys/devices/pci0000:00/0000:00:02.0/drm/card0 -> /sys/dev/char/226:0 Sandbox: SandboxBroker: denied op=stat rflags=400000 perms=0 path=/sys/devices/pci0000:00/0000:00:02.0/drm/card0 for pid=6218 Sandbox: Failed errno -13 op 2 flags 0400000 path /sys/dev/char/../../devices/pci0000:00/0000:00:02.0/drm/card0 Sandbox: SandboxBroker: denied op=readlink rflags=0 perms=0 path=/sys/devices/pci0000:00/0000:00:02.0 for pid=6218 Sandbox: Failed errno -13 op 10 flags 00 path /sys/dev/char/../../devices/pci0000:00/0000:00:02.0 I've adjusted one of the comments in the patch to match reality a little better.
Pushed by jedavis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e5c54805bcc1 Adjust sandbox policy to allow Mesa 12 to use libudev for device identification. r=gcp
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Can confirm this bug no longer occurs for me as of 58.0a1 (2017-10-04)
Needs an uplift after a day or so.
Flags: needinfo?(jld)
It appears that I suddenly cannot build moz-central on my Gentoo box after this change: > 8:09.11 /media/ssd/mozilla/central/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp: In function ‘void mozilla::AddMesaSysfsPaths(mozilla::SandboxBroker::Policy*)’: > 8:09.11 /media/ssd/mozilla/central/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp:67:53: error: ‘major’ was not declared in this scope > 8:09.11 major(sb.st_rdev), > 8:09.11 ^ > 8:09.12 /media/ssd/mozilla/central/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp:68:53: error: ‘minor’ was not declared in this scope > 8:09.12 minor(sb.st_rdev), > 8:09.12 ^ > 8:09.91 gmake[5]: *** [/media/ssd/mozilla/central/config/rules.mk:1077: SandboxBrokerPolicyFactory.o] Error 1 > 8:09.91 gmake[4]: *** [/media/ssd/mozilla/central/config/recurse.mk:73: security/sandbox/linux/broker/target] Error 2 Any hints?
Hmm, it seems as though adding "#include <sys/sysmacros.h>" to SandboxBrokerPolicyFactory.cpp does the trick for me. Is that something we should add to this patch, jld?
Does #include <sys/types.h> work?
Flags: needinfo?(jld) → needinfo?(wisniewskit)
No, that was already being included by the file.
Flags: needinfo?(wisniewskit)
(In reply to Thomas Wisniewski from comment #34) > No, that was already being included by the file. So it is; I was looking at the wrong version. In any case, it turns out that we've had this problem before: bug 1329798.
Looks like this bug was verified by the original reporter. Please request Beta approval on this when you get a chance.
Status: RESOLVED → VERIFIED
Flags: needinfo?(jld)
Depends on: 1406845
If we uplift it, we will have to take bug 1406845 to fix a resource leak.
Comment on attachment 8914490 [details] Bug 1401666 - Adjust sandbox policy to allow Mesa 12 to use libudev for device identification. Approval Request Comment [Feature/Bug causing the regression]: Bug 1308400 (Linux content sandbox read restrictions) [User impact if declined]: Slow WebGL performance in some configurations [Is this code covered by automated tests?]: Somewhat. The code is run, but the sandbox policy changes aren't relevant unless WebGL is used with a real GPU (including the VMware virtual GPU, but excluding pure software rendering). [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: Maybe. The fix has been verified on nightly by the reporter. I'll post the STR I used as a separate comment, because I need to attach a file. [List of other uplifts needed for the feature/fix]: bug 1406845 (resource leak) and bug 1406233 (downstream build breakage) [Is the change risky?]: Low risk [Why is the change risky/not risky?]: It allows read access to files that otherwise wouldn't be allowed, and which were readable in 56. [String changes made/needed]: None
Flags: needinfo?(jld)
Attachment #8914490 - Flags: approval-mozilla-beta?
Attached file Library shim for portable STR (deleted) —
To reproduce this locally, because I don't have a Skylake I can spare for testing, I used a small preloaded shim library to replace the reported driver name with 'X's, to simulate a kernel driver name that doesn't exactly match the userland driver name. Build this with something like `cc -O1 -Wall -fPIC -shared -o drmxxx.so drmxxx.c -I/usr/include/drm -ldl` and use it by setting LD_PRELOAD=/absolute/path/to/drmxxx.so in Firefox's environment. As written this needs the libdrm-dev package (or equivalent for the distro; it's probably part of Firefox's build dependencies in any case). Changing the "libdrm.so" to "libdrm.so.2" in the source might also work (untested). It also requires Mesa 12 or older, because newer versions aren't affected by this bug; Ubuntu 14.04 works, and CentOS 7 might also work (untested). It also requires a physical GPU (where VMware counts as "physical" but other VM hosts may not), and not using the Nvidia proprietary drivers (which replace Mesa).
Comment on attachment 8914490 [details] Bug 1401666 - Adjust sandbox policy to allow Mesa 12 to use libudev for device identification. Recent regression in webgl performance, beta57+
Attachment #8914490 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Hi Jed, I nominated the two dependent patches for uplift. In future, please nominate the dependencies in the relevant bugs yourself.
Flags: needinfo?(jld)
Flags: needinfo?(jld)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: