Closed Bug 1180942 Opened 9 years ago Closed 9 years ago

Disable Xrender by default

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: jrmuizel, Unassigned)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files)

Xrender is buggy, unreliably and drivers are moving to implement it on top of OpenGL in the xserver.
Blocks: 1164534
Depends on: 1180971
Depends on: 1183818
GTK3 try push (incl. talos) with failing tests updated and XRender off by default: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea115f6b650b
Whiteboard: [gfx-noted]
layers.use-image-offscreen-surfaces is preferable to switching off gfx.xrender.enabled because the former keeps render for accelerated compositing (if not using GL). With a semi-recent cairo, as with --system-cairo on most systems, cairo provides async SHM upload where appropriate. The SHM code in Gecko depends on XSync and whether it is suitable for OMTC is questionable.
try push with today's mozilla-central and layers/use-image-offscreen-surfaces set to true: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e17d285a5db7 For the most part we have reftest failures that just need to be fuzzed and timeouts. Since cairo's image backend doesn't use any extra threads, I expect timouts can be fixed by splitting the test suites. I haven't tried to run talos, but I expect some of the talos tests to regress since we'll be measuring time spent painting which is not part of the measurement when using cairo's xrender backend. I can say however than some sites paint noticeably faster with cairo's image backend than with xrender (for example treeherder on my computer), but that may depend the configuration.
another try push with layers.use-image-offscreen-surfaces set to true + some fuzzing. I disabled one web-platform test and enabled 2 which were previously marked as failing. The fuzzing values are the same as windows without d2d, which is what we expect. The try run looks good so far (as good as try runs can look these days) https://treeherder.mozilla.org/#/jobs?repo=try&revision=365c1cd0605a
Attached patch don't use xrender for painting (deleted) — Splinter Review
Attachment #8708306 - Flags: review?(jmuizelaar)
Attachment #8708306 - Flags: review?(jmuizelaar) → review+
I expect that landing this will make some waves on the talos numbers (for better or worse). The reason it is hard to evaluate the changes it makes is that drawing before the patch is done asynchronously on the x server and after this patch it is done synchronously in gecko so we are now measuring things that we didn't measure before. xrender being unreliable and buggy (at least the way we use it), it turns out that the user experience is noticeably better with xrender disabled (so after the patch), I tested this on 5 different systems and other folks in the Paris office who tested the pref in the last few days made the same observations. It is possible that on some systems (with some xorg and driver combinations) xrender behaves better than the image backend, but that wouldn't be the common case and the image backend also has the advantage of having reliable performance characteristics (it's code we ship rather than whatever version of the system we are running on), is much better tested and corresponds to what we run on android, b2g and windows xp. So in short, if any worrying talos results show up (and I am not around because timezones), no worries, it's likely that we'll accept the change (not as a regression but as a difference in what is measured), and I'll come back and investigate to be sure. Don't be too quick to back this out.
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/22e333b0a82c for test failures, so you'll get a free chance at seeing what talos "regressions" you'll need to defend against. Those linux64 debug reftest-2/reftest-e10s-1 OOM crashes in 632781-verybig.html in your try run were yours, intermittent at about a 50% failure rate. Not sure yet about the linux32 opt r-e10s failure in scroll-behavior-6.html, but I'm deeply suspicious about how it's there on your try run, and on a linux64 opt run on m-i, and I've never seen it anywhere else.
632781-verybig.html strikes again :( this test allocates a canvas that is so big that there is no way we can run it reliably unless we have the guarantee we have several very big chuncks of contiguous address space available. Since we are using shmem textures instead of pixmaps we have an extra allocation on the compositor side because compositing is still done with xrender. So I am not surprised that this test gets worse. I'll look into what I can do to save some memory, otherwise we'll need to disable the test.
Depends on: 774689
(In reply to Phil Ringnalda (:philor) from comment #8) > Not sure yet about the linux32 opt r-e10s failure in > scroll-behavior-6.html, but I'm deeply suspicious about how it's there on > your try run, and on a linux64 opt run on m-i, and I've never seen it > anywhere else. I guess the 100 r-e10s re-triggers in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=604c30246eeb rule this out? 632781-verybig.html should get to a more "acceptable" failure rate with bug 774689. I am hoping to re-land soon after bug 774689. On the bright side this patch enables 2 tests that were marked as failing before.
Yeah, dunno what's up with scroll-behavior-6.html, but I don't have any reason to believe it was you.
big old talos regressions: http://alertmanager.allizom.org:8080/alerts.html?rev=604c30246eeb94ca89e0250e9841b6dc84cf5d19&table=1 and a few improvements ^ as well, those ended up looking like regressions when we backed this out: http://alertmanager.allizom.org:8080/alerts.html?rev=765cf8a44b5a472f99ca93bf16c94761ce458861&showAll=1 :nical, I know there is a plan to reland this- can we try to address or ensure that we understand the perf impact prior to relanding?
Flags: needinfo?(nical.bugzilla)
(In reply to Joel Maher (:jmaher) from comment #12) > :nical, I know there is a plan to reland this- can we try to address or > ensure that we understand the perf impact prior to relanding? I was writing a long and detailed reply and nightly crashed on me, so I'll just write the short summary and if you want more details, ping me on irc or ask here. In short: we don't measure the same thing with xrender and with the image backend, there may be areas where xrender is really performing better, but in general I am confident that the image backend performs noticeably better in terms of what the user perceives (the asynchronous nature of xrender means what we measure does not correspond to the time it takes for frames to actually show up on the screen). xrender (at least the way we use it) is buggy, not reliable, its performance and stability are dependent on the combination of xorg, hardware and driver versions, and adds a backend to worry about. cairo-image on the other hand is our most well-tested drawing backend, has reliable performance characteristics, and makes us actually measure painting time. When we switch to skia we'll have a more apple-to-apple comparison since skia's software renderer is synchronous like cairo-image. Finally, this is just a pref change which we can easily revert if issues arise.
Flags: needinfo?(nical.bugzilla)
Shouldn't the patch also set gfx.xrender.enabled to false?
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
(In reply to AnAkkk from comment #15) > Shouldn't the patch also set gfx.xrender.enabled to false? That's a different story but I'll look into it. gfx.xrender.enabled controls whether we use xrender on the compositor which is somewhat less problematic because the compositor doesn't produce complicated drawing command streams like web content can, so we are less likely to run into issues. But yeah the bug title says xrender in general and we only disabled it for content, here. I'll file a new bug about investigating what to do on the compositor.
Blocks: 1216664
Blocks: 1191775
Depends on: 1244398
It looks like the commit here might have caused an emscripten browser test to fail on linux. This commit looks most relevant out of the range https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f36a4a4ff73f1184f569fd9152b98fe8d8d3866a&tochange=cea2883034cbb4485c1ee0047cd6a7cfe4b9b652 Full details in https://github.com/kripken/emscripten/issues/4069 where there are images of what this used to look like (and still does on chrome), and what it looks like now. If it would help I can create a standalone testcase of this outside of the emscripten test suite.
(In reply to Alon Zakai (:azakai) from comment #19) > It looks like the commit here might have caused an emscripten browser test > to fail on linux. This commit looks most relevant out of the range > https://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=f36a4a4ff73f1184f569fd9152b98fe8d8d3866a&tochange=cea2 > 883034cbb4485c1ee0047cd6a7cfe4b9b652 > > Full details in https://github.com/kripken/emscripten/issues/4069 where > there are images of what this used to look like (and still does on chrome), > and what it looks like now. If it would help I can create a standalone > testcase of this outside of the emscripten test suite. It would definitely help if we can get a standalone testcase. Can you make one?
Flags: needinfo?(azakai)
Sure, here's the combined html+js from that test.
Flags: needinfo?(azakai)
(In reply to Alon Zakai (:azakai) from comment #21) > Created attachment 8717531 [details] > emscripten's browser.test_sdl_canvas_alpha combined output > > Sure, here's the combined html+js from that test. I am not able to reproduce your issue through any combination of prefs. The text always shows up streaked orange, no green at all. I tried toggling xrender and image-offscreen-surfaces both on and off, and it is all looking fine on nightly in Linux. Any other info on how to reproduce?
Flags: needinfo?(azakai)
Sorry for the confusion, I should have been more clear: 1. The difference in output is between this: https://cloud.githubusercontent.com/assets/225351/12718337/b5023576-c8f6-11e5-8409-73b44bb580a2.png (correct) and this https://cloud.githubusercontent.com/assets/173661/12767737/d1890bf6-c9bf-11e5-87c1-d589b9269862.png (incorrect). (The green stuff is a secondary part of the test that Jukka was verifying as well, but it turned out not to be relevant for this.) 2. There aren't more steps to STR aside from running the test and seeing if it's like the first of the two links in this comment (the "smooth" one, which is like chrome) or the second (the "outline-ey" one, which seems wrong). 3. I don't see this on all machines. I see it on one machine but not another. Only differences between them is one is 64-bit and one is 32 (I assume not relevant), and one is intel GPU (that's what fails) and the other nvidia.
Flags: needinfo?(azakai)
(In reply to Alon Zakai (:azakai) from comment #23) > Sorry for the confusion, I should have been more clear: > > 1. The difference in output is between this: > https://cloud.githubusercontent.com/assets/225351/12718337/b5023576-c8f6- > 11e5-8409-73b44bb580a2.png (correct) and this > https://cloud.githubusercontent.com/assets/173661/12767737/d1890bf6-c9bf- > 11e5-87c1-d589b9269862.png (incorrect). (The green stuff is a secondary part > of the test that Jukka was verifying as well, but it turned out not to be > relevant for this.) > > 2. There aren't more steps to STR aside from running the test and seeing if > it's like the first of the two links in this comment (the "smooth" one, > which is like chrome) or the second (the "outline-ey" one, which seems > wrong). > > 3. I don't see this on all machines. I see it on one machine but not > another. Only differences between them is one is 64-bit and one is 32 (I > assume not relevant), and one is intel GPU (that's what fails) and the other > nvidia. Yes, the output is always correct whatever I try. I can't get it to produce the incorrect output.
If you set gfx.xrender.enabled to false, does the issue go away?
Flags: needinfo?(azakai)
Note: this caused the regression in bug 1245421 - the number of simultaneous fd's in use (primarily for shmem segments) spikes, going over 1024 (default on most systems) or even over 4096, depending on timing and other variables when viewing (for example) the treeherder beta page.
Depends on: 1245421
Depends on: 1245241
No longer depends on: 1245421
I still see the problem with that pref set to false and a restart. Here's my graphics info (when that pref is set to false) if that helps: Graphics Adapter Description Intel Open Source Technology Center -- Mesa DRI Intel(R) Ivybridge Mobile Asynchronous Pan/Zoom wheel input enabled Device ID Mesa DRI Intel(R) Ivybridge Mobile Driver Version 3.0 Mesa 10.1.3 GPU Accelerated Windows 0/1 Basic (OMTC) Supports Hardware H264 Decoding No Vendor ID Intel Open Source Technology Center WebGL Renderer Intel Open Source Technology Center -- Mesa DRI Intel(R) Ivybridge Mobile windowLayerManagerRemote true AzureCanvasBackend cairo AzureContentBackend cairo AzureFallbackCanvasBackend none AzureSkiaAccelerated 0 CairoUseXRender 0
Flags: needinfo?(azakai)
I checked another linux machine of mine, and it also shows the problem. It's using AMD with Gallium drivers. So I guess not GPU specific? Both machines where I see the problem are on Ubuntu 14.04, 64-bit, that's all I can think of that's similar between them.
When I change gfx.canvas.azure.backends from "cairo" to "skia" then the problem goes away.
(In reply to Alon Zakai (:azakai) from comment #30) > When I change gfx.canvas.azure.backends from "cairo" to "skia" then the > problem goes away. We're more or less going to make that much official in bug 1244754. Provided that there are no unforeseen blockers/consequences, we're working on making Skia the default canvas for Linux in 48. So that's one way to fix it...
Blocks: 1251554
Depends on: ship-gtk3
Depends on: 1281993
Blocks: 1100744
Blocks: 1181006
Blocks: 1005501
Depends on: 1297178
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: