Closed
Bug 1180942
Opened 9 years ago
Closed 9 years ago
Disable Xrender by default
Categories
(Core :: Graphics, defect)
Core
Graphics
Tracking
()
RESOLVED
FIXED
mozilla46
Tracking | Status | |
---|---|---|
firefox46 | --- | fixed |
People
(Reporter: jrmuizel, Unassigned)
References
(Depends on 1 open bug, Blocks 2 open bugs)
Details
(Whiteboard: [gfx-noted])
Attachments
(2 files)
(deleted),
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
(deleted),
text/html
|
Details |
Xrender is buggy, unreliably and drivers are moving to implement it on top of OpenGL in the xserver.
Comment 1•9 years ago
|
||
GTK3 try push (incl. talos) with failing tests updated and XRender off by default:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea115f6b650b
Updated•9 years ago
|
Whiteboard: [gfx-noted]
Comment 2•9 years ago
|
||
layers.use-image-offscreen-surfaces is preferable to switching off gfx.xrender.enabled because the former keeps render for accelerated compositing (if not using GL).
With a semi-recent cairo, as with --system-cairo on most systems, cairo provides async SHM upload where appropriate.
The SHM code in Gecko depends on XSync and whether it is suitable for OMTC is questionable.
Comment 3•9 years ago
|
||
try push with today's mozilla-central and layers/use-image-offscreen-surfaces set to true: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e17d285a5db7
For the most part we have reftest failures that just need to be fuzzed and timeouts. Since cairo's image backend doesn't use any extra threads, I expect timouts can be fixed by splitting the test suites. I haven't tried to run talos, but I expect some of the talos tests to regress since we'll be measuring time spent painting which is not part of the measurement when using cairo's xrender backend. I can say however than some sites paint noticeably faster with cairo's image backend than with xrender (for example treeherder on my computer), but that may depend the configuration.
Comment 4•9 years ago
|
||
another try push with layers.use-image-offscreen-surfaces set to true + some fuzzing.
I disabled one web-platform test and enabled 2 which were previously marked as failing. The fuzzing values are the same as windows without d2d, which is what we expect.
The try run looks good so far (as good as try runs can look these days)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=365c1cd0605a
Comment 5•9 years ago
|
||
Attachment #8708306 -
Flags: review?(jmuizelaar)
Reporter | ||
Updated•9 years ago
|
Attachment #8708306 -
Flags: review?(jmuizelaar) → review+
Comment 7•9 years ago
|
||
I expect that landing this will make some waves on the talos numbers (for better or worse). The reason it is hard to evaluate the changes it makes is that drawing before the patch is done asynchronously on the x server and after this patch it is done synchronously in gecko so we are now measuring things that we didn't measure before. xrender being unreliable and buggy (at least the way we use it), it turns out that the user experience is noticeably better with xrender disabled (so after the patch), I tested this on 5 different systems and other folks in the Paris office who tested the pref in the last few days made the same observations.
It is possible that on some systems (with some xorg and driver combinations) xrender behaves better than the image backend, but that wouldn't be the common case and the image backend also has the advantage of having reliable performance characteristics (it's code we ship rather than whatever version of the system we are running on), is much better tested and corresponds to what we run on android, b2g and windows xp.
So in short, if any worrying talos results show up (and I am not around because timezones), no worries, it's likely that we'll accept the change (not as a regression but as a difference in what is measured), and I'll come back and investigate to be sure. Don't be too quick to back this out.
Comment 8•9 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/22e333b0a82c for test failures, so you'll get a free chance at seeing what talos "regressions" you'll need to defend against.
Those linux64 debug reftest-2/reftest-e10s-1 OOM crashes in 632781-verybig.html in your try run were yours, intermittent at about a 50% failure rate. Not sure yet about the linux32 opt r-e10s failure in scroll-behavior-6.html, but I'm deeply suspicious about how it's there on your try run, and on a linux64 opt run on m-i, and I've never seen it anywhere else.
Comment 9•9 years ago
|
||
632781-verybig.html strikes again :( this test allocates a canvas that is so big that there is no way we can run it reliably unless we have the guarantee we have several very big chuncks of contiguous address space available. Since we are using shmem textures instead of pixmaps we have an extra allocation on the compositor side because compositing is still done with xrender. So I am not surprised that this test gets worse. I'll look into what I can do to save some memory, otherwise we'll need to disable the test.
Comment 10•9 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #8)
> Not sure yet about the linux32 opt r-e10s failure in
> scroll-behavior-6.html, but I'm deeply suspicious about how it's there on
> your try run, and on a linux64 opt run on m-i, and I've never seen it
> anywhere else.
I guess the 100 r-e10s re-triggers in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=604c30246eeb rule this out?
632781-verybig.html should get to a more "acceptable" failure rate with bug 774689.
I am hoping to re-land soon after bug 774689. On the bright side this patch enables 2 tests that were marked as failing before.
Comment 11•9 years ago
|
||
Yeah, dunno what's up with scroll-behavior-6.html, but I don't have any reason to believe it was you.
Comment 12•9 years ago
|
||
big old talos regressions:
http://alertmanager.allizom.org:8080/alerts.html?rev=604c30246eeb94ca89e0250e9841b6dc84cf5d19&table=1
and a few improvements ^ as well, those ended up looking like regressions when we backed this out:
http://alertmanager.allizom.org:8080/alerts.html?rev=765cf8a44b5a472f99ca93bf16c94761ce458861&showAll=1
:nical, I know there is a plan to reland this- can we try to address or ensure that we understand the perf impact prior to relanding?
Flags: needinfo?(nical.bugzilla)
Comment 13•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #12)
> :nical, I know there is a plan to reland this- can we try to address or
> ensure that we understand the perf impact prior to relanding?
I was writing a long and detailed reply and nightly crashed on me, so I'll just write the short summary and if you want more details, ping me on irc or ask here.
In short: we don't measure the same thing with xrender and with the image backend, there may be areas where xrender is really performing better, but in general I am confident that the image backend performs noticeably better in terms of what the user perceives (the asynchronous nature of xrender means what we measure does not correspond to the time it takes for frames to actually show up on the screen).
xrender (at least the way we use it) is buggy, not reliable, its performance and stability are dependent on the combination of xorg, hardware and driver versions, and adds a backend to worry about.
cairo-image on the other hand is our most well-tested drawing backend, has reliable performance characteristics, and makes us actually measure painting time.
When we switch to skia we'll have a more apple-to-apple comparison since skia's software renderer is synchronous like cairo-image.
Finally, this is just a pref change which we can easily revert if issues arise.
Flags: needinfo?(nical.bugzilla)
Comment 14•9 years ago
|
||
Comment 15•9 years ago
|
||
Shouldn't the patch also set gfx.xrender.enabled to false?
Comment 16•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox46:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Comment 17•9 years ago
|
||
(In reply to AnAkkk from comment #15)
> Shouldn't the patch also set gfx.xrender.enabled to false?
That's a different story but I'll look into it. gfx.xrender.enabled controls whether we use xrender on the compositor which is somewhat less problematic because the compositor doesn't produce complicated drawing command streams like web content can, so we are less likely to run into issues. But yeah the bug title says xrender in general and we only disabled it for content, here. I'll file a new bug about investigating what to do on the compositor.
Comment 18•9 years ago
|
||
Filed bug 1241832.
Comment 19•9 years ago
|
||
It looks like the commit here might have caused an emscripten browser test to fail on linux. This commit looks most relevant out of the range https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f36a4a4ff73f1184f569fd9152b98fe8d8d3866a&tochange=cea2883034cbb4485c1ee0047cd6a7cfe4b9b652
Full details in https://github.com/kripken/emscripten/issues/4069 where there are images of what this used to look like (and still does on chrome), and what it looks like now. If it would help I can create a standalone testcase of this outside of the emscripten test suite.
Comment 20•9 years ago
|
||
(In reply to Alon Zakai (:azakai) from comment #19)
> It looks like the commit here might have caused an emscripten browser test
> to fail on linux. This commit looks most relevant out of the range
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=f36a4a4ff73f1184f569fd9152b98fe8d8d3866a&tochange=cea2
> 883034cbb4485c1ee0047cd6a7cfe4b9b652
>
> Full details in https://github.com/kripken/emscripten/issues/4069 where
> there are images of what this used to look like (and still does on chrome),
> and what it looks like now. If it would help I can create a standalone
> testcase of this outside of the emscripten test suite.
It would definitely help if we can get a standalone testcase. Can you make one?
Flags: needinfo?(azakai)
Comment 22•9 years ago
|
||
(In reply to Alon Zakai (:azakai) from comment #21)
> Created attachment 8717531 [details]
> emscripten's browser.test_sdl_canvas_alpha combined output
>
> Sure, here's the combined html+js from that test.
I am not able to reproduce your issue through any combination of prefs. The text always shows up streaked orange, no green at all.
I tried toggling xrender and image-offscreen-surfaces both on and off, and it is all looking fine on nightly in Linux.
Any other info on how to reproduce?
Flags: needinfo?(azakai)
Comment 23•9 years ago
|
||
Sorry for the confusion, I should have been more clear:
1. The difference in output is between this: https://cloud.githubusercontent.com/assets/225351/12718337/b5023576-c8f6-11e5-8409-73b44bb580a2.png (correct) and this https://cloud.githubusercontent.com/assets/173661/12767737/d1890bf6-c9bf-11e5-87c1-d589b9269862.png (incorrect). (The green stuff is a secondary part of the test that Jukka was verifying as well, but it turned out not to be relevant for this.)
2. There aren't more steps to STR aside from running the test and seeing if it's like the first of the two links in this comment (the "smooth" one, which is like chrome) or the second (the "outline-ey" one, which seems wrong).
3. I don't see this on all machines. I see it on one machine but not another. Only differences between them is one is 64-bit and one is 32 (I assume not relevant), and one is intel GPU (that's what fails) and the other nvidia.
Flags: needinfo?(azakai)
Comment 24•9 years ago
|
||
(In reply to Alon Zakai (:azakai) from comment #23)
> Sorry for the confusion, I should have been more clear:
>
> 1. The difference in output is between this:
> https://cloud.githubusercontent.com/assets/225351/12718337/b5023576-c8f6-
> 11e5-8409-73b44bb580a2.png (correct) and this
> https://cloud.githubusercontent.com/assets/173661/12767737/d1890bf6-c9bf-
> 11e5-87c1-d589b9269862.png (incorrect). (The green stuff is a secondary part
> of the test that Jukka was verifying as well, but it turned out not to be
> relevant for this.)
>
> 2. There aren't more steps to STR aside from running the test and seeing if
> it's like the first of the two links in this comment (the "smooth" one,
> which is like chrome) or the second (the "outline-ey" one, which seems
> wrong).
>
> 3. I don't see this on all machines. I see it on one machine but not
> another. Only differences between them is one is 64-bit and one is 32 (I
> assume not relevant), and one is intel GPU (that's what fails) and the other
> nvidia.
Yes, the output is always correct whatever I try. I can't get it to produce the incorrect output.
Comment 25•9 years ago
|
||
If you set gfx.xrender.enabled to false, does the issue go away?
Flags: needinfo?(azakai)
Comment 26•9 years ago
|
||
Note: this caused the regression in bug 1245421 - the number of simultaneous fd's in use (primarily for shmem segments) spikes, going over 1024 (default on most systems) or even over 4096, depending on timing and other variables when viewing (for example) the treeherder beta page.
Depends on: 1245421
Comment 27•9 years ago
|
||
Sorry, bug 1245241
Comment 28•9 years ago
|
||
I still see the problem with that pref set to false and a restart.
Here's my graphics info (when that pref is set to false) if that helps:
Graphics
Adapter Description Intel Open Source Technology Center -- Mesa DRI Intel(R) Ivybridge Mobile
Asynchronous Pan/Zoom wheel input enabled
Device ID Mesa DRI Intel(R) Ivybridge Mobile
Driver Version 3.0 Mesa 10.1.3
GPU Accelerated Windows 0/1 Basic (OMTC)
Supports Hardware H264 Decoding No
Vendor ID Intel Open Source Technology Center
WebGL Renderer Intel Open Source Technology Center -- Mesa DRI Intel(R) Ivybridge Mobile
windowLayerManagerRemote true
AzureCanvasBackend cairo
AzureContentBackend cairo
AzureFallbackCanvasBackend none
AzureSkiaAccelerated 0
CairoUseXRender 0
Flags: needinfo?(azakai)
Comment 29•9 years ago
|
||
I checked another linux machine of mine, and it also shows the problem. It's using AMD with Gallium drivers. So I guess not GPU specific? Both machines where I see the problem are on Ubuntu 14.04, 64-bit, that's all I can think of that's similar between them.
Comment 30•9 years ago
|
||
When I change gfx.canvas.azure.backends from "cairo" to "skia" then the problem goes away.
Comment 31•9 years ago
|
||
(In reply to Alon Zakai (:azakai) from comment #30)
> When I change gfx.canvas.azure.backends from "cairo" to "skia" then the
> problem goes away.
We're more or less going to make that much official in bug 1244754. Provided that there are no unforeseen blockers/consequences, we're working on making Skia the default canvas for Linux in 48. So that's one way to fix it...
You need to log in
before you can comment on or make changes to this bug.
Description
•