Closed
Bug 637828
Opened 14 years ago
Closed 2 years ago
mozilla.org project transitions show graphics corruption
Categories
(Core :: Graphics, defect)
Core
Graphics
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: stechz, Assigned: jrmuizel)
References
()
Details
Attachments
(4 files, 3 obsolete files)
(deleted),
image/png
|
Details | |
(deleted),
patch
|
joe
:
review+
pavlov
:
approval2.0-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
(deleted),
image/png
|
Details |
Seems to only reproduce on Android.
Steps:
1. Navigate to http://mozilla.org
2. Wait for transition from Firefox to Thunderbird
There are graphics corruption issues after the fade is finished. See attachment.
Reporter | ||
Updated•14 years ago
|
tracking-fennec: --- → ?
Reporter | ||
Comment 1•14 years ago
|
||
This only seems to happen for smaller scales? After a certain amount of zooming in, this problem doesn't occur. I'm assuming the magic resolution range for this corruption is <=1.0.
I see this on desktop in portrait orientation.
Assignee: nobody → jones.chris.g
STR in test-ipcbrowser
(1) Load http://mozilla.org
(2) Set resolution to 0.5/0.5
Any resolution <1.0 seems to tickle the bug. Curiously, the corruption is
- ABSENT in a vanilla desktop-firefox build
- ABSENT in a vanilla desktop-firefox build with MOZ_LAYERS_FORCE_SHMEM_SURFACES=1
- ABSENT in a vanilla desktop-fennec build (guessing from comment 1)
- present in desktop-fennec build with --enable-mobile-optimize
So it looks like the secret sauce needed to trigger this is --enable-mobile-optimize. The only significant difference between that and MOZ_LAYERS_FORCE_SHMEM_SURFACES=1 in a vanilla build wrt shadow layers is the image resampling algorithms used. (The only other difference is that slightly different code is used to composite layers to the Gtk2 window, but since this bug appears on android too I think we can rule that out.)
This smells like a pixman bug :/.
This makes the bug go away
diff --git a/gfx/layers/ipc/ShadowLayers.cpp b/gfx/layers/ipc/ShadowLayers.cpp
--- a/gfx/layers/ipc/ShadowLayers.cpp
+++ b/gfx/layers/ipc/ShadowLayers.cpp
@@ -393,21 +393,17 @@ ShadowLayerForwarder::GetParentBackendTy
return mParentBackend;
}
static gfxASurface::gfxImageFormat
OptimalFormatFor(gfxASurface::gfxContentType aContent)
{
switch (aContent) {
case gfxASurface::CONTENT_COLOR:
-#ifdef MOZ_GFX_OPTIMIZE_MOBILE
- return gfxASurface::ImageFormatRGB16_565;
-#else
return gfxASurface::ImageFormatRGB24;
-#endif
case gfxASurface::CONTENT_ALPHA:
Looks like pixman is choking somewhere trying to composite to/from 565 surfaces. I'd guess a problem with rgba32->565. Digging.
OS: Android → All
This bug is gone when I --enable-system-cairo for a local cairo 1.10 build with my distro's pixman (0.18.4).
Halp! Do we cherry-pick upstream patches in this situation?
Judging by the date of the git commit we've got in-tree, it looks we're on pixman ~0.18.2. Is that right?
Assignee | ||
Comment 7•14 years ago
|
||
The pixman in tree should be from between 0.19.4 and 0.19.6. Does the patch on bug 604168 fix it for you?
Yeah, I got confused by an out-of-date README and then sidetracked for a bit. pixman-version.h says 0.19.1, FWIW.
I will give that a shot.
Bad news: I tried a local build of cairo 1.10 with a local build of pixman 0.21.2 (being careful to check LD_DEBUG), and the bug was gone. That's sort of good.
But, with a local build of pixman 0.18.4 running with local cairo 1.10, the bug was also gone. 0.18.4 looks to be older than our in-tree pixman. That means the remaining possible culprits are
- cairo 1.9 vs 1.10 (sigh)
- some patch we have applied to pixman in-tree
- some patch we have applied to cairo in-tree
(The bug also doesn't appear with in-tree cairo+pixman when I force the Xlib backend to use rgb565 surfaces instead of rgb24, but that's not too surprising.)
In-tree cairo claims to be 1.9.5 (cairo-features.h.in), not sure how accurate that is. I tried some cairo versions around there
1.9.2 - no tee backend
1.9.4 - crash
1.9.6 - crash
1.9.8 - OK
So, maybe something was fixed around 1.9.6-1.9.8.
I spent some time looking over the cairo log to see if there was anything that looked related to this bug. Nothing /really/ stood out, though there were a few patches that looked somewhat relevant. I tried applying them and deps, but turns out that some quite substantial cairo changes, many of them to cairo-image-surface, landed the day after the rev we imported last. (Not coincidentally, I imagine! :) .) So attempting to cherry-pick looks to be a lost cause, at least with my cairo-fu.
One of the substantial changes was http://cgit.freedesktop.org/cairo/commit/?id=b9407af6a4bc792c1bcb52c90aa8a618627bb618, referencing the tantalizing goodies
image-rgba firefox-talos-gfx-0 342050.17 (342155.88 0.02%) -> 69412.44 (69702.90 0.21%): 4.93x speedup
███▉
[snip]
image-rgba firefox-planet-gnome-0 217512.61 (217636.80 0.06%) -> 123341.02 (123641.94 0.12%): 1.76x speedup
▊
It's of course unclear how those numbers would translate to m-c tip. This probably wasn't all that appealing in the past because we haven't used the image backend for anything "serious". We are now, on mobile. I don't think this is news to anyone here.
So, I would be remiss to not pursue this: What would it take to land cairo 1.10 for fennec 4, if *only* for fennec 4? We have a patch in bug 562746. We can land cairo 1.10 on a branch if need be. Some folks who can hack on regression fixes are clear of their FF4 blockers (like me).
The alternative, possibly leaving this bug and maybe others unfixed and leaving possibly significant perf wins on the table, because we end up shipping a year-out-of-date cairo, doesn't sound very good.
Let the flames begin!
Tentatively setting dep; a cairo expert could probably directly patch this bug or cherry-pick a fix, but that would take some time for me and I'd rather settle comment 11 first.
Depends on: 562746
Another option which I believe has been bounced around before is shipping a libcairo.so built from scratch and building --enable-system-cairo. This shouldn't be a problem on android since we don't have to worry about conflicting symbols.
That sounds like a reasonable plan
Comment 15•14 years ago
|
||
(In reply to comment #11)
> So, I would be remiss to not pursue this: What would it take to land cairo 1.10
> for fennec 4, if *only* for fennec 4? We have a patch in bug 562746. We can
> land cairo 1.10 on a branch if need be. Some folks who can hack on regression
> fixes are clear of their FF4 blockers (like me).
That's a great idea in general. I think that cairo 1.10.x should be now in a *much* better shape than 1.9 snapshots. Once the release schedule for cairo 1.10 had been announced, a lot of people joined testing efforts and found numerous bugs which are now fixed. Also cairo 1.10 now got into some linux distros where additional testing and bugfixing happened.
Regarding fennec, special care needs to be taken not to forget doing cairo_surface_flush/cairo_surface_mark_dirty calls in places where necessary:
http://cairographics.org/manual-1.0.2/cairo-cairo-surface-t.html#cairo-surface-mark-dirty
http://cairographics.org/manual-1.0.2/cairo-cairo-surface-t.html#cairo-surface-flush
New version of cairo is less forgiving in the cases when this is not done properly (see bug 616700 as an example). Also these flush/dirty calls are only relevant for image backend and are for example noops for xlib, so the bugs may show up in fennec only.
Also in the linux world, the in-tree pixman/cairo copies are irrelevant because everyone is using system libraries anyway.
Comment 16•14 years ago
|
||
(In reply to comment #15)
> Regarding fennec, special care needs to be taken not to forget doing
> cairo_surface_flush/cairo_surface_mark_dirty calls in places where necessary:
> http://cairographics.org/manual-1.0.2/cairo-cairo-surface-t.html#cairo-surface-mark-dirty
> http://cairographics.org/manual-1.0.2/cairo-cairo-surface-t.html#cairo-surface-flush
And I think that in order to make this missing cairo_surface_flush/cairo_surface_mark_dirty calls hunt more efficient, the use of valgrind API could be used. We need to track any accesses to surface memory buffers performed by anything other than cairo/pixman and happening at a wrong time (not between flush and dirty calls). And this is exactly what valgrind can do easily.
Reporter | ||
Comment 17•14 years ago
|
||
I'm not confident about upgrading cairo. We would be adding a lot of unknowns for Fennec's release candidate. Updating cairo is the kind of thing we want to do at the beginning of the release cycle.
Are there any other solutions that can circumvent this bug?
Reporter | ||
Comment 18•14 years ago
|
||
Siarhei: I've found several places where we are not flushing and marking dirty. I could use some help figuring out how to set up valgrind to find the instances we care about.
Comment 19•14 years ago
|
||
It should be possible to patch cairo to make use of VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_NOACCESS or VALGRIND_MAKE_MEM_UNDEFINED client requests:
http://valgrind.org/docs/manual/mc-manual.html
So that 'cairo_image_surface_create_for_data' and 'cairo_surface_mark_dirty' would use VALGRIND_MAKE_MEM_NOACCESS on the surface memory buffer and 'cairo_surface_flush' would use VALGRIND_MAKE_MEM_DEFINED. After this is done, valgrind would complain about any attempts to access this buffer when it is not marked as 'defined'. Additionally either every cairo function also should mark buffer as defined on entry and restore its original state on exit in order not to catch itself, or just the valgrind log needs to be filtered to ignore any memory accesses which have cairo functions showing up in a backtrace.
Reporter | ||
Comment 20•14 years ago
|
||
Is there any easy function to call that gives me the number of bytes for an image, given its format, width, height, and stride?
No, need to compute from GetSize() and BytePerPixelFromFormat(). But total size isn't typically useful information on its own for an arbitrary image surface, except for accounting; what are you trying to do?
Reporter | ||
Comment 22•14 years ago
|
||
VALGRIND_MAKE_MEM_DEFINED needs to know the size of the memory region.
Yeah, you can't do that unless the stride is the width*bpp. Need to mark images row-by-row otherwise.
(Probably simplest just to use a single row-by-row impl, since this isn't terribly perf critical.)
Updated•14 years ago
|
Updated•14 years ago
|
tracking-fennec: ? → 2.0+
Assignee | ||
Comment 25•14 years ago
|
||
If someone can come up with a reduced test case, that might help here.
I tried for a bit, but this page is animating using jquery junk and I couldn't reproduce the problem with tests that just animated the opacity of <img>'s and <div>'s and text. In fact, it was after I tried to make a small test case that I asked you about cairo-trace last night ;).
I'm sure it's doable with some work.
Reporter | ||
Comment 27•14 years ago
|
||
I think the key features that tickle this bug are:
1) Something with opacity and gradients
2) Some sort of timed invalidation involving these gradients
Updated•14 years ago
|
Whiteboard: [needs-cairo-update]
Assignee | ||
Comment 28•14 years ago
|
||
Removing the use of slow operation 'SRC with a mask' helps this problem.
Assignee | ||
Comment 29•14 years ago
|
||
I don't see any reason to use SOURCE here. Pixman will already turn OVER with RGB24 source into COPY when it can. Plus if we use SOURCE with a MASK it will turn into a very slow path, including hitting pixman's general code.
Attachment #516979 -
Flags: review?(roc)
Assignee | ||
Comment 30•14 years ago
|
||
Attachment #516979 -
Attachment is obsolete: true
Attachment #516979 -
Flags: review?(roc)
Assignee | ||
Updated•14 years ago
|
Attachment #516983 -
Flags: review?(roc)
This patch sounds good on its own, but is the underlying bug something we're likely to trip on elsewhere?
Comment 32•14 years ago
|
||
Jeff tells me that the ADD composite operation seems to be wrong, but it might be the input to that operation in clip_and_composite_source.
Comment 33•14 years ago
|
||
Comment on attachment 516983 [details] [diff] [review]
Version that builds
This is removing an optimization that was added in bug 418494 - an optimization that Jeff actually questioned in bug 418494 comment 4. Further, Jeff tells me that pixman is already smart enough to optimize to source when it makes sense, like when we have no alpha channel. Finally, Jeff also tells me that using source requires us to draw through a mask, while over can be a lot faster.
So, as long as this doesn't regress us on Try, I'm happy with this patch!
Attachment #516983 -
Flags: review?(roc) → review+
Updated•14 years ago
|
Whiteboard: [needs-cairo-update] → [can-land]
Assignee | ||
Comment 34•14 years ago
|
||
(In reply to comment #32)
> Jeff tells me that the ADD composite operation seems to be wrong, but it might
> be the input to that operation in clip_and_composite_source.
I've confirmed that inputs to the composite operation
at http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-surface-fallback.c#365 are fine and the result is wrong.
This suggests that compositing a scaled RGB24 source with a A8 mask, the ADD operation and a RGB565 destination surface is broken.
Knowing this should make this much easier to fix. Unfortunately, I may not be able to debug this soon (before Monday) So I'd be happy if someone else took a look.
Reporter | ||
Comment 35•14 years ago
|
||
This is exciting news! I'm no expert, so if anyone else wants to look please do. In the mean time, I think I'll poke around...
Reporter | ||
Comment 36•14 years ago
|
||
BTW, there is still another corruption bug, even when all paths to cairo-surface-fallback.c:365 are gone. You can see it on the download button. If these are the same bug, perhaps there's an underlying draw path to blame.
Reporter | ||
Comment 37•14 years ago
|
||
No luck debugging this. We'll need someone else to take a look.
Assignee | ||
Comment 38•14 years ago
|
||
I tried reproducing the problem with a standalone test case. So either, there's something else special going on here or I missed something.
Comment 39•14 years ago
|
||
Note that I haven't seen any evidence of try Talos results, which is what I want before landing this change.
Whiteboard: [can-land] → [can-land][can land only if it passed try]
Assignee | ||
Comment 40•14 years ago
|
||
The problem is:
the destination pixman image has a transform.
Cairo does not expect this.
Assignee | ||
Comment 41•14 years ago
|
||
(In reply to comment #40)
> The problem is:
> the destination pixman image has a transform.
>
> Cairo does not expect this.
It seems like this happens when we use the destination as a transformed source. The solution to this problem, is not obvious. Upstream has completely rewritten this code.
Assignee | ||
Comment 42•14 years ago
|
||
This fixes the problem for me. Hopefully, it fixes all the other problem too.
Assignee: jones.chris.g → jmuizelaar
Attachment #517529 -
Flags: review?
Comment 43•14 years ago
|
||
Comment on attachment 517529 [details] [diff] [review]
Reset the transform on the destination
too much patch
Attachment #517529 -
Flags: review? → review-
Assignee | ||
Comment 44•14 years ago
|
||
This time without the debugging code.
Attachment #517529 -
Attachment is obsolete: true
Attachment #517535 -
Flags: review?
Comment 45•14 years ago
|
||
Comment on attachment 517535 [details] [diff] [review]
Reset the transform on the destination v2
just enough patch
Attachment #517535 -
Flags: review? → review+
Assignee | ||
Comment 46•14 years ago
|
||
It seems like the code at fault here is really pixman. I think that pixman should probably not be using a transformed fetch on the destination image under any circumstances.
For example, in this case we're fetching destination pixels from a different part of the image than we're storing them to. I can't see any reason for wanting this behaviour.
Reporter | ||
Comment 47•14 years ago
|
||
This fixes everything that updating cairo fixed for me! Let's get both patches in!
Assignee | ||
Comment 48•14 years ago
|
||
(In reply to comment #47)
> This fixes everything that updating cairo fixed for me! Let's get both patches
> in!
I'm not in as much of a rush to take the first patch because of any potential risk. If someone can show a measurable difference, it would make it more attractive.
Assignee | ||
Comment 49•14 years ago
|
||
Here's a final version ready for pushing.
Attachment #517535 -
Attachment is obsolete: true
Attachment #517578 -
Flags: review+
Assignee | ||
Updated•14 years ago
|
Whiteboard: [can-land][can land only if it passed try] → [can-land]
Updated•14 years ago
|
Attachment #516983 -
Flags: approval2.0-
Updated•14 years ago
|
Keywords: checkin-needed
Whiteboard: [can-land]
Reporter | ||
Comment 52•14 years ago
|
||
Pushed http://hg.mozilla.org/mozilla-central/rev/6dab3f1dfa0e
Leaving open for other patch.
tracking-fennec: 2.0+ → ---
Keywords: checkin-needed
Comment 53•14 years ago
|
||
(In reply to comment #46)
> It seems like the code at fault here is really pixman. I think that pixman
> should probably not be using a transformed fetch on the destination image under
> any circumstances.
I wonder if this is actually fixed in pixman already: http://cgit.freedesktop.org/pixman/commit/?id=7f4eabbeec92e55fd8f812c0e5d8568eacbb633d
Comment 54•14 years ago
|
||
Comment 55•14 years ago
|
||
Should we also take this on the relbranch?
blocking2.0: --- → ?
Whiteboard: [asking for .x]
Comment 56•14 years ago
|
||
Removing - jrmuizel tells me I'm seeing a different bug!
blocking2.0: ? → ---
Whiteboard: [asking for .x]
Comment 57•14 years ago
|
||
(In reply to comment #54)
> Another use case: "ghost pixels" when panning regular text
I've filed bug 641506 for that issue, which seems to be newer than this bug.
Updated•2 years ago
|
Severity: normal → S3
Comment 58•2 years ago
|
||
Unable to reproduce in current versions.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•