Closed
Bug 900012
Opened 11 years ago
Closed 11 years ago
crash in mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
VERIFIED
FIXED
mozilla25
People
(Reporter: zcampbell, Assigned: nical)
References
Details
(4 keywords, Whiteboard: [b2g-crash][fromAutomation])
Crash Data
Attachments
(1 file, 1 obsolete file)
(deleted),
patch
|
Details | Diff | Splinter Review |
Just came in on today's mc engineering build:
Gecko http://hg.mozilla.org/mozilla-central/rev/c2b375f3a909
Gaia 9bfceaa90e8b92a379432b67121afa3cd3f14c90
BuildID 20130731030205
Version 25.0a1
Steps to replicate:
1. Load camera app
2. Take a photo
3. Tap gallery app
4. Device will crashy crash
Reporter | ||
Comment 1•11 years ago
|
||
An additional crash report with a different STR
https://crash-stats.mozilla.com/report/index/e71dfc17-680c-4398-9ba2-1f4392130731
1. adb push videofile.mp4 sdcard/
2. Load videoplayer app
3. Wait for pushed video file to appear
4. Tap video file to play
5. Device says no.
Replicated manually.
Automation shows it crashes with other video formats too.
Updated•11 years ago
|
Whiteboard: [fromAutomation] → [b2g-crash][fromAutomation]
Updated•11 years ago
|
Component: General → Graphics: Layers
Product: Boot2Gecko → Core
Version: unspecified → Trunk
Updated•11 years ago
|
Keywords: regression,
reproducible
Updated•11 years ago
|
Blocks: b2g-central-dogfood
Comment 2•11 years ago
|
||
Joe - Any ideas on what caused this regression? We just saw this on today's m-c b2g build.
Flags: needinfo?(joe)
Comment 3•11 years ago
|
||
More simple steps:
1. Load camera app
2. Tap on the video icon to switch source
Comment 4•11 years ago
|
||
I don't, but perhaps Nical does.
Flags: needinfo?(joe) → needinfo?(nical.bugzilla)
Comment 5•11 years ago
|
||
Looks like this might be caused by the work going on in bug 858914.
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #5)
> Looks like this might be caused by the work going on in bug 858914.
Actually, Bug 858914 is aiming at fixing this kind of bugs.
So far, nothing of the work in 858914 that landed affects B2G (it is preffed off and I am still working on the B2G part of that bug).
Another bug that is worth looking at is Bug 879681 where we want to remove PGrallocBufferActor which is a source off so many problems. But it's not trivial and I have a lot to do with 858914 already so I wouldn't expect a patch for 879681 in the very short term (except if someone wants to take over this bug).
Comment 7•11 years ago
|
||
What device? Also, this implies that it just showed up in 2013-07-31, but can we confirm that it was ok on 2013-07-30 or not?
Flags: needinfo?(zcampbell)
Keywords: regressionwindow-wanted
Reporter | ||
Comment 8•11 years ago
|
||
Unagi device (this is in the crash report too)
This crash was definitely not in yesterday's 2013-07-30 master build.
This showed up in our first test automation run on the 2013-07-31 build today and hit 5-6 of our tests pretty hard; it was quite visible :)
Flags: needinfo?(zcampbell)
Comment 9•11 years ago
|
||
So that makes the regression range between 7/30/2013 - 7/31/2013 on the gecko side.
I see that a large patch landed from graphics layers in bu 866232. I see a smaller patch that landed in bug 899730 as well.
Comment 11•11 years ago
|
||
So, http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3d40d270c031&tochange=c2b375f3a909 as the regression window?
Comment 12•11 years ago
|
||
Nical's bug 898914 being pref'd off non-withstanding, I'd still like to confirm that didn't break it. Benoit, can you do a quick test, Nical's probably gone for the day. It's this one: http://hg.mozilla.org/mozilla-central/rev/313445f455f3
Flags: needinfo?(bjacob)
Comment 13•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #11)
> So,
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=3d40d270c031&tochange=c2b375f3a909 as the regression
> window?
Looks like it.
Keywords: regressionwindow-wanted
Comment 14•11 years ago
|
||
Issue looks similar to issue in bug 880780 + crash; after user launching camera app he see green screen before crash.
Crash ID: bp-0309ef6f-4c90-4c17-9710-4a8c82130731
Comment 15•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #12)
> Nical's bug 898914 being pref'd off non-withstanding, I'd still like to
> confirm that didn't break it. Benoit, can you do a quick test, Nical's
> probably gone for the day. It's this one:
> http://hg.mozilla.org/mozilla-central/rev/313445f455f3
Confirmed that this cset is responsible for this regression.
With 313445f455f3, camera app locks up (Camera process still visible in adb shell b2g-ps, though)
With the parent cset 91837985ae91, camera app works fine.
-> regressed by bug 858914
Flags: needinfo?(bjacob)
Comment 16•11 years ago
|
||
http://youtu.be/1Do6SXtKn9M video of issue happening.
Comment 17•11 years ago
|
||
Thanks Benoit. So can we backout the changeset in question here to get this working again?
Comment 18•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #17)
> Thanks Benoit. So can we backout the changeset in question here to get this
> working again?
RyanVM is currently working on backing out the changeset in question.
Comment 19•11 years ago
|
||
This is a huge changeset, I have no idea how hard it will be to back out. Its author is in Europe, too, so I wouldn't him to be available until tomorrow.
Comment 20•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #19)
> This is a huge changeset, I have no idea how hard it will be to back out.
> Its author is in Europe, too, so I wouldn't him to be available until
> tomorrow.
Yup. Discussed in IRC - nical is going to look into this first thing tomorrow his time.
Updated•11 years ago
|
Crash Signature: [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)] → [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)]
[@ nsXPConnect::XPConnect()]
Assignee | ||
Comment 21•11 years ago
|
||
The crash happens in some hacky code that was written in order to handle corner cases where the lifetime of Gralloc buffers is hard to track: When we set a gralloc buffer to a TextureHost we register this texture host to the the gralloc buffer so that the latter can ask one last thing (ForgetBuffer()) to the TextureHost before being destroyed.
It looks very much like the registered TextureHost is destroyed when the GrallocBufferActor's destructor tries to call forget buffer on it (dangling pointer crash).
bjacob you understand this code better than me so please let me know if I missed something.
In GrallocDeprecatedTextureHostOGL::SwapTextureImpl, we register to the gralloc buffer we just received, but we don't unregister from the previous one.
Also we don't seem to set the previous gralloc as the back buffer of the TextureHost. if we were doing any of those two i'd probably understand how the thing holds together but right now it's like we have always been lucky that the TextureHost would outlibe the GrallocBufferActor.
The funkiness here is that (I think) double buffering at the texture host level is only used in ImageLayers for async video. So subtle brokenness in GrallocDeprecatedTextureHost's double buffering will not break ThebesLayers (hich is the most exercised code path and probably where we were testing the register gralloc hack thingy).
Also Benoit, why do we need to call ForgetBuffer() in the GrallocBufferActor's destructor?
what harm is there to keep the android::GraphicBuffer alive a bit longer?
Comment 22•11 years ago
|
||
We are looking for a solution today (Aug 1); if not found, we will prepare a backout for tomorrow (Aug 2).
Assignee | ||
Comment 23•11 years ago
|
||
This patch reverts a one-line chunk that should not have been part of the patch part 4 in 858914, that made async-video single buffered. It fixes the crash, although I am not sure why. double buffering for async video was meant to avoid flashing on desktop.
Assignee: nobody → nical.bugzilla
Attachment #784606 -
Flags: review?(matt.woodrow)
Comment 24•11 years ago
|
||
Comment on attachment 784606 [details] [diff] [review]
make async-video double buffered.
Review of attachment 784606 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/thebes/gfxPlatform.h
@@ +564,5 @@
> */
> bool PreferMemoryOverShmem() const;
> + bool UseDeprecatedTextures() const {
> + if (!mLayersUseDeprecated) MOZ_CRASH("SHOULD USE DEPRECATED TEXTURES!");
> + return mLayersUseDeprecated;
Is this intentional? I'm hoping to have this enabled on OSX asap.
Attachment #784606 -
Flags: review?(matt.woodrow) → review+
Assignee | ||
Comment 25•11 years ago
|
||
The patch without the debug assertion
Attachment #784606 -
Attachment is obsolete: true
Assignee | ||
Comment 26•11 years ago
|
||
Assignee | ||
Comment 27•11 years ago
|
||
The image in the screen is still all messed up (green) on my unagi, though. Did we already have that before 858914? If I take pictures or videos, the image/video that is saved is correct, but not the image on the screen while I am filming.
Whiteboard: [b2g-crash][fromAutomation] → [b2g-crash][fromAutomation][leave-open]
Comment 28•11 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #27)
> The image in the screen is still all messed up (green) on my unagi, though.
> Did we already have that before 858914? If I take pictures or videos, the
> image/video that is saved is correct, but not the image on the screen while
> I am filming.
Yup. That's bug 880780.
Assignee | ||
Comment 29•11 years ago
|
||
With the patch applied, I also can't reproduce the crash described in comment 1.
Whiteboard: [b2g-crash][fromAutomation][leave-open] → [b2g-crash][fromAutomation]
Comment 31•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Comment 32•11 years ago
|
||
Works for me on Hamachi now.
Updated•11 years ago
|
Crash Signature: [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)]
[@ nsXPConnect::XPConnect()] → [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)]
[@ mozilla::docshell::POfflineCacheUpdateParent::DestroySubtree(mozilla::ipc::IProtocolManager<mozilla::ipc::…
Keywords: topcrash
Comment 34•11 years ago
|
||
Are you sure it hasn't regressed since because I see crashes in 25.0a1/20130805 (see https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayers%3A%3AGrallocBufferActor%3A%3AActorDestroy%28mozilla%3A%3Aipc%3A%3AIProtocolManager%3Cmozilla%3A%3Aipc%3A%3ARPCChannel%3A%3ARPCListener%3E%3A%3AActorDestroyReason%29)?
Flags: needinfo?(nical.bugzilla)
Reporter | ||
Comment 35•11 years ago
|
||
OK Scoobi, you are in luck, I have replicated it! Totally by coincidence I might add..
STR:
1. Push an image to the sdcard/gallery
2. Open contacts app
3. Tap the 'Add picture' to add a pic/avatar
4. Select 'From gallery'
5. Choose the image pushed in step 1
6. Crop the image a bit
7. Tap the tickbox
Device will crash.
Device: Unagi
Build:
Gecko http://hg.mozilla.org/mozilla-central/rev/ad0ae007aa9e
Gaia 0ca0dba246d1372eb815772c8108395ab0abd0a4
BuildID 20130805070203
Version 25.0a1
Assignee | ||
Comment 36•11 years ago
|
||
The crash here happens in some very badly broken code that tries to stitch together even more broken code. It is in a fragile equilibrium and it will break again until we get to fix the architectural badness that got us in this situation. Ironically bug 858914 is largely motivated by fixing the broken ownership model of gralloc buffers (the code that crashed) and it also incidentally modified code that led this to regress (the modification has been backed out by the patch in this bug).
My point is: Ownership model of gralloc buffers is badly broken and IMO there are two things that can really help in the long run:
* Bug 858914 Get the new textures to work with gralloc (what i am working on at the moment) which will remove a portion of the brokenness.
* Bug 879681 remove GrallocBufferActor which turns a safe abstraction (android::GraphicBuffer) into an unsafe segfault machine.
In the short run we need to keep stitching but this crash can be caused by pretty much anything that touches the fragile equilibrim of DeprecatedGrallocTextureHostOGL and GrallocBufferActor
Flags: needinfo?(nical.bugzilla)
Comment 37•11 years ago
|
||
(In reply to Zac C (:zac) from comment #35)
> STR:
> 1. Push an image to the sdcard/gallery
> 2. Open contacts app
> 3. Tap the 'Add picture' to add a pic/avatar
> 4. Select 'From gallery'
> 5. Choose the image pushed in step 1
> 6. Crop the image a bit
> 7. Tap the tickbox
> Device will crash.
It seems bug 901705.
Reporter | ||
Comment 38•11 years ago
|
||
Yes same STR but different crash report.
I filed this one:
https://crash-stats.mozilla.com/report/index/50c5c6bd-97b6-454b-b5c0-834772130806
Updated•11 years ago
|
Comment 39•11 years ago
|
||
We still can reproduce this with the STR described in comment #35
Gecko http://hg.mozilla.org/mozilla-central/rev/fd4cf30428b0
Gaia ae1905a5bf10f064a86731e2bff195c2bccf620f
BuildID 20130808040203
Version 26.0a1
Reopening this so we can xfail the test and track this crash
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 40•11 years ago
|
||
Florin, file a new bug, likely the same underlying issue as bug 901705 and bug 902927 (see comment 36).
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Comment 41•11 years ago
|
||
Attempted STR from Comment 0 and unable to reproduce reported result.
Using below Environmental Variables, device did not crash following STR on Buri device
Environmental Variables
Build ID: 20130830040204
Gecko: http://hg.mozilla.org/mozilla-central/rev/c7459bc8e449
Gaia: 407fbfb6a9de68ec4db2f0f3dc6c67463e293f47
Platform Version: 26.0a1
RIL Version: 01.01.00.019.206
Base Image 20130823
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•