Closed Bug 956325 Opened 11 years ago Closed 11 years ago

crash in mozalloc_abort(char const*) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result)

Categories

(Core :: IPC, defect)

29 Branch
ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

VERIFIED FIXED
1.3 C2/1.4 S2(17jan)
blocking-b2g 1.3+
Tracking Status
firefox27 --- wontfix
firefox28 --- verified
firefox29 --- verified
b2g-v1.3 --- fixed
b2g-v1.4 --- fixed

People

(Reporter: nkot, Assigned: gwagner)

References

Details

(Keywords: crash, regression, Whiteboard: [b2g-crash], [systemsfe], [CR 596211])

Crash Data

Attachments

(5 files, 2 obsolete files)

This bug was filed from the Socorro interface and is report bp-a1f9b222-b5aa-4941-a285-240342140103. ============================================================= Hit this crash today going through FTE after manually flashed my Buri to 20140103040201 build I am not sure if it can be reproduced using these STR: 1) Updated Buri to BuildID: 20140103040201 2) Reset device from Settings 3) Go through FTU (I also downloaded Facebook and Outlook contacts) 4) Tap on Privacy policy link 5) Tap Everything.me link Actual: Crash occurs Expected: NO crashes occur during FTE Environmental Variables: Device: Buri v1.4 (Master M-C) Mozilla RIL BuildID: 20140103040201 Gaia: 83cc63f728489a24256731adf558354bb2012a59 Gecko: 49d2fce9a86c Version: 29.0a1 Firmware Version: v1.2_20131115
blocking-b2g: --- → 1.4?
Component: General → IPC
Keywords: regression
Product: Firefox OS → Core
Version: unspecified → 29 Branch
I hit this crash as well during first run but I got the same stack as in Bug 952170. Will try to reproduce.
I hit this particular crash twice using the same STR, cannot reproduce it 100% though. Bug 952170 is also happening to me.
this crash also reproduced on 1.3 build Device: Buri v1.3 Mozilla RIL BuildID: 20140106004001 Gaia: 35a60b82f8cf2d759939a350e2dadbb9d8b2f5dc Gecko: a43cb4b322d3 Version: 28.0a2 Firmware Version: v1.2_20131115 same STR: 1) Updated Buri to BuildID: 20140103040201 2) Reset device from Settings 3) Go through FTU: Sign in to WiFi, Set Time Zone to America/LA, Download Facebook and Outlook contacts 4) Tap on Your Privacy link 5) Tap FerifoxOS and then Marketplace links 6) Tap Everything.me link ==> device crashes
blocking-b2g: 1.4? → 1.3?
Whiteboard: [b2g-crash]
Andrew - Can you find someone to look into this? We're getting hit by this crash daily in 1.3 testing.
Flags: needinfo?(overholt)
This is ###!!! ABORT: aborting because of MsgRouteError The most likely explanation for the error here is that we're racing: * parent is sending a message to an IPDL actor * child is destroying the actor The crash stack itself isn't going to be much use. We're going to need to catch this in a debugger or run a debug build with MOZ_IPC_MESSAGE_LOG and capture the log from both processes.
Jason, can whoever runs into this during smoketesting run with a debug build and MOZ_IPC_MESSAGE_LOG=1?
Flags: needinfo?(overholt) → needinfo?(jsmith)
(In reply to Andrew Overholt [:overholt] from comment #6) > Jason, can whoever runs into this during smoketesting run with a debug build > and MOZ_IPC_MESSAGE_LOG=1? On the QA side, we don't have debug device builds, so I don't think we would be able to investigate this unless someone can spin a build for us.
Flags: needinfo?(jsmith)
Andrew, How do we plan to proceed forward with this?
Flags: needinfo?(overholt)
Jason told me he was working with releng to get debug device builds. If that's not happening soon, I suggest we ask Gregor or someone on the Systems FE team to get bsmedberg/bent the requested logs.
Flags: needinfo?(overholt)
Flags: needinfo?(jsmith)
Flags: needinfo?(anygregor)
(In reply to Andrew Overholt [:overholt] from comment #9) > Jason told me he was working with releng to get debug device builds. If > that's not happening soon, I suggest we ask Gregor or someone on the Systems > FE team to get bsmedberg/bent the requested logs. It's in progress, but I don't expect this happen in a short period of time.
Flags: needinfo?(jsmith)
blocking-b2g: 1.3? → 1.3+
Regression window for v1.3: ~does not reproduce~ BuildID: 20140102004001 Gaia: 01e9da49be2cc4bc134eeefc434740d572ec2246 Gecko: 61f553e5db49 Version: 28.0a2 ~reproduces~ BuildID: 20140103004001 Gaia: ae7d05689b6b9ac4ec6182217dfdef06be28e886 Gecko: d9226a660d52 Version: 28.0a2 Occurred earlier on master (1.4) build, can find regression window there if needed, so far - reproduces on 01/02 master build but does not reproduce on 12/23 master build. Used STR from comment 3 to get a regression range
I tried with debug build and logging enabled but I can't reproduce this bug :(
Flags: needinfo?(anygregor)
i'm going to record a video, maybe it can help
Okay, following these STR after resetting device from Settings I can reproduce this crash 100%. I've tried it on 3 different devices. Video : http://youtu.be/esl9cdN51EQ
Thanks. bent and my guess is that we run into an OOM situation. I also noticed that during entering the password for the gmail contacts the keyboard app got killed.
Gregor, Can you please find someone to work on this blocker?
Flags: needinfo?(anygregor)
(In reply to Gregor Wagner [:gwagner] from comment #16) > Thanks. > bent and my guess is that we run into an OOM situation. > I also noticed that during entering the password for the gmail contacts the > keyboard app got killed. We already have some similar report on Buri (but for v1.1 as far as I can tell), in bug 945043.
Well not similar, but OOM issues.
(In reply to Preeti Raghunath(:Preeti) from comment #17) > Gregor, > > Can you please find someone to work on this blocker? Alex will take a look.
Flags: needinfo?(anygregor)
Right now I can't take a look because bug 958732 is kicking in before I can do anything in FTU.
Depends on: 958732
Depends on: 958780
Attached file buri.log (obsolete) (deleted) —
This is the adb logcat of the device with a debug build. It looks like I'm running into another crash :(
I'm testing with Inari, my Buri is not able to get WiFi working, I've already spent too much time fighting with this :(
(In reply to Natalya Kot [:nkot] from comment #3) > this crash also reproduced on 1.3 build > > Device: Buri v1.3 Mozilla RIL > BuildID: 20140106004001 > Gaia: 35a60b82f8cf2d759939a350e2dadbb9d8b2f5dc > Gecko: a43cb4b322d3 > Version: 28.0a2 > Firmware Version: v1.2_20131115 > > same STR: > 1) Updated Buri to BuildID: 20140103040201 > 2) Reset device from Settings > 3) Go through FTU: Sign in to WiFi, Set Time Zone to America/LA, Download > Facebook and Outlook contacts > 4) Tap on Your Privacy link > 5) Tap FerifoxOS and then Marketplace links > 6) Tap Everything.me link > ==> device crashes Are the time zone and contacts download mandatory ?
\o/ reproduced on Inari: > 1) Reset device from Settings > 2) Go through FTU: Sign in to WiFi > 3) Tap on Your Privacy link > 4) Tap FerifoxOS and then Marketplace links
Attached file Debug buid: adb logcat (deleted) —
Attachment #8359100 - Attachment is obsolete: true
And now hitting bug 959126 while trying to reproduce.
It seems we have a 'Browser' process being stuck. Killing it makes my homescreen coming back.
FYI Browser status was 't'.
Attachment #8359123 - Attachment mime type: text/x-log → text/plain
Attachment #8359124 - Attachment mime type: text/x-log → text/plain
Attachment #8359125 - Attachment mime type: text/x-log → text/plain
(In reply to Alexandre LISSY :gerard-majax from comment #24) > Are the time zone and contacts download mandatory ? It was a sure way to repro this crash. I tried going straight to Privacy link and crash didn't reproduce 100%, still could get it like 3/5... so, didn't mean to make things over complicated, thank you for working on that!
Attached patch 956325.diff (obsolete) (deleted) — Splinter Review
bent's patch.
Assignee: nobody → anygregor
Attachment #8359485 - Flags: review?(bugs)
Attachment #8359485 - Flags: review?(bugs) → review+
Comment on attachment 8359485 [details] [diff] [review] 956325.diff Er, no, we have mIsDestroyed checks in TabParent.cpp
Attachment #8359485 - Flags: review+ → review-
(In reply to Olli Pettay [:smaug] from comment #34) > Er, no, we have mIsDestroyed checks in TabParent.cpp Yikes, that is really fragile. http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.h#218 no longer overrides http://mxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#387 :(
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #35) > http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.h#218 no > longer overrides > http://mxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#387 Is it supposed to? Nobody should be calling [2] except for [1], right? I think we might need another mIsDestroyed check at [3], maybe. But yeah, this is super fragile. Adding some MOZ_OVERRIDE annotations on things would help robustify stuff but probably not completely. [1] http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#765 [2] http://mxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#387 [3] http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#807
Hrm, I thought so (the other Send[*] messages in nsEventStateManager::DispatchCrossProcessEvent do override the IPDL method), but now I'm not so sure about this. I'll poke around some more tomorrow.
Attached image screenshot (deleted) —
I was unable to repro the crash in today's master but scrolling in the E.me Privacy link I hit another issue, lots of overlapping text - see screenshot attached. Can it be any fallback from the recent work done here or it's a different issue?
filed new bug 959781 for the issue in comment 38
Attached patch 956325.diff (deleted) — Splinter Review
Attachment #8359485 - Attachment is obsolete: true
I still see the crash with the patch attached: Program received signal SIGSEGV, Segmentation fault. 0xb630419a in mozalloc_abort (msg=<optimized out>) at ../../../memory/mozalloc/mozalloc_abort.cpp:30 30 MOZ_CRASH(); (gdb) bt #0 0xb630419a in mozalloc_abort (msg=<optimized out>) at ../../../memory/mozalloc/mozalloc_abort.cpp:30 #1 0xb4d170bc in Abort (aMsg=0xbedeb7e4 "[Child 3685] ###!!! ABORT: aborting because of MsgRouteError: file ../../../dom/ipc/ContentChild.cpp, line 1136") at ../../../xpcom/base/nsDebugImpl.cpp:427 #2 NS_DebugBreak (aSeverity=<optimized out>, aStr=0xb6601d59 "aborting because of MsgRouteError", aExpr=0x0, aFile=0xb66019ed "../../../dom/ipc/ContentChild.cpp", aLine=1136) at ../../../xpcom/base/nsDebugImpl.cpp:414 #3 0xb53ff702 in mozilla::dom::ContentChild::ProcessingError (this=<optimized out>, what=<optimized out>) at ../../../dom/ipc/ContentChild.cpp:1136 #4 0xb4f0ac98 in mozilla::dom::PContentChild::OnProcessingError (this=<optimized out>, code=<optimized out>) at PContentChild.cpp:4491 #5 0xb4ee40de in mozilla::ipc::MessageChannel::MaybeHandleError (this=0xb3e44c48, code=mozilla::ipc::HasResultCodes::MsgRouteError, channelName=<optimized out>) at ../../../ipc/glue/MessageChannel.cpp:1493 #6 0xb4ee7060 in mozilla::ipc::MessageChannel::OnMaybeDequeueOne (this=0xb3e44c48) at ../../../ipc/glue/MessageChannel.cpp:1029 #7 0xb4ee3b60 in DispatchToMethod<mozilla::ipc::MessageChannel, void (mozilla::ipc::MessageChannel::*)()> (method= (void (mozilla::ipc::MessageChannel::*)(mozilla::ipc::MessageChannel * const)) 0xb4ee6fcd <mozilla::ipc::MessageChannel::OnMaybeDequeueOne()>, obj=<optimized out>, arg=<optimized out>) at ../../../ipc/chromium/src/base/tuple.h:383 #8 RunnableMethod<mozilla::ipc::MessageChannel, void (mozilla::ipc::MessageChannel::*)(), Tuple0>::Run (this=<optimized out>) at ../../../ipc/chromium/src/base/task.h:307 #9 0xb4ee45c8 in Run (this=<optimized out>) at ../../dist/include/mozilla/ipc/MessageChannel.h:376 #10 mozilla::ipc::MessageChannel::DequeueTask::Run (this=<optimized out>) at ../../dist/include/mozilla/ipc/MessageChannel.h:393
(In reply to Gregor Wagner [:gwagner] from comment #41) > I still see the crash with the patch attached: That is bug 959886.
Depends on: 959886
The patch in bug 959886 + this patch fix the crash for me!
Gregor, is this patch ready for review?
Flags: needinfo?(anygregor)
Attachment #8360050 - Flags: review?(bugs)
Flags: needinfo?(anygregor)
Comment on attachment 8360050 [details] [diff] [review] 956325.diff I don't see how MapEventCoordinatesForChildProcess could cause anything bad, but MaybeForwardEventToRenderFrame might. So move the if to be under MaybeForwardEventToRenderFrame.
Attachment #8360050 - Flags: review?(bugs) → review+
Whiteboard: [b2g-crash] → [b2g-crash], [systemsfe]
Target Milestone: --- → 1.3 C2/1.4 S2(17jan)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [b2g-crash], [systemsfe] → [b2g-crash], [systemsfe], [CR596211]
Whiteboard: [b2g-crash], [systemsfe], [CR596211] → [b2g-crash], [systemsfe], [CR 596211]
this crash still consistently reproduces on v1.3 (bp-4aa8f907-68a2-4458-9df7-dca512140117, so far unable to repro on master.. will test it next week or if someone else can try it too, will probably have to reopen the bug Buri v1.3 BuildID: 20140117004005 Gaia: a81ccdc53e45a6adeaae423e104e91bcc1e12b0e Gecko: 2c033140eff4 Version: 28.0a2 Firmware Version: v1.2-device.cfg
(In reply to Natalya Kot [:nkot] from comment #49) > this crash still consistently reproduces on v1.3 > (bp-4aa8f907-68a2-4458-9df7-dca512140117, so far unable to repro on master.. > will test it next week or if someone else can try it too, will probably have > to reopen the bug > [...] > Gecko: 2c033140eff4 This gecko revision is a descendent of that for Gregor's patch on Aurora so that means it probably didn't fix this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Did it include the fix for bug 959886? Both were needed to pass local testing in SF.
Gregor, ni for on your attention and radar.
Flags: needinfo?(anygregor)
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #51) > Did it include the fix for bug 959886? Both were needed to pass local > testing in SF. Don't think so. That patch landed at 8:46 am PST on Friday, which our daily nightly 1.3 builds wouldn't have included. Looks like we need to retest this next week. Going to reclose on that basis & flagging verifyme to verify the crash no longer reproduces in a build from next week.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Flags: needinfo?(anygregor)
Resolution: --- → FIXED
Keywords: verifyme
Verified fixed. The crash does not reproduce anymore on 01/21 master and v1.3. BuildID: 20140121040201 Gaia: e218d17ae7d01a81d48f833cd6fafb4e11b26cd8 Gecko: cdc0ab2c0cba Version: 29.0a1 BuildID: 20140121004137 Gaia: 47049555282a9a01fb60d1e1421b57e2810c96f5 Gecko: 6f7dfe36ab6c Version: 28.0a2 Firmware Version: v1.2-device.cfg
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: