Closed
Bug 956325
Opened 11 years ago
Closed 11 years ago
crash in mozalloc_abort(char const*) | NS_DebugBreak | mozilla::dom::ContentChild::ProcessingError(mozilla::ipc::HasResultCodes::Result)
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: nkot, Assigned: gwagner)
References
Details
(Keywords: crash, regression, Whiteboard: [b2g-crash], [systemsfe], [CR 596211])
Crash Data
Attachments
(5 files, 2 obsolete files)
This bug was filed from the Socorro interface and is
report bp-a1f9b222-b5aa-4941-a285-240342140103.
=============================================================
Hit this crash today going through FTE after manually flashed my Buri to 20140103040201 build
I am not sure if it can be reproduced using these STR:
1) Updated Buri to BuildID: 20140103040201
2) Reset device from Settings
3) Go through FTU (I also downloaded Facebook and Outlook contacts)
4) Tap on Privacy policy link
5) Tap Everything.me link
Actual:
Crash occurs
Expected:
NO crashes occur during FTE
Environmental Variables:
Device: Buri v1.4 (Master M-C) Mozilla RIL
BuildID: 20140103040201
Gaia: 83cc63f728489a24256731adf558354bb2012a59
Gecko: 49d2fce9a86c
Version: 29.0a1
Firmware Version: v1.2_20131115
Updated•11 years ago
|
blocking-b2g: --- → 1.4?
Component: General → IPC
Keywords: regression
Product: Firefox OS → Core
Version: unspecified → 29 Branch
Comment 1•11 years ago
|
||
I hit this crash as well during first run but I got the same stack as in Bug 952170. Will try to reproduce.
Reporter | ||
Comment 2•11 years ago
|
||
I hit this particular crash twice using the same STR, cannot reproduce it 100% though. Bug 952170 is also happening to me.
Updated•11 years ago
|
Keywords: regressionwindow-wanted
Reporter | ||
Comment 3•11 years ago
|
||
this crash also reproduced on 1.3 build
Device: Buri v1.3 Mozilla RIL
BuildID: 20140106004001
Gaia: 35a60b82f8cf2d759939a350e2dadbb9d8b2f5dc
Gecko: a43cb4b322d3
Version: 28.0a2
Firmware Version: v1.2_20131115
same STR:
1) Updated Buri to BuildID: 20140103040201
2) Reset device from Settings
3) Go through FTU: Sign in to WiFi, Set Time Zone to America/LA, Download Facebook and Outlook contacts
4) Tap on Your Privacy link
5) Tap FerifoxOS and then Marketplace links
6) Tap Everything.me link
==> device crashes
Updated•11 years ago
|
blocking-b2g: 1.4? → 1.3?
Updated•11 years ago
|
Whiteboard: [b2g-crash]
Comment 4•11 years ago
|
||
Andrew - Can you find someone to look into this? We're getting hit by this crash daily in 1.3 testing.
Flags: needinfo?(overholt)
Comment 5•11 years ago
|
||
This is ###!!! ABORT: aborting because of MsgRouteError
The most likely explanation for the error here is that we're racing:
* parent is sending a message to an IPDL actor
* child is destroying the actor
The crash stack itself isn't going to be much use. We're going to need to catch this in a debugger or run a debug build with MOZ_IPC_MESSAGE_LOG and capture the log from both processes.
Comment 6•11 years ago
|
||
Jason, can whoever runs into this during smoketesting run with a debug build and MOZ_IPC_MESSAGE_LOG=1?
Flags: needinfo?(overholt) → needinfo?(jsmith)
Comment 7•11 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #6)
> Jason, can whoever runs into this during smoketesting run with a debug build
> and MOZ_IPC_MESSAGE_LOG=1?
On the QA side, we don't have debug device builds, so I don't think we would be able to investigate this unless someone can spin a build for us.
Flags: needinfo?(jsmith)
Comment 8•11 years ago
|
||
Andrew,
How do we plan to proceed forward with this?
Flags: needinfo?(overholt)
Comment 9•11 years ago
|
||
Jason told me he was working with releng to get debug device builds. If that's not happening soon, I suggest we ask Gregor or someone on the Systems FE team to get bsmedberg/bent the requested logs.
Flags: needinfo?(overholt)
Flags: needinfo?(jsmith)
Flags: needinfo?(anygregor)
Comment 10•11 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #9)
> Jason told me he was working with releng to get debug device builds. If
> that's not happening soon, I suggest we ask Gregor or someone on the Systems
> FE team to get bsmedberg/bent the requested logs.
It's in progress, but I don't expect this happen in a short period of time.
Flags: needinfo?(jsmith)
Updated•11 years ago
|
blocking-b2g: 1.3? → 1.3+
Reporter | ||
Comment 11•11 years ago
|
||
Regression window for v1.3:
~does not reproduce~
BuildID: 20140102004001
Gaia: 01e9da49be2cc4bc134eeefc434740d572ec2246
Gecko: 61f553e5db49
Version: 28.0a2
~reproduces~
BuildID: 20140103004001
Gaia: ae7d05689b6b9ac4ec6182217dfdef06be28e886
Gecko: d9226a660d52
Version: 28.0a2
Occurred earlier on master (1.4) build, can find regression window there if needed, so far - reproduces on 01/02 master build but does not reproduce on 12/23 master build.
Used STR from comment 3 to get a regression range
Updated•11 years ago
|
Keywords: regressionwindow-wanted
Comment 12•11 years ago
|
||
Assignee | ||
Comment 13•11 years ago
|
||
I tried with debug build and logging enabled but I can't reproduce this bug :(
Flags: needinfo?(anygregor)
Reporter | ||
Comment 14•11 years ago
|
||
i'm going to record a video, maybe it can help
Reporter | ||
Comment 15•11 years ago
|
||
Okay, following these STR after resetting device from Settings I can reproduce this crash 100%. I've tried it on 3 different devices.
Video : http://youtu.be/esl9cdN51EQ
Assignee | ||
Comment 16•11 years ago
|
||
Thanks.
bent and my guess is that we run into an OOM situation.
I also noticed that during entering the password for the gmail contacts the keyboard app got killed.
Comment 17•11 years ago
|
||
Gregor,
Can you please find someone to work on this blocker?
Flags: needinfo?(anygregor)
Comment 18•11 years ago
|
||
(In reply to Gregor Wagner [:gwagner] from comment #16)
> Thanks.
> bent and my guess is that we run into an OOM situation.
> I also noticed that during entering the password for the gmail contacts the
> keyboard app got killed.
We already have some similar report on Buri (but for v1.1 as far as I can tell), in bug 945043.
Comment 19•11 years ago
|
||
Well not similar, but OOM issues.
Assignee | ||
Comment 20•11 years ago
|
||
(In reply to Preeti Raghunath(:Preeti) from comment #17)
> Gregor,
>
> Can you please find someone to work on this blocker?
Alex will take a look.
Flags: needinfo?(anygregor)
Assignee | ||
Comment 21•11 years ago
|
||
Right now I can't take a look because bug 958732 is kicking in before I can do anything in FTU.
Depends on: 958732
Comment 22•11 years ago
|
||
This is the adb logcat of the device with a debug build. It looks like I'm running into another crash :(
Comment 23•11 years ago
|
||
I'm testing with Inari, my Buri is not able to get WiFi working, I've already spent too much time fighting with this :(
Comment 24•11 years ago
|
||
(In reply to Natalya Kot [:nkot] from comment #3)
> this crash also reproduced on 1.3 build
>
> Device: Buri v1.3 Mozilla RIL
> BuildID: 20140106004001
> Gaia: 35a60b82f8cf2d759939a350e2dadbb9d8b2f5dc
> Gecko: a43cb4b322d3
> Version: 28.0a2
> Firmware Version: v1.2_20131115
>
> same STR:
> 1) Updated Buri to BuildID: 20140103040201
> 2) Reset device from Settings
> 3) Go through FTU: Sign in to WiFi, Set Time Zone to America/LA, Download
> Facebook and Outlook contacts
> 4) Tap on Your Privacy link
> 5) Tap FerifoxOS and then Marketplace links
> 6) Tap Everything.me link
> ==> device crashes
Are the time zone and contacts download mandatory ?
Comment 25•11 years ago
|
||
\o/ reproduced on Inari:
> 1) Reset device from Settings
> 2) Go through FTU: Sign in to WiFi
> 3) Tap on Your Privacy link
> 4) Tap FerifoxOS and then Marketplace links
Comment 26•11 years ago
|
||
Attachment #8359100 -
Attachment is obsolete: true
Comment 27•11 years ago
|
||
Comment 28•11 years ago
|
||
Comment 29•11 years ago
|
||
And now hitting bug 959126 while trying to reproduce.
Comment 30•11 years ago
|
||
It seems we have a 'Browser' process being stuck. Killing it makes my homescreen coming back.
Comment 31•11 years ago
|
||
FYI Browser status was 't'.
Updated•11 years ago
|
Attachment #8359123 -
Attachment mime type: text/x-log → text/plain
Updated•11 years ago
|
Attachment #8359124 -
Attachment mime type: text/x-log → text/plain
Updated•11 years ago
|
Attachment #8359125 -
Attachment mime type: text/x-log → text/plain
Reporter | ||
Comment 32•11 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #24)
> Are the time zone and contacts download mandatory ?
It was a sure way to repro this crash. I tried going straight to Privacy link and crash didn't reproduce 100%, still could get it like 3/5... so, didn't mean to make things over complicated, thank you for working on that!
Assignee | ||
Comment 33•11 years ago
|
||
bent's patch.
Assignee: nobody → anygregor
Attachment #8359485 -
Flags: review?(bugs)
Updated•11 years ago
|
Attachment #8359485 -
Flags: review?(bugs) → review+
Comment 34•11 years ago
|
||
Comment on attachment 8359485 [details] [diff] [review]
956325.diff
Er, no, we have mIsDestroyed checks in TabParent.cpp
Attachment #8359485 -
Flags: review+ → review-
(In reply to Olli Pettay [:smaug] from comment #34)
> Er, no, we have mIsDestroyed checks in TabParent.cpp
Yikes, that is really fragile.
http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.h#218 no longer overrides http://mxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#387
:(
Comment 36•11 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #35)
> http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.h#218 no
> longer overrides
> http://mxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#387
Is it supposed to? Nobody should be calling [2] except for [1], right? I think we might need another mIsDestroyed check at [3], maybe. But yeah, this is super fragile. Adding some MOZ_OVERRIDE annotations on things would help robustify stuff but probably not completely.
[1] http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#765
[2] http://mxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#387
[3] http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#807
Hrm, I thought so (the other Send[*] messages in nsEventStateManager::DispatchCrossProcessEvent do override the IPDL method), but now I'm not so sure about this. I'll poke around some more tomorrow.
Reporter | ||
Comment 38•11 years ago
|
||
I was unable to repro the crash in today's master but scrolling in the E.me Privacy link I hit another issue, lots of overlapping text - see screenshot attached.
Can it be any fallback from the recent work done here or it's a different issue?
Reporter | ||
Comment 39•11 years ago
|
||
filed new bug 959781 for the issue in comment 38
Assignee | ||
Comment 40•11 years ago
|
||
Attachment #8359485 -
Attachment is obsolete: true
Assignee | ||
Comment 41•11 years ago
|
||
I still see the crash with the patch attached:
Program received signal SIGSEGV, Segmentation fault.
0xb630419a in mozalloc_abort (msg=<optimized out>) at ../../../memory/mozalloc/mozalloc_abort.cpp:30
30 MOZ_CRASH();
(gdb) bt
#0 0xb630419a in mozalloc_abort (msg=<optimized out>) at ../../../memory/mozalloc/mozalloc_abort.cpp:30
#1 0xb4d170bc in Abort (aMsg=0xbedeb7e4 "[Child 3685] ###!!! ABORT: aborting because of MsgRouteError: file ../../../dom/ipc/ContentChild.cpp, line 1136")
at ../../../xpcom/base/nsDebugImpl.cpp:427
#2 NS_DebugBreak (aSeverity=<optimized out>, aStr=0xb6601d59 "aborting because of MsgRouteError", aExpr=0x0, aFile=0xb66019ed "../../../dom/ipc/ContentChild.cpp",
aLine=1136) at ../../../xpcom/base/nsDebugImpl.cpp:414
#3 0xb53ff702 in mozilla::dom::ContentChild::ProcessingError (this=<optimized out>, what=<optimized out>) at ../../../dom/ipc/ContentChild.cpp:1136
#4 0xb4f0ac98 in mozilla::dom::PContentChild::OnProcessingError (this=<optimized out>, code=<optimized out>) at PContentChild.cpp:4491
#5 0xb4ee40de in mozilla::ipc::MessageChannel::MaybeHandleError (this=0xb3e44c48, code=mozilla::ipc::HasResultCodes::MsgRouteError, channelName=<optimized out>)
at ../../../ipc/glue/MessageChannel.cpp:1493
#6 0xb4ee7060 in mozilla::ipc::MessageChannel::OnMaybeDequeueOne (this=0xb3e44c48) at ../../../ipc/glue/MessageChannel.cpp:1029
#7 0xb4ee3b60 in DispatchToMethod<mozilla::ipc::MessageChannel, void (mozilla::ipc::MessageChannel::*)()> (method=
(void (mozilla::ipc::MessageChannel::*)(mozilla::ipc::MessageChannel * const)) 0xb4ee6fcd <mozilla::ipc::MessageChannel::OnMaybeDequeueOne()>,
obj=<optimized out>, arg=<optimized out>) at ../../../ipc/chromium/src/base/tuple.h:383
#8 RunnableMethod<mozilla::ipc::MessageChannel, void (mozilla::ipc::MessageChannel::*)(), Tuple0>::Run (this=<optimized out>)
at ../../../ipc/chromium/src/base/task.h:307
#9 0xb4ee45c8 in Run (this=<optimized out>) at ../../dist/include/mozilla/ipc/MessageChannel.h:376
#10 mozilla::ipc::MessageChannel::DequeueTask::Run (this=<optimized out>) at ../../dist/include/mozilla/ipc/MessageChannel.h:393
(In reply to Gregor Wagner [:gwagner] from comment #41)
> I still see the crash with the patch attached:
That is bug 959886.
Depends on: 959886
Assignee | ||
Comment 43•11 years ago
|
||
The patch in bug 959886 + this patch fix the crash for me!
Assignee | ||
Updated•11 years ago
|
Attachment #8360050 -
Flags: review?(bugs)
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(anygregor)
Comment 45•11 years ago
|
||
Comment on attachment 8360050 [details] [diff] [review]
956325.diff
I don't see how MapEventCoordinatesForChildProcess could
cause anything bad, but MaybeForwardEventToRenderFrame might.
So move the if to be under MaybeForwardEventToRenderFrame.
Attachment #8360050 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 46•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Whiteboard: [b2g-crash] → [b2g-crash], [systemsfe]
Target Milestone: --- → 1.3 C2/1.4 S2(17jan)
Comment 47•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 48•11 years ago
|
||
status-b2g-v1.3:
--- → fixed
status-b2g-v1.4:
--- → fixed
status-firefox27:
--- → wontfix
status-firefox28:
--- → fixed
status-firefox29:
--- → fixed
Updated•11 years ago
|
Whiteboard: [b2g-crash], [systemsfe] → [b2g-crash], [systemsfe], [CR596211]
Updated•11 years ago
|
Whiteboard: [b2g-crash], [systemsfe], [CR596211] → [b2g-crash], [systemsfe], [CR 596211]
Reporter | ||
Comment 49•11 years ago
|
||
this crash still consistently reproduces on v1.3 (bp-4aa8f907-68a2-4458-9df7-dca512140117, so far unable to repro on master..
will test it next week or if someone else can try it too, will probably have to reopen the bug
Buri v1.3
BuildID: 20140117004005
Gaia: a81ccdc53e45a6adeaae423e104e91bcc1e12b0e
Gecko: 2c033140eff4
Version: 28.0a2
Firmware Version: v1.2-device.cfg
Comment 50•11 years ago
|
||
(In reply to Natalya Kot [:nkot] from comment #49)
> this crash still consistently reproduces on v1.3
> (bp-4aa8f907-68a2-4458-9df7-dca512140117, so far unable to repro on master..
> will test it next week or if someone else can try it too, will probably have
> to reopen the bug
> [...]
> Gecko: 2c033140eff4
This gecko revision is a descendent of that for Gregor's patch on Aurora so that means it probably didn't fix this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Did it include the fix for bug 959886? Both were needed to pass local testing in SF.
Comment 53•11 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #51)
> Did it include the fix for bug 959886? Both were needed to pass local
> testing in SF.
Don't think so. That patch landed at 8:46 am PST on Friday, which our daily nightly 1.3 builds wouldn't have included. Looks like we need to retest this next week.
Going to reclose on that basis & flagging verifyme to verify the crash no longer reproduces in a build from next week.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Flags: needinfo?(anygregor)
Resolution: --- → FIXED
Reporter | ||
Comment 54•11 years ago
|
||
Verified fixed.
The crash does not reproduce anymore on 01/21 master and v1.3.
BuildID: 20140121040201
Gaia: e218d17ae7d01a81d48f833cd6fafb4e11b26cd8
Gecko: cdc0ab2c0cba
Version: 29.0a1
BuildID: 20140121004137
Gaia: 47049555282a9a01fb60d1e1421b57e2810c96f5
Gecko: 6f7dfe36ab6c
Version: 28.0a2
Firmware Version: v1.2-device.cfg
Status: RESOLVED → VERIFIED
Keywords: verifyme
\o/
You need to log in
before you can comment on or make changes to this bug.
Description
•