Closed Bug 1140978 Opened 10 years ago Closed 10 years ago

b2g crash at IPC::Message::EnsureFileDescriptorSet()

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 1133865
Tracking Status
b2g-master --- affected

People

(Reporter: lixia, Unassigned)

References

Details

(Keywords: qablocker, regression, Whiteboard: [3.0-nexus-5-l] )

Attachments

(11 files, 1 obsolete file)

[1.Description]: [Flame v3.0][Nexus 5 v3.0][First Time Experience]Sometimes,the prompt "Something just crashed" will pop up when you open some apps and do some random operations after FTU. Found time:14:45. Attch:crashed.MP4,something_just_crashed.png and logcat_1445.txt. Title: B2G 39.0a1 Crash Report [@ IPC::Message::EnsureFileDescriptorSet() ] Crash report: https://crash-stats.mozilla.com/report/index/f8c4b6ae-b9ff-43ad-872c-768e52150309 [2.Testing Steps]: 1.Flash build (20150308160204). 2.Skip FTU. 3.Launch some apps on homescreen and do some random operations,such as importing from SD card and exporting contact via Bluetooth in Contacts,or ending call in Phone,or connecting wifi in Settings, or closing the app in card view. (No precise steps) [3.Expected Result]: 3.The device should not crash. [4.Actual Result]: 3.Sometimes,the prompt "Something just crashed" will pop up. [5.Reproduction build]: Flame 3.0 build: Build ID 20150308160204 Gaia Revision fea83511df9ccba64259346bc02ebf2c417a12c2 Gaia Date 2015-03-08 06:36:28 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/eab4a81e4457 Gecko Version 39.0a1 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150308.192120 Firmware Date Sun Mar 8 19:21:31 EDT 2015 Bootloader L1TC000118D0 N5 3.0: Build ID 20150308160204 Gaia Revision fea83511df9ccba64259346bc02ebf2c417a12c2 Gaia Date 2015-03-08 06:36:28 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/eab4a81e4457 Gecko Version 39.0a1 Device Name hammerhead Firmware(Release) 5.0 Firmware(Incremental) eng.cltbld.20150308.192431 Firmware Date Sun Mar 8 19:24:47 EDT 2015 Bootloader HHZ12d [6.Reproduction Frequency]: Seldom Recurrence,5/30 [7.TCID]: Free Test
Attached image something_just_crashed.png (deleted) —
Attached file logcat_1445.txt (deleted) —
Attached video crashed.MP4 (deleted) —
This issue is also occurring on the latest Flame 3.0 nightly Crash report pages, and banners will appear within multiple apps intermittently (Gallery, Dialer, FTU, etc.) Environmental Variables: Device: Flame 3.0 (319mb)(Kitkat)(Full Flash) Build ID: 20150309010232 Gaia: fea83511df9ccba64259346bc02ebf2c417a12c2 Gecko: eab4a81e4457 Gonk: e7c90613521145db090dd24147afd5ceb5703190 Version: 39.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Keywords: regression
[Blocking Requested - why for this release]: Functional regression across multiple apps that fail smoke tests. Requesting a window.
blocking-b2g: --- → 3.0?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
QA Contact: dharris
I see lots of crash-reports in the logcat, but none are submitted to crash stats. Can we please submit those next time this occurs? Thanks!
Attached file Crash 1 minidump (deleted) —
Attached file Crash 2 minidump (deleted) —
Attached file Crash 3 (deleted) —
I can reproduce pretty reliably only after a clean flash. STR: 1) Do a full flash 2) go to settings, wifi 3) add hidden network 4) Something crashes. Gecko git - 6e5e93073a9e11ff531570b3571f1136fde04255 Gaia - 1279c9ca5a489aa7fc9a7a173ba1dbae0f71b8f2
QA Contact: jmercado
This is a link to the crash encountered when reproducing this issue after speaking with Derek. https://crash-stats.mozilla.com/report/index/64ea84aa-f1be-47a1-bdef-6ccc72150309 Proceeding to find a window now.
Can still reproduce comment 9 w/ bug 1123762 disabled. Might be able to pin it down by disabling gfx.vsync.refreshdriver to false.
(In reply to Jayme Mercado [:JMercado] from comment #10) > This is a link to the crash encountered when reproducing this issue after > speaking with Derek. > > https://crash-stats.mozilla.com/report/index/64ea84aa-f1be-47a1-bdef- > 6ccc72150309 > > Proceeding to find a window now. Hmm, is there anyway to get symbols?
From comment 9, I can still reproduce this from Fridays' build before bug 1123762 landed. Gecko 7f9a12e9199f37ce2a708dd19f71abcf38ce4668.
Do you also have the link to the crash-stats site? Just want to double check that it is not same as the bug 1137653, since it has a similiar STR.
oops, didn't see comment 10. sorry.
Flags: needinfo?(nhirata.bugzilla)
Mason's steps do not work for me but using automation scripts the way that Derek did and I found this as the central window. We're going deeper now into the inbounds to verify. Is it possible that there are multiple issues here? Central Regression Window: Last Working Environmental Variables: Device: Flame 3.0 BuildID: 20150307032729 Gaia: eab77fbc95bebc1fbaac1f4f1c163824d924c93d Gecko: d9b06c673f80 Gonk: e7c90613521145db090dd24147afd5ceb5703190 Version: 39.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0 First Broken Environmental Variables: Device: Flame 3.0 BuildID: 20150307191929 Gaia: 373c9ddb916631facbae3d6f70fb82f3ff501411 Gecko: ae68dca2cda6 Gonk: e7c90613521145db090dd24147afd5ceb5703190 Version: 39.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0 Last Working gaia / First Broken gecko - Issue DOES occur Gaia: eab77fbc95bebc1fbaac1f4f1c163824d924c93d Gecko: ae68dca2cda6 First Broken gaia / Last Working gecko - Gaia: 373c9ddb916631facbae3d6f70fb82f3ff501411 Gecko: d9b06c673f80 Gaia Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7d85ac833cff&tochange=ae68dca2cda6
(In reply to Jayme Mercado [:JMercado] from comment #16) > Mason's steps do not work for me but using automation scripts the way that > Derek did and I found this as the central window. We're going deeper now > into the inbounds to verify. Is it possible that there are multiple issues > here? > If you use automation scripts, then you will definitely hit bug 1137653, which has been happening for more than a week. As nhirata says, I think this is a separate issue though.
(In reply to Jayme Mercado [:JMercado] from comment #16) > Mason's steps do not work for me but using automation scripts the way that > Derek did and I found this as the central window. We're going deeper now > into the inbounds to verify. Is it possible that there are multiple issues > here? > > Central Regression Window: > > Last Working > Environmental Variables: > Device: Flame 3.0 > BuildID: 20150307032729 > Gaia: eab77fbc95bebc1fbaac1f4f1c163824d924c93d > Gecko: d9b06c673f80 > Gonk: e7c90613521145db090dd24147afd5ceb5703190 > Version: 39.0a1 (3.0) > Firmware Version: v18D-1 > User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0 > > First Broken > Environmental Variables: > Device: Flame 3.0 > BuildID: 20150307191929 > Gaia: 373c9ddb916631facbae3d6f70fb82f3ff501411 > Gecko: ae68dca2cda6 > Gonk: e7c90613521145db090dd24147afd5ceb5703190 > Version: 39.0a1 (3.0) > Firmware Version: v18D-1 > User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0 > > Last Working gaia / First Broken gecko - Issue DOES occur > Gaia: eab77fbc95bebc1fbaac1f4f1c163824d924c93d > Gecko: ae68dca2cda6 > > First Broken gaia / Last Working gecko - > Gaia: 373c9ddb916631facbae3d6f70fb82f3ff501411 > Gecko: d9b06c673f80 > > Gaia Pushlog: > http://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=7d85ac833cff&tochange=ae68dca2cda6 I probably should have specified that after "Add Hidden Network", comes up, start typing in a network. When the keyboard comes up, something crashes.
Attached file omni.ja (obsolete) (deleted) —
new omni.ja file: 1. download from attachments 2. adb remount 3. adb push omni.ja /system/b2g/omni.ja 4. adb reboot
Flags: needinfo?(nhirata.bugzilla)
Flags: needinfo?(nhirata.bugzilla)
A lot of our automated UI tests crashed out today, this seems like the probable cause. Marking this qablocker.
Keywords: qablocker
Attached file omni.ja (deleted) —
Attachment #8574822 - Attachment is obsolete: true
Flags: needinfo?(nhirata.bugzilla)
We tested this with both automation and manual testing and received the same window. Bug 1123762 seems the likely cause for this issue. B2g-inbound Regression Window Last Working Environmental Variables: Device: Flame 3.0 BuildID: 20150306132631 Gaia: 4c6cecb14e5cc20b2e217d5783e8dde4a5145d66 Gecko: 716b424d27c0 Gonk: e7c90613521145db090dd24147afd5ceb5703190 Version: 39.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0 First Broken Environmental Variables: Device: Flame 3.0 BuildID: 20150306134530 Gaia: 4c6cecb14e5cc20b2e217d5783e8dde4a5145d66 Gecko: afd91b997c2e Gonk: e7c90613521145db090dd24147afd5ceb5703190 Version: 39.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0 Last Working gaia / First Broken gecko - Issue DOES occur Gaia: 4c6cecb14e5cc20b2e217d5783e8dde4a5145d66 Gecko: afd91b997c2e First Broken gaia / Last Working gecko - Issue does NOT occur Gaia: 4c6cecb14e5cc20b2e217d5783e8dde4a5145d66 Gecko: 716b424d27c0 Gecko Pushlog: http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=716b424d27c0&tochange=afd91b997c2e
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Mason, can we get the landing for bug 1123762 backed out? It appears to be causing this crash.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker) → needinfo?(mchang)
Sure, it looks like it. Can you please provide how you're testing via automation and locally? I've been unable to reproduce anything. Thanks!
Flags: needinfo?(mchang) → needinfo?(ktucker)
Backout of 1123762 is in b2g-inbound.
The testing was the test_browser_navigation.py automation script for the automation and the manual was adding wifi networks in settings, using the browser and camera etc.
Flags: needinfo?(ktucker)
Attached file crash_dump.dmp (deleted) —
Dump requested by Mason.
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
The crash reason from comment 0 is SIGSEGV when we de-reference the 0x5a5a5a6e address. It seems we access this message object's member "file_descriptor_set_"[1] after the message object freed. B2G will poison the memory with 0x5a5a5a5a after free. And we check the offset of "file_descriptor_set_" is just 0x14. And we create the message in ipdl generated code. We delete the message only when the channel is closed or it is already processed[2]. Both the creation and destruction are not controlled by user. So it's weird that we have this use-after-free problem. [1] https://hg.mozilla.org/mozilla-central/annotate/6686aacf006f/ipc/chromium/src/chrome/common/ipc_message.cc#l161 [2] https://hg.mozilla.org/mozilla-central/annotate/6686aacf006f/ipc/chromium/src/chrome/common/ipc_channel_posix.cc#l762
I try to use this patch to check use-after-free problem, but it can't be reproduced with my flame device. I will discuss with Mason for the STR. There will be a large number of log. And I think the logcat buffer is not enough. So, I redirect the logcat log into a file.
I can't make sure we call ChooseTimer()[1] before nuwa cloning. This patch create the timer in RefreshDriver constructor. It will be called before nuwa cloning. I think the original one is still available. We still create the vsync-base timer at [2] when we call ChooseTimer() at the first time. [1] https://hg.mozilla.org/mozilla-central/annotate/6686aacf006f/layout/base/nsRefreshDriver.cpp#l982 [2] https://hg.mozilla.org/mozilla-central/annotate/6686aacf006f/layout/base/nsRefreshDriver.cpp#l879
This is clearly not a First Time Experience / Gaia issue - although that is where we'll first encounter the bug. Can we get this moved to the correct component?
Flags: needinfo?(hshih)
We have the same crash signature as bug 1133865
Status: NEW → RESOLVED
Closed: 10 years ago
Component: Gaia::System → Graphics
Flags: needinfo?(hshih)
Product: Firefox OS → Core
Resolution: --- → DUPLICATE
Summary: [Flame][First Time Experience]Sometimes,the prompt "Something just crashed" will pop up. → b2g crash at IPC::Message::EnsureFileDescriptorSet()
Duplicate of another smoketest blocker.
blocking-b2g: 2.5? → ---
Keywords: smoketest
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: