Closed Bug 871574 Opened 11 years ago Closed 11 years ago

crash in mozilla::dom::indexedDB::PIndexedDBRequestChild::OnMessageReceived

Categories

(Core :: Storage: IndexedDB, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: scoobidiver, Unassigned)

Details

(Keywords: crash, Whiteboard: [b2g-crash])

Crash Data

There are six crashes including recent ones in builds from May 6 and 7. Here is a crash report: bp-88a50569-d82f-4e1d-8b4e-d366b2130507. Frame Module Signature Source 0 @0x0 1 libxul.so mozilla::dom::indexedDB::PIndexedDBRequestChild::OnMessageReceived PIndexedDBRequestChild.cpp:194 2 libxul.so mozilla::dom::PContentChild::OnMessageReceived PContentChild.cpp:2302 3 libxul.so mozilla::ipc::AsyncChannel::OnDispatchMessage AsyncChannel.cpp:471 4 libxul.so mozilla::ipc::RPCChannel::OnMaybeDequeueOne RPCChannel.cpp:402 5 libxul.so RunnableMethod<IPC::ChannelProxy::Context, void , Tuple0>::Run tuple.h:383 6 libxul.so mozilla::ipc::RPCChannel::DequeueTask::Run RPCChannel.h:425 7 libxul.so MessageLoop::RunTask message_loop.cc:337 8 libxul.so MessageLoop::DeferOrRunPendingTask message_loop.cc:345 9 libxul.so MessageLoop::DoWork message_loop.cc:445 10 libxul.so mozilla::ipc::DoWorkRunnable::Run MessagePump.cpp:42 11 libxul.so nsThread::ProcessNextEvent nsThread.cpp:620 12 libxul.so NS_ProcessNextEvent_P nsThreadUtils.cpp:237 13 libxul.so mozilla::ipc::MessagePump::Run MessagePump.cpp:117 14 libxul.so mozilla::ipc::MessagePumpForChildProcess::Run MessagePump.cpp:231 15 libxul.so MessageLoop::RunInternal message_loop.cc:219 16 libxul.so MessageLoop::Run message_loop.cc:212 17 libxul.so nsBaseAppShell::Run nsBaseAppShell.cpp:163 18 libxul.so XRE_RunAppShell nsEmbedFunctions.cpp:646 19 libxul.so mozilla::ipc::MessagePumpForChildProcess::Run MessagePump.cpp:198 20 libxul.so MessageLoop::RunInternal message_loop.cc:219 21 libxul.so MessageLoop::Run message_loop.cc:212 22 libxul.so XRE_InitChildProcess nsEmbedFunctions.cpp:485 23 plugin-container main ipc/app/MozillaRuntimeMain.cpp:60 24 libc.so __libc_init libc_init_dynamic.c:114 25 @0xb0001dc5 More reports at: https://crash-stats.mozilla.com/report/list?signature=%400x0+|+mozilla%3A%3Adom%3A%3AindexedDB%3A%3APIndexedDBRequestChild%3A%3AOnMessageReceived
According to Bug 863500 comment 25, this one should be duplicated to Bug 863500.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
There are recent crashes.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
It's #7 crasher in B2G 18.0.
These crashes indicate memory corruption. We'll need some STR and valgrind probably :(
Basically we're crashing on a null deref of 'actor' here: PIndexedDBRequestChild::OnMessageReceived(const Message& __msg) { ... PIndexedDBRequestChild* actor; if ((!(Read((&(actor)), (&(__msg)), (&(__iter)), false)))) { FatalError("Error deserializing 'PIndexedDBRequestChild'"); return MsgValueError; } ... (actor)->DestroySubtree(Deletion); ... } But: PIndexedDBRequestChild::Read( PIndexedDBRequestChild** __v, const Message* __msg, void** __iter, bool __nullable) { int32_t id; if ((!(Read((&(id)), __msg, __iter)))) { FatalError("Error deserializing 'id' for 'PIndexedDBRequestChild'"); return false; } if (((1) == (id)) || (((0) == (id)) && ((!(__nullable))))) { mozilla::ipc::ProtocolErrorBreakpoint("bad ID for PIndexedDBRequest"); return false; } if ((0) == (id)) { (*(__v)) = 0; return true; } ... } That can only return a null actor if '__nullable' is true, which it can't be in this case. So somewhere between 'Read(&actor)' and 'actor->DestroySubtree()' our actor pointer is being overwritten.
It's now #4 top crasher in B2G 18.0.
blocking-b2g: --- → leo?
Keywords: topcrash
Reporter: Can you describe the user impact when this crash occurs? ahuang: Can you analyze what we have already first and provide some insights so we can see the severity?
Flags: needinfo?(ahuang)
(In reply to Wayne Chang [:wchang] from comment #7) > Reporter: Can you describe the user impact when this crash occurs? > > ahuang: Can you analyze what we have already first and provide some insights > so we can see the severity? We don't see this at least after 5/15 build, right? I believe the severity is low. According to Ben in comment 4 and comment 5, I think coredump may provide us little help here. Minidump from partner is not enough to solve this bug for sure, but I think it's barely possible to let partners run Valgrind in stress tests as well. Maybe bug 847268, enabling coredump is much more reasonable for partners and us to dig into this bug.
Flags: needinfo?(ahuang)
leo+ (at least temporarily) given comment 6, but comment 8 may lead to a resolved/worksforme.
blocking-b2g: leo? → leo+
To note : these are all keon or peak crashes.
(In reply to ben turner [:bent] from comment #5) > That can only return a null actor if '__nullable' is true, which it can't be > in this case. So somewhere between 'Read(&actor)' and > 'actor->DestroySubtree()' our actor pointer is being overwritten. Let's try wether we can reproduce this on emulator-x86 or not. We can enable hardware watchpoint with gdb 7.4 (or later) (bug 865582) on emulator-x86. Valgrind seems to be a good choice, too.
(In reply to Scoobidiver from comment #3) > It's #7 crasher in B2G 18.0. Hi, I want to check this bug, using HW watchpoint on emulator-x86. Can you provide 100% reproduciable steps? Thanks.
(In reply to Wayne Chang [:wchang] from comment #7) > Reporter: Can you describe the user impact when this crash occurs? (In reply to Alan Huang [:ahuang] from comment #13) > Can you provide 100% reproduciable steps? I don't have. This bug was filed against crash stats. In addition, users can't add a comment when crashing so no clue except maybe from URLs if available.
Assignee: nobody → ahuang
Are we still seeing this on more recent builds?
Flags: needinfo?(scoobidiver)
(In reply to Wayne Chang [:wchang] from comment #15) > Are we still seeing this on more recent builds? It happens on Peak and Keon up to B2G 18.0/20130613 which seems to be the latest FxOS-1.0.1 build.
Flags: needinfo?(scoobidiver)
(In reply to Scoobidiver from comment #16) > (In reply to Wayne Chang [:wchang] from comment #15) > > Are we still seeing this on more recent builds? > It happens on Peak and Keon up to B2G 18.0/20130613 which seems to be the > latest FxOS-1.0.1 build. Have we seen it on 1.1 or trunk/1.2 builds recently as well?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #17) > Have we seen it on 1.1 or trunk/1.2 builds recently as well? ZTE phones don't have symbols so I can't say for 1.1. In trunk/1.2, there are only 3 crashes over the last week, none for this bug, so it's not statistically representative.
(In reply to Scoobidiver from comment #18) > ZTE phones don't have symbols so I can't say for 1.1. The shipped ZTE phones are running 1.0.1 - both 1.1 and 1.2 are only in use in internal testing builds/devices (unagi etc.), or for 1.2, on Geeksphones devices with very daring users.
It's #21 crasher in B2G for all versions.
blocking-b2g: leo+ → leo?
Keywords: topcrash
Hello Al, As we talked before, we may need QA help us to find STR for this. Can Taiwan QA provide some help here? Thanks!
Keywords: qawanted
QA Contact: atsai
Triage- Leo-ing until we can find an STR or the occurrence rate rises.
blocking-b2g: leo? → ---
Hi, Alan, Sorry to jump in. I have no idea regarding provided logs. All that we can do is run the scenarios that Bug 863500 comment 24 mentioned. Do you think this makes sense? If you know that there have any specific methods to trigger this crash, please feel free to contact us. I will also go to your cubicle to discuss this problem with you after I did the test. Thanks!
Hi, Alan and all, I automated the test steps that Bug 863500 comment 24 mentioned recently and run it on the following V1-TRAIN build with unagi device. * 2013-07-03-07-02-10 * 2013-07-18-23-02-25 I still cannot reproduce it. This bug was reported 2 months ago. I cannot sure if we had any patch impact the bug and became a potential issue. By the way, I also doubt that if the crash reports were caused by QA since we ran the Leo test during the period. But I don't have any finding. I will continue to monitor this issue form automation server but not spend too much time. If you have further suggestions, comments, or findings, please feel free to contact. Thanks!
Crash Signature: [@ @0x0 | mozilla::dom::indexedDB::PIndexedDBRequestChild::OnMessageReceived] → [@ @0x0 | mozilla::dom::indexedDB::PIndexedDBRequestChild::OnMessageReceived] [@ @0x0 | mozilla::dom::indexedDB::PIndexedDBRequestChild::OnMessageReceived(IPC::Message const&)]
Based on the comment above I think QA has done what we can to reproduce this. If we get more information later during daily testing, we'll try to action it from there. For now, there's not much we can do here.
Keywords: qawanted
(In reply to William Hsu [:whsu] from comment #24) > I automated the test steps that Bug 863500 comment 24 mentioned recently and > run it on the following V1-TRAIN build with unagi device. It might help to run this series of steps under valgrind and see if it reports anything unusual. Please ping qDot for help on setting it up.
Valgrind unfortunately only runs on >= v1.2 on the nexus 4.
(In reply to Kyle Machulis [:kmachulis] [:qdot] from comment #27) > Valgrind unfortunately only runs on >= v1.2 on the nexus 4. Eh? I was able to run it on v1.0.1 unagi before.
bent's original instructions for getting valgrind up and running on v1.0/1.1 are at https://bug854517.bugzilla.mozilla.org/attachment.cgi?id=729283 See if you can work through these. I'm hoping my valgrind patches for v1.2 will land soon, and will try to backport them to 1.0/1.1 when that happens.
Component: General → DOM: IndexedDB
Product: Firefox OS → Core
Alan, this has been a top-crasher for a while with no action on it, can you please help here ?
Flags: needinfo?(ahuang)
I have no idea of this for a while, and I am currently occupied by tarako. Un-take this first.
Assignee: ahuang → nobody
Flags: needinfo?(ahuang)
I don't see any recent crash in anything higher than 18. I am not sure if this bug will appear in new Gecko levels. Should we keep this open?
Flags: needinfo?(bbajaj)
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #32) > I don't see any recent crash in anything higher than 18. I am not sure if > this bug will appear in new Gecko levels. Should we keep this open? lets close it for now and we can reopen if need be.
Flags: needinfo?(bbajaj)
Keywords: topcrash-b2g
WFM for now
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.