Closed
Bug 1258312
Opened 9 years ago
Closed 9 years ago
crash in mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::dom::PBrowserChild::FatalError | mozilla::dom::PBrowserChild::OnMessageReceived
Categories
(Core :: DOM: Content Processes, defect, P1)
Tracking
()
RESOLVED
FIXED
People
(Reporter: kanru, Assigned: kanru)
References
Details
(Keywords: crash, Whiteboard: btpp-active)
Crash Data
Attachments
(2 files)
(deleted),
text/x-review-board-request
|
jld
:
review+
ritu
:
approval-mozilla-aurora+
|
Details |
(deleted),
text/x-review-board-request
|
jld
:
review+
ritu
:
approval-mozilla-aurora+
|
Details |
This bug was filed from the Socorro interface and is
report bp-8c873afc-9585-404c-a46f-9bc4d2160320.
=============================================================
This crash is #2, with 296 crashes, +2.09% compared with the control group on the beta46 a/b experiment.
Frame Module Signature Source
0 mozglue.dll mozalloc_abort(char const* const) memory/mozalloc/mozalloc_abort.cpp
1 xul.dll NS_DebugBreak xpcom/base/nsDebugImpl.cpp
2 xul.dll mozilla::ipc::FatalError(char const*, char const*, unsigned long, bool) ipc/glue/ProtocolUtils.cpp
3 xul.dll mozilla::dom::PBrowserChild::FatalError(char const* const) obj-firefox/ipc/ipdl/PBrowserChild.cpp
4 xul.dll mozilla::dom::PBrowserChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PBrowserChild.cpp
5 xul.dll mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PContentChild.cpp
6 xul.dll mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) ipc/glue/MessageChannel.cpp
7 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message const&) ipc/glue/MessageChannel.cpp
8 xul.dll mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ipc/glue/MessageChannel.cpp
9 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc
10 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp
Updated•9 years ago
|
Assignee: nobody → kchen
Comment 1•9 years ago
|
||
1) appear to be content process protocol FatalAborts -
http://hg.mozilla.org/releases/mozilla-beta/annotate/5904e3eb711d/ipc/glue/ProtocolUtils.cpp#l332
2) All appear to be startup crashes
< 1 min 98.67% 446
1-5 min 1.33% 6
3) All signatures have DispatchAsyncMessage on the stack
We need to build beta up to get a look at where this happens in auto generated code.
Updated•9 years ago
|
Assignee | ||
Comment 2•9 years ago
|
||
I checked one report https://crash-stats.mozilla.com/report/index/f221d2f6-e2cf-4002-9336-7df922160317 and a few others.
The error is "Error deserializing nsSizeMode". Maybe we send some initialized value the content process. Since we only assert the validity in debug build we don't catch this on the sender side.
Updated•9 years ago
|
Assignee | ||
Comment 3•9 years ago
|
||
https://dxr.mozilla.org/mozilla-central/rev/63be002b4a803df1122823841ef7633b7561d873/ipc/chromium/src/base/pickle.cc#508-509
If Resize failed the content will crash in the exact same signature so this might be an OOM issue. Note bug 897870 added annotation to initial message construction but not WriteBegin. I'm not sure if writing a single bool could trigger the Resize yet. Our initial capacity_ is 64 on both x86 and x86_64.
Comment 4•9 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #3)
> If Resize failed the content will crash in the exact same signature so this
> might be an OOM issue. Note bug 897870 added annotation to initial message
> construction but not WriteBegin. I'm not sure if writing a single bool could
> trigger the Resize yet. Our initial capacity_ is 64 on both x86 and x86_64.
That's a great point. That could explain a number of these deserialization crashes we see (bug 1259192 is another). While the lower level code that calls Pickle::BeginWrite() (like Pickle::WriteBytes()) seems to properly do error checking, I don't think that's happening in the higher-level IPDL-generated code such as
PBrowserParent::SendUpdateDimensions. Do you want to write a patch to check the last Resize() or should I?
Flags: needinfo?(kchen)
Updated•9 years ago
|
Priority: -- → P1
Updated•9 years ago
|
Updated•9 years ago
|
Assignee | ||
Comment 5•9 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #4)
> (In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #3)
> > If Resize failed the content will crash in the exact same signature so this
> > might be an OOM issue. Note bug 897870 added annotation to initial message
> > construction but not WriteBegin. I'm not sure if writing a single bool could
> > trigger the Resize yet. Our initial capacity_ is 64 on both x86 and x86_64.
>
> That's a great point. That could explain a number of these deserialization
> crashes we see (bug 1259192 is another). While the lower level code that
> calls Pickle::BeginWrite() (like Pickle::WriteBytes()) seems to properly do
> error checking, I don't think that's happening in the higher-level
> IPDL-generated code such as
> PBrowserParent::SendUpdateDimensions. Do you want to write a patch to check
> the last Resize() or should I?
I have a patch that turns Resize() to be infallible. What do you think? Although the Pickle methods is geared towards allocation failure reporting, none of the higher-level IPDL code checks the result.
Flags: needinfo?(kchen)
Assignee | ||
Comment 6•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/43123/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/43123/
Attachment #8736137 -
Flags: review?(continuation)
Attachment #8736138 -
Flags: review?(continuation)
Assignee | ||
Comment 7•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/43125/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/43125/
Assignee | ||
Comment 8•9 years ago
|
||
Not sure if we need the second patch but if we will want to know the real deserialization error in case it's not OOM.
Comment 9•9 years ago
|
||
The patches looks good to me, but you should get an IPC peer to review it. Maybe jld.
Assignee | ||
Updated•9 years ago
|
Attachment #8736137 -
Flags: review?(continuation) → review?(jld)
Attachment #8736138 -
Flags: review?(continuation) → review?(jld)
Comment 10•9 years ago
|
||
This particular signature does not appear in builds after 46 experiment 1, but that may be because it morphed to a different signature because of inlining or COMDAT-folding. I think we should proceed with getting the better debugging data.
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(jld)
Assignee | ||
Comment 11•9 years ago
|
||
review ping
Comment 12•9 years ago
|
||
Comment on attachment 8736137 [details]
MozReview Request: Bug 1258312 - Make Pickle::Resize infallible r?mccr8
https://reviewboard.mozilla.org/r/43123/#review40537
Longer-term it could be useful to be able to kill only the child process that caused the parent to fail an allocation, but that's not too meaningful in the immediate future.
Attachment #8736137 -
Flags: review?(jld) → review+
Updated•9 years ago
|
Attachment #8736138 -
Flags: review?(jld) → review+
Comment 13•9 years ago
|
||
Comment on attachment 8736138 [details]
MozReview Request: Bug 1258312 - Add crash annotation to EnumSerializer r?mccr8
https://reviewboard.mozilla.org/r/43125/#review40539
Updated•9 years ago
|
Flags: needinfo?(jld)
Comment 14•9 years ago
|
||
Is this ready to land now? It would be nice to see what happens with these patches.
Flags: needinfo?(kchen)
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(kchen)
Keywords: leave-open
Comment 15•9 years ago
|
||
Comment 16•9 years ago
|
||
bugherder |
Assignee | ||
Comment 17•9 years ago
|
||
Comment on attachment 8736137 [details]
MozReview Request: Bug 1258312 - Make Pickle::Resize infallible r?mccr8
Approval Request Comment
[Feature/regressing bug #]: N/A
Uplift this to 47 in preparation for the next e10s experiment and diagnose the problem.
[User impact if declined]: no
[Describe test coverage new/current, TreeHerder]: cooked on m-c
[Risks and why]: Little. This turns a potential content crash to a main process OOM crash. The main process wouldn't be able to survive long after that anyway.
[String/UUID change made/needed]: no
Attachment #8736137 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 18•9 years ago
|
||
Comment on attachment 8736138 [details]
MozReview Request: Bug 1258312 - Add crash annotation to EnumSerializer r?mccr8
Approval Request Comment
Same as above
Attachment #8736138 -
Flags: approval-mozilla-aurora?
Updated•9 years ago
|
Assignee | ||
Comment 19•9 years ago
|
||
Comment 20•9 years ago
|
||
I see a couple of deserialization FatalError crashes on April 2 through April 6, but none after this patch landed. These crashes aren't super common on Nightly, so it is hard to say for sure if they are gone entirely, but we should consider duping all of the other deserialization crashes in the "see also" section over to this one.
https://crash-stats.mozilla.com/search/?product=Firefox&version=48.0a1&signature=~mozilla%3A%3Aipc%3A%3AFatalError&build_id=%3E%3D20160311030233&_facets=signature&_columns=date&_columns=signature&_columns=build_id&_columns=ipc_fatal_error_msg&_columns=ipc_fatal_error_protocol#crash-reports
Updated•9 years ago
|
Whiteboard: btpp-active
Updated•9 years ago
|
Flags: needinfo?(rkothari)
Comment 23•9 years ago
|
||
These patches modify IPC OOM errors such that they show up as OOM signatures vs. FatalError signatures. It's really important we get this uplifted for the beta 47 experiment.
status-firefox47:
--- → affected
Flags: needinfo?(rkothari)
Comment on attachment 8736137 [details]
MozReview Request: Bug 1258312 - Make Pickle::Resize infallible r?mccr8
e10s stability related, Aurora47+
Attachment #8736137 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8736138 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
The reason why I didn't prioritize this over the other bugs in the aurora nom queue is the bug resolution is "New". I was under the impression that the fix hasn't landed in Nightly yet and therefore not ready for uplift to Aurora. Sorry about the delay!
Comment 26•9 years ago
|
||
Updated•9 years ago
|
Comment 28•9 years ago
|
||
As expected, these crashes seem to have gone away on Aurora after 4-13, which is about when this landed.
Comment 29•9 years ago
|
||
I do actually see a few in the parent process after the landing, so we may have some lingering bugs in our deserialization code. Hopefully they aren't common enough to impact stability. If so, we can file new bugs for them.
Updated•9 years ago
|
Assignee | ||
Comment 30•9 years ago
|
||
I saw some new signatures on Nightly which are deserialization on nsTArray. One of the messages is PBrowser::Msg_HandleAccessKey. I haven't check the others. Totally disappeared on Aurora.
Comment 32•8 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #19)
> Some results are coming in:
>
> https://crash-stats.mozilla.com/search/
> ?product=Firefox&proto_signature=%5Emozalloc_abort+%7C+mozalloc_handle_oom+%7
> C+moz_xrealloc+%7C+Pickle%3A%3AResize&_facets=signature&_columns=date&_column
> s=signature&_columns=product&_columns=version&_columns=build_id&_columns=plat
> form#facet-signature
FWIW, that search is broken because it puts '+' characters before and after the '|' characters. This one is better:
> https://crash-stats.mozilla.com/search/?product=Firefox&proto_signature=%5Emozalloc_abort%20%7C%20mozalloc_handle_oom%20%7C%20moz_xrealloc%20%7C%20Pickle%3A%3AResize&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
(Socorro inserts the '+' chars sometimes. I haven't worked out exactly why or when.)
You need to log in
before you can comment on or make changes to this bug.
Description
•