Closed
Bug 1187408
Opened 9 years ago
Closed 9 years ago
Crash in SetContentProcessSandbox while stability testing
Categories
(Core :: DOM: Content Processes, defect)
Core
DOM: Content Processes
Tracking
()
RESOLVED
WORKSFORME
blocking-b2g | 2.2? |
People
(Reporter: ggrisco, Assigned: jld)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 637][caf priority: p3][CR 846198])
Attachments
(4 files)
Crash in automated stability testing with following signature:
[@ mozilla::SetContentProcessSandbox | mozilla::dom::ContentChild::RecvSetProcessSandbox | mozilla::dom::PContentChild::OnMessageReceived | mozilla::ipc::MessageChannel::DispatchAsyncMessage ]
This crash is intermittent, seen once on AU 154, once on AU 170, and now one time on AU 214.
cafbot will upload logs.
Reporter | ||
Updated•9 years ago
|
Blocks: CAF-v2.2-metabug
blocking-b2g: --- → 2.2?
Comment 1•9 years ago
|
||
Comment 2•9 years ago
|
||
Updated•9 years ago
|
Whiteboard: [CR 846198] → [caf priority: p3][CR 846198]
Updated•9 years ago
|
Whiteboard: [caf priority: p3][CR 846198] → [b2g-crash][caf-crash 637][caf priority: p3][CR 846198]
Comment 3•9 years ago
|
||
Observed on:
Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.214
Moz BuildID: 20150606002503
Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.214.xml?h=release
B2G Version: v2.2
Gecko Version: 37.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=8fc797527a3eca7665bc1d1828848f2fb77ca99f
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=e0045f9c8b7e84fc52ba628141688c8ecb4b7a52
Patches: bug 1133147, bug 1181641
Comment 4•9 years ago
|
||
Comment 5•9 years ago
|
||
Assignee | ||
Comment 6•9 years ago
|
||
07-23 12:42:44.000 27029 27029 E Sandbox : Thread 27033 unresponsive for 10 seconds. Killing process.
I started to write a lot of text about this, assuming that the thread was actually unresponsive for 10s, but then I noticed that this is the 2.2 / 37 branch. Which means it doesn't have the fix for bug 1176085. I'd been thinking of that bug as a false negative for this assertion, because I discovered it in a case the assertion should have fired and didn't (and looped forever instead)… but it could also be a false positive.
So what actually happened here is that the thread didn't respond within 10 *milli*seconds (and also didn't exit), and a more or less random number in the range [0, 999999999] (the nanoseconds part of a clock reading) was less than the number of seconds since boot (i.e., the CLOCK_MONOTONIC time in seconds). Which is a relatively low probability, and it's not even checked if the thread handles the signal promptly, but it's not zero.
Specifically, the log has this:
07-23 12:42:23.280 266 266 I Gecko : Uptime: 2932m
If that's the host uptime, then the probability is about 1 in 5000, on top of the probability that the timeout case happens at all, but that's applied to every non-main thread in the content process every time an app is started. If that's a typical uptime, and if there are tens or hundreds of test devices, then this starts looking plausible.
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → jld
Assignee | ||
Comment 7•9 years ago
|
||
For those not following bug 1176085: I could try to uplift it and (hopefully?) fix this bug, but I'd have to warn release management that it caused bug 1185118 to start manifesting as crashes instead of something else (probably hanging the content process indefinitely). I expect that that would be considered excessive risk (even though the code as-is is obviously wrong and causing *these* crashes). That bug seems to occur only on Flame devices, and I strongly suspect a kernel bug, but it's hard to get any farther than that with no STR and only the limited data available in Gecko minidumps.
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Comment 8•9 years ago
|
||
"Closing issue which has not been seen since 07/15/15 17:25"
You need to log in
before you can comment on or make changes to this bug.
Description
•