Closed
Bug 1017706
Opened 10 years ago
Closed 10 years ago
Intermittent HealthReport.jsm Shutdown | application crashed [@ mozalloc_abort(char const*)] after "ABORT: file resource://gre/modules/HealthReport.jsm, line 4356"
Categories
(Firefox Health Report Graveyard :: Client: Desktop, defect)
Firefox Health Report Graveyard
Client: Desktop
Tracking
(firefox31 unaffected, firefox32 fixed, firefox33 fixed, firefox-esr24 unaffected)
RESOLVED
FIXED
Firefox 33
Tracking | Status | |
---|---|---|
firefox31 | --- | unaffected |
firefox32 | --- | fixed |
firefox33 | --- | fixed |
firefox-esr24 | --- | unaffected |
People
(Reporter: Yoric, Assigned: Yoric)
References
(Blocks 1 open bug, )
Details
(Keywords: intermittent-failure)
Attachments
(2 files, 4 obsolete files)
(deleted),
patch
|
rnewman
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Yoric
:
review+
lmandel
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Summary: At least one completion condition is taking too long to complete. Conditions: [{"name":"FHR: Flushing storage shutdown","state":"(none)"}] Barrier: Metrics Storage Backend – followed by AsyncShutdown.jsm crash → Crash in AsyncShutdown.jsm, with warning « At least one completion condition is taking too long to complete. Conditions: [{"name":"FHR: Flushing storage shutdown","state":"(none)"}] Barrier: Metrics Storage Backend »
Assignee | ||
Comment 2•10 years ago
|
||
Basically, this means that there is a problem in or around healthreporter.jsm, with `this._promiseShutdown` never being resolved.
Comment 3•10 years ago
|
||
Previous JS exception:
System JS : ERROR resource://gre/components/nsUrlClassifierListManager.js:342 - NS_ERROR_NOT_INITIALIZED: Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsIUrlClassifierDBService.getTables]
I don't know offhand if FHR has a dependency on url-classifier; gps do you know? Otherwise does anyone know if/how we could get a JS and maybe even c++ stack for that JS exception?
Flags: needinfo?(gps)
Comment 4•10 years ago
|
||
I don't think FHR has a direct dependency on the URL classifier service. Although, something that FHR pulls from (like Places) might.
I wasn't paying attention when bug 917883 finally landed. IIRC we experienced weird shutdown behavior (possibly a crash) the first time it landed. Perhaps it should be backed out?
Flags: needinfo?(gps)
Assignee | ||
Comment 5•10 years ago
|
||
There is clearly a relationship with bug 917883. However, what's not clear is whether landing it added a new bug or just made it more visible.
Assignee | ||
Comment 6•10 years ago
|
||
Unfortunately, we have already landed a few bugs that depend on bug 917883 (obviously, I was too optimistic because I didn't see any errors), so I'd like to give a try to fixing the issue before we go the pains of reverting everything.
This patch does two things:
- it takes advantage of AsyncShutdown logging to tell us at which stage of shutdown we have failed;
- it fixes a (preexistent) typo that might possibly have contributed to intermittent issues during shutdown.
Assignee: nobody → dteller
Attachment #8432430 -
Flags: review?(rnewman)
Attachment #8432430 -
Flags: review?(gps)
Assignee | ||
Comment 7•10 years ago
|
||
(forgot a qref)
Attachment #8432430 -
Attachment is obsolete: true
Attachment #8432430 -
Flags: review?(rnewman)
Attachment #8432430 -
Flags: review?(gps)
Attachment #8432447 -
Flags: review?(rnewman)
Attachment #8432447 -
Flags: review?(gps)
Comment 8•10 years ago
|
||
Comment on attachment 8432447 [details] [diff] [review]
Fix a typo, add logging
Review of attachment 8432447 [details] [diff] [review]:
-----------------------------------------------------------------
::: services/healthreport/healthreporter.jsm
@@ +581,5 @@
>
> if (this._initializeHadError) {
> this._log.warn("Initialization had error. Shutting down immediately.");
> } else {
> + if (this._providerManagerInProgress) {
Oof.
Attachment #8432447 -
Flags: review?(rnewman) → review+
Assignee | ||
Comment 9•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Attachment #8432447 -
Flags: review?(gps)
Comment 10•10 years ago
|
||
Comment on attachment 8432447 [details] [diff] [review]
Fix a typo, add logging
Review of attachment 8432447 [details] [diff] [review]:
-----------------------------------------------------------------
::: services/healthreport/healthreporter.jsm
@@ +581,5 @@
>
> if (this._initializeHadError) {
> this._log.warn("Initialization had error. Shutting down immediately.");
> } else {
> + if (this._providerManagerInProgress) {
And this is why I've been screaming for JavaScript code coverage for years.
At one time, I /thought/ I had 100% initialization state test coverage. I was wrong. Too bad I didn't have the data to tell me so.
Assignee | ||
Updated•10 years ago
|
Keywords: checkin-needed
Comment 11•10 years ago
|
||
Looks like Tomcat already pushed this but forgot to mark the bug.
https://hg.mozilla.org/integration/fx-team/rev/57a76b26c953
Keywords: checkin-needed
Whiteboard: [fixed-in-fx-team]
Comment 12•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → Firefox 32
Assignee | ||
Comment 13•10 years ago
|
||
Bug is still present as bug 1022198.
Obviously, the typo was not the reason.
Resolution: FIXED → DUPLICATE
Assignee | ||
Updated•10 years ago
|
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Assignee | ||
Comment 15•10 years ago
|
||
Changing topic as bug 1018895 changed the output of AsyncShutdown.
Summary: Crash in AsyncShutdown.jsm, with warning « At least one completion condition is taking too long to complete. Conditions: [{"name":"FHR: Flushing storage shutdown","state":"(none)"}] Barrier: Metrics Storage Backend » → Intermittent OSX HealthReport.jsm Shutdown | application crashed [@ mozalloc_abort(char const*)] after "ABORT: file resource://gre/modules/HealthReport.jsm, line 4356"
Assignee | ||
Comment 16•10 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=41272501&tree=Mozilla-Inbound
Rev5 MacOSX Mountain Lion 10.8 mozilla-inbound debug test mochitest-other on 2014-06-07 03:14:30 PDT for push 49abfe360b14
slave: talos-mtnlion-r5-014
03:49:05 INFO - System JS : ERROR resource://gre/components/nsUrlClassifierListManager.js:342 - NS_ERROR_NOT_INITIALIZED: Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsIUrlClassifierDBService.getTables]
03:49:05 INFO - WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"FHR: Flushing storage shutdown","state":{"shutdownInitiated":false,"initialized":true,"shutdownRequested":false,"initializeHadError":false,"providerManagerInProgress":false,"storageInProgress":false,"hasProviderManager":true,"hasStorage":true},"filename":"resource://gre/modules/HealthReport.jsm","lineNumber":4356}] Barrier: Metrics Storage Backend
03:49:05 INFO - WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"Metrics Storage: Shutting down","state":"(none)","filename":"resource://gre/modules/HealthReport.jsm","lineNumber":2453}] Barrier: Sqlite.jsm: wait until all clients have completed their task
03:49:05 INFO - WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"Sqlite.jsm shutdown blocker","state":{"description":"Waiting for the barrier to be lifted"},"filename":"resource://gre/modules/Sqlite.jsm","lineNumber":84}] Barrier: profile-before-change
03:49:56 INFO - ERROR: undefined
03:49:56 INFO - [1485] ###!!! ABORT: file resource://gre/modules/HealthReport.jsm, line 4356
Assignee | ||
Comment 17•10 years ago
|
||
Fixing a second typo, this time in AsyncShutdown logging. This typo is responsible for the `ERROR: undefined` in the log, but not for the actual bug. Self-reviewing, as this is just a one-char change.
Attachment #8436360 -
Flags: review+
Assignee | ||
Comment 18•10 years ago
|
||
Comment on attachment 8436360 [details] [diff] [review]
Fixing a typo in AsyncShutdown
Ah, it seems that there is a race condition between RyanVM and myself. This has already been landed as part of bug 1022198 before I could mark it obsolete over there.
Attachment #8436360 -
Attachment is obsolete: true
Updated•10 years ago
|
Keywords: intermittent-failure
Assignee | ||
Comment 20•10 years ago
|
||
According to the logs:
1. both `shutdownInitiated` and `shutdownRequested` are `false`, which means that we have never entered `_initiateShutdown`;
2. `storageInProgress` is `false`, `initialized` is `true`, `providerManagerInProgress` is `false`, which means that we are not caught during initialization;
3. `initializeHadError` is `false`, so initialization was satisfying.
...and we do not get the information on `_shutdownComplete` because I forgot an underscore in the logging code. The kind of errors that makes one wish JS were a statically typed language. Fortunately, the information would be redundant in this case.
Barring any error, 1. means that we never receive "quit-application". This is worrying. gps, is my interpretation of 1. correct? ALso, do you have any idea what could be causing this? I imagine that it is related to
System JS : ERROR resource://gre/components/nsUrlClassifierListManager.js:342 - NS_ERROR_NOT_INITIALIZED: Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsIUrlClassifierDBService.getTables]
Flags: needinfo?(gps)
Assignee | ||
Comment 21•10 years ago
|
||
My current hypothesis is the following: some client of quit-application sitting before healthreporter.jsm spins the event loop. In some unknown circumstances, this spinning completely messes up with our shutdown sequence, causing nsUrlClassifierListManager to request updates too late during shutdown, and also causes healthreporter.jsm to never receive quit-application (or at least to receive it after profile-before-change).
Assignee | ||
Comment 22•10 years ago
|
||
Attaching a band-aid patch based on my hypothesis above. I will also investigate clients of quit-application, but I don't know how long this will take.
Attachment #8436382 -
Flags: review?(rnewman)
Attachment #8436382 -
Flags: review?(gps)
Assignee | ||
Comment 23•10 years ago
|
||
Benjamin, does comment 21 make sense to you and do you know of any component with such behavior?
Note that the issue is probably not caused by a test, as it has also been reported by a user of Nightly.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 24•10 years ago
|
||
Could this be related to http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpConnectionMgr.cpp#175 ? I have the intuition that that this can be called during quit-application by http://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierStreamUpdater.cpp#516 if the URL classifier is being updated.
Comment 25•10 years ago
|
||
Updated•10 years ago
|
Summary: Intermittent OSX HealthReport.jsm Shutdown | application crashed [@ mozalloc_abort(char const*)] after "ABORT: file resource://gre/modules/HealthReport.jsm, line 4356" → Intermittent HealthReport.jsm Shutdown | application crashed [@ mozalloc_abort(char const*)] after "ABORT: file resource://gre/modules/HealthReport.jsm, line 4356"
Assignee | ||
Comment 26•10 years ago
|
||
Let's not add new typos, we have seen enough in this bug.
Attachment #8436382 -
Attachment is obsolete: true
Attachment #8436382 -
Flags: review?(rnewman)
Attachment #8436382 -
Flags: review?(gps)
Attachment #8436541 -
Flags: review?(rnewman)
Attachment #8436541 -
Flags: review?(gps)
Comment 27•10 years ago
|
||
Comment on attachment 8436541 [details] [diff] [review]
2. Bandaid patch, v2
Review of attachment 8436541 [details] [diff] [review]:
-----------------------------------------------------------------
This looks sane.
Attachment #8436541 -
Flags: review?(rnewman)
Attachment #8436541 -
Flags: review?(gps)
Attachment #8436541 -
Flags: review+
Updated•10 years ago
|
Flags: needinfo?(gps)
Assignee | ||
Comment 28•10 years ago
|
||
Assignee | ||
Comment 29•10 years ago
|
||
Added reviewer.
Attachment #8436541 -
Attachment is obsolete: true
Attachment #8437113 -
Flags: review+
Assignee | ||
Updated•10 years ago
|
Keywords: checkin-needed
Updated•10 years ago
|
Target Milestone: Firefox 32 → ---
Comment 30•10 years ago
|
||
Keywords: checkin-needed
Comment 31•10 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 33
Comment 32•10 years ago
|
||
Please request Aurora approval for this when you get a chance.
status-firefox31:
--- → unaffected
status-firefox32:
--- → affected
status-firefox33:
--- → fixed
status-firefox-esr24:
--- → unaffected
Flags: needinfo?(dteller)
Comment 33•10 years ago
|
||
I am not aware of a situation where quit-application would not be fired if we entered into nsAppStartup::run, except perhaps if we never ran the event loop at all in some restart scenario.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 34•10 years ago
|
||
Benjamin, what if some client of quit-application sitting before HealthReporter.jsm decided to spin the event loop. Look at our shutdown code, I have the impression that this could cause profile-before-change to be fired before quit-application reaches HealthReporter.jsm, wouldn't it?
Flags: needinfo?(dteller)
Comment 35•10 years ago
|
||
In normal shutdown, quit-application is typically fired here: http://hg.mozilla.org/mozilla-central/annotate/c482c28b35b6/toolkit/components/startup/nsAppStartup.cpp#l456
and the profile notifications don't happen until the stack unwinds all the way to here: http://hg.mozilla.org/mozilla-central/annotate/7297dedf2416/toolkit/xre/nsAppRunner.cpp#l4108
So it sounds unlikely that spinning an event loop would change anything there. A completely missing notification sounds more likely.
Comment 36•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4][PTO June 19-22] from comment #32)
> Please request Aurora approval for this when you get a chance.
Please? :)
Flags: needinfo?(dteller)
Assignee | ||
Comment 37•10 years ago
|
||
Comment on attachment 8437113 [details] [diff] [review]
2. Bandaid patch, v3
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 917883 (although I suspected it existed with different symptoms form beforehand).
User impact if declined: Occasional shutdown freeze/crashes.
Testing completed (on m-c, etc.): Has been on m-c for one week.
Risk to taking this patch (and alternatives if risky): Can't think of any.
String or IDL/UUID changes made by this patch: None.
Attachment #8437113 -
Flags: approval-mozilla-aurora?
Flags: needinfo?(dteller)
Comment 38•10 years ago
|
||
Comment 39•10 years ago
|
||
Comment on attachment 8437113 [details] [diff] [review]
2. Bandaid patch, v3
Aurora approval granted.
Attachment #8437113 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 40•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Blocks: AsyncShutdown
Updated•6 years ago
|
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•