Closed
Bug 1038342
Opened 10 years ago
Closed 10 years ago
Force Firefox crash if shutdown hangs
Categories
(Core :: General, defect)
Core
General
Tracking
()
RESOLVED
FIXED
mozilla34
People
(Reporter: vladan, Assigned: Yoric)
References
(Depends on 2 open bugs)
Details
(Keywords: perf)
Attachments
(1 file, 7 obsolete files)
(deleted),
patch
|
benjamin
:
superreview+
|
Details | Diff | Splinter Review |
Firefox sometimes deadlocks on shutdown, preventing a restart and causing users frustration. We should have a timer with a high threshold (99th percentile of shutdown time) to kill Firefox with exit(0) if shutdown hasn't made progress in a long time. If the main thread isn't hung, we can also try to force cleanup (e.g. saving Places) before making the exit(0) call.
Assignee | ||
Comment 1•10 years ago
|
||
Note to self: need to check how reliable our Telemetry data on shutdown is. If it is reliable, we can probably use something along the lines of "Longest successful shutdown ever" * 1.05.
Updated•10 years ago
|
OS: Windows 7 → All
Hardware: x86_64 → All
Assignee | ||
Comment 2•10 years ago
|
||
Actually, note to self: if we don't want to break AsyncShutdown, we need to wait for "Longest successful shutdown ever" + 1 minute + \epsilon.
Assignee | ||
Comment 3•10 years ago
|
||
Still brainstorming. 1. What runs the Terminator? A XPCOM/Chromium/PR thread or a process? - XPCOM thread is good for communications, but blocks shutdown, so that's probably a bad idea; - I have no experience with Chromium threads, but I seem to understand that they have their own event loop, which looks a tad overkill – no idea whether they block shutdown; - PR thread might be sufficient, if we are careful with communications; - getting a process right on all platforms sounds nightmarish. So I'm tempted to go with PRThread. 2. When do we spawn? I'd go for "quit-application". 3. When do we kill? a. Start a timer when we reach quit-application. If we haven't reached profile-before-change after N seconds, terminate Firefox. b. Restart the timer. If we haven't reached xpcom-will-shutdown after N seconds, terminate Firefox. c. Restart the timer. If we haven't reached xpcom-shutdown after N seconds, terminate Firefox. d. Restart the timer. After N seconds, terminate Firefox. Hopefully, by then, the process has been killed. 4. How many seconds? We should use AsyncShutdown-style timers to make sure that we we don't kill Firefox simply because the lid of the laptop has been closed. Unfortunately, I don't think that there is a good way to measure the duration of profile-before-change => xpcom-will-shutdown or xpcom-will-shutdown => xpcom-shutdown or xpcom-shutdown => exit. Well, there may be ways, but they strongly sound like overengineering (i.e. write durations to the Registry Base under Windows, I'm not sure where on other platforms). Also, if we make Terminator faster than AsyncShutdown, we will lose all the precious crash data. I there believe that we should make Terminator adopt a "AsyncShutdown + 5 seconds" policy. 5. Crash or exit(0)? exit(0) sounds like the right thing for release, but we may decide to go for crashing in Nightly, Aurora, early Beta, so as to get statistics.
Assignee | ||
Comment 4•10 years ago
|
||
As a side-note, for 5., we have flag EARLY_BETA_OR_EARLIER.
Comment 5•10 years ago
|
||
Should be an unjoinable PRThread, I think. I suggest that we start the timer at http://hg.mozilla.org/mozilla-central/annotate/a74600665875/toolkit/xre/nsAppRunner.cpp#l4014 right after the main event loop exits. This will cover all of profile shutdown and subsequent steps. I'm not sure what AsyncShutdown-style timers means in this context, but yes, we should try to make sure that machine sleep doesn't affect it. Whatever the actual shutdown mechanism is, it must not run c++ static destructors, so exit() is out. I think that _exit() also may be out on Windows, but I'm not sure about other platforms. I think we should start out with MOZ_CRASH and later we can decide to switch it to something more silent like TerminateProcess.
Assignee | ||
Comment 6•10 years ago
|
||
By AsyncShutdown-style timer, I mean "wait for at least one second, then once we are awake, wait again for at least one second, etc." N times.
Assignee | ||
Comment 7•10 years ago
|
||
C++ 11 offers a `quick_exit` function that does not run object destructors. It is not specified whether temporary files are removed and streams are flushed. I *think* that's ok.
Comment 8•10 years ago
|
||
Compiler support may be lacking for that, unfortunately. In particular, no version of Visual Studio (up to and including 2013) supports quick_exit [1]. [1] http://msdn.microsoft.com/en-us/library/hh567368.aspx#concurrencytable
Assignee | ||
Comment 9•10 years ago
|
||
Thanks Emanuel. I'll try and produce a first version that runs `_exit(0)` and see if we can iterate on this. MSDN wasn't clear on whether this runs static destructors, I'll experiment. Also, I have early sketches on how we could trigger Places shutdown, etc, but this looks like lots of work, plus the simplest way I have in mind goes through bug 918317. I believe that we can get a first iteration working without this. With a large enough delay, I don't see how this could create datalosses.
Assignee | ||
Comment 10•10 years ago
|
||
Attaching a first minimal prototype. If this minimal feature set is sufficient, I intend to work on attempts to rescue shutdown (i.e. force Places to be closed as cleanly as possible) as a followup bug. This implementation uses `_exit(0)` for the time being, until I have time to test its behavior under Windows and/or try and come up with something better.
Attachment #8462024 -
Flags: feedback?(vdjeric)
Attachment #8462024 -
Flags: feedback?(nfroyd)
Attachment #8462024 -
Flags: feedback?(benjamin)
Comment 11•10 years ago
|
||
Random suggestion from a someone just happy to finally see this get implemented: append a timestamp to each of the ShutdownProgress crash report annotations
Assignee | ||
Comment 12•10 years ago
|
||
Dave: what's the usecase for this timestamp?
Comment 13•10 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] from comment #12) > Dave: what's the usecase for this timestamp? It would give a clear timeline for the shutdown hang/crash in a crash report. Without it, there's no way to tell how long of a hang the report is for. Someone could have a toolkit.asyncshutdown.crash_timeout set too low and be submitting reports for no practical reason. Timestamping each phase would also allow basic telemetry to see if some reports are slow for everything or just hang on one phase.
Assignee | ||
Comment 14•10 years ago
|
||
This sounds like a good plan. Filed as bug 1044020.
Comment 15•10 years ago
|
||
Comment on attachment 8462024 [details] [diff] [review] Implementation of nsTerminator Review of attachment 8462024 [details] [diff] [review]: ----------------------------------------------------------------- Just some small comments. I was going to grouse about yet another watchdog thread, but I guess this one is only active during shutdown, so it's not too bad. ::: toolkit/components/terminator/moz.build @@ +15,5 @@ > +EXTRA_COMPONENTS += [ > + 'terminator.manifest', > +] > + > +FINAL_LIBRARY = 'toolkitcomps' I think this is now supposed to be 'xul' everywhere or somesuch. Check with glandium. ::: toolkit/components/terminator/nsTerminator.cpp @@ +7,5 @@ > +/** > + * A watchdog designed to terminate shutdown if it lasts too long. > + * > + * This watchdog is designed as a worst-case problem container for the > + * common case in which Firefox just won't shutdown. Nit: trailing whitespace here and elsewhere. @@ +10,5 @@ > + * This watchdog is designed as a worst-case problem container for the > + * common case in which Firefox just won't shutdown. > + * > + * We spawn a thread during quit-application. If any of the shutdown > + * steps takes more than n millisecons (63000 by default), kill the Nit: milliseconds. @@ +29,5 @@ > +#include "mozilla/Preferences.h" > +#include "mozilla/Scoped.h" > +#include "mozilla/Services.h" > +#include "mozilla/ArrayUtils.h" > +#include "mozilla/DebugOnly.h" Nit: Sort these? @@ +31,5 @@ > +#include "mozilla/Services.h" > +#include "mozilla/ArrayUtils.h" > +#include "mozilla/DebugOnly.h" > + > +#include "nsTerminator.h" You are cargo-culting includes from this file into your .h file. Maybe #include nsTerminator.h first? @@ +61,5 @@ > + ScopedDeletePtr<Options> options((Options*)arg); > + int32_t crashAfterMS = options->crashAfterMS; > + options.dispose(); > + > + int timeToLive = crashAfterMS; We should probably just make this int32_t, to match crashAfterMS. @@ +105,5 @@ > + "quit-application" > +, "profile-change-teardown" > +, "profile-before-change" > +, "xpcom-will-shutdown" > +, "xpcom-shutdown" Nit: this formatting is unusual. Better might be to just include the trailing comma always: "quit-application", ... "xpcom-shutdown", which preserves the one-line-of-diff-per-addition property. @@ +111,5 @@ > + > +NS_IMPL_ISUPPORTS( > + nsTerminator > +, nsIObserver > +) Just one line for this, please. @@ +156,5 @@ > + &crashAfterMS); > + if (NS_SUCCEEDED(rv)) { > + // Add a little padding, to ensure that we do not crash before > + // AsyncShutdown. > + crashAfterMS *= 1.05; Probably should make sure this doesn't overflow. (Conversion from doubles that are too large is undefined, I believe.) @@ +159,5 @@ > + // AsyncShutdown. > + crashAfterMS *= 1.05; > + } else { > + crashAfterMS = DEFAULT_CRASH_AFTER_MS; > + } Should check that crashAfterMS is positive, too. @@ +204,5 @@ > + > +#if defined(MOZ_CRASHREPORTER) > + // In case of crash, we wish to know where in shutdown we are > + nsCOMPtr<nsICrashReporter> reporter = > + do_GetService("@mozilla.org/toolkit/crash-reporter;1"); NS_CRASHREPORTER_CONTRACTID? ::: toolkit/crashreporter/test/unit/xpcshell.ini @@ +31,5 @@ > skip-if = os=='linux' && bits==32 > > [test_crash_AsyncShutdown.js] > [test_event_files.js] > +[test_crash_terminator.js] Missing hg add?
Attachment #8462024 -
Flags: feedback?(nfroyd) → feedback+
Assignee | ||
Comment 16•10 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #15) > @@ +111,5 @@ > > + > > +NS_IMPL_ISUPPORTS( > > + nsTerminator > > +, nsIObserver > > +) > > Just one line for this, please. Side-note: apparently, this is the style, these days. > Missing hg add? Oops. Let me add this as a differential patch.
Assignee | ||
Comment 17•10 years ago
|
||
Attachment #8462784 -
Flags: feedback?(vdjeric)
Attachment #8462784 -
Flags: feedback?(nfroyd)
Attachment #8462784 -
Flags: feedback?(benjamin)
Comment 18•10 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] (PTO + various activities, hard to reach until August 20th - please use "needinfo") from comment #16) > (In reply to Nathan Froyd (:froydnj) from comment #15) > > @@ +111,5 @@ > > > + > > > +NS_IMPL_ISUPPORTS( > > > + nsTerminator > > > +, nsIObserver > > > +) > > > > Just one line for this, please. > > Side-note: apparently, this is the style, these days. I think you have been spending too much time in places/ and storage/ code. Those are the only modules that use that style, according to grep.
Comment 19•10 years ago
|
||
Comment on attachment 8462784 [details] [diff] [review] 2. Forgotten test file Review of attachment 8462784 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/crashreporter/test/unit/test_crash_terminator.js @@ +18,5 @@ > + createInstance(Components.interfaces.nsIObserver); > + terminator.observe(null, "profile-after-change", null); > + > + // Inform the terminator that shutdown has started > + // Pick an arbitray notification Nit: 'arbitrary' @@ +19,5 @@ > + terminator.observe(null, "profile-after-change", null); > + > + // Inform the terminator that shutdown has started > + // Pick an arbitray notification > + terminator.observe(null, "xpcom-will-shutdown", null); Maybe pull this notification name out into: const NOTIFICATION = ...; or something and use it below, too? @@ +22,5 @@ > + // Pick an arbitray notification > + terminator.observe(null, "xpcom-will-shutdown", null); > + > + while(true) { > + dump("Waiting until crash\n"); Maybe pull this out of the loop, so you're not spamming the log if we happen to process many events?
Attachment #8462784 -
Flags: feedback?(nfroyd) → feedback+
Reporter | ||
Comment 20•10 years ago
|
||
Comment on attachment 8462024 [details] [diff] [review] Implementation of nsTerminator Review of attachment 8462024 [details] [diff] [review]: ----------------------------------------------------------------- - Will we be able to measure how often (% of shutdowns) this Terminator triggers using Breakpad? I suspect not.. so can you file another bug to Telemetry reporting similar to failedProfileLockCount? ::: toolkit/components/terminator/nsTerminator.cpp @@ +53,5 @@ > +}; > + > +void run(void* arg) > +{ > + PR_SetCurrentThreadName("Terminator"); Let's give it a more descriptive title.. like "Shutdown hang terminator" @@ +81,5 @@ > + timeToLive = crashAfterMS; > + continue; > + } > + timeToLive -= TICK_DURATION; > + if (timeToLive > 0) { Nit: >= @@ +89,5 @@ > + // Shutdown is apparently dead. Kill the process. > + MOZ_CRASH("Shutdown too long, probably frozen, causing a crash."); > + > + // Oh, we are in release mode. Well, let's not crash then. > + NS_WARNING("Shutdown too long, probably frozen, terminating the process."); MOZ_CRASH doesn't cause a crash on Release? @@ +185,5 @@ > + > +NS_IMETHODIMP > +nsTerminator::Observe(nsISupports *, const char *aTopic, const char16_t *) > +{ > + if (strcmp(aTopic, "profile-after-change") == 0) { What do you think about spawning the terminator thread on first sign of shutdown? ::: toolkit/components/terminator/terminator.manifest @@ +1,1 @@ > +category profile-after-change nsTerminator @mozilla.org/toolkit/shutdown-terminator;1 How much startup overhead does this create?
Attachment #8462024 -
Flags: feedback?(vdjeric) → feedback+
Reporter | ||
Comment 21•10 years ago
|
||
Comment on attachment 8462784 [details] [diff] [review] 2. Forgotten test file Review of attachment 8462784 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/crashreporter/test/unit/test_crash_terminator.js @@ +8,5 @@ > +function setup_crash() { > + Components.utils.import("resource://gre/modules/Services.jsm"); > + > + Services.prefs.setBoolPref("toolkit.terminator.testing", true); > + Services.prefs.setIntPref("toolkit.asyncshutdown.crash_timeout", 10); Why not have a shorter timeout so the test runs quicker?
Attachment #8462784 -
Flags: feedback?(vdjeric) → feedback+
Assignee | ||
Comment 22•10 years ago
|
||
(In reply to Vladan Djeric (:vladan) from comment #21) > Why not have a shorter timeout so the test runs quicker? Shorter than 10 milliseconds? Sure, but that's not going to change much. (In reply to Vladan Djeric (:vladan) from comment #20) > - Will we be able to measure how often (% of shutdowns) this Terminator > triggers using Breakpad? I suspect not.. so can you file another bug to > Telemetry reporting similar to failedProfileLockCount? For each version, we should be able to compute this as # of Terminator Crashes with Telemetry enabled / # of runs with Telemetry enabled I'm not sure whether the CrashReporter knows whether Telemetry is enabled, but that's pretty easy to add. > @@ +89,5 @@ > > + // Shutdown is apparently dead. Kill the process. > > + MOZ_CRASH("Shutdown too long, probably frozen, causing a crash."); > > + > > + // Oh, we are in release mode. Well, let's not crash then. > > + NS_WARNING("Shutdown too long, probably frozen, terminating the process."); > > MOZ_CRASH doesn't cause a crash on Release? My bad, I was confusing the behavior of MOZ_CRASH and that of MOZ_ASSERT. > > @@ +185,5 @@ > > + > > +NS_IMETHODIMP > > +nsTerminator::Observe(nsISupports *, const char *aTopic, const char16_t *) > > +{ > > + if (strcmp(aTopic, "profile-after-change") == 0) { > > What do you think about spawning the terminator thread on first sign of > shutdown? Isn't that what I'm doing? > > ::: toolkit/components/terminator/terminator.manifest > @@ +1,1 @@ > > +category profile-after-change nsTerminator @mozilla.org/toolkit/shutdown-terminator;1 > > How much startup overhead does this create? Good question. I don't think we have any measurement on that topic.
Reporter | ||
Comment 23•10 years ago
|
||
> Shorter than 10 milliseconds? Sure, but that's not going to change much. My mistake, I thought the toolkit.asyncshutdown.crash_timeout pref was in units of seconds. > For each version, we should be able to compute this as > # of Terminator Crashes with Telemetry enabled / # of runs with Telemetry > enabled > > I'm not sure whether the CrashReporter knows whether Telemetry is enabled, > but that's pretty easy to add. That works but then we can't do Telemetry Alerting for it. How hard is it to get the # of Terminator Crashes with Telemetry enabled? > Isn't that what I'm doing? My mistake, I thought SelfInit creates the thread. > Good question. I don't think we have any measurement on that topic. ts_paint can help answer that question (at least partially). Roberto also has the ts_cold_paint test which apparently isn't noisy on SSD systems
Assignee | ||
Comment 24•10 years ago
|
||
(In reply to Vladan Djeric (:vladan) from comment #23) > > For each version, we should be able to compute this as > > # of Terminator Crashes with Telemetry enabled / # of runs with Telemetry > > enabled > > > > I'm not sure whether the CrashReporter knows whether Telemetry is enabled, > > but that's pretty easy to add. > > That works but then we can't do Telemetry Alerting for it. How hard is it to > get the # of Terminator Crashes with Telemetry enabled? Since the scope of the Terminator covers cases that occur after Telemetry (and Profile) shutdown, this won't work without an awful hack, such as writing the data in the registry base, which is a bit annoying. > > Good question. I don't think we have any measurement on that topic. > > ts_paint can help answer that question (at least partially). Roberto also > has the ts_cold_paint test which apparently isn't noisy on SSD systems Note that we are using this mechanism in a few places over the tree, plus most add-ons: http://dxr.mozilla.org/mozilla-central/search?q=profile-after-change+ext%3Amanifest&case=true&redirect=true
Reporter | ||
Comment 25•10 years ago
|
||
> Since the scope of the Terminator covers cases that occur after Telemetry > (and Profile) shutdown, this won't work without an awful hack, such as > writing the data in the registry base, which is a bit annoying. I was suggesting doing something similar to what failedProfileLockCount does: the Terminator could write a file in the profile directory during abnormal exit, and then Telemetry could pick it up during the following session. > Note that we are using this mechanism in a few places over the tree, plus > most add-ons: > http://dxr.mozilla.org/mozilla-central/search?q=profile-after- > change+ext%3Amanifest&case=true&redirect=true Yeah, I saw it was very common, that's why I was curious what the cost might be. I doubt the startup cost consideration would block this feature.
Assignee | ||
Comment 26•10 years ago
|
||
(In reply to Vladan Djeric (:vladan) from comment #25) > I was suggesting doing something similar to what failedProfileLockCount > does: the Terminator could write a file in the profile directory during > abnormal exit, and then Telemetry could pick it up during the following > session. This is possible, but it would violate our invariant that we do not write to the profile directory after profile-before-change2. Before proceeding in this direction, I would like bsmedberg's opinion.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 27•10 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #19) > @@ +19,5 @@ > > + terminator.observe(null, "profile-after-change", null); > > + > > + // Inform the terminator that shutdown has started > > + // Pick an arbitray notification > > + terminator.observe(null, "xpcom-will-shutdown", null); > > Maybe pull this notification name out into: > > const NOTIFICATION = ...; > > or something and use it below, too? This would be a bit tricky as `do_crash` passes `setup_crash` by source. To do this, I would need to handle serialization, which sounds a bit overkill.
Assignee | ||
Comment 28•10 years ago
|
||
Applied feedback.
Attachment #8462024 -
Attachment is obsolete: true
Attachment #8462784 -
Attachment is obsolete: true
Attachment #8462024 -
Flags: feedback?(benjamin)
Attachment #8462784 -
Flags: feedback?(benjamin)
Attachment #8467816 -
Flags: feedback?(benjamin)
Comment 29•10 years ago
|
||
Comment on attachment 8467816 [details] [diff] [review] Implementation of nsTerminator, v2 >diff --git a/toolkit/components/terminator/nsTerminator.cpp b/toolkit/components/terminator/nsTerminator.cpp >new file mode 100644 >--- /dev/null >+++ b/toolkit/components/terminator/nsTerminator.cpp >@@ -0,0 +1,228 @@ >+/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- should be tab-width: 8 >+ * the Terminator thread, once it has acknolwedged the progress. nit spelling >+void run(void* arg) style nit, void on its own line. >+#if defined(EARLY_BETA_OR_EARLIER) >+ >+ // Shutdown is apparently dead. Crash the process. >+ MOZ_CRASH("Shutdown too long, probably frozen, causing a crash."); >+ >+#else >+ >+ // Shutdown is apparently dead. Kill the process. >+ NS_WARNING("Shutdown too long, probably frozen, terminating the process."); >+ >+ // Exit, running as little cleanup as possible. >+ // Note that this will leave the profile locked. >+ _exit(0); Are the experiments from comment 9 done? I think that we should start by landing this only with MOZ_CRASH. >+static const char* sObserverTopics[] = { should be static char const *const sObserverTopics[]. >+// Actually launch the thread. This takes place at the first sign of shutdown. >+nsresult >+nsTerminator::Start() { nit, brace in column 0. Why does this return an nsresult when the only caller doesn't error-check? Prefer void. >+ // Determine how long we need to wait >+ >+ nsresult rv; >+ nsCOMPtr<nsIPrefService> prefs = >+ do_GetService(NS_PREFSERVICE_CONTRACTID, &rv); >+ if (!NS_SUCCEEDED(rv)) { >+ return rv; >+ } >+ >+ nsCOMPtr<nsIPrefBranch> branch; >+ rv = prefs->GetBranch("", getter_AddRefs(branch)); >+ if (!NS_SUCCEEDED(rv)) { >+ return rv; >+ } >+ >+ int32_t crashAfterMS; >+ rv = branch->GetIntPref("toolkit.asyncshutdown.crash_timeout", >+ &crashAfterMS); Replace all this with Preferences::GetInt >+ if (NS_SUCCEEDED(rv)) { >+ // Add a little padding, to ensure that we do not crash before >+ // AsyncShutdown. >+ crashAfterMS += 3000; >+ MOZ_ASSERT(crashAfterMS > 0); >+ } else { >+ crashAfterMS = DEFAULT_CRASH_AFTER_MS; >+ } >+ >+ ScopedDeletePtr<Options> options(new Options()); >+ options->crashAfterMS = crashAfterMS; >+ >+ // Allocate and start the thread. >+ // By design, it will never finish, nor be deallocated. >+ PRThread* thread = PR_CreateThread( >+ PR_SYSTEM_THREAD, /* This thread will not prevent the process from terminating */ >+ run, >+ options.forget(), >+ PR_PRIORITY_LOW, >+ PR_LOCAL_THREAD /* FIXME: ?*/, This should be PR_GLOBAL_THREAD. Doesn't really matter on any of our configs, but local threads can be cooperatively scheduled by NSPR instead of scheduled by the OS, which is definitely not what we want here. >+ if (reporter) { >+ (void)reporter->AnnotateCrashReport(NS_LITERAL_CSTRING("ShutdownProgress"), >+ nsAutoCString(aTopic)); Prefer unused << foo(); over (void)foo here and below.
Attachment #8467816 -
Flags: feedback?(benjamin) → feedback+
Updated•10 years ago
|
Flags: needinfo?(benjamin)
Assignee | ||
Comment 30•10 years ago
|
||
I finally found the opportunity to run the experiments. Apparently, _exit(0) works as we hoped with VC++. Benjamin, Vladan, can you confirm that this experiment covers all the use cases we have?
Attachment #8475046 -
Flags: feedback?(vdjeric)
Attachment #8475046 -
Flags: feedback?(benjamin)
Comment 31•10 years ago
|
||
The testcase also needs to cover loading a DLL and seeing whether DllMain/static destructors from that external DLL are called with _exit.
Updated•10 years ago
|
Attachment #8475046 -
Flags: feedback?(benjamin)
Assignee | ||
Comment 32•10 years ago
|
||
I think that we can land a version based on MOZ_CRASH. Once we have this, we'll be able to get some data, plus progress with bug 1044020. In parallel, we can work on a binary test that somehow checks that _exit(0) works nicely on all tier 1 desktop platforms (not sure how we can load a DLL in a binary test, though), and move to _exit(0) once we are satisfied with this. Does this plan sound good to everyone?
Flags: needinfo?(vdjeric)
Flags: needinfo?(benjamin)
Comment 33•10 years ago
|
||
Yes, use MOZ_CRASH for now and let's do a followup about a safe termination system for this and for bug 662444.
Flags: needinfo?(benjamin)
Assignee | ||
Updated•10 years ago
|
Summary: Force Firefox exit(0) if shutdown hangs → Force Firefox crash if shutdown hangs
Reporter | ||
Comment 34•10 years ago
|
||
Sounds good to me. Also please talk to aklotz about any possible interactions with his profile unlocker bug 286355
Flags: needinfo?(vdjeric)
Reporter | ||
Comment 35•10 years ago
|
||
Comment on attachment 8475046 [details]
Experiment
Experiment looks complete aside from Benjamin's point about DLLs.
I did some googling and found that "whether C streams are closed and/or flushed, and files open with tmpfile are removed [during _exit(0)] depends on the particular system or library implementation." but I don't think we need to worry about this
Attachment #8475046 -
Flags: feedback?(vdjeric) → feedback+
Assignee | ||
Comment 36•10 years ago
|
||
Here we go.
Attachment #8467816 -
Attachment is obsolete: true
Attachment #8476545 -
Flags: review?(nfroyd)
Comment 37•10 years ago
|
||
Comment on attachment 8476545 [details] [diff] [review] Implementation of nsTerminator, v3 Review of attachment 8476545 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/terminator/nsTerminator.cpp @@ +63,5 @@ > + int32_t crashAfterMS; > +}; > + > +void > +run(void* arg) Nit: "Run"? @@ +69,5 @@ > + PR_SetCurrentThreadName("Shutdown Hang Terminator"); > + > + // Let's copy and deallocate options, that's one less leak to worry > + // about. > + ScopedDeletePtr<Options> options((Options*)arg); Please use UniquePtr.
Attachment #8476545 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 38•10 years ago
|
||
Comment on attachment 8476545 [details] [diff] [review] Implementation of nsTerminator, v3 Ted, can you review the crash test and the build parts?
Attachment #8476545 -
Flags: review?(ted)
Assignee | ||
Comment 39•10 years ago
|
||
Sorry, forgot to upload the updated version. Ted, can you look at build + crash test? Try: https://tbpl.mozilla.org/?tree=Try&rev=5ac2051439b1
Attachment #8476545 -
Attachment is obsolete: true
Attachment #8476545 -
Flags: review?(ted)
Attachment #8477345 -
Flags: review?(ted)
Assignee | ||
Comment 40•10 years ago
|
||
(sorry, that was an empty commit)
Attachment #8477345 -
Attachment is obsolete: true
Attachment #8477345 -
Flags: review?(ted)
Attachment #8477346 -
Flags: review?(ted)
Assignee | ||
Comment 41•10 years ago
|
||
Try: https://tbpl.mozilla.org/?tree=Try&rev=efec017f65dc
Assignee | ||
Updated•10 years ago
|
Attachment #8477346 -
Flags: superreview?(benjamin)
Comment 42•10 years ago
|
||
Comment on attachment 8477346 [details] [diff] [review] Implementation of nsTerminator, v5 Review of attachment 8477346 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/terminator/nsTerminator.cpp @@ +194,5 @@ > + nsCOMPtr<nsICrashReporter> reporter = > + do_GetService(NS_CRASHREPORTER_CONTRACTID); > + if (reporter) { > + unused << reporter->AnnotateCrashReport(NS_LITERAL_CSTRING("ShutdownProgress"), > + nsAutoCString(aTopic)); You can just #include "nsExceptionHandler.h" and call CrashReporter::AnnotateCrashReport without any xpcom goop. ::: toolkit/crashreporter/test/unit/test_crash_terminator.js @@ +8,5 @@ > +function setup_crash() { > + Components.utils.import("resource://gre/modules/Services.jsm"); > + > + Services.prefs.setBoolPref("toolkit.terminator.testing", true); > + Services.prefs.setIntPref("toolkit.asyncshutdown.crash_timeout", 10); Is this 10 seconds? That feels like a long time to make a test wait. Can this be shorter? @@ +12,5 @@ > + Services.prefs.setIntPref("toolkit.asyncshutdown.crash_timeout", 10); > + > + // Initialize the terminator > + // (normally, this is done through the manifest file, but xpcshell > + // doesn't take them into account). This seems odd. Is this just a quirk of xpcshell startup?
Attachment #8477346 -
Flags: review?(ted) → review+
Assignee | ||
Comment 43•10 years ago
|
||
Applied Ted's feedback about the crash manager. Benjamin, do you wish to sr this?
Attachment #8475046 -
Attachment is obsolete: true
Attachment #8477346 -
Attachment is obsolete: true
Attachment #8477346 -
Flags: superreview?(benjamin)
Attachment #8479000 -
Flags: superreview?(benjamin)
Comment 44•10 years ago
|
||
Comment on attachment 8479000 [details] [diff] [review] Implementation of nsTerminator, v6 Only two comments: Is the asyncshutdown timeout small enough that it will always run first before this? Wouldn't it make sense to have a final crash reporter annotation to mark the fact that we are intentionally crashing, rather than having to infer it from the stack/signature? The ShutdownProgress annotation isn't sufficient for that since other crashes which happen during shutdown will have that annotation.
Attachment #8479000 -
Flags: superreview?(benjamin) → superreview+
Assignee | ||
Comment 45•10 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #44) > Comment on attachment 8479000 [details] [diff] [review] > Implementation of nsTerminator, v6 > > Only two comments: > > Is the asyncshutdown timeout small enough that it will always run first > before this? Normally, yes. > Wouldn't it make sense to have a final crash reporter annotation to mark the > fact that we are intentionally crashing, rather than having to infer it from > the stack/signature? The ShutdownProgress annotation isn't sufficient for > that since other crashes which happen during shutdown will have that > annotation. I was counting on the MOZ_CRASH message for this. Otherwise, since `AnnotateCrashReport` is thread-safe only in the chrome process, we would need to complicate the code to handle semi-satisfactorily the content process.
Comment 46•10 years ago
|
||
You can feel free to add a new API that just sets a boolean or something, we have some of those already.
Assignee | ||
Comment 47•10 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #42) > Is this 10 seconds? That feels like a long time to make a test wait. Can > this be shorter? That's 10 milliseconds, actually. Anything > 0 would do. > @@ +12,5 @@ > This seems odd. Is this just a quirk of xpcshell startup? That's my understanding. (In reply to Ted Mielczarek [:ted.mielczarek] from comment #46) > You can feel free to add a new API that just sets a boolean or something, we > have some of those already. That sounds like a followup bug. For the moment, I really have no idea how I can make a thread-safe call to the crash reporter on a content process without assuming that the main thread hasn't frozen. Try: https://tbpl.mozilla.org/?tree=Try&rev=ef25c3727616
Keywords: checkin-needed
Comment 48•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/b1223e628e8f
Flags: in-testsuite+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b1223e628e8f
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Updated•9 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•