Closed Bug 1288321 Opened 8 years ago Closed 8 years ago

Crash in RunWatchdog in the restart process of upgrading nightly with nsUpdateDriver.cpp in the stack

Categories

(Toolkit :: Application Update, defect)

50 Branch
Unspecified
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox50 --- affected
firefox51 --- affected

People

(Reporter: Usul, Unassigned)

References

Details

(Keywords: crash, nightly-community, regression, Whiteboard: see comment 8 [fce-active-legacy])

Crash Data

Attachments

(1 obsolete file)

This bug was filed from the Socorro interface and is report bp-2bec1d5c-43ff-428d-b4aa-007ae2160721. ============================================================= Frame Module Signature Source 0 libxul.so RunWatchdog toolkit/components/terminator/nsTerminator.cpp:158 1 libnspr4.so _pt_root nsprpub/pr/src/pthreads/ptthread.c:216 Ø 2 libpthread-2.23.so libpthread-2.23.so@0x75c9 Ø 3 libc-2.23.so libc-2.23.so@0x102eac
https://crash-stats.mozilla.com/signature/?product=Firefox&_sort=-date&signature=RunWatchdog shows 23 crashes, all on Linux. I think this might be the Linux version of Bug 1272614 since RunWatchdog shows up there as well.
Component: General → Application Update
(In reply to Marcia Knous [:marcia - use ni] from comment #1) > https://crash-stats.mozilla.com/signature/?product=Firefox&_sort=- > date&signature=RunWatchdog shows 23 crashes, all on Linux. I think this > might be the Linux version of Bug 1272614 since RunWatchdog shows up there > as well. I could only find two reports in this list that seemed to be related to updates. This doesn't mean that the others aren't, but I could not find any indication that they were. In [1], the user comment says that the hang occurred while applying an update. In [2], we seem to be clearly stuck in nsUpdateProcessor::WaitForProcess[3]. Matt, this seems to be in line with your theory in bug 1272614 comment 16. [1] https://crash-stats.mozilla.com/report/index/e6115d49-1b50-41d3-bb9e-c8f612160718#allthreads [2] https://crash-stats.mozilla.com/report/index/b93f5262-307e-4357-b1db-cab752160716#allthreads [3] https://hg.mozilla.org/projects/ash/annotate/63cc31d6cc1c/toolkit/xre/nsUpdateDriver.cpp#l1000
Flags: needinfo?(mhowell)
Yeah, I don't know how to tell for sure from that crash dump that WaitForProcess is actually responsible for the timeout, but I think that's enough to say it's causing trouble for at least somebody. I'll start working on a patch to make that function nonblocking. Not sure how I'm going to do that, but hey, how hard can it be, right? Right?
Assignee: nobody → mhowell
Flags: needinfo?(mhowell)
Comment on attachment 8774911 [details] Bug 1288321 - Avoid blocking where possible while waiting for the updater to stage I am not the most confident about this patch that I have ever been; triggering a try build before requesting review.
One of the things that may be causing this is the thread taking more than 2 minutes which I believe is what it is limited too in this instance. One of the things that might alleviate this is to perform staging updates without the ability to recover the staged directory (e.g. making backups of existing files, etc.).
Attachment #8774911 - Flags: review?(spohl.mozilla.bugs)
Crash volume for signature 'RunWatchdog': - nightly (version 50): 54 crashes from 2016-06-06. - aurora (version 49): 0 crash from 2016-06-07. - beta (version 48): 0 crash from 2016-06-06. - release (version 47): 0 crash from 2016-05-31. - esr (version 45): 0 crash from 2016-04-07. Crash volume on the last weeks: Week N-1 Week N-2 Week N-3 Week N-4 Week N-5 Week N-6 Week N-7 - nightly 24 11 11 0 0 0 0 - aurora 0 0 0 0 0 0 0 - beta 0 0 0 0 0 0 0 - release 0 0 0 0 0 0 0 - esr 0 0 0 0 0 0 0 Affected platform: Linux
Comment on attachment 8774911 [details] Bug 1288321 - Avoid blocking where possible while waiting for the updater to stage Moving this patch to bug 1272614. This bug only has one report that obviously relates to app update, and that one is specifically about app update, so the patch is more relevant there (it addresses all platforms). After some e-mail discussion between me and spohl, we think the signature this bug is about only exists because of some strangeness with Socorro; I filed bug 1290967 about that. There isn't really anything we can do on this bug specifically until we have some result there.
Attachment #8774911 - Attachment is obsolete: true
Attachment #8774911 - Flags: review?(spohl.mozilla.bugs)
Assignee: mhowell → nobody
Attachment #8774911 - Attachment is obsolete: false
Attachment #8774911 - Attachment is obsolete: true
I just updated Firefox 47.0.1 to Firefox 48 by checking for updates via the about dialog. After the download completed, I clicked the button to restart. After about a minute, the main window disappeared but the about dialog and a private window remained visible. It finally crashed: https://crash-stats.mozilla.com/report/index/5342a435-bc81-4675-a5f3-8793a2160808#allthreads Socorro seems to remove URLs, but here's the link to the pastebin with the sampled process in the hung state (not sure if that adds much to what's available via Socorro directly): http://pastebin.com/VQ9DZ0wL
Crash volume for signature 'RunWatchdog': - nightly (version 51): 48 crashes from 2016-08-01. - aurora (version 50): 28 crashes from 2016-08-01. - beta (version 49): 0 crashes from 2016-08-02. - release (version 48): 0 crashes from 2016-07-25. - esr (version 45): 0 crashes from 2016-05-02. Crash volume on the last weeks (Week N is from 08-22 to 08-28): W. N-1 W. N-2 W. N-3 - nightly 9 12 12 - aurora 4 11 4 - beta 0 0 0 - release 0 0 0 - esr 0 0 0 Affected platform: Linux Crash rank on the last 7 days: Browser Content Plugin - nightly #41 - aurora #87 - beta - release - esr
Keywords: regression
Version: unspecified → 50 Branch
Depends on: 1290967
Whiteboard: see comment 8
Whiteboard: see comment 8 → see comment 8 [fce-active]
Matt, it looks like there are still quite a few reports coming in with builds well after your patch in bug 1272614 landed. Do you think this would still be happening in nsUpdateProcessor after your patch landed? Weird there was one crash report where nsUpdateDriver.cpp code was in the stack. :(
Flags: needinfo?(mhowell)
The signature this bug is tracking is just the general shutdownhang watchdog timeout case, so it includes everything that causes shutdown hangs, but Socorro isn't recognizing these as such; normally it would rewrite the signature to start with "shutdownhang" but that isn't happening with these. I filed bug 1290967 when I realized that was happening, and it's just been marked fixed this morning, so I would expect this signature to start disappearing very soon. Is there a recent report that includes nsUpdateDriver? I don't know a good way to search a bunch of these things for contents of call stacks.
Flags: needinfo?(mhowell) → needinfo?(robert.strong.bugs)
I no longer see RunWatchdog now that bug 1290967 and bug 1272614 was fixed so resolving wfm. Crashes in app update should include nsUpdateDriver.cpp and after going through many crash reports I have only found 1 on Linux out of 44 signature shutdownhang in the last week on 51.0a1, 50.a2, and 49b and 1 on Mac. I'll file bugs for those after I go through more crash reports. https://crash-stats.mozilla.com/report/index/e6b3a1e9-53e8-4109-bbaa-1bd2c2160908#allthreads https://crash-stats.mozilla.com/report/index/6005970a-c1f1-4b5d-83d8-4e5f02160907#allthreads
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(robert.strong.bugs)
Resolution: --- → WORKSFORME
Also filed bug 1301572 which should at the very least fix the Mac crash.
Summary: Crash in RunWatchdog in the restart process of upgrading nightly → Crash in RunWatchdog in the restart process of upgrading nightly with nsUpdateDriver.cpp in the stack
Whiteboard: see comment 8 [fce-active] → see comment 8 [fce-active-legacy]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: