Background task updater leaves profiles or parts of profiles on disk in TmpD (%APPDATA%/Temp)
Categories
(Toolkit :: Application Update, defect, P1)
Tracking
()
People
(Reporter: Gijs, Assigned: nrishel)
References
Details
(Whiteboard: [fidedi-ope])
Attachments
(7 files, 2 obsolete files)
(deleted),
image/png
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
My temp dir appears to have 18 directories of the form MozillaBackgroundTask-{hash}-backgroundupdate[-N]
, where N ranges from 1-17.
Some are empty, some have a bunch of subfolders that have more subfolders but no files, some have something like the attachment, some just have 2 NSS/password files (key4+cert9).
All of these are from the last 9-ish days. Confusingly, most of the timestamps on the files seem not to be close to times that I would have shut off the machine, so I'm not sure why they wouldn't have been cleaned up successfully.
We should make sure these files get tidied up.
Comment 1•3 years ago
|
||
I'll add another data point here. I currently have 146 of these directories. It's true that I have an unusual number of installations, and I do a fair amount of background update testing though. If we narrow it down to directories with the hash of my primary installation though, there are still 42 directories. Those 42 directories are using 1MB total disk space. The timestamps on those directories range from 2021-04-26 to 2021-07-27. I do see a fair number of timestamps that are very close together, which might be from times when I was trying to reproduce bugs on Nightly.
I just tried running the task manually on my primary installation (while Firefox was still running) and did not see a directory get left behind. I then tried running the task while Firefox was not running and with an update pending and a directory got left behind with contents similar to the screenshot in the description. I then ran it again after the update had been installed, and a directory was left behind with only the key4 and cert9 files.
Comment 2•3 years ago
|
||
We have another report of this in the wild: https://twitter.com/ericlaw/status/1431326875687792645. NI to me to see what a periodic cleanup would look like.
Comment 3•3 years ago
|
||
I'm calling this an S4. It can potentially consume some disk space, so we should really fix it. But it shouldn't have much impact on users.
Updated•3 years ago
|
Updated•3 years ago
|
Reporter | ||
Comment 6•3 years ago
|
||
(In reply to Alice0775 White from comment #5)
set regressor bug # to tracking
Is that bug really the regressor? This bug got filed 3 months ago, before bug 1734262 was either filed or fixed... perhaps things got worse or changed somehow?
Comment 7•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #6)
(In reply to Alice0775 White from comment #5)
set regressor bug # to tracking
Is that bug really the regressor? This bug got filed 3 months ago, before bug 1734262 was either filed or fixed... perhaps things got worse or changed somehow?
It's not. This ticket is exacerbated by using the background task apparatus more aggressively, but it predates converting pingsender.
Comment 8•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #6)
(In reply to Alice0775 White from comment #5)
set regressor bug # to tracking
Is that bug really the regressor?
Yes, It actually started after bug 1734262 landed.
I checked autoland Build and I got the regression window in Bug 1736910 comment #1.
This bug got filed 3 months ago, before bug 1734262 was either filed or fixed... perhaps things got worse or changed somehow?
So, This bug seems to be different from bug 1736910
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
Consolidate method implementations into BackgroundTasks.cpp, remove unused headers and replaced with forward declarations where possible, removed redundant mozilla::
namespacing for individual variables.
Assignee | ||
Comment 10•3 years ago
|
||
Depends on D139328
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Comment 11•3 years ago
|
||
Comment on attachment 9264899 [details]
Bug 1722777 - Pre: Cleanup BackgroundTask. r=nalexander
Revision D139328 was moved to bug 1756675. Setting attachment 9264899 [details] to obsolete.
Comment 12•3 years ago
|
||
Comment on attachment 9264900 [details]
Bug 1722777 - Pre: Add logging for BackgroundTask temporary profile setup and teardown. r=nalexander
Revision D139329 was moved to bug 1756675. Setting attachment 9264900 [details] to obsolete.
Comment 13•3 years ago
|
||
I won't file a sub-ticket for this just yet, but we probably want to disable fast shutdown mode entirely in background tasks. We can use toolkit.shutdown.fastShutdownStage=0
in backgroundtasks.js to do so. Curiously, it's not enough for me when I hack it into a PGO build, and I don't yet understand why not.
Comment 14•3 years ago
|
||
I got a chance to dig into this. :nrishel reported that this reproed much more consistently with PGO builds, which was a clue. Focusing on the case where the deletion fails due to cert9.db
and key4.db
, this is what is happening. Due to challenges refcounting NSS objects, NSS is shutdown as part of XPCOM shutdown. In the "fast shutdown" mode, that shutdown simply doesn't occur. That means the SQLite DBs aren't closed, and (waves hands) that somehow leaves them blocking the temporary profile directory deletion. Hence, this bug. (Although, not the only cause of this bug.)
Fast shutdown mode is only enabled when !DEBUG
and !MOZ_PROFILE_GENERATE
, explaining why I don't see this locally. If we were to test against a task that makes network requests in test_backgroundtask_deletes_profile.js, I think we'd witness this problem with opt
builds.)
Here's what I propose for this ticket. First, we need to fill the hole we've dug:
- We need to raise the high-water mark for the number of temporary profile directories immediately.
- We need an approach to clean up existing directories incrementally.
- If we can use a Windows-only feature like
MoveFileEx
to make sure these get cleaned up at shutdown time, great. Perhaps let's only do that if we fail to delete the directory after N attempts (say 2 attempts).
Concurrently, we need to stop digging:
- Let's get a test establishing a background tasks profile contents highwater mark: Bug 1679443. This is tricky because what the test task does actually matters in this case; profile contents are created on-demand. But we can at least perform a network request to see the NSS files, say.
- Let's disable fast shutdown in background tasks mode, so that we are more likely to have an orderly shutdown. It's not great that shutdowns could take longer and/or hang, but I think it's a risk worth taking: we don't have a human around to witness and correct problems. (This doesn't have a ticket yet but should get one.)
- Let's turn of these NSS databases entirely: Bug 1757252. We can easily keep this fixed with the tests added by Bug 1679443.
- Let's turn off recording Telemetry in background tasks mode (since it won't be submitted): Bug 1757229. This should squash another large source of this particular problem.
- As a bonus, over the weekend I dug into not writing cookie databases to disk and it's pretty straightforward: Bug 1675829. I'll get a patch up.
Assignee | ||
Comment 15•3 years ago
|
||
Since we determined the profile directory limit won't be reached for ~6 years we can skip 1.
If we're going to clean up directories incrementally we can skip 3.; in retrospect clean up on shutdown makes more sense for one off file cleanup.
Assignee | ||
Comment 16•3 years ago
|
||
Assignee | ||
Comment 17•3 years ago
|
||
Depends on D140427
Assignee | ||
Comment 18•3 years ago
|
||
Depends on D140428
Assignee | ||
Comment 19•3 years ago
|
||
Depends on D140428
Assignee | ||
Comment 20•3 years ago
|
||
Depends on D140910
Comment 22•3 years ago
|
||
Comment 23•3 years ago
|
||
Comment 24•3 years ago
|
||
Backed out for causing xpcshell failures.
backout link: https://hg.mozilla.org/integration/autoland/rev/9984f7dd94b374f319fbaaede80b10b30c9975b4
Failure log: https://treeherder.mozilla.org/logviewer?job_id=371597578&repo=autoland&lineNumber=3486
Comment 25•3 years ago
|
||
Backed out for causing xpcshell failures.
Backout link: https://hg.mozilla.org/integration/autoland/rev/82e9c155cec38c11e3aecbddb0251d8392328aa2
Failure log: https://treeherder.mozilla.org/logviewer?job_id=371608568&repo=autoland&lineNumber=1833
Updated•3 years ago
|
Comment 26•3 years ago
|
||
Comment 27•3 years ago
|
||
Backed out for causing multiple failures
Backout link: https://hg.mozilla.org/integration/autoland/rev/027f5c773698ffa72b4a175f32b21e9b2cb17e02
Failure log:
https://treeherder.mozilla.org/logviewer?job_id=371628193&repo=autoland&lineNumber=6662
https://treeherder.mozilla.org/logviewer?job_id=371626107&repo=autoland&lineNumber=13762
Comment 28•3 years ago
|
||
Comment 29•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5c96cf253726
https://hg.mozilla.org/mozilla-central/rev/9232381871c4
https://hg.mozilla.org/mozilla-central/rev/e39d7cb0170b
https://hg.mozilla.org/mozilla-central/rev/bcb41f5b279a
https://hg.mozilla.org/mozilla-central/rev/6c22e1c0fc14
https://hg.mozilla.org/mozilla-central/rev/20ab625c5204
Assignee | ||
Comment 30•3 years ago
|
||
Resolved test issues that caused backouts.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•2 years ago
|
Description
•