Use a multi-instance lock/semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it
Categories
(Toolkit :: Application Update, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox86 | --- | fixed |
People
(Reporter: robert.strong.bugs, Assigned: mhowell)
References
(Blocks 4 open bugs, Regressed 4 open bugs)
Details
(Whiteboard: [iu_tracking])
Attachments
(7 files)
(deleted),
text/x-phabricator-request
|
Details | |
Bug 1553982 Part 1 - Manage "multi-instance locks" that can be created by any component. r=bytesized
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/plain
|
chutten
:
data-review+
|
Details |
Some time in the past mhowell suggested using a semaphore to determine whether there are other instances running. This would allow us to lessen the issue where launching an additional instance of the same installation will perform an update and thereby require the other instance to restart. Below is a very simple flow of how I think this could work.
- Create or open the semaphore on startup.
- Use the Install hash for the name.
- Prefix with Global\ on Windows so it is available to all users
- The security of the semaphore will need to allow other users to work with it (need to figure out which users should be able to). This would allow other processes to DOS the update.
- Permissions will need to be set on the Linux and OS X semaphores files so other users can use it. This would allow other processes to DOS the update.
- For exits and crashes it will need to release the semaphore and it should close the handle when the instance count is 0. Does anyone know if this will be an issue in our crash handling code?
- After X amount of time has passed (channel specific) based on a user pref (prevents other user processes from preventing this) without updating due to the semaphore force the update. This will likely be done with a new update status and this will mitigate the ability to DOS the update.
- UI will most likely be shown when it is going to force the update.
- Change the update restart UI or possibly display a warning so the user is informed that clicking the button will force the update.
Note: Some people have had issues with users creating semaphores on Windows Terminal server so it should ignore the semaphore when it can't be created or opened.
I think this method should also be sufficient for the Update Agent to determine if there is a running instance for an update.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 2•4 years ago
|
||
I did a few minutes digging into what we have in tree that's relevant to this ticket. For context, we want:
- to interact with the new semaphore across all of our platforms (Windows, macOS, Linux);
- from at least two different processes -- Firefox and the background update agent -- that do not have any relationship with each other;
- with one of the processes, the background update agent, being written in Rust and not having the Firefox-y infrastructure to hand.
There are many semaphore implementations in the tree. Most aren't intended to address point 2: they're sem_init
(under POSIX). In particular ipc/glue/CrossProcessSemaphore.cpp
expects to pass shared memory between processes, which isn't pleasant here, since there's no parent/child relationship. NSPR has an implementation but it's unused in-tree (and NSPR is large for this small use case). There are a many Rust implementations but most aren't intended for global sharing (i.e., with sem_open
(under POSIX) semantics).
I conclude that the most sensible thing is to wrap POSIX and Windows semaphores directly. It would be nice to share the code but it's a little awkward to have a little C++ RAII class that can also be used from Rust.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 3•4 years ago
|
||
Assignee | ||
Comment 4•4 years ago
|
||
Depends on D95623
Assignee | ||
Comment 5•4 years ago
|
||
Depends on D95626
Assignee | ||
Comment 6•4 years ago
|
||
Depends on D95627
Assignee | ||
Comment 7•4 years ago
|
||
Depends on D95628
Assignee | ||
Comment 8•4 years ago
|
||
Depends on D95629
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 9•4 years ago
|
||
Comment 10•4 years ago
|
||
Comment on attachment 9189061 [details]
Data review request
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
Yes, Molly Howell is responsible.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 2, Interaction.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does there need to be a check-in in the future to determine whether to renew the data?
No. This collection is permanent.
Result: datareview+
Comment 11•4 years ago
|
||
Comment 12•4 years ago
|
||
Backed out for build bustages.
Log: https://treeherder.mozilla.org/logviewer?job_id=323479356&repo=autoland&lineNumber=32308
Backout: https://hg.mozilla.org/integration/autoland/rev/b6232a49dc3921b1f8bb1956e30b312359fdc46b
Assignee | ||
Comment 13•4 years ago
|
||
The failures are on Android in the global semaphore (part 1); the global semaphore is not used on Android in this stack, and it probably never will be used there, so I'm going to try disabling building it. Will reland if that works.
Assignee | ||
Comment 14•4 years ago
|
||
According to https://treeherder.mozilla.org/jobs?repo=try&revision=d4f0ca4009826d0ef1f398ea7b68c54e070da9eb that attempt has succeeded.
Comment 15•4 years ago
|
||
Comment 16•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5040354e75c2
https://hg.mozilla.org/mozilla-central/rev/01d41760db29
https://hg.mozilla.org/mozilla-central/rev/9dcf78cd576f
https://hg.mozilla.org/mozilla-central/rev/71742fced1ba
https://hg.mozilla.org/mozilla-central/rev/40d67c6dfdf3
https://hg.mozilla.org/mozilla-central/rev/78dce99516dd
Comment 17•4 years ago
|
||
This may have caused bug 1680935.
Seeing multiple reports including one on the #nightly:mozilla.org channel.
Comment 18•4 years ago
|
||
Backed out from central for causing bug 1680935: https://hg.mozilla.org/mozilla-central/rev/7167ab3f6e8d7b77439ac1cc7d9356bbaff02510
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 19•4 years ago
|
||
Comment 20•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f5234182f295
https://hg.mozilla.org/mozilla-central/rev/a65058184c20
https://hg.mozilla.org/mozilla-central/rev/b616a1a15f94
https://hg.mozilla.org/mozilla-central/rev/7fe2fcf9dd6b
https://hg.mozilla.org/mozilla-central/rev/cecc2311e348
https://hg.mozilla.org/mozilla-central/rev/54a9f18d1680
Comment 21•4 years ago
|
||
I'm on Manjaro KDE and this fix does not seem to be doing anything.
I have nightly unpacked in home subfolder and I often have my "main" profile running and then start clean copy for testing: /home/rdk/Apps/firefox-nightly/firefox -profile /tmp/nightly
. When I close this tmp profile, go back to "main" and try open new tab, most of the time I get blank page and sometimes "one small thing" message https://i.imgur.com/KfavQIp.png
Updated•4 years ago
|
Comment 22•4 years ago
|
||
Can someone respond? This still does not work for me.
I'm on Manjaro KDE and this fix does not seem to be doing anything.
I have nightly unpacked in home subfolder and I often have my "main" profile running and then start clean copy for testing:
/home/rdk/Apps/firefox-nightly/firefox -profile /tmp/nightly
. When I close this tmp profile, go back to "main" and try open new tab, most of the time I get blank page and sometimes "one small thing" message https://i.imgur.com/KfavQIp.png
Comment 23•4 years ago
|
||
gwarser, what you're describing sounds like bug 1480452, which this will not fix. If Nightly had downloaded an update, then starting a tmp profile will apply that update, please see the other bug for more discussion.
Comment 24•3 years ago
|
||
Can someone tell me what is the difference between this bug and bug 1480452?
I thought fixing this bug was supposed to prevent opening up another profile from applying updates thereby breaking the prior instance.
Comment 25•3 years ago
|
||
(In reply to Caspy7 from comment #24)
Can someone tell me what is the difference between this bug and bug 1480452?
I thought fixing this bug was supposed to prevent opening up another profile from applying updates thereby breaking the prior instance.
It's a partial solution: it should prevent two running Firefox instances from updating out from each other. What it doesn't prevent is a newly launched Firefox applying a pending update with a running Firefox instance. There's a great deal of context, but basically we don't have a good way to message early in startup that we're not updating because of this lock, and that opens the possibility for malicious actors to prevent Firefox updating. That's a risk we're not willing to take. I think this is covered in the discussion of the linked ticket.
Comment 26•3 years ago
|
||
I have spent a fair amount of time looking into this on multiple occasions because I can never seem to remember exactly how it works or why it does what it does. This question prompted me to do this again today. :nalexander beat me to actually answering the question, but I want to leave a record of what this does and why, mostly to make it easier on myself the next time I need to look into this.
Firefox has two related things going on with regard to checking for other instances. The update mutex and nsIUpdateSyncManager
.
The update mutex:
- Has existed for a long time.
- Works similarly to a normal mutex, but is automatically closed if Firefox crashes.
- Is locked when an installation of Firefox tries to update.
- If it cannot be acquired, it means that another instance of Firefox is handling update, so the update system pretty much lets the other instance handle it.
- Currently this exists only on Windows. We may eventually add it to macOS as part of enabling Background Update.
Whereas nsIUpdateSyncManager
:
- Was created more recently (in this bug).
- Is implemented via a read/write lock system. Each Firefox instance takes a read lock. The write lock can only be taken if there are no other instances with read locks. (Note that I only checked the Windows implementation when I wrote this. I seem to recall that we do slightly different things on other OS's, but I believe they work in similar ways)
- A read lock is taken when Firefox starts. This allows
nsIUpdateSyncManager
to be able to say pretty definitively if any other instances are running, not just if any are updating. nsIUpdateSyncManager
reporting another instance running has a couple of effects.- It causes the update interfaces to indicate that update is being handled by another instance. This message isn't quite correct, but the intention is basically correct since if you have multiple instances open, it's not the best time to update.
- If/when we try check for updates in the background, we wait for the other instances to exit. There is a timeout on this process (measured across application restarts). If the timeout expires, we will check for updates anyways.
- If there are still multiple instances running when we go to show the update notification, we don't show the update badge until the doorhanger (popup) notification is shown. When it is shown, we show different message text.
- Technically there is a small race condition in the implementation of
nsIUpdateSyncManager
. We have to release our read lock to try to take a write lock or else taking the write lock will always fail.
IIRC, we originally wanted nsIUpdateSyncManager
to wait for other instances to close between downloading the update and setting the necessary flags that allow it to be installed on startup. But this would have been much more difficult because that would probably have required adding another state to the update state machine. Which would mean that we would probably have to look at basically every time that we do anything with that state and make sure that it still makes sense in the context of the new state machine. So we went with a different implementation that didn't involve having to do that.
Comment 27•2 years ago
|
||
Hi,
in relation with the rationale here explained for not updating when there are several instances running, I need to mention that some users, like me, do systematically run multiple instances of Firefox (one "basic" instance and another one which is launched with -no-remote
).
I have witnessed that each of them says "Firefox is being updated by another instance", which I now understand is an imprecise formulation of an intended behaviour.
However this behaviour also means that Firefox is never going to get updated, ever.
I was not able to find an about:config
setting to tell one if the instances not to bother with updates and let the other manage them. I think you might want to have this. This would at least let one instance check for and even download an update, and prompt the user to close the others to apply the update.
Comment 28•2 years ago
|
||
(In reply to YS1 from comment #27)
However this behaviour also means that Firefox is never going to get updated, ever.
That is inaccurate. As stated in Comment 26:
There is a timeout on this process (measured across application restarts). If the timeout expires, we will check for updates anyways.
I cannot currently find the code that causes that to persist across application restarts. Perhaps I was mistaken about that detail. But once your installation is sufficiently old, you would hit the logic that causes it to check for updates much earlier in startup. Which would mean that either (a) the two instances would run for over 6 hours and then we would check for updates anyways, or (b) you would eventually start up the two instances and in the brief window where only one of them is running, an update should be initiated.
I was not able to find an
about:config
setting to tell one if the instances not to bother with updates and let the other manage them. I think you might want to have this. This would at least let one instance check for and even download an update, and prompt the user to close the others to apply the update.
Long story short, this would be very difficult to implement in a way that is both useful and safe and, frankly, a solution that requires the user properly configure their two separate profiles in order to update is not ideal. We are working on a way around this problem entirely in Bug 1705217 and Bug 1480452. The short version is that we want to allow Firefox continue running properly even if the files on the disk have been updated. This would allow us to eliminate an entire class of update problems and remove the safeguards that avoid having Firefox update when multiple instances are running simultaneously.
In the meantime, you are welcome to adjust app.update.checkOnlyInstance.timeout
to control how long we wait for other instances to close before proceeding with an update anyways. It should be set to an integer value which will be interpreted as a number of milliseconds. It doesn't have a value by default, but it falls back to 6 hours. But I do feel like I should warn you that it is likely that you are going to hit Bug 1480452 if you do this. As I touched on earlier, Firefox currently does like having its files updated while it is running. It becomes unable to start new processes and eventually you will get a message telling you that you need to restart in order for the browser to keep working.
Oh, one additional warning. Until Bug 1787511 is fixed, you should not attempt to set app.update.checkOnlyInstance.timeout
to 0
. But you can set it to 1
if you want.
Comment 29•2 years ago
|
||
Thank you for this very enlightening comment.
However, I added an integer app.update.checkOnlyInstance.timeout
pref with value 1
yesterday to one of my instances, and it still says "Firefox is being updated by another instance" today (after a whole system restart). This is Firefox 104.0 for information.
Comment 30•2 years ago
|
||
Yeah, at the moment the preference basically affects Firefox's internal update mechanism, not the manual update UI. I added a note to Bug 1787511 to fix that in such a way that it would allow the update UI to work in that case. I think it should be a pretty easy fix. I'm just not sure when I will have time to get around to it. Hopefully soon.
Updated•2 years ago
|
Description
•