Closed Bug 1137388 Opened 10 years ago Closed 10 years ago

Marionette hard-kills the browser for restarts which causes lots of trouble

Categories

(Remote Protocol :: Marionette, defect)

defect
Not set
critical

Tracking

(firefox38 fixed, firefox39 fixed)

RESOLVED FIXED
mozilla39
Tracking Status
firefox38 --- fixed
firefox39 --- fixed

People

(Reporter: whimboo, Assigned: chmanchester)

References

Details

(Keywords: dataloss)

Attachments

(2 files, 2 obsolete files)

As implemented on bug 940954 the restart support for Marionette has a massive disruptive behavior! As Chris and I figured out today it some-what hard-kills the browser, and doesn't let it sanely shutdown. All this is done via runner.stop().

The problems you can face by doing this are massive in terms that lots of things which have to happen on shutdown are NOT executed. We have seen this because preferences we have set in a test were lost after a restart, which never should happen.

Instead of stopping the browser from mozrunner, the shutdown or better restart has to be initiated from within the application itself. Whereby it has to allow Firefox to safely shutdown itself. Only when that is not possible we should issue a hard stop for the process.

Here some links for how Mozmill is handling that at the moment:

Controller's restartApplication() wrapper:
https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/driver/controller.js#L438

Controller's stopApplication() wrapper:
https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/driver/controller.js#L466

Frame's shutdownApplication() method which safely closes the browser and issues a restart by Firefox itself:

https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/frame.js#L48

Python __init__.py restart logic for mozrunner:
https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L549

Allowing Firefox to restart itself should be the way to go here because it will pass over environment variables which are important for the next process. We should not call mozrunner to start the application again.

This bug is kinda blocking us from having working restart tests, including our update tests. We may somewhat be able to workaround that with sleeps and a custom request for shutdown in our tests. But I cannot tell, if this all would work.
Isn't this a duplicate of bug 1099127? I think we're ending up solving the same problem here, except that "inside application triggered" restart means any restart triggered by mozmill because mozmill runs inside the application.
Flags: needinfo?(hskupin)
No, this is not a duplicate. This is part of the problem to solve for user triggered restarts. For that bug we still have to call marionette.restart(). It's just that we need a clean way to restart/shutdown the browser. This is not done at the moment.
Flags: needinfo?(hskupin)
This looks ok locally, let's see how this looks on other platforms:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2aa51e49c91
Attached file MozReview Request: bz://1137388/chmanchester (obsolete) (deleted) β€”
/r/4413 - Bug 1137388 - Add a facility to restart firefox from marionette from within the browser for update tests.

Pull down this commit:

hg pull review -r b004519010946edc4c1375cfb5ba42eccfbfb9e3
(In reply to Chris Manchester [:chmanchester] from comment #3)
> This looks ok locally, let's see how this looks on other platforms:
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2aa51e49c91

Problems on windows. I'll dust off my windows build tomorrow.
I cannot login to review board, so some comments here...

* It's great to see that you were able to get that far with it yesterday!

* In case of a not forced restart keep in mind that Firefox might disallow a shutdown of itself. In such a case we have to ensure to not wait forever but also kill the process after a given amount of time. So not sure if adding force as parameter is necessary.

* For the restart method you missed to document the new parameters. Maybe for the flags it would be good to add a see link to the MDN documentation? Or at least list the possible values, or create constants in Marionette itself.

* The same logic as you have here we may also need for a normal close of the application. Is Marionette also using mozrunner to hardstop it? There are tests which would also need a clean shutdown of the application. This is most likely a follow-up bug, but wanted to bring this up.
Assignee: nobody → cmanchester
Status: NEW → ASSIGNED
(In reply to Chris Manchester [:chmanchester] from comment #5)
> (In reply to Chris Manchester [:chmanchester] from comment #3)
> > This looks ok locally, let's see how this looks on other platforms:
> > 
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2aa51e49c91
> 
> Problems on windows. I'll dust off my windows build tomorrow.

This looks ok except for win64 (I tried because my local build work fine): https://treeherder.mozilla.org/#/jobs?repo=try&revision=a77e2eb87993

Maybe the browser process isn't re-used after a restart on win64 like it is for other platforms.
(In reply to Henrik Skupin (:whimboo) from comment #6)
> I cannot login to review board, so some comments here...
> 
> * It's great to see that you were able to get that far with it yesterday!
> 
> * In case of a not forced restart keep in mind that Firefox might disallow a
> shutdown of itself. In such a case we have to ensure to not wait forever but
> also kill the process after a given amount of time. So not sure if adding
> force as parameter is necessary.
The implementation of quit seems pretty definitive as long as eForceQuit is set (which should be included here for parity with mozmill but I missed). When is this an issue?
> 
> * For the restart method you missed to document the new parameters. Maybe
> for the flags it would be good to add a see link to the MDN documentation?
> Or at least list the possible values, or create constants in Marionette
> itself.
> 
> * The same logic as you have here we may also need for a normal close of the
> application. Is Marionette also using mozrunner to hardstop it? There are
> tests which would also need a clean shutdown of the application. This is
> most likely a follow-up bug, but wanted to bring this up.
Comment on attachment 8570254 [details]
MozReview Request: bz://1137388/chmanchester

/r/4413 - Bug 1137388 - Add a facility to restart firefox from marionette from within the browser for update tests.

Pull down this commit:

hg pull review -r 5e4deeea5a198134a0e49a7a926badcdb333913f
(In reply to Chris Manchester [:chmanchester] from comment #8)
> (In reply to Henrik Skupin (:whimboo) from comment #6)
> > I cannot login to review board, so some comments here...
> > 
> > * It's great to see that you were able to get that far with it yesterday!
> > 
> > * In case of a not forced restart keep in mind that Firefox might disallow a
> > shutdown of itself. In such a case we have to ensure to not wait forever but
> > also kill the process after a given amount of time. So not sure if adding
> > force as parameter is necessary.
> The implementation of quit seems pretty definitive as long as eForceQuit is
> set (which should be included here for parity with mozmill but I missed).
> When is this an issue?
> > 
> > * For the restart method you missed to document the new parameters. Maybe
> > for the flags it would be good to add a see link to the MDN documentation?
> > Or at least list the possible values, or create constants in Marionette
> > itself.
I noticed we're already using setting enough flags by default that this isn't useful, so I removed the parameter.
> > 
> > * The same logic as you have here we may also need for a normal close of the
> > application. Is Marionette also using mozrunner to hardstop it? There are
> > tests which would also need a clean shutdown of the application. This is
> > most likely a follow-up bug, but wanted to bring this up.
mozrunner probably stops the browser at the end of the run.

My next question is whether getting this working on win64 should block other work on update tests.
Attachment #8570254 - Attachment is obsolete: true
(In reply to Chris Manchester [:chmanchester] from comment #8)
> > * In case of a not forced restart keep in mind that Firefox might disallow a
> > shutdown of itself. In such a case we have to ensure to not wait forever but
> > also kill the process after a given amount of time. So not sure if adding
> > force as parameter is necessary.
> The implementation of quit seems pretty definitive as long as eForceQuit is
> set (which should be included here for parity with mozmill but I missed).
> When is this an issue?

Usually code in Firefox has to request a shutdown first via the "quit-application-requested" observer notification. That allows listeners for this topic to run prepare for shutdown. Whereby each listener can reject the request via the return value.

(In reply to Chris Manchester [:chmanchester] from comment #10)
> > > * For the restart method you missed to document the new parameters. Maybe
> > > for the flags it would be good to add a see link to the MDN documentation?
> > > Or at least list the possible values, or create constants in Marionette
> > > itself.
> I noticed we're already using setting enough flags by default that this
> isn't useful, so I removed the parameter.

IMHO it would be better to call this method shutdownApplication similar to the implementation in Mozmill and allow to set a flag if the application needs to be restarted. This would help to have only a single shutdown method compared to the case below.

> > > * The same logic as you have here we may also need for a normal close of the
> > > application. Is Marionette also using mozrunner to hardstop it? There are
> > > tests which would also need a clean shutdown of the application. This is
> > > most likely a follow-up bug, but wanted to bring this up.
> mozrunner probably stops the browser at the end of the run.

That would be another problem we should fix. Mozrunner should only stop an application if it cannot quit itself.

> My next question is whether getting this working on win64 should block other
> work on update tests.

Our existing Mozmill tests are working perfect with Firefox 32/64bit on Win64. So I think it's just a missing piece in the implementation here.
(In reply to Henrik Skupin (:whimboo) from comment #11)
> (In reply to Chris Manchester [:chmanchester] from comment #8)
> > > * In case of a not forced restart keep in mind that Firefox might disallow a
> > > shutdown of itself. In such a case we have to ensure to not wait forever but
> > > also kill the process after a given amount of time. So not sure if adding
> > > force as parameter is necessary.
> > The implementation of quit seems pretty definitive as long as eForceQuit is
> > set (which should be included here for parity with mozmill but I missed).
> > When is this an issue?
> 
> Usually code in Firefox has to request a shutdown first via the
> "quit-application-requested" observer notification. That allows listeners
> for this topic to run prepare for shutdown. Whereby each listener can reject
> the request via the return value.

The quit is always granted by the time we call nsIAppStartup.quit. There's a case the quit is aborted when new windows are opened while quitting, but that's not honored when eForceQuit is provided.

> 
> (In reply to Chris Manchester [:chmanchester] from comment #10)
> > > > * For the restart method you missed to document the new parameters. Maybe
> > > > for the flags it would be good to add a see link to the MDN documentation?
> > > > Or at least list the possible values, or create constants in Marionette
> > > > itself.
> > I noticed we're already using setting enough flags by default that this
> > isn't useful, so I removed the parameter.
> 
> IMHO it would be better to call this method shutdownApplication similar to
> the implementation in Mozmill and allow to set a flag if the application
> needs to be restarted. This would help to have only a single shutdown method
> compared to the case below.

restarts are the only case I'm targeting here.

> 
> > > > * The same logic as you have here we may also need for a normal close of the
> > > > application. Is Marionette also using mozrunner to hardstop it? There are
> > > > tests which would also need a clean shutdown of the application. This is
> > > > most likely a follow-up bug, but wanted to bring this up.
> > mozrunner probably stops the browser at the end of the run.
> 
> That would be another problem we should fix. Mozrunner should only stop an
> application if it cannot quit itself.

Doesn't sound like a blocker, but file a bug.

> 
> > My next question is whether getting this working on win64 should block other
> > work on update tests.
> 
> Our existing Mozmill tests are working perfect with Firefox 32/64bit on
> Win64. So I think it's just a missing piece in the implementation here.

If it's a requirement, it's a requirement. It's significantly more challenging because it looks like we need to thread thread everything through to a new process. I'll need to figure out how to get a local build to work with.
19:06:18     INFO -  ProcessManager UNABLE to use job objects to manage child processes
19:06:18     INFO -  ProcessManager NOT managing child processes

... from the broken win64 run looks relevant. Now I'm thinking this is a mozprocess problem or something else altogether.
(In reply to Chris Manchester [:chmanchester] from comment #13)
> 19:06:18     INFO -  ProcessManager UNABLE to use job objects to manage
> child processes
> 19:06:18     INFO -  ProcessManager NOT managing child processes
> 
> ... from the broken win64 run looks relevant. Now I'm thinking this is a
> mozprocess problem or something else altogether.

I did a win64 build locally and everything works as expected (the warning isn't printed, either). I pushed to try with mozprocess debug output turned on to see it that yields anything interesting. I guess the next step is to request a loaner and try to debug from there. I don't know how far I'll get with that, but I'll give it a shot.
After some more try debugging and reading through the mozprocess code and windows APIs, the issue appears to be that JOB_OBJECT_LIMIT_BREAKAWAY_OK isn't set on the job object that creates the browser process on win64 (also our only win 8 jobs), meaning we don't use job objects to manage child processes, so when we restart, the child process that becomes the new main firefox process can't be killed by mozprocess.
(In reply to Chris Manchester [:chmanchester] from comment #12)
> > Usually code in Firefox has to request a shutdown first via the
> > "quit-application-requested" observer notification. That allows listeners
> > for this topic to run prepare for shutdown. Whereby each listener can reject
> > the request via the return value.
> 
> The quit is always granted by the time we call nsIAppStartup.quit. There's a
> case the quit is aborted when new windows are opened while quitting, but
> that's not honored when eForceQuit is provided.

The problem you are hitting when directly requesting a shutdown with the aForceQuit flag set is, that there can be unsafed data for windows which is not getting saved and it results in dataloss. Good example here should be sessionrestore, which has a small delay before it saves new data. Here a forced restart would cause that e.g. a tab you opened right before quit will not exist anymore after a restart. So eAttemptQuit should be the correct flag for us, and we should only fallback to eForceQuit, if the first try to shutdown fails.

https://developer.mozilla.org/en-US/docs/How_to_Quit_a_XUL_Application

> > IMHO it would be better to call this method shutdownApplication similar to
> > the implementation in Mozmill and allow to set a flag if the application
> > needs to be restarted. This would help to have only a single shutdown method
> > compared to the case below.
> 
> restarts are the only case I'm targeting here.

Well, for shutdown it's exactly the same method just without the aRestart flag set. But I can file a bug for it for sure.

> If it's a requirement, it's a requirement. It's significantly more

We have to run Firefox on Windows 32 and 64 systems. That's a requirement from QA. The same applies to the upcoming Firefox64 builds.
(In reply to Henrik Skupin (:whimboo) from comment #16)
> (In reply to Chris Manchester [:chmanchester] from comment #12)
> > > Usually code in Firefox has to request a shutdown first via the
> > > "quit-application-requested" observer notification. That allows listeners
> > > for this topic to run prepare for shutdown. Whereby each listener can reject
> > > the request via the return value.
> > 
> > The quit is always granted by the time we call nsIAppStartup.quit. There's a
> > case the quit is aborted when new windows are opened while quitting, but
> > that's not honored when eForceQuit is provided.
> 
> The problem you are hitting when directly requesting a shutdown with the
> aForceQuit flag set is, that there can be unsafed data for windows which is
> not getting saved and it results in dataloss. Good example here should be
> sessionrestore, which has a small delay before it saves new data. Here a
> forced restart would cause that e.g. a tab you opened right before quit will
> not exist anymore after a restart. So eAttemptQuit should be the correct
> flag for us, and we should only fallback to eForceQuit, if the first try to
> shutdown fails.
> 
> https://developer.mozilla.org/en-US/docs/How_to_Quit_a_XUL_Application
The mozmill update tests only use controller.js' restartApplication, which doesn't specify a user shutdown, and therefore goes through frame.js' shutdownApplication, which always specifies eForceQuit. This seems like the realm of bug 1099127.
> 
> > > IMHO it would be better to call this method shutdownApplication similar to
> > > the implementation in Mozmill and allow to set a flag if the application
> > > needs to be restarted. This would help to have only a single shutdown method
> > > compared to the case below.
> > 
> > restarts are the only case I'm targeting here.
> 
> Well, for shutdown it's exactly the same method just without the aRestart
> flag set. But I can file a bug for it for sure.
The client would need to handle things differently, too, but I guess we can expose the parameter so the server won't need to be updated later.
> 
> > If it's a requirement, it's a requirement. It's significantly more
> 
> We have to run Firefox on Windows 32 and 64 systems. That's a requirement
> from QA. The same applies to the upcoming Firefox64 builds.
As you can see from this bug I'm working on it.
(In reply to Chris Manchester [:chmanchester] from comment #15)
> After some more try debugging and reading through the mozprocess code and
> windows APIs, the issue appears to be that JOB_OBJECT_LIMIT_BREAKAWAY_OK
> isn't set on the job object that creates the browser process on win64 (also
> our only win 8 jobs), meaning we don't use job objects to manage child
> processes, so when we restart, the child process that becomes the new main
> firefox process can't be killed by mozprocess.

According to msdn, windows 8 we can create nested job objects, so breaking away from a job isn't required to manage a process and its children in a sub-job (mozprocess assumes we need to breakaway to achieve this, which doesn't appear possible on our Windows 8 slaves). I replicated the failure and tried a mozharness patch locally that works for this. I'll run it through try as soon as it opens.
Results from try with mozprocess patch looking good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=967dcdfea267

Will land in bug 1139722.
Attached file MozReview Request: bz://1137388/chmanchester (obsolete) (deleted) β€”
/r/4933 - Bug 1137388 - Add a facility to restart firefox from marionette from within the browser for update tests.

Pull down this commit:

hg pull review -r 76d5a3582f390569746acbe1a8449a850e24033f
Comment on attachment 8573666 [details]
MozReview Request: bz://1137388/chmanchester

/r/4933 - Bug 1137388 - Add a facility to restart firefox from marionette from within the browser for update tests.

Pull down this commit:

hg pull review -r 76d5a3582f390569746acbe1a8449a850e24033f
Attachment #8573666 - Flags: review?(jgriffin)
Attachment #8573666 - Flags: review?(dburns)
Comment on attachment 8573666 [details]
MozReview Request: bz://1137388/chmanchester

https://reviewboard.mozilla.org/r/4931/#review3935

Ship It!
Attachment #8573666 - Flags: review?(dburns) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/56083b5a4473
Attachment #8573666 - Flags: review?(jgriffin)
https://hg.mozilla.org/mozilla-central/rev/56083b5a4473
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Depends on: 1130220
No longer depends on: 1130220
We kinda would like to have this support for Firefox 38 to be able to run our Firefox UI tests. Would it be possible to backport this patch?
Flags: needinfo?(dburns)
Flags: needinfo?(cmanchester)
(In reply to Henrik Skupin (:whimboo) from comment #25)
> We kinda would like to have this support for Firefox 38 to be able to run
> our Firefox UI tests. Would it be possible to backport this patch?

If we do, we definitely need to pick up its dependents (transitively).
Flags: needinfo?(cmanchester)
The only other patch we have to backport is for bug 1139722. The general restart logic already landed for Firefox 37.
https://hg.mozilla.org/releases/mozilla-aurora/rev/81306a6804e8
Flags: needinfo?(dburns)
Attachment #8573666 - Attachment is obsolete: true
Attachment #8619605 - Flags: review+
Attachment #8619606 - Flags: review+
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: