Closed Bug 1656962 Opened 4 years ago Closed 3 years ago

(Puppeteer) Firefox process is not handled properly on Windows

Categories

(Remote Protocol :: Agent, defect, P3)

Unspecified
Windows
defect
Points:
2

Tracking

(firefox96 fixed)

RESOLVED FIXED
96 Branch
Tracking Status
firefox96 --- fixed

People

(Reporter: impossibus, Assigned: toshi)

References

Details

There are a number of issues with the Puppeteer launcher for Firefox on Windows. Here are two examples.

Problems with start-up in CI: https://github.com/puppeteer/puppeteer/issues/5673
Processes not being cleaned up: https://github.com/puppeteer/puppeteer/issues/6298

I think this is because Puppeteer's process management does not account for the Firefox Windows launcher process. In my investigation so far, adding the --wait-for-process args doesn't seem to fix the CI issue mentioned above, so we might need to dig into this a bit more.

Severity: -- → S3
Alias: [puppeteer-beta-reserve]
Whiteboard: [puppeteer-beta-reserve]
Whiteboard: [puppeteer-beta-reserve] → [puppeteer-beta2-mvp]

I just created a PR on the Puppeteer Github repository to check if the problem is still around; and it is:
https://github.com/puppeteer/puppeteer/pull/6895

I assume (without having had a look at the code yet) that shutting down the browser will always be done via [Browser.close()[https://chromedevtools.github.io/devtools-protocol/tot/Browser/#method-close]. Here is our implementation:

https://searchfox.org/mozilla-central/rev/0379f315c75a2875d716b4f5e1a18bf27188f1e6/remote/cdp/domains/parent/Browser.jsm#31-33

I tried to build a custom Windows64 build with the launcher process disabled, but as it looks like this doesn't work anymore:

https://treeherder.mozilla.org/jobs?repo=try&revision=c566d6fe18e2840bcc722d9d4363a4425010d526&selectedTaskRun=B_660yb4RySf3KGXbj0CCA.0

Aaron, I assume that this a bug, and should get fixed? Or did I do something wrong?

Flags: needinfo?(aklotz)
Whiteboard: [puppeteer-beta2-mvp] → [bidi-m1-mvp]
Points: --- → 2
No longer blocks: 1693763

It's no longer blocking us from getting Windows CI jobs running for Puppeteer, but I would kind like to understand and fix the underlying problem here.

Priority: P2 → P3
No longer depends on: 1693296

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #3)

Aaron, I assume that this a bug, and should get fixed? Or did I do something wrong?

As discussed, this is a bug that should be fixed.

Flags: needinfo?(aklotz)

(In reply to Aaron Klotz [:aklotz] from comment #5)

As discussed, this is a bug that should be fixed.

I filed bug 1697282 for that.

Also as discussed I will get more information first in how processes are handled by Puppeteer before asking Aaron for advice.

(In reply to Maja Frydrychowicz :impossibus (was :maja_zf) (needinfo me) from comment #0)

Problems with start-up in CI: https://github.com/puppeteer/puppeteer/issues/5673

That was fixed by not killing Firefox anymore as workaround, but causing an in-app shutdown.

Processes not being cleaned up: https://github.com/puppeteer/puppeteer/issues/6298

This is still valid, and even with --wait-for-process, which is present for a while as argument in the Puppeteer Launcher.

When following the lifetime of a process the following is done:

  1. Creating a runner and starting it
  2. The process is spawned via childProcess.spawn() (detached is kept as false)
  3. Finally the process gets killed via this.proc.kill('SIGKILL')

I tried to test the Puppeteer code path with a disabled launcher process but due to bug 1697282 it's not possible right now. As such not sure of the launcher process is the problem here, or something else causes Firefox to not completely shutdown.

Aaron, it would be great to get some help here. Maybe you can already see the problem. If not hints/tips for testing various scenarios would be welcome too. Thanks a lot!

Flags: needinfo?(aklotz)

The launcher process does not start the browser under a job object, so killing the launcher process (which is obviously the one whose handle you have) won't have any effect on the browser itself.

I suppose there are a few options here, but it really depends on effort and whether we want to add this functionality to Puppeteer/node or to the launcher process itself.

If we landed a change to the launcher process to use a job object, we could probably make this all Just Work (TM). I would add the caveat that we would only do this for Windows 8 and higher, so if Windows 7 matters, then this isn't an option.

Other options probably involve adding code (and possibly necessitating the addition of new dependent packages) to Puppeteer itself that call various Windows APIs to discover the browser process and terminate it:

  • Puppeteer could use a different API that supports Windows job objects to start Firefox (again, has the same Windows 7 limitation as above);
  • Using toolhelp for enumerating processes, finding the one whose parent is the launcher, and then terminating that;
  • Using FindWindow for resolving a Firefox window, obtaining its pid, and then terminating that (hard to distinguish between multiple instances of Firefox if that is a concern)
Flags: needinfo?(aklotz)
Whiteboard: [bidi-m1-mvp] → [bidi-m2-mvp]
Priority: P3 → --
Priority: -- → P3
Whiteboard: [bidi-m2-mvp]

Moving outside of the milestone because Puppeteer CI jobs are running on Windows, so there is no need to address this immediately.

(In reply to (No longer employed by Mozilla) Aaron Klotz from comment #8)

If we landed a change to the launcher process to use a job object, we could probably make this all Just Work (TM). I would add the caveat that we would only do this for Windows 8 and higher, so if Windows 7 matters, then this isn't an option.

Having a proper solution as implemented for the launcher process sounds way more helpful to me. If we would outsource that to clients which start and have to handle Firefox processes, each of these would need logic to properly find the right Firefox process based on the launcher process.

Having the limitation for Windows 8 and higher sounds perfectly fine.

With Aaron being no longer with us I wonder who from the Firefox/core team actually could help us with that problem. Philip, could you help us in finding a good person? Thanks a lot.

Flags: needinfo?(pluk)

Molly, do you have an idea to whom we would have to talk to for getting the above feature implemented? Thanks!

Flags: needinfo?(mhowell)

Toshi does just about all of our launcher process work nowadays.

Flags: needinfo?(mhowell) → needinfo?(tkikuchi)

Yes, I'm the launcher process guy now. So, what needs to be done is to make sure terminating the launcher process terminates all the firefox processes, correct? Adding the launcher process to a job object when it's launched with "--wait-for-browser" is doable.

Flags: needinfo?(tkikuchi)

Ah great to know and thanks for the feedback! Yes, we already launch the process with --wait-for-browser on Windows. So getting that particular feature added would be a great help for us!

Flags: needinfo?(pluk)
Depends on: 1740619

Toshi, I assume that there is no workaround that we could apply in the meantime to get a correct handling of the Firefox process implemented? Not using --wait-for-browser is possibly not an option. I tried to check how mozrunner and mozprofile handling that but I actually cannot find any code related to that. Also searchfox doesn't give any results for it except in testing code. So out of interest where is that command line flag handled? Thanks!

Flags: needinfo?(tkikuchi)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #15)

Toshi, I assume that there is no workaround that we could apply in the meantime to get a correct handling of the Firefox process implemented? Not using --wait-for-browser is possibly not an option. I tried to check how mozrunner and mozprofile handling that but I actually cannot find any code related to that. Also searchfox doesn't give any results for it except in testing code. So out of interest where is that command line flag handled? Thanks!

The "--wait-for-browser" option is handled here.

Now that bug 1697282 was resolved, did you confirm this scenario worked on a build without the launcher process?

Flags: needinfo?(tkikuchi)

No I did not. But checking the link to where it's handled I'm wondering about the usage of marionette:
https://searchfox.org/mozilla-central/rev/24fac1ad31fb9c6e9c4c767c6a7ff45d226078f3/browser/app/winlauncher/LauncherProcessWin.cpp#96

There is the --remote-debugging-port missing which actually enables the Remote Agent which serves CDP and WebDriver BiDi. Would we have to add it there too? I assume --wait-for-browser isn't taken into account at all. Or am I wrong?

Flags: needinfo?(tkikuchi)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #17)

There is the --remote-debugging-port missing which actually enables the Remote Agent which serves CDP and WebDriver BiDi. Would we have to add it there too? I assume --wait-for-browser isn't taken into account at all. Or am I wrong?

The --wait-for-browser option is basically needed for automation. Since the launcher process does and will do nothing after it finishes launching the browser process, it's usually a good habit to terminate it to free system resources. I think having --remote-debugging-port alone does not require the launcher process alive, but if there is any broken scenario similar to this one, we can always consider updating that condition.

Flags: needinfo?(tkikuchi)

--remote-debugging-port is used for automation similar to all the other entries. Also there are other clients than Puppeteer which might not set --wait-for-browser on its own. As such I would suggest that we really add it to this list.

This is now fixed by bug 1740619. \o/

Assignee: nobody → tkikuchi
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch
You need to log in before you can comment on or make changes to this bug.