Closed Bug 776947 Opened 12 years ago Closed 10 years ago

ignore running processes in talos when you are in --develop mode

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: ankit.goyal90, Mentored)

References

Details

(Whiteboard: [good next bug][lang=python])

Attachments

(2 files, 4 obsolete files)

allow for solo_process mode to be toggled with --develop (1.0) 12 years ago Joel Maher ( :jmaher ) (UTC -8) (deleted), patch		Details \| Diff \| Splinter Review
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher 10 years ago Ankit Goyal (deleted), patch	ankit.goyal90 : feedback-	Details \| Diff \| Splinter Review
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher 10 years ago Ankit Goyal (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher 10 years ago Ankit Goyal (deleted), patch		Details \| Diff \| Splinter Review
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher 10 years ago Ankit Goyal (deleted), patch	jmaher : review-	Details \| Diff \| Splinter Review
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher 10 years ago Ankit Goyal (deleted), patch	jmaher : review+	Details \| Diff \| Splinter Review

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Description

•

12 years ago

this frustrates everybody who runs talos.  There is no reason that we need to have no other processes running that match our process name (i.e. firefox).

If we are running in --develop mode, this means we are working on a local system and trying to add a test or debug a test.  We really don't care about reproducible numbers at this stage (that is what try server is for).

Jeff Hammel

Comment 1

•

12 years ago

This *should* be pretty easy to ignore.  That said, I have noticed in semi-accidental experimentation that if one does have a browser running, Talos often becomes very confused and can do horrible things like deleting your session history of the browser instance that it did not start.  So while ignoring running processes is pretty easy, I'm guessing there's other process management stuff in talos that should probably be fixed along with this (or we could just move to mozprocess/mozrunner)

Jonathan Watt [:jwatt]

Comment 2

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #1)
> Talos often becomes very confused and can do horrible things like deleting
> your session history of the browser instance that it did not start.

!!! Yikes!

/me reverts his local change to talos/ttest.py and decides to close his main Firefox after all

Jeff Hammel

Comment 3

•

12 years ago

Heh, well I wasn't meaning to sound alarmist, though I have seen this (and haven't had time to track down what exactly is happening).  FWIW, I haven't seen it do anything to the (non-talos) profile other than killing the history and tabs of the active session.

William Lachance (:wlach)

Comment 4

•

12 years ago

(In reply to Joel Maher (:jmaher) from comment #0)
> this frustrates everybody who runs talos.  There is no reason that we need
> to have no other processes running that match our process name (i.e.
> firefox).
> 
> If we are running in --develop mode, this means we are working on a local
> system and trying to add a test or debug a test.  We really don't care about
> reproducible numbers at this stage (that is what try server is for).

Also note that on Android only one instance of a given application may be running at a particular time, so we'll still want to check for that condition on that particular platform (so we can give an informative error).

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 5

•

12 years ago

Attached patch allow for solo_process mode to be toggled with --develop (1.0) (deleted) — Details — Splinter Review

There are a lot of little changes to make this work.  This patch takes care of the remote scenario where we still have the solo process requirement.

Assignee: nobody → jmaher

Status: NEW → ASSIGNED

Attachment #645349 - Flags: review?(jhammel)

Jeff Hammel

Comment 6

•

12 years ago

Comment on attachment 645349 [details] [diff] [review]
allow for solo_process mode to be toggled with --develop (1.0)

I'm not a big fan of passing around solo_process everywhere.  I'd prefer to start getting talos on mozprocess before this bug as so much of this is trying to patch fairly messy code.  We have to change several places: it would be nicer to change one.

+        if not solo_process:
+            extra_args += " -no-remote "
+

Is there a reason we shouldn't add this flag in all cases?

         processes = self._ffprocess.checkAllProcesses(browser_config['process'], browser_config['child_process'])
-        if processes:
+        if processes and browser_config['solo_process']:

Should we even bother checking for processes in this case?

-                if (i > 0) and self._ffprocess.checkAllProcesses(browser_config['process'], browser_config['child_process']):
+                if (i > 0) and \
+                   browser_config['solo_process'] and \
+                   self._ffprocess.checkAllProcesses(browser_config['process'], browser_config['child_process']):
                     raise talosError("previous cycle still running")

So this will lose detection of the previous cycle running in --develop mode.  IMHO, not a great idea.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 7

•

12 years ago

yeah, we lose a lot when we ignore the process state and this patch is much uglier than I thought it would be.

for the -no-remote, the goal of talos is to run as close to what an end user will run as.  I don't know what -no-remote does internally, but no end user runs that way.

I am fine shelving this until mozprocess happens, I know we keep putting that off, but we could realistically do that this quarter as a lot of the hurdles have been tackled.

Jeff Hammel

Comment 8

•

12 years ago

Yeah, I think with the current state of talos, mozprocess should be....well, not easy but unblocked as much as it can be.  We just need to find time to do it.  We should probably make it an unofficial goal to get this added this quarter and, if sept comes and this isn't started, assess what time we have and throw some weight at it

Jeff Hammel

Comment 9

•

12 years ago

Comment on attachment 645349 [details] [diff] [review]
allow for solo_process mode to be toggled with --develop (1.0)

taking this down for review for now.  please let me know if i should reconsider it

Attachment #645349 - Flags: review?(jhammel)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 11

•

10 years ago

this bug can be a reality now that we use mozprocess to manage our browser.  I don't believe the original patch is that useful other than as a way to see what crazy things were done back in the old days.

Solving this would be a good win for developers, I would consider this a good_third_bug or something, it is a bit more complex than many would like.

Dan Minor [:dminor]

Comment 12

•

10 years ago

I'll take a stab at this one to learn a bit more about talos.

Assignee: jmaher → dminor

Dan Minor [:dminor]

Comment 13

•

10 years ago

It's been three months and I haven't had a chance to look at this, so I'm going to unassign myself. Maybe we should make this a mentored "good next bug"?

Assignee: dminor → nobody

Status: ASSIGNED → NEW

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 14

•

10 years ago

this bug could use a better description, if you are picking this up to work on please ask questions

Blocks: 1088251

Mentor: jmaher, dminor

Whiteboard: [good next bug][lang=python]

Ankit Goyal

Assignee

Comment 15

•

10 years ago

I am getting idea here that when we use --develop, we have to ignore the previously running instances of firefox, there should be no interaction with these instances. Is that all or there is more?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 16

•

10 years ago

Ankit, correct.  We want to monitor our current process that we launch.  We shouldn't care about any previous existing browsers or even new ones launched by the user.

When we switched to mozprocess to launch the process this has made it possible to get the process id and just manage that instance of the process.

Ankit Goyal

Assignee

Comment 17

•

10 years ago

All instances of firefox(whether new tab or window) are running under a single process. I checked PID both on Windows and Linux. I think firefox is forking internally. So, I am stuck here!

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 18

•

10 years ago

This bug is supporting the feature when you have your regular instance of Firefox running, then you run talos in addition to it.

Talos needs to launch firefox with the "--no-remote" cli parameter so that it creates a separate instance.  I did just verify that launching the same instance of firefox-trunk, one normal and the other with --no-remote yields two unique firefox-trunk processes.

Let me know if this information helps you move forward.

Ankit Goyal

Assignee

Comment 19

•

10 years ago

Thanx, that is helpful.

Ankit Goyal

Assignee

Comment 20

•

10 years ago

Attached patch Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher (obsolete) (deleted) — Details — Splinter Review

Attachment #8542845 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 21

•

10 years ago

Comment on attachment 8542845 [details] [diff] [review]
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher

Review of attachment 8542845 [details] [diff] [review]:
-----------------------------------------------------------------

overall, I think we can make ffprocess.py a much smoother bit of code.  First off we don't need to preserve the old functions, lets get rid of duplicated code.

Second, now that we have pids, we don't need to get all processes, etc. I believe we can just have two functions:
checkProcesses()
cleanupProcesses()

of course these might call internal functions, but those are the two entry points we care about.

I believe you have done the hard work, now it is just making it clean.

::: talos/ffprocess.py
@@ +84,5 @@
> +    def TerminateProcesses(self, timeout, pids, *process_names):
> +        """Helper function to terminate all processes with the given process name and PID
> +
> +        Args:
> +            process_names: String or strings containing the process name, i.e. "firefox"

I would rather edit TerminateAllProcesses instead of duplicating the function.

@@ +91,5 @@
> +        for process_name in process_names:
> +            try:
> +                process_pids = mozpid.get_pids(process_name)
> +                pids_to_terminate = [process_pid for process_pid in process_pids if process_pid in pids]
> +                for p_id in pids_to_terminate:

we just need-
for p in pids:
  terminateprocess()

why do the entire mozpid.get_pids?

::: talos/run_tests.py
@@ +140,4 @@
>  
>      # set browser_config
>      browser_config=configurator.browser_config()
> +    

nit: extra whitespace here

::: talos/ttest.py
@@ +127,2 @@
>                                           browser_config['child_process'],
> +                                         browser_config['browser_wait'], 

nit: you have trailing whitespace here

Attachment #8542845 - Flags: review?(jmaher) → review-

Ankit Goyal

Assignee

Comment 22

•

10 years ago

Yeah, Joel. I was waiting for your approval of my approach. I already saw all these issues.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 23

•

10 years ago

awesome.  If you want to gain consensus on an approach, you can ask for feedback instead of a review.  A neat feature in bugzilla to avoid getting the dreaded r-.

Looking forward to your patch!

Ankit Goyal

Assignee

Comment 24

•

10 years ago

Attached patch Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher (obsolete) (deleted) — Details — Splinter Review

Attachment #8542845 - Attachment is obsolete: true

Attachment #8543132 - Flags: review?(jmaher)

Ankit Goyal

Assignee

Updated

•

10 years ago

Attachment #8542845 - Flags: review- → feedback-

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 25

•

10 years ago

Comment on attachment 8543132 [details] [diff] [review]
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher

Review of attachment 8543132 [details] [diff] [review]:
-----------------------------------------------------------------

a few small details here.  One question I do have is how do we handle the child_process (i.e. plugin-container?)  Right now we return a single pid from TalosProcess, but in reality firefox has the main process and the plugin-container process.  I do realize that I have overlooked this when discussing this bug originally and giving you feedback on your first patch.

Quite possibly could you comment on how you thing we could solve the plugin-container solution and test if we really do clean that up or not.  Once we resolve the child process quesiton, this patch has good code otherwise!

::: talos/ffprocess.py
@@ +28,5 @@
> +            ret = self._TerminateProcess(pid, timeout)
> +            if ret:
> +                results.append("(%s): %s" % (pid, ret))
> +            else:
> +                pids.remove(pid)

why do you remove pids here?

@@ +33,5 @@
>          return ",".join(results)
>  
> +    def checkProcesses(self, pids):
> +        """Returns a list of browser related PIDs still running
> +        

nit: blank line with whitespace

@@ +40,5 @@
> +        Returns:
> +            A list containing PIDs which are still running
> +        """
> +        return [pid for pid in pids if utils.is_running(pid)]
> +        

nit: blank line with whitespace

::: talos/run_tests.py
@@ +145,5 @@
> +    # set 'remote' in browser_config to False, if --develop is specified to PerfConfigurator.py
> +    if browser_config['develop']:
> +        browser_config['extra_args'] = '--no-remote'
> +        browser_config['remote'] = False
> + 

nit: you have a blank line with whitespace, both on lines 143 and 149

::: talos/ttest.py
@@ +469,3 @@
>                          cleanup = TalosProcess.TalosProcess(['python'] + test_config['cleanup'].split(), env=os.environ.copy())
>                          cleanup.run()
> +                        self._pids.append(cleanup.pid)

I don't believe we need a pid here, in this case we are running an external script which isn't the browser.

Attachment #8543132 - Flags: review?(jmaher) → review-

Ankit Goyal

Assignee

Comment 26

•

10 years ago

I believe `plugin-container` is child process of firefox, so it will be terminated when we terminate firefox.

> +            ret = self._TerminateProcess(pid, timeout)
> +            if ret:
> +                results.append("(%s): %s" % (pid, ret))
> +            else:
> +                pids.remove(pid)

I am removing PIDs here, because if we don't get any result from _TerminateProcess(), that PID is not running. It is unnecessary to keep in PIDs list and iterate over and over during cleanup.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 27

•

10 years ago

in this case we make a deep copy of pids for this loop, I assume this modifies the self._pids variable that we store the list of pids in?

Ankit Goyal

Assignee

Comment 28

•

10 years ago

I already made deepcopy. Look at the loop.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 29

•

10 years ago

got it, lets just figure out the child process and we should be good.

Ankit Goyal

Assignee

Comment 30

•

10 years ago

Attached patch Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher (obsolete) (deleted) — Details — Splinter Review

Wrote a hack to get child PIDs.
Also filed a bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1117651

Attachment #8543132 - Attachment is obsolete: true

Attachment #8543803 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 31

•

10 years ago

Ankit, wow, this is cool.  I got curious how we deal with this in our other harnesses, and it appears that mozprocess (which TalosProcess is subclasses from) does handle child processes.  Take a look here:
http://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozprocess/mozprocess/processhandler.py#115

In our mochitest harness, we use mozprocess to launch a server and here is logic to kill it:
http://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#371


For the main firefox process, we do something like this:
http://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#1550

which then calls into mozrunner (we use .start() and .wait() methods):
http://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/mozrunner/base/runner.py#75

Those reference a processhandler which are part of mozprocess (as referenced in the first link), these effectively call wait() which if you drill down enough you can see that wait() handles child processes in the os specific _wait() definitions.

Long story short, I might have misled you on a patch of wasted effort whereas your first patch would have been more than enough.

Do let me know your thoughts on this.  If you have other concerns lets sort it out, otherwise we should take your first patch and file a cleanup bug to remove the references to child_process in the talos code base :)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

10 years ago

Attachment #8543803 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 32

•

10 years ago

Comment on attachment 8543132 [details] [diff] [review]
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher

Review of attachment 8543132 [details] [diff] [review]:
-----------------------------------------------------------------

can you address the nits in this patch.

::: talos/ffprocess.py
@@ +28,5 @@
> +            ret = self._TerminateProcess(pid, timeout)
> +            if ret:
> +                results.append("(%s): %s" % (pid, ret))
> +            else:
> +                pids.remove(pid)

why do you remove pids here?

@@ +33,5 @@
>          return ",".join(results)
>  
> +    def checkProcesses(self, pids):
> +        """Returns a list of browser related PIDs still running
> +        

nit: blank line with whitespace

@@ +40,5 @@
> +        Returns:
> +            A list containing PIDs which are still running
> +        """
> +        return [pid for pid in pids if utils.is_running(pid)]
> +        

nit: blank line with whitespace

::: talos/run_tests.py
@@ +145,5 @@
> +    # set 'remote' in browser_config to False, if --develop is specified to PerfConfigurator.py
> +    if browser_config['develop']:
> +        browser_config['extra_args'] = '--no-remote'
> +        browser_config['remote'] = False
> + 

nit: you have a blank line with whitespace, both on lines 143 and 149

::: talos/ttest.py
@@ +469,3 @@
>                          cleanup = TalosProcess.TalosProcess(['python'] + test_config['cleanup'].split(), env=os.environ.copy())
>                          cleanup.run()
> +                        self._pids.append(cleanup.pid)

I don't believe we need a pid here, in this case we are running an external script which isn't the browser.

Attachment #8543132 - Flags: review- → review+

Ankit Goyal

Assignee

Comment 33

•

10 years ago

Attached patch Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher (obsolete) (deleted) — Details — Splinter Review

Attachment #8543803 - Attachment is obsolete: true

Attachment #8544367 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 34

•

10 years ago

Comment on attachment 8544367 [details] [diff] [review]
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher

Review of attachment 8544367 [details] [diff] [review]:
-----------------------------------------------------------------

overall this code is r+ quality, thanks for addressing the nits.  I have ran this in production and on windows we are having some issues:

try push that runs this code:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1575a927a71

specific example:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmaher@mozilla.com-d1575a927a71/try-win32/try_win7-ix_test-chromez-bm119-tests1-windows-build56.txt.gz (search for 'Failed tresize')

It appears that it is failing on the 'ts' style test where we launch the browser to measure raw startup times.  Lets see if there is a timing issue with fast startup/shutdown times, or maybe something windows related.  

Marking as r- until we get that sorted out, quite possibly it should be an easy fix!

Attachment #8544367 - Flags: review?(jmaher) → review-

Ankit Goyal

Assignee

Comment 35

•

10 years ago

Apparently, mozprocess requires some upgrade.
In talos/talos/utils.py, function is_running() calls >>  
In https://github.com/mozilla/gecko-dev/blob/master/testing/mozbase/mozprocess/mozprocess/pid.py, function ps() calls >> 
subprocess.Popen(['ps', arg], stdout=subprocess.PIPE)

Now, `ps` is not available on windows.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 36

•

10 years ago

on windows, we run in a unix style shell which allows for ps to be available.  If I get some time today or tomorrow, I will run some tests on a slave and see if I can see what is the root cause.

I am concerned with a couple things:
1) why this seems to happen on the startup only tests
2) why this happens at the end instead of in the middle of the test
3) what type of wait/timeout are we doing
4) if there are any retries we are doing 

looking into those questions might uncover an answer :)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 37

•

10 years ago

one interesting thing here is that the pid list keeps growing, could it be that this works except that the pids get reused over time?

Ankit Goyal

Assignee

Comment 38

•

10 years ago

Attached patch Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher (deleted) — Details — Splinter Review

Apparently, _TerminateProcess() in ffprocess_win32.py, is returning a value in case of everything.
It was not consistent with ffprocess_linux.py and ffprocess_mac.py
I updated the patch.
I hope this works. We have done a lot of work on this so far.

Attachment #8544367 - Attachment is obsolete: true

Attachment #8545702 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 39

•

10 years ago

Comment on attachment 8545702 [details] [diff] [review]
Bug 776947 - ignore running processes in talos when you are in --develop mode . r=jmaher

Review of attachment 8545702 [details] [diff] [review]:
-----------------------------------------------------------------

This seems to solve the problem!  Thanks for finding and fixing this:)

Attachment #8545702 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 40

•

10 years ago

https://hg.mozilla.org/build/talos/rev/f15ed512ff2c

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

10 years ago

Assignee: nobody → ankit.goyal90

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.