Closed Bug 1236770 Opened 9 years ago Closed 8 years ago

Fix the increase in frequency of psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') failures in talos-other on Windows 7 PGO caused by bug 1303096

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(firefox-esr45 unaffected, firefox51 unaffected, firefox52 unaffected, firefox-esr52 unaffected, firefox53 fixed, firefox54 fixed)

RESOLVED FIXED
mozilla54
Tracking Status
firefox-esr45 --- unaffected
firefox51 --- unaffected
firefox52 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: philor, Assigned: blassey)

References

Details

(Whiteboard: [stockwell fixed])

Attachments

(1 file)

Summary: Intermittent TEST-UNEXPECTED-ERROR | damp | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe')
Even though it won't do any good, I guess we might as well have the name of the most frequent suite in the summary.
Summary: Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe')
Summary: Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe')
Summary: Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr,tresize | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe')
this bug had a recent uptick and change of pattern, On Feb 4, we started seeing win7-pgo talos-other-e10s failures- this is new, so I did some retriggers: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=win%20pgo%20talos%20other%20e10s&tochange=e58878766438f80b01d3e3cb9f48aaab373b2923&fromchange=ef2f2b1d477388a54be99288cf0fb3e0490f44a0&selectedJob=74334567 ni me for following up on the retriggers.
Flags: needinfo?(jmaher)
oh, getting closer, down to 3 pushes: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=win%20pgo%20talos%20other%20e10s&tochange=e58878766438f80b01d3e3cb9f48aaab373b2923&fromchange=3e555770a90a41e04bbb4ac41b65fa2f1db6977d since this is PGO, I need to backfill PGO builds and tests, this will take another 6+ hours to get the data I need. Luckily this pattern is very clear, 100% green vs 70% green :)
in bug 1303096 the changes caused very frequent (30%) failures in windows pgo talos tests (as seen my some retriggers): https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=win%20pgo%20talos%20other%20e10s&tochange=e58878766438f80b01d3e3cb9f48aaab373b2923&fromchange=3e555770a90a41e04bbb4ac41b65fa2f1db6977d&selectedJob=75592879 :blassey, I see you are the patch author here- could you fix this error or back out your patch sometime in the next week or two?
Depends on: 1303096
Flags: needinfo?(jmaher) → needinfo?(blassey.bugs)
Attached patch talos_term.patch (deleted) — Splinter Review
This looks like a race between the is_running() check and the call to terminate() https://treeherder.mozilla.org/#/jobs?repo=try&revision=ddb09114c08a5b1987c0191ea05717a8cbf33618
Assignee: nobody → blassey.bugs
Flags: needinfo?(blassey.bugs)
Attachment #8835822 - Flags: review?(jmaher)
Comment on attachment 8835822 [details] [diff] [review] talos_term.patch Review of attachment 8835822 [details] [diff] [review]: ----------------------------------------------------------------- I am fine giving this a try. I am not sure if the try push is using pgo builds for the tests, we have a confusing story on try for pgo. Overall this looks like we are catching the right exception and doing what should be right for cleaning this up.
Attachment #8835822 - Flags: review?(jmaher) → review+
No, this isn't using pgo for for the tests. If anyone knows how to do that, I'm all ears
I follow this: https://wiki.mozilla.org/ReleaseEngineering/TryChooser#What_if_I_want_PGO_for_my_build it builds a win7-opt build according to the display on treeherder, but it really is a pgo build, then we run tests on that build.
Pushed by blassey@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f00030db1ddd Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr,tresize | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') r=jmaher
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla54 → ---
(In reply to Joel Maher ( :jmaher) from comment #38) > I follow this: > https://wiki.mozilla.org/ReleaseEngineering/ > TryChooser#What_if_I_want_PGO_for_my_build > > it builds a win7-opt build according to the display on treeherder, but it > really is a pgo build, then we run tests on that build. so that's what I've done (please correct me if I'm wrong) and its coming up green. https://treeherder.mozilla.org/#/jobs?repo=try&revision=e75f4e6a3d273efaf56335b338840001e6c2c71c
Flags: needinfo?(blassey.bugs)
the patch looks correct, odd why the build was so fast.
:blassey, do you have a plan here, or should we back out the original patch?
Flags: needinfo?(blassey.bugs)
(In reply to Joel Maher ( :jmaher) from comment #47) > :blassey, do you have a plan here, or should we back out the original patch? I don't have a plan, because as I said in comment 44 the try run done with the posted instructions is green (the one red is from a psutil.NoSuchProcess exception is thrown in different code for a different test, perhaps needs the same treatment). Either the try runs show this is fixed, or try needs to be fixed to reproduce this problem. Bottom line, don't back out.
Flags: needinfo?(blassey.bugs)
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #42) > Unfortunately, this still hits: > https://treeherder.mozilla.org/logviewer.html#?job_id=76564933&repo=mozilla- > inbound Actually, now that I look at it, this is that same psutil.NoSuchProcess exception is thrown in different code for a different test. Sebastian, I think you wrongly backed out, please reland.
Flags: needinfo?(aryx.bugmail)
Blocks: 1339594
We don't have any workable system for "we had an intermittent, something caused it to happen a whole lot more, that separate thing was fixed," so I cloned this to bug 1339594 so that can be the intermittent-failure bug for the fact that this failure will continue whether or not the patch here successfully fixes one cause of it.
Summary: Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr,tresize | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Fix one cause of psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') failures in Talos
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #49) > Actually, now that I look at it, this is that same psutil.NoSuchProcess > exception is thrown in different code for a different test. Sebastian, I > think you wrongly backed out, please reland. Nothing got backed out, the needinfo was just a request to check if the patch does what it should.
Flags: needinfo?(aryx.bugmail)
I don't know how to evaluate whether the one failure that I would call "this bug" in https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=fc9f6f7e8e86f00af60ff1ecc72eaf854a6b1ddd&filter-searchStr=ee21f8436d5df9f41883c24adff2c6bb51fb6bc5&group_state=expanded&selectedJob=77421724 was what this bug turned into and decided to fix or not, but it's certainly below the 30% failure rate, so let's just call it fixed based on that, and that what it fixed.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Summary: Fix one cause of psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') failures in Talos → Fix the increase in frequency of psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') failures in talos-other on Windows 7 PGO caused by bug 1303096
Target Milestone: --- → mozilla54
Whiteboard: [stockwell fixed]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: