Closed
Bug 1236770
Opened 9 years ago
Closed 8 years ago
Fix the increase in frequency of psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') failures in talos-other on Windows 7 PGO caused by bug 1303096
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(firefox-esr45 unaffected, firefox51 unaffected, firefox52 unaffected, firefox-esr52 unaffected, firefox53 fixed, firefox54 fixed)
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox-esr45 | --- | unaffected |
firefox51 | --- | unaffected |
firefox52 | --- | unaffected |
firefox-esr52 | --- | unaffected |
firefox53 | --- | fixed |
firefox54 | --- | fixed |
People
(Reporter: philor, Assigned: blassey)
References
Details
(Whiteboard: [stockwell fixed])
Attachments
(1 file)
(deleted),
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
Reporter | ||
Updated•9 years ago
|
Summary: Intermittent TEST-UNEXPECTED-ERROR | damp | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe')
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 3•9 years ago
|
||
Even though it won't do any good, I guess we might as well have the name of the most frequent suite in the summary.
Summary: Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe')
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•8 years ago
|
Summary: Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe')
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•8 years ago
|
Summary: Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr,tresize | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe')
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 31•8 years ago
|
||
this bug had a recent uptick and change of pattern, On Feb 4, we started seeing win7-pgo talos-other-e10s failures- this is new, so I did some retriggers:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=win%20pgo%20talos%20other%20e10s&tochange=e58878766438f80b01d3e3cb9f48aaab373b2923&fromchange=ef2f2b1d477388a54be99288cf0fb3e0490f44a0&selectedJob=74334567
ni me for following up on the retriggers.
Flags: needinfo?(jmaher)
Comment 32•8 years ago
|
||
oh, getting closer, down to 3 pushes:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=win%20pgo%20talos%20other%20e10s&tochange=e58878766438f80b01d3e3cb9f48aaab373b2923&fromchange=3e555770a90a41e04bbb4ac41b65fa2f1db6977d
since this is PGO, I need to backfill PGO builds and tests, this will take another 6+ hours to get the data I need. Luckily this pattern is very clear, 100% green vs 70% green :)
Comment 33•8 years ago
|
||
in bug 1303096 the changes caused very frequent (30%) failures in windows pgo talos tests (as seen my some retriggers):
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=win%20pgo%20talos%20other%20e10s&tochange=e58878766438f80b01d3e3cb9f48aaab373b2923&fromchange=3e555770a90a41e04bbb4ac41b65fa2f1db6977d&selectedJob=75592879
:blassey, I see you are the patch author here- could you fix this error or back out your patch sometime in the next week or two?
Depends on: 1303096
Flags: needinfo?(jmaher) → needinfo?(blassey.bugs)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 35•8 years ago
|
||
This looks like a race between the is_running() check and the call to terminate()
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ddb09114c08a5b1987c0191ea05717a8cbf33618
Assignee: nobody → blassey.bugs
Flags: needinfo?(blassey.bugs)
Attachment #8835822 -
Flags: review?(jmaher)
Comment 36•8 years ago
|
||
Comment on attachment 8835822 [details] [diff] [review]
talos_term.patch
Review of attachment 8835822 [details] [diff] [review]:
-----------------------------------------------------------------
I am fine giving this a try. I am not sure if the try push is using pgo builds for the tests, we have a confusing story on try for pgo.
Overall this looks like we are catching the right exception and doing what should be right for cleaning this up.
Attachment #8835822 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 37•8 years ago
|
||
No, this isn't using pgo for for the tests. If anyone knows how to do that, I'm all ears
Comment 38•8 years ago
|
||
I follow this:
https://wiki.mozilla.org/ReleaseEngineering/TryChooser#What_if_I_want_PGO_for_my_build
it builds a win7-opt build according to the display on treeherder, but it really is a pgo build, then we run tests on that build.
Comment 39•8 years ago
|
||
Pushed by blassey@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f00030db1ddd
Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr,tresize | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') r=jmaher
Comment 40•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment hidden (Intermittent Failures Robot) |
Comment 42•8 years ago
|
||
Unfortunately, this still hits: https://treeherder.mozilla.org/logviewer.html#?job_id=76564933&repo=mozilla-inbound
Flags: needinfo?(blassey.bugs)
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Status: RESOLVED → REOPENED
status-firefox46:
affected → ---
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Resolution: FIXED → ---
Target Milestone: mozilla54 → ---
Assignee | ||
Comment 44•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #38)
> I follow this:
> https://wiki.mozilla.org/ReleaseEngineering/
> TryChooser#What_if_I_want_PGO_for_my_build
>
> it builds a win7-opt build according to the display on treeherder, but it
> really is a pgo build, then we run tests on that build.
so that's what I've done (please correct me if I'm wrong) and its coming up green. https://treeherder.mozilla.org/#/jobs?repo=try&revision=e75f4e6a3d273efaf56335b338840001e6c2c71c
Flags: needinfo?(blassey.bugs)
Comment 45•8 years ago
|
||
the patch looks correct, odd why the build was so fast.
Comment hidden (Intermittent Failures Robot) |
Comment 47•8 years ago
|
||
:blassey, do you have a plan here, or should we back out the original patch?
Flags: needinfo?(blassey.bugs)
Assignee | ||
Comment 48•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #47)
> :blassey, do you have a plan here, or should we back out the original patch?
I don't have a plan, because as I said in comment 44 the try run done with the posted instructions is green (the one red is from a psutil.NoSuchProcess exception is thrown in different code for a different test, perhaps needs the same treatment). Either the try runs show this is fixed, or try needs to be fixed to reproduce this problem.
Bottom line, don't back out.
Flags: needinfo?(blassey.bugs)
Assignee | ||
Comment 49•8 years ago
|
||
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #42)
> Unfortunately, this still hits:
> https://treeherder.mozilla.org/logviewer.html#?job_id=76564933&repo=mozilla-
> inbound
Actually, now that I look at it, this is that same psutil.NoSuchProcess exception is thrown in different code for a different test. Sebastian, I think you wrongly backed out, please reland.
Flags: needinfo?(aryx.bugmail)
Reporter | ||
Comment 50•8 years ago
|
||
We don't have any workable system for "we had an intermittent, something caused it to happen a whole lot more, that separate thing was fixed," so I cloned this to bug 1339594 so that can be the intermittent-failure bug for the fact that this failure will continue whether or not the patch here successfully fixes one cause of it.
Keywords: intermittent-failure
Summary: Intermittent TEST-UNEXPECTED-ERROR | damp,ts_paint,tps,a11yr,tresize | psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') → Fix one cause of psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') failures in Talos
Comment 51•8 years ago
|
||
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #49)
> Actually, now that I look at it, this is that same psutil.NoSuchProcess
> exception is thrown in different code for a different test. Sebastian, I
> think you wrongly backed out, please reland.
Nothing got backed out, the needinfo was just a request to check if the patch does what it should.
Flags: needinfo?(aryx.bugmail)
Reporter | ||
Comment 52•8 years ago
|
||
I don't know how to evaluate whether the one failure that I would call "this bug" in https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=fc9f6f7e8e86f00af60ff1ecc72eaf854a6b1ddd&filter-searchStr=ee21f8436d5df9f41883c24adff2c6bb51fb6bc5&group_state=expanded&selectedJob=77421724 was what this bug turned into and decided to fix or not, but it's certainly below the 30% failure rate, so let's just call it fixed based on that, and that what it fixed.
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
status-firefox51:
--- → unaffected
status-firefox-esr45:
--- → unaffected
status-firefox-esr52:
--- → unaffected
Resolution: --- → FIXED
Summary: Fix one cause of psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') failures in Talos → Fix the increase in frequency of psutil.NoSuchProcess process no longer exists (pid=3740, name=u'firefox.exe') failures in talos-other on Windows 7 PGO caused by bug 1303096
Target Milestone: --- → mozilla54
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Whiteboard: [stockwell fixed]
Updated•8 years ago
|
Comment 54•8 years ago
|
||
bugherder uplift |
Flags: in-testsuite+
You need to log in
before you can comment on or make changes to this bug.
Description
•