Closed
Bug 1317189
Opened 8 years ago
Closed 7 years ago
talos --rebuild option stopped working
Categories
(Testing :: General, defect)
Tracking
(firefox55 fixed)
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox55 | --- | fixed |
People
(Reporter: zbraniecki, Assigned: chmanchester)
References
Details
Attachments
(1 file)
For a couple months now I'm testing performance of my branch using the following talos run:
./mach try -b o -p linux64,macosx64,win64 -u none[x64,10.10,Windows\ 8] -t other[x64,10.10,Windows\ 8],other-e10s[x64,10.10,Windows\ 8] --rebuild 20
Historically, this always worked. I got builds like:
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab90334d93d8
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=79facb824200
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=248c297a129b
but over last two days the builds get 20 rebuilds for linux, but just one for mac and windows:
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=6544a957e60e64fa97a11e293e17af02c1d1fd22
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=7abee73aa6672ef7528ed4d6345138a50239c74c
or like here 20 builds for windows, 20 builds for mac e10s, but only 1 non-e10s mac:
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0e752b48e4c499d61f49065ce2c585ec4735d1f
-
Comment 1•8 years ago
|
||
Armen, possibly related to bug 1316976?
Reporter | ||
Comment 2•8 years ago
|
||
adding jobs to that build doesn't work either. I tried to add more talos-other and it never happened.
Comment 3•8 years ago
|
||
I did respin talos-other today, and it turned out to kick 1+20 new ones, though there are still pending ones on https://treeherder.mozilla.org/#/jobs?repo=try&author=zbraniecki@mozilla.com
Reporter | ||
Comment 4•8 years ago
|
||
More examples:
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=6544a957e60e64fa97a11e293e17af02c1d1fd22 - windows stuck, macos e10s stuck
- https://treeherder.mozilla.org/#/jobs?repo=try&revision=13cbd8a4e42c81516f7a2a3c2887865ad0b1a925 - windows and linux stuck, macos done
etc.
Can we get some help with this? I'm running a lot of perf tests right now and this bug is making it rally hard to work.
Comment 5•8 years ago
|
||
More recent updates:
Linux stuck, windos and OSX done:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=584c35d5187b
however, it was able to re-build on Linux a week before:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c7d834929a76ff701671a9d0474290d188f1132
Comment 6•8 years ago
|
||
:bstack, would you be able to help us figure out why this wouldn't be working on linux (i.e. taskcluster) ?
Flags: needinfo?(bstack)
Comment 7•8 years ago
|
||
Sorry I've let this language all day today. Had some other stuff I needed to look into first. Afaict, this isn't related to our recent work in triggering talos from treeherder. This would most likely be an in-tree taskgraph generation issue. I'll look into this a bit and defer to someone more wise in the ways of in-tree stuff if I can't find anything awry.
Assignee: nobody → bstack
Status: NEW → ASSIGNED
Flags: needinfo?(bstack)
Comment 8•8 years ago
|
||
I'm at a bit of a loss. I don't think I really have the context here to figure out what's going on. wlach, is this related to the work you're doing now?
Flags: needinfo?(wlachance)
Comment 9•8 years ago
|
||
No, this isn't really related to anything I'm doing.
I don't really see why this would be taskcluster related, at least not fully, as apparently the problem goes back 3 months (long before we used buildbotbridge to schedule the linux talos jobs). If the problems were linux-specific and were more recent, :wcosta would be the person I'd ping (he was doing most of the work for linux talos and BBB).
:catlee, do you know who might be able to debug this? They would need to know about buildbot and how try syntax translates into talos jobs being scheduled.
Flags: needinfo?(wlachance) → needinfo?(catlee)
Updated•8 years ago
|
Assignee: bstack → nobody
Status: ASSIGNED → NEW
Comment 10•8 years ago
|
||
I think --rebuild support is something that trigger-bot [1] handles.
Chris, can you help out here?
[1] http://chmanchester.github.io/blog/2015/07/15/automatic-triggering-on-try-server/
Flags: needinfo?(catlee) → needinfo?(cmanchester)
Assignee | ||
Comment 11•8 years ago
|
||
It's pretty unclear to me what the issue is here, or which jobs it impacts, so I pushed to try with `--rebuild` for Linux and OS X:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=55dfc7a5b6514581601e5472ea73f880a822cdc3
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a45930154c6a16225bbbe23e9dd7ec7c882f2de9
This is working as expected for buildbot jobs, which are triggered by trigger-bot, and taskcluster jobs, which are triggered by a different mechanism. People reporting this issue refer to jobs being "stuck" -- perhaps this refers to some re-triggered jobs being in pending for an apparently unreasonable amount of time?
Flags: needinfo?(cmanchester)
Comment 12•8 years ago
|
||
I think the symptom is more like sending multiple platforms Talos w/ --rebuild in one time, tests might be stucked. Pushing with single platform seems fine.
Assignee | ||
Comment 13•8 years ago
|
||
I can't seem to reproduce this, trying again in https://treeherder.mozilla.org/#/jobs?repo=try&revision=123655847133d9f2b757770ad6729effb0753f26
Comment 14•7 years ago
|
||
AFAICT this blocks us evaluating stylo changes on Linux. For example in a recent try push [1] we saw retriggers for win and mac, but not linux. This seems to imply the feature is broken on taskcluster but not buildbot.
chmanchester or wlach, can you take a closer look at this?
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=22028266be5e4485a959d44b1619c7e3d3f80dfa
Flags: needinfo?(wlachance)
Flags: needinfo?(cmanchester)
Comment 15•7 years ago
|
||
I'm sorry, don't think I can help (this is even less my area now than it was a few months ago). If Chris doesn't know what's up, I would escalate to :garndt and/or :jmaher.
Updated•7 years ago
|
Flags: needinfo?(wlachance)
Assignee | ||
Comment 16•7 years ago
|
||
I think I figured this out. It's the difference between "--rebuild" and "--rebuild-talos", the former works fine on TC, the latter as implemented in bug 1333167 does not seem to work, but I think I see the issue.
Assignee | ||
Comment 17•7 years ago
|
||
Actually, based on the links in comment 0 this bug was actually filed about "--rebuild", where the issue still doesn't reproduce. I'll re purpose it to fix "--rebuild-talos" unless there are any objections.
Comment hidden (mozreview-request) |
Comment 19•7 years ago
|
||
mozreview-review |
Comment on attachment 8866058 [details]
Bug 1317189 - Fix --rebuild-talos for TC try jobs by checking the correct attribute.
https://reviewboard.mozilla.org/r/137654/#review141070
Attachment #8866058 -
Flags: review?(wcosta) → review+
Comment 20•7 years ago
|
||
possibly bug 1352202 is a dup?
Comment 21•7 years ago
|
||
Pushed by cmanchester@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d27e83aae737
Fix --rebuild-talos for TC try jobs by checking the correct attribute. r=wcosta
Comment 22•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in
before you can comment on or make changes to this bug.
Description
•