Closed
Bug 1321513
Opened 8 years ago
Closed 8 years ago
Add nagios checks for pushapk scriptworkers
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlorenzo, Assigned: jlorenzo)
References
Details
Attachments
(2 files, 2 obsolete files)
(deleted),
patch
|
arich
:
review+
jlorenzo
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
text/x-review-board-request
|
mozilla
:
review+
jlorenzo
:
checked-in+
|
Details |
Nagios config files have been created on the nodes sides[1] (added in bug 1307826). Pushapkworker doesn't rely on the scriptworker shared module yet (introduced in bug 1316702). If I understand bug 1317783 comment 3 correctly, this may not happen immediately (the scriptworker module enables chain of trust AFAUI).
Therefore, I believe we can turn on nagios just like Aki did in bug 1295196
[1] https://hg.mozilla.org/build/puppet/file/8fbdd01cb464/modules/pushapkworker/templates/nagios.cfg.erb
Assignee | ||
Comment 1•8 years ago
|
||
Things have changed since bug 1295196 comment 1, no more SVN repo is required. Releng folks have access to the git repos listed at [1] (and more precisely, the nagios part of it)
[1] https://mana.mozilla.org/wiki/display/SYSADMIN/Git
Assignee | ||
Comment 2•8 years ago
|
||
The patch is totally similar to bug 1295196. Before asking for a formal review to the folks in #moc, I have a question for you, Aki. I'm not sure I need to add the resource "pushapk-scriptworker-procs", because "scriptworker-procs" already exist. Should I create it anyway, or is it better to just add a hostgroup to "scriptworker-procs"?
Attachment #8816127 -
Flags: feedback?(aki)
Comment 3•8 years ago
|
||
Comment on attachment 8816127 [details] [diff] [review]
add_pushapk_scriptworker_in_nagios.diff
The current idea is to have each scriptworker instance use scriptworker-procs, rather than have one *-procs check per scriptworker instance type, especially since I believe they'll look exactly the same other than the name. If there isn't a strong reason to add another check, let's reuse the check.
Attachment #8816127 -
Flags: feedback?(aki) → feedback+
Assignee | ||
Comment 4•8 years ago
|
||
Here's a new revision where pushapk-scriptworkers is now listed under "scriptworker-procs" instead of a newly created entry. Aki, do you have the rights to review and merge my patch? Or should I ask Amy?
Attachment #8816127 -
Attachment is obsolete: true
Attachment #8823219 -
Flags: review?(aki)
Comment hidden (off-topic) |
Assignee | ||
Comment 6•8 years ago
|
||
Oops, I should've added --no-color when generating the diff.
Attachment #8823219 -
Attachment is obsolete: true
Attachment #8823219 -
Flags: review?(aki)
Attachment #8823221 -
Flags: review?(aki)
Assignee | ||
Comment 7•8 years ago
|
||
For the record, the patch has also been rebased on top of the latest master.
Comment 8•8 years ago
|
||
Comment on attachment 8823221 [details] [diff] [review]
add_pushapk_scriptworker_in_nagios_rev3.diff
I think :arr is a better reviewer here.
Attachment #8823221 -
Flags: review?(aki) → review?(arich)
Comment 9•8 years ago
|
||
Comment on attachment 8823221 [details] [diff] [review]
add_pushapk_scriptworker_in_nagios_rev3.diff
This looks fine to me. I presume that /builds/scriptworker/bin/scriptworker already exists on the client and that the nagios config has been modified in releng puppet to include the nrpe check?
Attachment #8823221 -
Flags: review?(arich) → review+
Assignee | ||
Comment 10•8 years ago
|
||
The releng had been modified to support the nrpe check[1]. However, thanks for calling out `/builds/scriptworker`!
I noticed pushapk-scriptworker and signing-scriptworker don't share the same root[2]. Therefore, something needs to be changed:
a) either on the releng puppet; we make the root pushapk at /build/scriptworker
b) or on the current nagios config; we extend the regex to support other roots.
I see drawbacks in both scenarios:
a) if you're logged on a machine, you may not know what type of scriptworker instance you have, unless you dig into its configuration
b) the regex might become a nightmare after several instances being added.
On the other had, having consistency across scriptworker instances is better, and will help setting up new ones. Moreover, if you're logged on a machine, there's great probability its FQDN refers to what type of scriptworker instance you're dealing with. Then, I'll change the puppet configuration.
[1] https://hg.mozilla.org/build/puppet/file/3e2076f7aedc/modules/pushapkworker/manifests/init.pp#l63
[2] Pushapk being defined as: https://hg.mozilla.org/build/puppet/file/3e2076f7aedc/manifests/moco-config.pp#l453
Signing-scriptworker as: https://hg.mozilla.org/build/puppet/file/3e2076f7aedc/modules/signing_scriptworker/manifests/settings.pp#l2
Comment hidden (mozreview-request) |
Comment 12•8 years ago
|
||
mozreview-review |
Comment on attachment 8823758 [details]
Bug 1321513 - Make pushapkworker live under /build/scriptworker
https://reviewboard.mozilla.org/r/102242/#review102620
At some point, we should get pushapkworker using modules/scriptworker, and we can make sure all workers using modules/scriptworker have a /builds/scriptworker root.
I think this works until that point.
Attachment #8823758 -
Flags: review?(aki) → review+
Comment hidden (mozreview-request) |
Assignee | ||
Comment 14•8 years ago
|
||
Even though the puppet run was passing yesterday, a nit slipped through the patch. I updated it.
The latest revision of that patch passed lint tests there: https://github.com/mozilla/build-puppet/pull/24
I tested the changes against dev-linux64-ec2-jlorenzo.dev.releng.use1.m.c:
* /build/pushapkworker doesn't exist anymore
* /build/scriptworker does and contains all the files
* `supervisorctl status pushapkworker` says the process is running
* /builds/scriptworker/logs/worker.log shows the process does what it needs to do (polling TC's queue)
I think we're good to merge and deploy attachment 8823758 [details] (which is necessary to land attachment 8823221 [details] [diff] [review]).
Assignee | ||
Comment 15•8 years ago
|
||
Landed on default [1] and merged in production [2]. Puppet agent applied the changes restarted the process (which restarted properly)[3].
On the production machine, I see now /builds/scriptworker/ but not /build/pushapkworker/ anymore. This complies with [4].
[1] https://hg.mozilla.org/build/puppet/rev/1a429ca163d9
[2] https://hg.mozilla.org/build/puppet/rev/b4d2f4f777e4
[3] https://papertrailapp.com/systems/526654063/events?focus=753674615796166664
[4] https://papertrailapp.com/events?q=program%3Apuppet-agent&focus=753673798523449394
Assignee | ||
Updated•8 years ago
|
Attachment #8823758 -
Flags: checked-in+
Assignee | ||
Comment 16•8 years ago
|
||
Comment on attachment 8823221 [details] [diff] [review]
add_pushapk_scriptworker_in_nagios_rev3.diff
This patch was landed as revision 568ad3460d032bf498f7bf0e9239938244f64784 on master in the git repo (linked in comment 1).
Attachment #8823221 -
Flags: checked-in+
Assignee | ||
Comment 17•8 years ago
|
||
I connected to the releng nagios instance[1] and looked for pushapkwoker. The entry there shows:
* 7 checks (PING + the 6 new checks added with attachment 8823221 [details] [diff] [review]).
* /builds/scriptworker/bin/scriptworker is declared as running. I can tell it's also working, because this job[2] has just finished and it ran fine.
* Every check is green.
Looks like we're all set. Time to close this bug! Thanks for the help Ami and Aki!
[1] Listed at https://mana.mozilla.org/wiki/display/SYSADMIN/Nagios#Nagios-Hardware
[2] https://tools.taskcluster.net/task-inspector/#Tmlx52SQT9uwVhi9oEqagg/
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 18•8 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #12)
> At some point, we should get pushapkworker using modules/scriptworker, and
> we can make sure all workers using modules/scriptworker have a
> /builds/scriptworker root.
I agree. I looked more into using this module. I get the feeling I need a greater version than 0.7.1. That is to say, Chain of Trust has to be enabled. Hence, I guess the module can be added alongside bug 1317783.
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•