Closed Bug 1207229 Opened 9 years ago Closed 9 years ago

b2g_bumper causing alerts in #buildduty

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 985767

People

(Reporter: aselagea, Unassigned)

References

Details

Noticed several Nagios alerts related to an increased load average on buildbot-master66.bb.releng.usw2.mozilla.com. [04:40.35] <nagios-releng> Tue 06:40:32 PDT [4084] buildbot-master66.bb.releng.usw2.mozilla.com:load is WARNING: WARNING - load average: 6.08, 10.37, 9.94 (http://m.mozilla.org/load) Things seem to be coming back to normal soon after that, but this kind of alerts has been triggered pretty often during the last few days. Also spotted alerts like (most likely caused by hanging hg processes): <nagios-releng> Tue 08:40:23 PDT [4090] buildbot-master66.bb.releng.usw2.mozilla.com:File Age - /builds/b2g_bumper/b2g_bumper.stamp is CRITICAL: FILE_AGE CRITICAL: /builds/b2g_bumper/b2g_bumper.stamp is 2118 seconds old and 0 bytes (http://m.mozilla.org/File+Age+-+/builds/b2g_bumper/b2g_bumper.stamp) The one above didn't last for too long, though, so we didn't need to kill that process.
Running "top" on buildbot-master66.bb.releng.usw2.mozilla.com revealed lots of git-remote-http requests. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8795 cltbld 20 0 185m 8864 4192 R 4.9 0.2 0:00.26 git-remote-http 8801 cltbld 20 0 184m 8160 4192 R 4.6 0.2 0:00.16 git-remote-http 8789 cltbld 20 0 186m 9.8m 4200 R 4.3 0.3 0:00.28 git-remote-http 8794 cltbld 20 0 185m 8784 4192 R 4.3 0.2 0:00.24 git-remote-http 8802 cltbld 20 0 184m 8088 4192 R 4.3 0.2 0:00.15 git-remote-http 8796 cltbld 20 0 184m 8484 4192 R 4.0 0.2 0:00.24 git-remote-http 8804 cltbld 20 0 184m 8240 4192 R 4.0 0.2 0:00.13 git-remote-http 8798 cltbld 20 0 188m 11m 4200 S 3.7 0.3 0:00.22 git-remote-http 8777 cltbld 20 0 189m 12m 4200 S 3.4 0.3 0:00.31 git-remote-http 8807 cltbld 20 0 183m 7096 4192 R 3.4 0.2 0:00.11 git-remote-http 8808 cltbld 20 0 184m 7884 4192 R 3.4 0.2 0:00.11 git-remote-http 8786 cltbld 20 0 188m 11m 4200 S 2.8 0.3 0:00.27 git-remote-http 8810 cltbld 20 0 183m 6708 4192 R 2.8 0.2 0:00.09 git-remote-http 8812 cltbld 20 0 182m 6736 4192 R 2.5 0.2 0:00.08 git-remote-http 8776 cltbld 20 0 112m 3468 1108 R 1.9 0.1 0:00.06 git 8005 cltbld 20 0 948m 24m 4184 S 1.5 0.7 0:05.93 python :pmoore - since you worked at several improvements on this one in the past, could you please take a look?
Flags: needinfo?(pmoore)
Unfortunately I'm not working on this stuff any more. @Hal, is this something you'd be able to support Alin with? Otherwise catlee can probably direct you to somebody. Thanks, Pete
Flags: needinfo?(pmoore) → needinfo?(hwine)
Nope -- I've never touched b2g bumper -- catlee knows the most I believe.
Flags: needinfo?(hwine)
catlee: any reason why load would be spiking recently (and repeatedly)? Is this related to release week, i.e. will it go back to normal soon?
Flags: needinfo?(catlee)
nothing that I'm aware of
Flags: needinfo?(catlee)
raising to blocker since we have a problem here, seems that last bumper bot push was at around 8am pacific yesterday and no push since then. There are changes in b2g/gaia and so i think its now a real problem
Severity: normal → blocker
also closing gaia for this problem
The only thing I can see right now is that the bumper is failing to resolve 'b2g-4.4.2_r1' on https://git.mozilla.org/b2g/device_lge_mako-kernel. I can see that that repo doesn't exist, but I'm not sure what changed. Possibly a manifest change?
(In reply to Carsten Book [:Tomcat] from comment #7) > also closing gaia for this problem filed bug 1208024 for the b2bumper problem since it might be a different problem here.
Severity: blocker → normal
coop is adjusting nagios in bug 985767.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.