Closed
Bug 930383
Opened 11 years ago
Closed 11 years ago
tbpl not successfully updating, all trees closed
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Infrastructure & Operations Graveyard
WebOps: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: fox2mike)
References
Details
Both tbpl and tbpl-dev.allizom are failing to processes completed builds and tests (https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=592f54e43014 should show several hundred green (and orange and red) letters where it only shows a dozen and dropping grey letters for the jobs that are still running (which are loaded client-side, in the browser, from a different source).
The processing is done by a cronjob running dataimport/import-buildbot-data.py, which might be wedged (though it seems odd that it would be wedged on both tbpl and tbpl-dev at the same time), or they might not be able to fetch http://builddata.pub.build.mozilla.org/buildjson/builds-4hr.js.gz where their data comes from, or... db troubles? Not sure.
At any rate, all trees are closed as a result.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → shyam
Assignee | ||
Comment 1•11 years ago
|
||
So my ssh keys aren't on api-dev.community.scl3.mozilla.com, trying to get access to the machine to see what the issue is.
Assignee | ||
Comment 2•11 years ago
|
||
The machine is out of space. This is not a supported production machine, has no monitoring on it.
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/unused--10--4--11--236-root
14G 14G 0 100% /
Working on freeing up space.
Assignee | ||
Comment 3•11 years ago
|
||
Deleted a few old logs, this machine now has
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/unused--10--4--11--236-root
14G 8.2G 5.2G 62% /
Fired apache back up and seems like it's back online.
Assignee | ||
Comment 4•11 years ago
|
||
And tbpl's import crons were hosed, had to kill the existing ones and fire off new ones on both dev and prod.
Comment 5•11 years ago
|
||
verified prod and dev tpbl are now showing results so far
Comment 6•11 years ago
|
||
I'm not sure why bzapi being down meant the TBPL crons got stuck. The backend bzapi calls (as opposed to the UI tooltip meta calls) are only done after data import, by https://hg.mozilla.org/webtools/tbpl/file/16e6a0cc29c0/php/inc/AnnotatedSummaryGenerator.php#l200 , via https://hg.mozilla.org/webtools/tbpl/file/16e6a0cc29c0/php/getLogExcerpt.php . These are done by workers spawned from https://hg.mozilla.org/webtools/tbpl/file/16e6a0cc29c0/dataimport/import-buildbot-data.py#l363
However after 60s we should hit the timeout at https://hg.mozilla.org/webtools/tbpl/file/16e6a0cc29c0/php/inc/ParallelLogGenerating.php#l37 aborting that getLogExcerpt.php call, so shouldn't end up with a backlog of workers. (And even if we did, the new jobs are inserted before spawning the workers, so we'd at least see the new jobs on TBPL, unless we starved the webhead of resources).
Assignee | ||
Comment 7•11 years ago
|
||
Maybe someone else can look into that.
Assignee: shyam → nobody
Severity: blocker → normal
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 8•11 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #7)
> Maybe someone else can look into that.
Filed bug 930945.
Comment 9•11 years ago
|
||
Do we also need to get bzapi moved to a production machine? Or will enough of tbpl work that trees can remain open?
Flags: needinfo?(emorley)
Comment 10•11 years ago
|
||
bzapi has been replaced by a supported bugzilla API, we just need to switch to it (not identical, so not just a case of adjusting the endpoints), see bug 930410
Flags: needinfo?(emorley)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → shyam
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•