Closed Bug 930383 Opened 11 years ago Closed 11 years ago

tbpl not successfully updating, all trees closed

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: fox2mike)

References

Details

Both tbpl and tbpl-dev.allizom are failing to processes completed builds and tests (https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=592f54e43014 should show several hundred green (and orange and red) letters where it only shows a dozen and dropping grey letters for the jobs that are still running (which are loaded client-side, in the browser, from a different source). The processing is done by a cronjob running dataimport/import-buildbot-data.py, which might be wedged (though it seems odd that it would be wedged on both tbpl and tbpl-dev at the same time), or they might not be able to fetch http://builddata.pub.build.mozilla.org/buildjson/builds-4hr.js.gz where their data comes from, or... db troubles? Not sure. At any rate, all trees are closed as a result.
Depends on: 930387
Assignee: nobody → shyam
So my ssh keys aren't on api-dev.community.scl3.mozilla.com, trying to get access to the machine to see what the issue is.
The machine is out of space. This is not a supported production machine, has no monitoring on it. Filesystem Size Used Avail Use% Mounted on /dev/mapper/unused--10--4--11--236-root 14G 14G 0 100% / Working on freeing up space.
Deleted a few old logs, this machine now has Filesystem Size Used Avail Use% Mounted on /dev/mapper/unused--10--4--11--236-root 14G 8.2G 5.2G 62% / Fired apache back up and seems like it's back online.
And tbpl's import crons were hosed, had to kill the existing ones and fire off new ones on both dev and prod.
verified prod and dev tpbl are now showing results so far
I'm not sure why bzapi being down meant the TBPL crons got stuck. The backend bzapi calls (as opposed to the UI tooltip meta calls) are only done after data import, by https://hg.mozilla.org/webtools/tbpl/file/16e6a0cc29c0/php/inc/AnnotatedSummaryGenerator.php#l200 , via https://hg.mozilla.org/webtools/tbpl/file/16e6a0cc29c0/php/getLogExcerpt.php . These are done by workers spawned from https://hg.mozilla.org/webtools/tbpl/file/16e6a0cc29c0/dataimport/import-buildbot-data.py#l363 However after 60s we should hit the timeout at https://hg.mozilla.org/webtools/tbpl/file/16e6a0cc29c0/php/inc/ParallelLogGenerating.php#l37 aborting that getLogExcerpt.php call, so shouldn't end up with a backlog of workers. (And even if we did, the new jobs are inserted before spawning the workers, so we'd at least see the new jobs on TBPL, unless we starved the webhead of resources).
Maybe someone else can look into that.
Assignee: shyam → nobody
Severity: blocker → normal
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Depends on: 930945
(In reply to Shyam Mani [:fox2mike] from comment #7) > Maybe someone else can look into that. Filed bug 930945.
Do we also need to get bzapi moved to a production machine? Or will enough of tbpl work that trees can remain open?
Flags: needinfo?(emorley)
bzapi has been replaced by a supported bugzilla API, we just need to switch to it (not identical, so not just a case of adjusting the endpoints), see bug 930410
Flags: needinfo?(emorley)
Assignee: nobody → shyam
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.