Closed Bug 1274191 Opened 8 years ago Closed 8 years ago

JobsViewSet.create exceptions from builds-4hr ingestion ("KeyError: 'd31974e5eb4ed77615decff56fffb6dd3882909c'")

Categories

(Tree Management :: Treeherder: API, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1272532

People

(Reporter: emorley, Unassigned)

References

Details

Last night buildbot builds weren't being ingested for an hour or two - see IRC logs: http://logs.glob.uno/?c=mozilla%23releng&s=18+May+2016&e=19+May+2016&b=1#c246737 http://logs.glob.uno/?c=mozilla%23treeherder&s=18+May+2016&e=19+May+2016&b=1#c102502 Of note: * Prod revision was 3442b2f1193651808a4fa7741d72038abcd7c8d5 * There were no recent deployments (last was 7 days ago: https://rpm.newrelic.com/accounts/677903/applications/4180461/deployments) * There was not a massive backlog of celery jobs (https://rpm.newrelic.com/accounts/677903/dashboard/13318367/page/1?tw%5Bend%5D=1463620893&tw%5Bstart%5D=1463608881) * However there were a number of exceptions: https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors?tw%5Bend%5D=1463620893&tw%5Bstart%5D=1463608881 Specifically HTTP 500s when the builds4hr ingestion task posted the jobs back to the API: https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/4e92a0-dcff726e-1d5f-11e6-b947-b82a72d22a14 Caused by KeyErrors during treeherder.webapp.api.jobs:JobsViewSet.create: https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/4eb47b-c1b9801d-1d5f-11e6-b947-b82a72d22a14 The exception happened on this line: https://github.com/mozilla/treeherder/blame/3442b2f1193651808a4fa7741d72038abcd7c8d5/treeherder/model/derived/jobs.py#L1782 Which was recently changed in bug 1265037: https://github.com/mozilla/treeherder/commit/104787282d0db965ea948342ce1f2926936f3d27 The guid handling for buildbot retried jobs (which attempts to work around buildbot quirks) has always been pretty hacky (and not well documented), guess we missed an edge case :-)
This is bug 1272532, which we should probably prioritize fixing. I'll look into it.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.