Closed Bug 1358593 Opened 8 years ago Closed 7 years ago

[Alert] Cloudamqp: Queue total messages alarm: log_crossreference_error_lines (2017-04-21)

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: jgraham)

References

Details

Attachments

(1 file)

At 1730 UTC+1 today, on both stage and prod. Name treeherder-prod Server <SNIP> Vhost <SNIP> Queue log_crossreference_error_lines Current # messages 3016 Alarm queue regexp .* Alarm threshold 1000 The queues have since returned to normal. Example slow transaction traces: * 299 seconds - https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions?tw%5Bend%5D=1492801739&tw%5Bstart%5D=1492780139#id=5b224f746865725472616e73616374696f6e2f43656c6572792f63726f73737265666572656e63652d6572726f722d6c696e6573222c22225d&app_trace_id=59c02164-26a1-11e7-9a52-0242ac110012_13057_15560 * 186 seconds - https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions?tw%5Bend%5D=1492801739&tw%5Bstart%5D=1492780139#id=5b224f746865725472616e73616374696f6e2f43656c6572792f63726f73737265666572656e63652d6572726f722d6c696e6573222c22225d&app_trace_id=6c9373bd-26a4-11e7-9a52-0242ac110012_8571_11052 99% of the crossreference-error-lines transaction was in Python code. The job_id of those two were 93316118 and 93319768 respectively. Annoyingly the API requires knowing the repository name (which isn't annotated on these transactions) even though job_id is now unique across all repositories. As such, I've had to look these up in the database by hand: > select job.id, repository.name from job join repository on job.repository_id = repository.id where job.id in (93316118, 93319768) + ------- + --------- + | id | name | + ------- + --------- + | 93316118 | try | | 93319768 | try | + ------- + --------- + Which gives: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4fc508fa426a2a5f3e7476e696f286bba4c8de3&selectedJob=93316118 https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4fc508fa426a2a5f3e7476e696f286bba4c8de3&selectedJob=93319768 The log URLs: https://treeherder.mozilla.org/api/project/try/job-log-url/?job_id=93316118 -> https://queue.taskcluster.net/v1/task/SW3naVDCRUakztjbz3B3wA/runs/0/artifacts/public/logs/live_backing.log -> https://queue.taskcluster.net/v1/task/SW3naVDCRUakztjbz3B3wA/runs/0/artifacts/public/test_info//reftest-no-accel_errorsummary.log https://treeherder.mozilla.org/api/project/try/job-log-url/?job_id=93319768 -> https://queue.taskcluster.net/v1/task/NDUVHJXBSvarGKezxpESBg/runs/0/artifacts/public/logs/live_backing.log" -> https://queue.taskcluster.net/v1/task/NDUVHJXBSvarGKezxpESBg/runs/0/artifacts/public/test_info//reftest_errorsummary.log James, next week would you mind having a look at why these cases are so slow? I'm guessing it's since they had a fair number of error lines?
Flags: needinfo?(james)
Priority: -- → P1
Summary: [Alert] Cloudamqp: Queue total messages alarm: treeherder-prod log_crossreference_error_lines → [Alert] Cloudamqp: Queue total messages alarm: log_crossreference_error_lines (2017-04-21)
So I found that the use of re was mostly useless. It was hard to profile this in a good way without spending more time importing the relevant data into the local db, but the profile after these changes seemed to spend almost all of its time in db stuff whereas before we were spending lots of time uselessly compiling regexp, so I think it's an improvement.
Flags: needinfo?(james)
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/adc8db7a74dea2f80a514deb9cea03a89b148fca Bug 1358593 - Use simple string matching in crossreference_error_lines The regexp for matching lines started off somewhat complex but ended up just being equivalent to str.endswith(). Therefore using re is just unnecessary overhead.
Thank you for looking into this :-)
Assignee: emorley → james
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: