Closed
Bug 1358593
Opened 8 years ago
Closed 7 years ago
[Alert] Cloudamqp: Queue total messages alarm: log_crossreference_error_lines (2017-04-21)
Categories
(Tree Management :: Treeherder: Infrastructure, defect, P1)
Tree Management
Treeherder: Infrastructure
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: jgraham)
References
Details
Attachments
(1 file)
(deleted),
text/x-github-pull-request
|
Details |
At 1730 UTC+1 today, on both stage and prod.
Name treeherder-prod
Server <SNIP>
Vhost <SNIP>
Queue log_crossreference_error_lines
Current # messages 3016
Alarm queue regexp .*
Alarm threshold 1000
The queues have since returned to normal.
Example slow transaction traces:
* 299 seconds - https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions?tw%5Bend%5D=1492801739&tw%5Bstart%5D=1492780139#id=5b224f746865725472616e73616374696f6e2f43656c6572792f63726f73737265666572656e63652d6572726f722d6c696e6573222c22225d&app_trace_id=59c02164-26a1-11e7-9a52-0242ac110012_13057_15560
* 186 seconds - https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions?tw%5Bend%5D=1492801739&tw%5Bstart%5D=1492780139#id=5b224f746865725472616e73616374696f6e2f43656c6572792f63726f73737265666572656e63652d6572726f722d6c696e6573222c22225d&app_trace_id=6c9373bd-26a4-11e7-9a52-0242ac110012_8571_11052
99% of the crossreference-error-lines transaction was in Python code.
The job_id of those two were 93316118 and 93319768 respectively.
Annoyingly the API requires knowing the repository name (which isn't annotated on these transactions) even though job_id is now unique across all repositories.
As such, I've had to look these up in the database by hand:
> select job.id, repository.name from job join repository on job.repository_id = repository.id where job.id in (93316118, 93319768)
+ ------- + --------- +
| id | name |
+ ------- + --------- +
| 93316118 | try |
| 93319768 | try |
+ ------- + --------- +
Which gives:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4fc508fa426a2a5f3e7476e696f286bba4c8de3&selectedJob=93316118
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4fc508fa426a2a5f3e7476e696f286bba4c8de3&selectedJob=93319768
The log URLs:
https://treeherder.mozilla.org/api/project/try/job-log-url/?job_id=93316118
-> https://queue.taskcluster.net/v1/task/SW3naVDCRUakztjbz3B3wA/runs/0/artifacts/public/logs/live_backing.log
-> https://queue.taskcluster.net/v1/task/SW3naVDCRUakztjbz3B3wA/runs/0/artifacts/public/test_info//reftest-no-accel_errorsummary.log
https://treeherder.mozilla.org/api/project/try/job-log-url/?job_id=93319768
-> https://queue.taskcluster.net/v1/task/NDUVHJXBSvarGKezxpESBg/runs/0/artifacts/public/logs/live_backing.log"
-> https://queue.taskcluster.net/v1/task/NDUVHJXBSvarGKezxpESBg/runs/0/artifacts/public/test_info//reftest_errorsummary.log
James, next week would you mind having a look at why these cases are so slow? I'm guessing it's since they had a fair number of error lines?
Flags: needinfo?(james)
Reporter | ||
Updated•8 years ago
|
Priority: -- → P1
Summary: [Alert] Cloudamqp: Queue total messages alarm: treeherder-prod log_crossreference_error_lines → [Alert] Cloudamqp: Queue total messages alarm: log_crossreference_error_lines (2017-04-21)
Comment 1•7 years ago
|
||
Assignee | ||
Comment 2•7 years ago
|
||
So I found that the use of re was mostly useless. It was hard to profile this in a good way without spending more time importing the relevant data into the local db, but the profile after these changes seemed to spend almost all of its time in db stuff whereas before we were spending lots of time uselessly compiling regexp, so I think it's an improvement.
Flags: needinfo?(james)
Comment 3•7 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/adc8db7a74dea2f80a514deb9cea03a89b148fca
Bug 1358593 - Use simple string matching in crossreference_error_lines
The regexp for matching lines started off somewhat complex but ended
up just being equivalent to str.endswith(). Therefore using re is just
unnecessary overhead.
Reporter | ||
Comment 4•7 years ago
|
||
Thank you for looking into this :-)
Assignee: emorley → james
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•