Investigate max runtime errors after microsoft.com re-recording
Categories
(Testing :: Raptor, task, P1)
Tracking
(Not tracked)
People
(Reporter: alexandru.irimovici, Assigned: alexandru.irimovici)
References
Details
Investigate max runtime errors that we get here:
https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=253270045&revision=65cc93e84d6e8203b62d2d6706a3a54446b5c8c3
the PR with the changes is: https://phabricator.services.mozilla.com/D35759
Comment 1•5 years ago
|
||
Comment 2•5 years ago
|
||
(In reply to Alexandru Irimovici from comment #0)
Investigate max runtime errors that we get here:
https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=253270045&revision=65cc93e84d6e8203b62d2d6706a3a54446b5c8c3the PR with the changes is: https://phabricator.services.mozilla.com/D35644
I believe the correct patch is https://phabricator.services.mozilla.com/D35759
Assignee | ||
Comment 3•5 years ago
|
||
(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #2)
(In reply to Alexandru Irimovici from comment #0)
Investigate max runtime errors that we get here:
https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=253270045&revision=65cc93e84d6e8203b62d2d6706a3a54446b5c8c3the PR with the changes is: https://phabricator.services.mozilla.com/D35644
I believe the correct patch is https://phabricator.services.mozilla.com/D35759
You are right :) I will edit my comment
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
try push with the errors: https://treeherder.mozilla.org/#/jobs?repo=try&revision=41bbda6e2e12ebd8cb05d34e025fb244939373bd
similar try push with another sites tested, that is all green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=65cc93e84d6e8203b62d2d6706a3a54446b5c8c3
For the failing test we have 2 sites(apple and microsoft)
The jobs fail intermitently and from what I see from the logs, they go well for the first site(apple) and when it fails, it happens for the microsoft recording at about 12-16 pagecycle. It just hangs in there for ~25 min, until the task timeouts.
log sample: (https://taskcluster-artifacts.net/TxCDVjthScSotADqvnv_-A/0/public/logs/live_backing.log)
I was not able to reproduce it locally, but I recorded the microsoft.com site again and I'm going to switch it.
Robert, did you notice this kind of behavior before?
Comment 5•5 years ago
|
||
(In reply to Alexandru Irimovici from comment #4)
they go well for the first site(apple) and when it fails, it happens for the microsoft recording at about 12-16 pagecycle. It just hangs in there for ~25 > min, until the task timeouts.
Robert, did you notice this kind of behavior before?
Sites working well and then timing out in future page-cycles? Yes before the general fix to the intermittents (Bug 1559798). Hopefully re-recording again will fix it?
Assignee | ||
Comment 6•5 years ago
|
||
(In reply to Robert Wood [:rwood] from comment #5)
(In reply to Alexandru Irimovici from comment #4)
they go well for the first site(apple) and when it fails, it happens for the microsoft recording at about 12-16 pagecycle. It just hangs in there for ~25 > min, until the task timeouts.
Robert, did you notice this kind of behavior before?
Sites working well and then timing out in future page-cycles? Yes before the general fix to the intermittents (Bug 1559798). Hopefully re-recording again will fix it?
The new recording is still having the same issue(timing out in future page-cycles): https://treeherder.mozilla.org/#/jobs?repo=try&revision=39283025f4280b49c707ce30146fcd09a56f89fd
I'm still not able to reproduce it locally. Robert, can you help me with some advice for this? :)
Comment 7•5 years ago
|
||
(In reply to Alexandru Irimovici from comment #6)
I'm still not able to reproduce it locally. Robert, can you help me with some advice for this? :)
Hmmm no I don't know what the issue is here - you and :bebe are much more familiar with recordings etc. than I am - :bebe please have a look, thanks!
Assignee | ||
Comment 8•5 years ago
|
||
I don't think having more time would solve these intermittents. Almost every time the job succeeds, it does it in ~6 minutes. When it fails, it does it in the ~14th pagecycle, suddenly, after running normally and it just blocks the taskcluster task for the rest of the time that remained(example from logs below - notice the timestamps of the logs).
13:29:08 INFO - raptor-control-server Info: received webext_status: begin pagecycle 14
13:29:08 INFO - PID 1414 | console.log: "[raptor-runnerjs] begin pagecycle 14"
13:29:08 INFO - PID 1414 | console.log: "[raptor-runnerjs] posting to control server"
13:29:08 INFO - PID 1414 | console.log: "[raptor-runnerjs] begin pagecycle 14"
13:29:08 INFO - PID 1414 | console.log: "[raptor-runnerjs] post success"
13:29:09 INFO - PID 1414 | console.log: "[raptor-runnerjs] update tab: 1"
13:29:09 INFO - PID 1414 | console.log: "[raptor-runnerjs] posting to control server"
13:29:09 INFO - PID 1414 | console.log: "[raptor-runnerjs] update tab: 1"
13:29:09 INFO - PID 1414 | console.log: "[raptor-runnerjs] test tab updated: 1"
13:29:09 INFO - PID 1414 | console.log: "[raptor-runnerjs] posting to control server"
13:29:09 INFO - PID 1414 | console.log: "[raptor-runnerjs] test tab updated: 1"
13:29:09 INFO - raptor-control-server Info: received webext_status: update tab: 1
13:29:09 INFO - raptor-control-server Info: received webext_status: test tab updated: 1
13:29:09 INFO - PID 1414 | console.log: "[raptor-runnerjs] post success"
13:29:09 INFO - PID 1414 | console.log: "[raptor-runnerjs] post success"
[taskcluster:error] Aborting task...
[taskcluster 2019-06-28T13:54:32.698Z] === Task Finished ===
[taskcluster 2019-06-28T13:54:32.699Z] Task Duration: 30m0.009884531s
Bebe and Bob Clary confirmed that the failures are not related to hardware issues, as they take place on different machines.
There is still this mistery: The test is failing intermittently in exactly the same spot, on the same pagecycle (14th for OS X and 16th for Linux), not teminating the test after the fail and letting the taskcluster task to timeout.
Assignee | ||
Comment 9•5 years ago
|
||
After rebasing, the tests are running fine: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d99ea2033384cbe9fd53044dd40aa10d277f349
Description
•