Closed Bug 1408389 Opened 7 years ago Closed 7 years ago

when trying to run tests on m3.large (instead of m1.medium) I get many blue jobs in treeherder

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

mozilla58

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Attachments

(1 file)

run damp/asan tests on xlarge instead of legacy 7 years ago Joel Maher ( :jmaher ) (UTC -8) (deleted), patch	gbrown : review+	Details \| Diff \| Splinter Review

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Description

•

7 years ago

Dustin J. Mitchell [:dustin] (he/him)

Comment 1

•

7 years ago

Are these the same tests that we couldn't get to run on anything but m1.mediums before? If I recall, those were failing (orange) not ?? (blue). I think the rough consensus was that they were concurrency-related tests and failed on a multi-CPU instance type (which just about everything but m1.medium is). If this is the same, let's find and link to that bug for context. Either way, we should be able to dig up some logging for those instances.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 2

•

7 years ago

these are the same tests we tried to run on m3.large in the past and identified as too flaky or perma failing. There were 5 exceptions of test jobs that were using legacy and 3 of them are ok to move, but examining the last 2 test suites, this is where I get a lot of the blue jobs. Many of the other failures I am doing a quick pass on to hunt down failures that I see in the logs. If there are other explanations for the blue jobs, that would be good to know as well. I found bug 1281241 (which this blocks) as a reference for previous work done to get off the m1.mediums.

Dustin J. Mitchell [:dustin] (he/him)

Comment 3

•

7 years ago

I'll pull the logs for those instances (in a bit..)

Flags: needinfo?(dustin)

Greg Arndt [:garndt]

Comment 4

•

7 years ago

Looking at one of the machines things start crashing (including the worker) because the machine is out of memory: Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: Uncaught Exception! Attempting to report to Sentry and crash. Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: Error: spawn ENOMEM Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at exports._errnoException (util.js:1026:11) Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at ChildProcess.spawn (internal/child_process.js:313:11) Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at exports.spawn (child_process.js:380:9) Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at Object.exports.execFile (child_process.js:143:15) Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at exports.exec (child_process.js:103:18) Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at Object.check (/home/ubuntu/docker_worker/node_modules/diskspace/diskspace.js:56:3) Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at exports.default (/home/ubuntu/docker_worker/src/lib/stats/host_metrics.js:43:13) Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at ontimeout (timers.js:365:14) Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at tryOnTimeout (timers.js:237:5) Oct 13 06:50:01 docker-worker.aws-provisioner.us-east-1e.ami-98a16ee2.m3-large.i-09d72485b13b76e53 docker-worker: at Timer.listOnTimeout (timers.js:207:5)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 5

•

7 years ago

this is great info! I need to look at the passing ones and see what the memory usage is.

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

7 years ago

Flags: needinfo?(dustin)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 6

•

7 years ago

Attached patch run damp/asan tests on xlarge instead of legacy (deleted) — Details — Splinter Review

we fixed a damp test, now we need to run damp not on legacy. Doing the default instance type (m3.large), we run out of memory! for 7.5GB of memory, that isn't good- but thanks to the data in this bug, I moved to xlarge and it works great: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d4f6786669723bccabf73c864cf3e9342792d9c6

Assignee: nobody → jmaher

Status: NEW → ASSIGNED

Attachment #8920556 - Flags: review?(gbrown)

Geoff Brown [:gbrown]

Comment 7

•

7 years ago

Comment on attachment 8920556 [details] [diff] [review] run damp/asan tests on xlarge instead of legacy Review of attachment 8920556 [details] [diff] [review]: ----------------------------------------------------------------- I suggest clarifying the comment, maybe, "runs out of memory on default/m3.medium"

Attachment #8920556 - Flags: review?(gbrown) → review+

Geoff Brown [:gbrown]

Comment 8

•

7 years ago

s/m3.medium/m3.large/

Greg Arndt [:garndt]

Comment 9

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #6) > Created attachment 8920556 [details] [diff] [review] > run damp/asan tests on xlarge instead of legacy > > we fixed a damp test, now we need to run damp not on legacy. Doing the > default instance type (m3.large), we run out of memory! for 7.5GB of > memory, that isn't good- but thanks to the data in this bug, I moved to > xlarge and it works great: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=d4f6786669723bccabf73c864cf3e9342792d9c6 Interesting that we run out of memory on m3.large but not m1.medium. m1.medium has half the memory of a m3.large.

Pulsebot

Comment 10

•

7 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/11d443e7b098 run devtools on asan and xlarge. r=gbrown

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 11

•

7 years ago

m1.medium is single core, m3.large is multi-core, I suspect we are chewing up much more memory per process/thread than we would on m1.medium.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 12

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/11d443e7b098

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

You need to log in before you can comment on or make changes to this bug.

Bugzilla

when trying to run tests on m3.large (instead of m1.medium) I get many blue jobs in treeherder

Categories

(Taskcluster :: General, defect)

Tracking

(Not tracked)

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Attachment

General

Description

File Name

Content Type