Closed Bug 1512676 Opened 6 years ago Closed 4 years ago

Intermittent Windows [taskcluster:error] exit status 128 | bash: fork: Resource temporarily unavailable

Categories

(Infrastructure & Operations :: RelOps: Windows OS, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: ahal, Unassigned)

References

Details

This is an intermittent I recently introduced by standing up Windows generic-worker with run-task in bug 1436037. It looks like we are hitting some sort of resource limit. I suspect the fix will involve increasing a number in a file on the host, though we might also be able to hack around it in run-task. Relevant log: [task 2018-12-07T01:19:47.298Z] executing ['bash', '-cx', 'cd $GECKO_PATH && ./mach python-test --python 2 --subsuite mozlint'] [task 2018-12-07T01:19:47.379Z] + cd 'Z:\task_1544143488\build\src' [task 2018-12-07T01:19:47.380Z] + ./mach python-test --python 2 --subsuite mozlint [task 2018-12-07T01:19:47.392Z] 0 [main] us 0 open_stackdumpfile: Dumping stack trace to us.stackdump [task 2018-12-07T01:19:47.409Z] 0 [main] bash 8164 sync_with_child: child 6288(0x250) died before initialization with status code 0xC0000005 [task 2018-12-07T01:19:47.410Z] 23 [main] bash 8164 sync_with_child: *** child state waiting for longjmp [task 2018-12-07T01:19:47.410Z] bash: fork: Resource temporarily unavailable [taskcluster 2018-12-07T01:19:47.439Z] Exit Code: 128 [taskcluster 2018-12-07T01:19:47.439Z] User Time: 0s [taskcluster 2018-12-07T01:19:47.439Z] Kernel Time: 31.25ms [taskcluster 2018-12-07T01:19:47.439Z] Wall Time: 9m1.0519279s [taskcluster 2018-12-07T01:19:47.439Z] Result: FAILED [taskcluster 2018-12-07T01:19:47.439Z] === Task Finished === https://taskcluster-artifacts.net/FIKV7zqTSt6EYmcH16zntQ/0/public/logs/live_backing.log
I guess Taskcluster :: Generic Worker isn't the right component. Not sure that this is either, but it's probably a bit closer.
Assignee: nobody → relops
Component: Generic-Worker → RelOps: Puppet
Product: Taskcluster → Infrastructure & Operations
QA Contact: pmoore → mcornmesser
Actually, this might be an issue in msys: https://sourceforge.net/p/mingw/bugs/1730/ Apparently 0xC0000005 means we tried to run something in a portion of non-executable memory. I'd like someone more familiar with msys/mozilla-build to chime in here. But maybe we should just retry the command if we detect this error.
Assignee: relops → nobody
Component: RelOps: Puppet → RelOps: Windows OS
QA Contact: mcornmesser

I also see this sometimes:

[task 2019-01-07T09:43:23.702Z] executing ['bash', '-cx', '$GECKO_PATH/taskcluster/scripts/misc/wrench-windows-tests.sh']
[task 2019-01-07T09:43:24.098Z]       0 [main] us 0 init_cheap: VirtualAlloc pointer is null, Win32 error 487
[task 2019-01-07T09:43:24.098Z] AllocationBase 0x0, BaseAddress 0x607A0000, RegionSize 0x310000, State 0x10000
[task 2019-01-07T09:43:24.098Z] C:\mozilla-build\msys\bin\bash.exe: *** Couldn't reserve space for cygwin's heap, Win32 error 0

e.g. at [1] and [2]. It's not the exact same problem, but a similar class of problem in that bash dies on startup

[1] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=220290868&repo=try&lineNumber=543
[2] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=220285704&repo=try&lineNumber=533

I filed https://github.com/mozilla/treeherder/pull/4434 to match the above outputs (comment 0 and comment 3) as errors in TH, so they can be starred by sheriffs more easily.

We discussed a better long-term solution to surfacing these errors on IRC at https://mozilla.logbot.info/treeherder/20190108#c15802795

Guessing this is no longer an issue; I don't see any similar errors in papertrail. Please reopen if it's still a problem.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.