Open Bug 1413823 Opened 7 years ago Updated 1 year ago

Apport should be uninstalled or disabled on the docker-worker host

Categories

(Taskcluster :: Workers, enhancement, P5)

enhancement

Tracking

(Not tracked)

REOPENED

People

(Reporter: glandium, Unassigned)

Details

Apport, installed on the docker-worker host, sets /proc/sys/kernel/core_pattern to |/usr/share/apport/apport %p %s %c %P. What this does in practice, is that everything that crashes in a docker container makes the *host* run that command, which then processes the crash, maybe even exfiltrating data to canonical (pure speculation, and that's not why I'm filing this bug anyways, just a possible additional reason to want to have this removed). What this means in practice is that there is *absolutely* no way to get a core dump of crashing processes in a docker container. And while one may be able to do some kinds of jumping through hoops to attach a debugger (good luck with that, with ptrace being either disabled or prevented for processes not in the ancestry or the crashing process), even when you somehow succeed, running with the debugger attached changes timings. In my case, chasing a high frequency intermittent crash, attaching a debugger just makes the crash go away... Uninstalling or disabling apport on the host would leave the default of "core", which would create the core dump in the working directory of the crashing process, which is good enough.
This seems like low-hanging fruit...
Assignee: nobody → gps
Status: NEW → ASSIGNED
Flags: needinfo?(gps)
https://github.com/taskcluster/docker-worker/pull/356 is up for review. Depends on some other PRs. But it should hopefully land soonish.
Flags: needinfo?(gps)
I'm gonna call this closed since the commit landed in docker-worker. When it will get deployed, I'm not sure.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
This is supposed to be deployed, but apparently didn't work: # cat /proc/sys/kernel/core_pattern |/usr/share/apport/apport %p %s %c %d %P (in one-click loaners on both gecko-t-linux-large and gecko-1-b-linux workers)
Status: RESOLVED → REOPENED
Flags: needinfo?(gps)
Resolution: FIXED → ---
:gps, is this worth working on a fix? I see you've been needinfo'd for 4 months :)
I think it is still worth fixing because it makes debugging crashes on TC workers extremely difficult. But I'm not actively working on it. (I stopped hacking on all things docker-worker a bit ago. And I have little desire to go back since apparently docker-worker doesn't have much of a future.)
Assignee: gps → nobody
Flags: needinfo?(gps)
Component: Docker-Worker → Worker
Priority: -- → P5
QA Contact: pmoore
Component: Worker → Workers

This could be added to the monopacker process for docker-workers.

Status: REOPENED → RESOLVED
Closed: 7 years ago4 years ago
Resolution: --- → INACTIVE

Reopening inactive bugs, because they may still need attention. Historically, inactive bugs were closed, but this hides the fact there are genuine issues which have not been resolved.

Status: RESOLVED → REOPENED
Resolution: INACTIVE → ---
You need to log in before you can comment on or make changes to this bug.