SLURM+Docker: How to kill docker-created processes using SLURMs scancel - docker

We have currently setup a GPU computing cluster with SLURM as a resource manager. As this is a cluster for deep-learning, we manage dependencies by using nvidia-docker images to facilitate different frameworks and CUDA versions.
Our typical use case is to allocate resources with srun and give a command to run nvidia-docker which runs the experiment scripts as per the following:
srun --gres=gpu:[num gpus required] nvidia-docker run --rm -u $(id -u):$(id -g) /bin/bash -c [python scripts etc..] &
We have discovered an issue where if a slurm job is cancelled using the scancel command, the docker process on the node is cancelled, but whatever experiment scripts that were started in the docker still continue. As far as we understand, this is not a fault in SLURM, but rather it is the case that killing a docker process does not kill its spawned processes, they will only be killed with the docker kill command. While there might be some way to execute the docker kill command in a SLURM prologue script, we were wondering if anyone else has had this problem and if they have solved it somehow. To summerize, we would like to know:
How can we ensure that processes started in a nvidia-docker container, which in turn was started by a SLURM SRUN, are killed with SCANCEL?

Configuring Slurm to use cgroups might help here. With cgroups enabled, any process belonging to a job is attached to a cgroup and destroyed when the job ends. Destruction is take care of by the kernel so there is no way a regular process can escape that.

Related

Is there a way to know what is causing a memory leak on a docker swarm?

We are running a docker swarm and using Monit to see resources utilisation. The
Process memory for dockerd keeps on growing over time. This happens on all nodes that at least perform a docker action e.g docker inspect or docker exec. I'm suspecting it might be something related to this these actions but I'm not sure how to replicate it. I have a script like
#!/bin/sh
set -eu
containers=$(docker container ls | awk '{if(NR>1) print $NF}')
# Loop forever
while true;
do
for container in $containers; do
echo "Running Inspect on $container"
CONTAINER_STATUS="$(docker inspect $container -f "{{.State}}")"
done
done
but I'm open to other suggestions
Assuming you can run ansible to run a command via ssh on all servers:
ansible swarm -a "docker stats --no-stream"
A more SRE solution is containerd + Prometheus + AlerManager / Grafana to gather metrics from the swarm nodes and then implement alerting when container thresholds are exceeded.
Don't forget you can simply set a resource constraint on Swarm services to limit the amount of memory and cpu service tasks can consume or be restarted. Then just look for services that keep getting OOM killed.

Avoid docker exec zombie processes when connecting to containers via bash

Like most docker users, I periodically need to connect to a running container and execute various arbitrary commands via bash.
I'm using 17.06-CE with an ubuntu 16.04 image, and as far as I understand, the only way to do this without installing ssh into the container is via docker exec -it <container_name> bash
However, as is well-documented, for each bash shell process you generate, you leave a zombie process behind when your connection is interrupted. If you connect to your container often, you end up with 1000s of idle shells -a most undesirable outcome!
How can I ensure these zombie shell processes are killed upon disconnection -as they would be over ssh?
One way is to make sure the linux init process runs in your container.
In recent versions of docker there is an --init option to docker run that should do this. This uses tini to run init which can also be used in previous versions.
Another option is something like the phusion-baseimage project that provides a base docker image with this capability and many others (might be overkill).

Docker container CPU and Memory Utilization

I have a Docker container running with this command in my Jenkins job:
docker run --name="mydoc" reportgeneration:1.0 start=$START end=$END config=$myfile
This works very well. The image is created from a DockerFile which is executing a shell script with ENTRYPOINT.
Now I want to know how much CPU and memory has been utilized by this container. I am using a Jenkins job, where in the "execute shell command", I am running the above Docker run command.
I saw about 'docker stats' command. It works very well in my Ubuntu machine. But I want it to run via Jenkins as my container is running via Jenkins console. So here follows the limitations I have.
I don't know if there is any way to stop docker stats command. In Ubuntu command line, we hit 'ctrl+c' to stop it. How will I do it in Jenkins?
Even if I figure out a way to stop docker stats, once the 'docker run' command gets executed, the container will not be active and will be exited. For exited container, CPU and memory utilisation will be zero.
docker run 'image'
docker stats container id/name
With the above two lines, docker stats command will only get an exited container and I don't think docker stats will even work with Jenkins console as it cannot be stopped.
Is there any way that I can get container's resource utilization (CPU, memory) in a better way via Jenkins console?
Suggestion is to not run docker stats interactively, but have a piece of a shell script with a loop like this:
#!/bin/sh
# First, start the container
CONTAINER_ID=$(docker run -d ...)
# Then start watching that it's running (with inspect)
while [ "$(docker inspect -f {{.State.Running}} $CONTAINER_ID 2>/dev/null)" = "true" ]; do
# And while it's running, check stats
docker stats --no-stream $CONTAINER_ID
sleep 1
done
# When the script reaches this point, the container had stopped.
# For example, let's clean it up (assuming you haven't used --rm in run).
docker rm $CONTAINER_ID
The condition checks whenever the container is running or not, and docker stats --no-stream prints stats once then exits, making it suitable for non-interactive use.
I believe you can use a variant of such shell script file (obviously, updated to do something useful, rather than just starting the container and watching its stats) as a build step.
But if you need/want/have an interactive process that you want to stop, kill is the command you're looking for. Ctrl-C in a terminal just sends a SIGINT to the process.
You need to know an PID, of course. I'm not sure about Jenkins, but if you've just started a child process from a shell script with child-process & (e.g. docker stats &), then its PID would be in the $! variable. Or you can try to figure it using pidof or ps commands, but that may be error-prone in case of concurrent jobs (unless they're all isolated).
Here I've assumed that your Jenkins jobs are shell scripts that do the actual work. If your setup is different (e.g. if you use some plugins so Jenkins talk to Docker directly), things may be different and more complicated.

Why there is no init / initctl on the docker centos image

Using the public/common docker's centos image I was installing some services that required a /etc/init directory and I had a failure. I have further noticed that initctl does not exist, meaning that init was not run.
How can the centos image be used with a fully functional init process ?
example:
docker run -t -i centos /bin/bash
file /etc/init
/etc/init: cannot open ... no such file or directory ( /etc/init )
initctl
bash: initctl: command not found
A Docker container is more analogous to a process than a VM. That process can spawn other processes though, and the sub-processes will run in the same container. A common pattern is to use a process supervisor like supervisord as described in the Docker documentation. In general though, it's usually recommended to try and run one process per container if you can (so that, for example, you can monitor and cap memory and CPU at the process level).

is it reasonable to run docker processes under runit/daemontools supervision

I have been running docker processes (apps) via
docker run …
But under runit supervision (runit is like daemontools) - so runit ensures that the process stays up, passes signals etc.
Is this reasonable? Docker seems to want to run its own demonization - but it isn't as thorough as runit. Furthermore, when runit restarts the app - a new container is created each time (fine) but it leaves a trace of the old one around - this seems to imply I am doing it in the wrong way.
Should docker not be run this way?
Should I instead set up a container from the image, just once, and then have runit run/supervise that container for all time?
Docker does do some management of daemonized containers: if the system shuts down, then when the Docker daemon starts it will also restart any containers that were running at the time the system shut down. But if the container exits on its own or the kernel (or a user) kills the container while it is running, the Docker daemon won't restart it. In cases where you do want a restart, a process manager makes sense.
I don't know runit so I can't give specific configuration guidance. But you should probably make the process manager communicate with the docker daemon and check to see if a given container id is running (docker ps | grep container_id or equivalent, or use the Docker Remote API directly). If the container has stopped, use Docker to restart it (docker run container_id) instead of running a new container. Or, if you do want a new container each time, then begin with docker run -rm to automatically clean it up when it exits or stops.
If you don't want your process manager to poll docker, you could instead run something that watches docker events.
You can get the container_id when you start the container as the return value of starting a daemon, or you can ask Docker to write this out to a file (docker run -cidfile myfilename, like a PID file)
I hope that helps or helps another runit guru offer more detailed advice.
Yes, I think running docker under runit makes sense. Typically when you start a process there is a way to tell it not to daemonize if it does by default since the normal way to hand-off from the runit run script to a process is via exec on the last line of your run script. For docker this means making sure not to set the -d flag.
For example, with docker you probably want your run script to look something like this:
#!/bin/bash -e
exec 2>&1
exec chpst -u dockeruser docker run -a stdin -a stdout -i ...
Using exec and chpst should resolve most issues with processes not terminating correctly when you bring down a runit service.

Resources