Pulumi does not perform graceful shutdown of kubernetes pods - docker

I'm using pulumi to manage kubernetes deployments. One of the deployments runs an image which intercepts SIGINT and SIGTERM signals to perform a graceful shutdown like so (this example is running in my IDE):
{"level":"info","msg":"trying to activate gcloud service account","time":"2021-06-17T12:19:25-05:00"}
{"level":"info","msg":"env var not found","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Started Worker","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","Signal":"interrupt","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Worker has been stopped.","time":"2021-06-17T12:19:27-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Stopped Worker","time":"2021-06-17T12:19:27-05:00"}
Notice the "Signal":"interrupt" with a message of Worker has been stopped.
I find that when I alter the source code (which alters the docker image) and run pulumi up the pod doesn't gracefully terminate based on what's described in this blog post. Here's a screenshot of logs from GCP:
The highlighted log line in the image above is the first log line emitted by the app. Note that the shutdown messages aren't logged above the highlighted line which suggests to me that the pod isn't given a chance to perform a graceful shutdown.
Why might the pod not go through the graceful shutdown mechanisms that kubernetes offers? Could this be a bug with how pulumi performs updates to deployments?
EDIT: after doing more investigation I found that this problem is happening because starting a docker container with go run /path/to/main.go actually ends up created two processes like so (after execing into the container):
root#worker-ffzpxpdm-78b9797dcd-xsfwr:/gadic# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.3 0.3 2046200 30828 ? Ssl 18:04 0:12 go run src/cmd/worker/main.go --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --grpc-hos
root 3782 0.0 0.5 1640772 43232 ? Sl 18:06 0:00 /tmp/go-build2661472711/b001/exe/main --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --
root 3808 0.1 0.0 4244 3468 pts/0 Ss 19:07 0:00 /bin/bash
root 3817 0.0 0.0 5900 2792 pts/0 R+ 19:07 0:00 ps aux
If run kill -TERM 1 then the signal isn't forwarded to the underlying binary, /tmp/go-build2661472711/b001/exe/main, which means the graceful shutdown of the application isn't executed. However, if I run kill -TERM 3782 then the graceful shutdown logic is executed.
It seems the go run spawns a subprocess and this blog post suggests the signals are only forwarded to PID 1. On top of that, it's unfortunate that go run doesn't forward signals to the subprocess it spawns.

The solution I found is to add RUN go build -o worker /path/to/main.go in my dockerfile and then to start the docker container with ./worker --arg1 --arg2 instead of go run /path/to/main.go --arg1 --arg2.
Doing it this way ensures there aren't any subprocess spawns by go and that ensures signals are handled properly within the docker container.

Related

pause container have pid 1 in the pod?

[root#k8s001 ~]# docker exec -it f72edf025141 /bin/bash
root#b33f3b7c705d:/var/lib/ghost# ps aux`enter code here`
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1012 4 ? Ss 02:45 0:00 /pause
root 8 0.0 0.0 10648 3400 ? Ss 02:57 0:00 nginx: master process nginx -g daemon off;
101 37 0.0 0.0 11088 1964 ? S 02:57 0:00 nginx: worker process
node 38 0.9 0.0 2006968 116572 ? Ssl 02:58 0:06 node current/index.js
root 108 0.0 0.0 3960 2076 pts/0 Ss 03:09 0:00 /bin/bash
root 439 0.0 0.0 7628 1400 pts/0 R+ 03:10 0:00 ps aux
The display come from internet, it says pause container is the parent process of other containers in the pod, if you attach pod or other containers, do ps aux, you would see that.
Is it correct, I do in my k8s,different, PID 1 is not /pause.
...Is it correct, I do in my k8s,different, PID 1 is not /pause.
This has changed, pause no longer hold PID 1 despite being the first container created by the container runtime to setup the pod (eg. cgroups, namespace etc). Pause is isolated (hidden) from the rest of the containers in the pod regardless of your ENTRYPOINT/CMD. See here for more background information.
By default, Docker will run your entrypoint (or the command, if there is no entrypoint) as PID 1. However, that is not necessarily always the case, since, depending on how you start the container, Docker (or your orchestrator) can also run its custom init process as PID 1:
$ docker run -d --init --name test alpine sleep infinity
849efe38ecec439550738e981065ec4aff55ef5607f03b9fed975e2d3146b9b0
$ with-docker docker exec -ti test ps
PID USER TIME COMMAND
1 root 0:00 /sbin/docker-init -- sleep infinity
7 root 0:00 sleep infinity
8 root 0:00 ps
For more information on why you would want your entrypoint not to be PID 1, you can check this explanation from a tini developer:
Now, unlike other processes, PID 1 has a unique responsibility, which is to reap zombie processes.
Zombie processes are processes that:
Have exited.
Were not waited on by their parent process (wait is the syscall parent processes use to retrieve the exit code of their children).
Have lost their parent (i.e. their parent exited as well), which means they'll never be waited on by their parent.

Where exactly do the logs of kubernetes pods come from (at the container level)?

I'm looking to redirect some logs from a command run with kubectl exec to that pod's logs, so that they can be read with kubectl logs <pod-name> (or really, /var/log/containers/<pod-name>.log). I can see the logs I need as output when running the command, and they're stored inside a separate log directory inside the running container.
Redirecting the output (i.e. >> logfile.log) to the file which I thought was mirroring what is in kubectl logs <pod-name> does not update that container's logs, and neither does redirecting to stdout.
When calling kubectl logs <pod-name>, my understanding is that kubelet gets them from it's internal /var/log/containers/ directory. But what determines which logs are stored there? Is it the same process as the way logs get stored inside any other docker container?
Is there a way to examine/trace the logging process, or determine where these logs are coming from?
Logs from the STDOUT and STDERR of containers in the pod are captured and stored inside files in /var/log/containers. This is what is presented when kubectl log is run.
In order to understand why output from commands run by kubectl exec is not shown when running kubectl log, let's have a look how it all works with an example:
First launch a pod running ubuntu that are sleeping forever:
$> kubectl run test --image=ubuntu --restart=Never -- sleep infinity
Exec into it
$> kubectl exec -it test bash
Seen from inside the container it is the STDOUT and STDERR of PID 1 that are being captured. When you do a kubectl exec into the container a new process is created living alongside PID 1:
root#test:/# ps -auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 7 0.0 0.0 18504 3400 pts/0 Ss 20:04 0:00 bash
root 19 0.0 0.0 34396 2908 pts/0 R+ 20:07 0:00 \_ ps -auxf
root 1 0.0 0.0 4528 836 ? Ss 20:03 0:00 sleep infinity
Redirecting to STDOUT is not working because /dev/stdout is a symlink to the process accessing it (/proc/self/fd/1 rather than /proc/1/fd/1).
root#test:/# ls -lrt /dev/stdout
lrwxrwxrwx 1 root root 15 Nov 5 20:03 /dev/stdout -> /proc/self/fd/1
In order to see the logs from commands run with kubectl exec the logs need to be redirected to the streams that are captured by the kubelet (STDOUT and STDERR of pid 1). This can be done by redirecting output to /proc/1/fd/1.
root#test:/# echo "Hello" > /proc/1/fd/1
Exiting the interactive shell and checking the logs using kubectl logs should now show the output
$> kubectl logs test
Hello

Docker leaving zombie processes (vieux/sshfs)

I have a swarm of few services, and in the compose file there are few volumes created with the vieux/sshfs driver, which are used by the services.
The containers spawned by the services execute a single script, after which the container finishes/exits and a new one is created on its place - basically the services are spawning new containers all the time.
All works smooth, except that there is exceptionally large amount of zombie processes accumulated in the host machine. The zombies go away when the docker daemon is re stared - it must be docker who makes the zombies.
"ps aux | grep 'Z'" is
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 3040 0.0 0.0 0 0 ? Zs 14:13 0:00 [ssh] <defunct>
root 3042 0.0 0.0 0 0 ? Zs 14:13 0:00 [sshfs] <defunct>
root 3052 0.0 0.0 0 0 ? Zs 14:13 0:00 [ssh] <defunct>
root 3055 0.0 0.0 0 0 ? Zs 14:13 0:00 [sshfs] <defunct>
...
As far as I understand, the volumes are created only once, and the services are just using the local copy of the volume - not creating a new ssh connection and reading straight form the remote machine - and this should not be creating another ssh connection process that will become zombie.
I have trouble finding info on the topic, which makes me think that I am doing something fundamentally wrong. Please help.
I have just resolved the issue by enabling Tini for the services in the docker-compose file as follows -
init: true
Few zombies (<10) pop up, but then they get killed in a second - no accumulation.
I still don't get what the zombies had to do with the ssh. If anyone can answer that I would be grateful.
PS: I have checked few days after I have enabled Tini. There are some accumulated zombies (~300, before there ware ~2000). Problem seems mitigated, but it is still there.
I recently read an article about it.
It just said than if you declare your volume directly into the docker-compose.yml in may lead to some issue with zombie sshfs process.
To avoid this, I try to declare the volume as external and run manually the creation of the docker volume.
docker volume create -d vieux/sshfs -o sshcmd="$USER_SSH#$IP:/mysupervolume" -o IdentityFile="/root/.ssh/$SSH_KEY" nameofmyvolume
Hope it will help someone.
Kr,

Container Optimized OS Graceful Shutdown of Celery

Running COS on GCE
Any ideas on how to get COS to do a graceful docker shutdown?
My innermost process is celery, which says he wants a SIGTERM to stop gracefully
http://docs.celeryproject.org/en/latest/userguide/workers.html#stopping-the-worker
My entrypoint is something like
exec celery -A some_app worker -c some_concurrency
On COS I am running my docker a service, something like
write_files:
- path: /etc/systemd/system/servicename.service
permissions: 0644
owner: root
content: |
[Unit]
Description=Some service
[Service]
Environment="HOME=/home/some_home"
RestartSec=10
Restart=always
ExecStartPre=/usr/share/google/dockercfg_update.sh
ExecStart=/usr/bin/docker run -u 2000 --name=somename --restart always some_image param_1 param_2
ExecStopPost=/usr/bin/docker stop servicename
KillMode=processes
KillSignal=SIGTERM
But ultimately when my COS instance it shut down, it just yanks the plug.
Do I need to add a shutdown script to do a docker stop? Do I need to do something more advanced?
What is the expected exit status of your container process when when it receives SIGTERM?
Running systemctl stop <service> then systemctl status -l <service> should show the exit code of the main process. Example:
Main PID: 21799 (code=exited, status=143)
One possibility is that the process does receive SIGTERM and shuts down gracefully, but returns non-zero exit code.
This would make the systemd believe that it didn't shutdown correctly. If that is the case, adding
SuccessExitStatus=143
to your systemd service should help. (Replace 143 with the actual exit code of your main process.)

How to I bypass the login page on Rstudio?

I am trying to bypass the login page on RStudio as we are running it in a Docker container and this step is not necessary as we authenticate before we let users launch the container.
I am using the Rocker implementation of RStudio for Docker. We are running on Centos7.
I'm fairly new to SO, so please let me know what information would be helpful for answering the question.
I figured it out.
When you start rserver, add the flag --auth-none=1, so my final CMD in my Dockerfile looked like:
USER rstudio
CMD ["/usr/lib/rstudio-server/bin/rserver","--server-daemonize=0","--auth-none=1"]
I will caution though, the first time I did it, I ran with sudo -E in front of the command and it logged into RStudio as ROOT! (this is also because I had altered the /etc/rstudio/rserver.conf with the setting auth-minimum-user-id=0 because I was trying to get the error to go away (which it did :)
The above code will change to user 'rstudio' before running the command which will log you straight in as rstudio.
Hope that helps someone out there, I know I spent the better portion of my day finding a work-around!
To bypass the login page you need also to define the environment variable USER.
need to set system environmental variable USER=rstudio in order for --auth-none 1
-- https://github.com/rstudio/rstudio/issues/1663
Here is a snippet of Dockerfile permitting to run the RStudio server and to login as the user rstudio.
ENV USER="rstudio"
CMD ["/usr/lib/rstudio-server/bin/rserver", "--server-daemonize", "0", "--auth-none", "1"]
When it's run the login page is not displayed and we can check that the server and the session are running with the rstudio user.
# Run the container
docker run --name rstudio --rm -p 8787:8787 -d rstudio
# Check processes
docker exec -it rstudio ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
rstudio+ 1 0.1 0.3 210792 13844 ? Ssl 21:10 0:00 /usr/lib/rstudi
rstudio 49 0.7 2.3 555096 82312 ? Sl 21:10 0:03 /usr/lib/rstudi
root 570 0.0 0.1 45836 3744 pts/0 Rs+ 21:18 0:00 ps aux

Resources