pause container have pid 1 in the pod? - docker

[root#k8s001 ~]# docker exec -it f72edf025141 /bin/bash
root#b33f3b7c705d:/var/lib/ghost# ps aux`enter code here`
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1012 4 ? Ss 02:45 0:00 /pause
root 8 0.0 0.0 10648 3400 ? Ss 02:57 0:00 nginx: master process nginx -g daemon off;
101 37 0.0 0.0 11088 1964 ? S 02:57 0:00 nginx: worker process
node 38 0.9 0.0 2006968 116572 ? Ssl 02:58 0:06 node current/index.js
root 108 0.0 0.0 3960 2076 pts/0 Ss 03:09 0:00 /bin/bash
root 439 0.0 0.0 7628 1400 pts/0 R+ 03:10 0:00 ps aux
The display come from internet, it says pause container is the parent process of other containers in the pod, if you attach pod or other containers, do ps aux, you would see that.
Is it correct, I do in my k8s,different, PID 1 is not /pause.

...Is it correct, I do in my k8s,different, PID 1 is not /pause.
This has changed, pause no longer hold PID 1 despite being the first container created by the container runtime to setup the pod (eg. cgroups, namespace etc). Pause is isolated (hidden) from the rest of the containers in the pod regardless of your ENTRYPOINT/CMD. See here for more background information.

By default, Docker will run your entrypoint (or the command, if there is no entrypoint) as PID 1. However, that is not necessarily always the case, since, depending on how you start the container, Docker (or your orchestrator) can also run its custom init process as PID 1:
$ docker run -d --init --name test alpine sleep infinity
849efe38ecec439550738e981065ec4aff55ef5607f03b9fed975e2d3146b9b0
$ with-docker docker exec -ti test ps
PID USER TIME COMMAND
1 root 0:00 /sbin/docker-init -- sleep infinity
7 root 0:00 sleep infinity
8 root 0:00 ps
For more information on why you would want your entrypoint not to be PID 1, you can check this explanation from a tini developer:
Now, unlike other processes, PID 1 has a unique responsibility, which is to reap zombie processes.
Zombie processes are processes that:
Have exited.
Were not waited on by their parent process (wait is the syscall parent processes use to retrieve the exit code of their children).
Have lost their parent (i.e. their parent exited as well), which means they'll never be waited on by their parent.

Related

Pulumi does not perform graceful shutdown of kubernetes pods

I'm using pulumi to manage kubernetes deployments. One of the deployments runs an image which intercepts SIGINT and SIGTERM signals to perform a graceful shutdown like so (this example is running in my IDE):
{"level":"info","msg":"trying to activate gcloud service account","time":"2021-06-17T12:19:25-05:00"}
{"level":"info","msg":"env var not found","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Started Worker","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","Signal":"interrupt","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Worker has been stopped.","time":"2021-06-17T12:19:27-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Stopped Worker","time":"2021-06-17T12:19:27-05:00"}
Notice the "Signal":"interrupt" with a message of Worker has been stopped.
I find that when I alter the source code (which alters the docker image) and run pulumi up the pod doesn't gracefully terminate based on what's described in this blog post. Here's a screenshot of logs from GCP:
The highlighted log line in the image above is the first log line emitted by the app. Note that the shutdown messages aren't logged above the highlighted line which suggests to me that the pod isn't given a chance to perform a graceful shutdown.
Why might the pod not go through the graceful shutdown mechanisms that kubernetes offers? Could this be a bug with how pulumi performs updates to deployments?
EDIT: after doing more investigation I found that this problem is happening because starting a docker container with go run /path/to/main.go actually ends up created two processes like so (after execing into the container):
root#worker-ffzpxpdm-78b9797dcd-xsfwr:/gadic# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.3 0.3 2046200 30828 ? Ssl 18:04 0:12 go run src/cmd/worker/main.go --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --grpc-hos
root 3782 0.0 0.5 1640772 43232 ? Sl 18:06 0:00 /tmp/go-build2661472711/b001/exe/main --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --
root 3808 0.1 0.0 4244 3468 pts/0 Ss 19:07 0:00 /bin/bash
root 3817 0.0 0.0 5900 2792 pts/0 R+ 19:07 0:00 ps aux
If run kill -TERM 1 then the signal isn't forwarded to the underlying binary, /tmp/go-build2661472711/b001/exe/main, which means the graceful shutdown of the application isn't executed. However, if I run kill -TERM 3782 then the graceful shutdown logic is executed.
It seems the go run spawns a subprocess and this blog post suggests the signals are only forwarded to PID 1. On top of that, it's unfortunate that go run doesn't forward signals to the subprocess it spawns.
The solution I found is to add RUN go build -o worker /path/to/main.go in my dockerfile and then to start the docker container with ./worker --arg1 --arg2 instead of go run /path/to/main.go --arg1 --arg2.
Doing it this way ensures there aren't any subprocess spawns by go and that ensures signals are handled properly within the docker container.

Where exactly do the logs of kubernetes pods come from (at the container level)?

I'm looking to redirect some logs from a command run with kubectl exec to that pod's logs, so that they can be read with kubectl logs <pod-name> (or really, /var/log/containers/<pod-name>.log). I can see the logs I need as output when running the command, and they're stored inside a separate log directory inside the running container.
Redirecting the output (i.e. >> logfile.log) to the file which I thought was mirroring what is in kubectl logs <pod-name> does not update that container's logs, and neither does redirecting to stdout.
When calling kubectl logs <pod-name>, my understanding is that kubelet gets them from it's internal /var/log/containers/ directory. But what determines which logs are stored there? Is it the same process as the way logs get stored inside any other docker container?
Is there a way to examine/trace the logging process, or determine where these logs are coming from?
Logs from the STDOUT and STDERR of containers in the pod are captured and stored inside files in /var/log/containers. This is what is presented when kubectl log is run.
In order to understand why output from commands run by kubectl exec is not shown when running kubectl log, let's have a look how it all works with an example:
First launch a pod running ubuntu that are sleeping forever:
$> kubectl run test --image=ubuntu --restart=Never -- sleep infinity
Exec into it
$> kubectl exec -it test bash
Seen from inside the container it is the STDOUT and STDERR of PID 1 that are being captured. When you do a kubectl exec into the container a new process is created living alongside PID 1:
root#test:/# ps -auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 7 0.0 0.0 18504 3400 pts/0 Ss 20:04 0:00 bash
root 19 0.0 0.0 34396 2908 pts/0 R+ 20:07 0:00 \_ ps -auxf
root 1 0.0 0.0 4528 836 ? Ss 20:03 0:00 sleep infinity
Redirecting to STDOUT is not working because /dev/stdout is a symlink to the process accessing it (/proc/self/fd/1 rather than /proc/1/fd/1).
root#test:/# ls -lrt /dev/stdout
lrwxrwxrwx 1 root root 15 Nov 5 20:03 /dev/stdout -> /proc/self/fd/1
In order to see the logs from commands run with kubectl exec the logs need to be redirected to the streams that are captured by the kubelet (STDOUT and STDERR of pid 1). This can be done by redirecting output to /proc/1/fd/1.
root#test:/# echo "Hello" > /proc/1/fd/1
Exiting the interactive shell and checking the logs using kubectl logs should now show the output
$> kubectl logs test
Hello

Docker container increases ram

I have launched several docker containers and using docker stats, I have verified that one of them increases the consumption of ram memory since it starts until it is restarted.
My question is if there is any way to verify where such consumption comes from within the docker container. There is some way to check the consumption inside the container, something of the docker stats style but for the inside of the container.
Thanks for your cooperation.
Not sure if it's what you are asking for, but here's an example:
(Before your start):
Run a test container docker run --rm -it ubuntu
Install stress by typing apt-get update and apt-get install stress
Run stress --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 * 0.9;}' < /proc/meminfo)k --vm-keep -m 1 (it will start consuming memory)
1. with top
If you go to a new terminal you can type docker container exec -it <your container name> top and you will get something like the following:
(notice that the %MEM usage of PID 285 is 68.8%)
docker container exec -it dreamy_jang top
top - 12:46:04 up 22 min, 0 users, load average: 1.48, 1.55, 1.12
Tasks: 4 total, 2 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 20.8 us, 0.8 sy, 0.0 ni, 78.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 6102828 total, 150212 free, 5396604 used, 556012 buff/cache
KiB Swap: 1942896 total, 1937508 free, 5388 used. 455368 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
285 root 20 0 4209376 4.007g 212 R 100.0 68.8 6:56.90 stress
1 root 20 0 18500 3148 2916 S 0.0 0.1 0:00.09 bash
274 root 20 0 36596 3072 2640 R 0.0 0.1 0:00.21 top
284 root 20 0 8240 1192 1116 S 0.0 0.0 0:00.00 stress
2. with ps aux
Again, from a new terminal you type docker container exec -it <your container name> ps aux
(notice that the %MEM usage of PID 285 is 68.8%)
docker container exec -it dreamy_jang ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 18500 3148 pts/0 Ss 12:25 0:00 /bin/bash
root 284 0.0 0.0 8240 1192 pts/0 S+ 12:39 0:00 stress --vm-byt
root 285 99.8 68.8 4209376 4201300 pts/0 R+ 12:39 8:53 stress --vm-byt
root 286 0.0 0.0 34400 2904 pts/1 Rs+ 12:48 0:00 ps aux
My source for this stress thing is from this question: How to fill 90% of the free memory?

Haproxy reload with different backend server ip

Is it possible to reload haproxy while the backend server ip changed? If, how?
It is essential for docker stack. On every deploy, new containers with different ip will replace the old containers.
In our implementation, services return 503 occasionally as the old haproxy process is not terminated and still accepting request, while the backend server is already gone. httplog show that some requests forward a backend which is gone.
# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 893 0.0 0.0 0 0 ? Zs 19:39 0:01 [haproxy] <defunct>
root 898 0.3 0.0 49416 9640 ? Ss 19:49 0:13 /usr/local/sbin/haproxy -D -f /app/haproxy.cfg -p /var/run/haproxy.pid
root 915 0.2 0.0 0 0 ? Zs 19:49 0:12 [haproxy] <defunct>
root 920 0.2 0.0 49308 10196 ? Ss 20:57 0:01 /usr/local/sbin/haproxy -D -f /app/haproxy.cfg -p /var/run/haproxy.pid
root 937 0.0 0.0 0 0 ? Zs 20:57 0:00 [haproxy] <defunct>
root 942 0.3 0.0 49296 9880 ? Ss 20:58 0:01 /usr/local/sbin/haproxy -D -f /app/haproxy.cfg -p /var/run/haproxy.pid
root 959 0.2 0.0 49296 9852 ? Ss 20:58 0:01 /usr/local/sbin/haproxy -D -f /app/haproxy.cfg -p /var/run/haproxy.pid
[Edit]
I am using docker swarm mode. I did try with publish service's port to the host; however, the performance of the swarm’s internal load balancer is bad, and I try to avoid.
While it should be possible to change the HAProxy configuration to point to a different backend server, it seems like it would be easier to bind the Docker containers' ports to predictable ports on the Docker host, so the HAProxy config does not need to change.
For example:
docker run -d -p 127.0.0.1:80:9999 hello_world
And your HAProxy config could look like
backend something
# Assuming the Docker host's IP address is 192.0.2.123
server some-server 192.0.2.123:9999

docker on upstart on scaleway

I have docker container based on ubuntu 12.04 and wish start it on scaleway This instantApp run on ubuntu 15.04 with systemd. For my container I need upstart. I turn on upstart by this recommendation:
Install the upstart-sysv package, which will remove ubuntu-standard and systemd-sysv (but should not remove anything else -- if it does, yell!), and run sudo update-initramfs -u. After that, grub's "Advanced options" menu will have a corresponding "Ubuntu, with Linux ... (systemd)" entry where you can do an one-time boot with systemd.
Now my server running with upstart:
# ps aux|grep upstart
root 1447 0.0 0.0 2632 1744 ? S 13:44 0:00 upstart-udev-bridge --daemon
root 1598 0.0 0.0 2044 176 ? S 13:44 0:00 upstart-file-bridge --daemon
root 2571 0.0 0.0 2032 1128 ? S 13:44 0:00 upstart-socket-bridge --daemon
root 32408 0.0 0.0 3156 1472 pts/4 S+ 14:27 0:00 grep --color=auto upstart
but docker not running:
# service docker status
* Docker is managed via upstart, try using service docker status
# service docker start
* Docker is managed via upstart, try using service docker start
How I can start docker as daemon?
See answer for this Ask Ubuntu question - it's a workaround to get things running again until the Kernel bug is address: https://askubuntu.com/questions/683462/docker-is-managed-via-upstart-try-using-service-docker

Resources