I am having difficulty with getting script to run when a pod is executed and running, I have a python script and install registration command running but when I test with deleting the pod which sends SIGTERM, I never recieve the output of the command or deos the command / function run.
Dockerfile and entrpoint are set correctly
CMD ["/app/entrypoint.sh"]
entrypoint both of script
## Remove
removal() {
echo "Removing agent..."
python3 /app/token_getter.py
GET_TOKEN=$(cat /app/token.txt)
cat <<< $GET_TOKEN | login --with-token
}
#Trap SIGTERM
trap 'removal' SIGTERM
## Add Agent
AGENT_NAME=$HOSTNAME-$RANDOM
echo $AGENT_NAME > /app/agent_name.txt
./config.sh --unattended --name $AGENT_NAME --labels $AGENT_NAME
./run.sh --once
The Add Agent commands run fine and registers and just runs in background, when I kill the pod , I cannot seem to get the removal command to run. I see below in logs each time.
level=info msg="Processing signal 'terminated'"
1 mtail.go:382] Received SIGTERM, exiting...
1 mtail.go:396] Shutdown requested.
1 log_watcher.go:219] Shutting down log watcher.
level=info msg="Daemon shutdown complete"
1 loader.go:381] Shutting down loader.
1 mtail.go:419] END OF LINE
level=info msg="stopping healthcheck following graceful shutdown"
module=libcontainerd
level=info msg="stopping event stream following graceful shutdown" error="context
canceled" module=libcontainerd
Related
How to process SIGTERM in argo or kubeflow stage/node/component?
It's possible to catch SIGTERM if your python script launched with PID 1.
But in argo/kubeflow container PID 1 is occupied by
1 root 0:00 /var/run/argo/argoexec emissary -- bash -c set -eo pipefail; touch /tmp/9306d238a1214915a260b696e45390ad.step; sleep 1; echo "
p.s.
Tried to use
container.set_lifecycle(V1Lifecycle(pre_stop=V1Handler(_exec=V1ExecAction([
"pkill", "-15", "python"
]))))
But this setting doesn't leads to the correct SIGTERM forwarding.
SIGTERM on process python3 appears immediately before the pod killing, after ~30 sec since the pod stop initialization.
I start docker
sudo service docker start
then I try to run dockerd
sudo dockerd
it shows the following error:
INFO[2021-11-21T19:25:52.804962676+05:30] Starting up
failed to start daemon: pid file found, ensure docker is not running or delete /var/run/docker.pid
it works for me:
sudo chmod 666 /var/run/docker.sock
Delete the PID file. Kill the running docker service and start it again.
ps -ef | grep docker
kill -9 <PIDs>
sudo systemctl start docker.service
Delete the .pid file using the below Linux command,
rm /var/run/docker.pid
Now the pid file will get deleted and the docker daemon can be launched newly.
I had the same problem. The following worked for me:
Deleted /var/run/docker.pid
Reboot computer
Had similar issue
`sudo docker ps -a`
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
sudo systemctl status docker
docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: deactivating (stop-sigterm) since Wed 2022-09-07 09:32:11 -05; 5h 55min ago
Docs: https://docs.docker.com
Main PID: <PID_NO> (dockerd)
CGroup: /system.slice/docker.service
└─<PID_NO> /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
time="2022-09-07T09:32:26.-05:00" level=info msg="ccResol...=grpc
time="2022-09-07T09:32:26.-05:00" level=info msg="ClientC...=grpc
time="2022-09-07T09:32:26.-05:00" level=info msg="pickfir...=grpc
time="2022-09-07T09:32:26.-05:00" level=info msg="pickfir...=grpc
time="2022-09-07T09:32:26.-05:00" level=info msg="[graphd...lay2"
time="2022-09-07T09:32:26.-05:00" level=warning msg="moun...ound"
time="2022-09-07T09:32:26.-05:00" level=info msg="Loading...art."
systemd[1]: Dependency failed for Docker Application Container Engine.
systemd[1]: Job docker.service/start failed with result 'dependency'.
dockerd[<PID_NO>]: time="2022-09-07T09:39:52.-05:00" level=info msg="Process...ted'"
Hint: Some lines were ellipsized, use -l to show in full.
`sudo systemctl start docker` -- Gives no output
deleted docker.pid file in var/run but It didn't helped either
I'm trying to create a docker container with systemd enabled and install auditd on it.
I'm using the standard centos/systemd image provided in dockerhub.
But when I'm trying to start audit, it fails.
Here is the list of commands that I have done to create and get into the docker container:
docker run -d --rm --privileged --name systemd -v /sys/fs/cgroup:/sys/fs/cgroup:ro centos/systemd
docker exec -it systemd bash
Now, inside the docker container:
yum install audit
systemctl start auditd
I'm receiving the following error:
Job for auditd.service failed because the control process exited with error code. See "systemctl status auditd.service" and "journalctl -xe" for details.
Then I run:
systemctl status auditd.service
And I'm getting this info:
auditd[182]: Error sending status request (Operation not permitted)
auditd[182]: Error sending enable request (Operation not permitted)
auditd[182]: Unable to set initial audit startup state to 'enable', exiting
auditd[182]: The audit daemon is exiting.
auditd[181]: Cannot daemonize (Success)
auditd[181]: The audit daemon is exiting.
systemd[1]: auditd.service: control process exited, code=exited status=1
systemd[1]: Failed to start Security Auditing Service.
systemd[1]: Unit auditd.service entered failed state.
systemd[1]: auditd.service failed.
Do you guys have any ideas on why this is happening?
Thank you.
See this discussion:
At the moment, auditd can be used inside a container only for aggregating
logs from other systems. It cannot be used to get events relevant to the
container or the host OS. If you want to aggregate only, then set
local_events=no in auditd.conf.
Container support is still under development.
Also see this:
local_events
This yes/no keyword specifies whether or not to include local events. Normally you want local events so the default value is yes. Cases where you would set this to no is when you want to aggregate events only from the network. At the moment, this is useful if the audit daemon is running in a container. This option can only be set once at daemon start up. Reloading the config file has no effect.
So at least at Date: Thu, 19 Jul 2018 14:53:32 -0400, this feature not support, had to wait.
Running COS on GCE
Any ideas on how to get COS to do a graceful docker shutdown?
My innermost process is celery, which says he wants a SIGTERM to stop gracefully
http://docs.celeryproject.org/en/latest/userguide/workers.html#stopping-the-worker
My entrypoint is something like
exec celery -A some_app worker -c some_concurrency
On COS I am running my docker a service, something like
write_files:
- path: /etc/systemd/system/servicename.service
permissions: 0644
owner: root
content: |
[Unit]
Description=Some service
[Service]
Environment="HOME=/home/some_home"
RestartSec=10
Restart=always
ExecStartPre=/usr/share/google/dockercfg_update.sh
ExecStart=/usr/bin/docker run -u 2000 --name=somename --restart always some_image param_1 param_2
ExecStopPost=/usr/bin/docker stop servicename
KillMode=processes
KillSignal=SIGTERM
But ultimately when my COS instance it shut down, it just yanks the plug.
Do I need to add a shutdown script to do a docker stop? Do I need to do something more advanced?
What is the expected exit status of your container process when when it receives SIGTERM?
Running systemctl stop <service> then systemctl status -l <service> should show the exit code of the main process. Example:
Main PID: 21799 (code=exited, status=143)
One possibility is that the process does receive SIGTERM and shuts down gracefully, but returns non-zero exit code.
This would make the systemd believe that it didn't shutdown correctly. If that is the case, adding
SuccessExitStatus=143
to your systemd service should help. (Replace 143 with the actual exit code of your main process.)
For some reason when using systemd unit files my docker containers start but get shut down instantly. I have tried finding logs but can not see any indication on why this is happening. Is there someone that knows how to solve this / find the logs that show what is happening?
Note: When starting them manually after boot with docker start containername then it works (also when using systemctl start nginx)
After some more digging I found this error: could not find udev device: No such device it could have something to do with this?
Unit Service file:
[Unit]
Description=nginx-container
Requires=docker.service
After=docker.service
[Service]
Restart=always
RestartSec=2
StartLimitInterval=3600
StartLimitBurst=5
TimeoutStartSec=5
ExecStartPre=-/usr/bin/docker kill nginx
ExecStartPre=-/usr/bin/docker rm nginx
ExecStart=/usr/bin/docker run -i -d -t --restart=no --name nginx -p 80:80 -v /projects/frontend/data/nginx/:/var/www -v /projects/frontend: nginx
ExecStop=/usr/bin/docker stop -t 2 nginx
[Install]
WantedBy=multi-user.target
Journalctl output:
May 28 11:18:15 frontend dockerd[462]: time="2015-05-28T11:18:15Z" level=info msg="-job start(d757f83d4a13f876140ae008da943e8c5c3a0765c1fe5bc4a4e2599b70c30626) = OK (0)"
May 28 11:18:15 frontend dockerd[462]: time="2015-05-28T11:18:15Z" level=info msg="POST /v1.18/containers/nginx/stop?t=2"
May 28 11:18:15 frontend dockerd[462]: time="2015-05-28T11:18:15Z" level=info msg="+job stop(nginx)"
Docker logs: empty (docker logs nginx)
Systemctl output: (systemctl status nginx, nginx.service)
● nginx.service - nginx-container
Loaded: loaded (/etc/systemd/system/multi-user.target.wants/nginx.service)
Active: failed (Result: start-limit) since Thu 2015-05-28 11:18:20 UTC; 12min ago
Process: 3378 ExecStop=/usr/bin/docker stop -t 2 nginx (code=exited, status=0/SUCCESS)
Process: 3281 ExecStart=/usr/bin/docker run -i -d -t --restart=no --name nginx -p 80:80 -v /projects/frontend/data/nginx/:/var/www -v /projects/frontend:/nginx (code=exited, status=0/SUCCESS)
Process: 3258 ExecStartPre=/usr/bin/docker rm nginx (code=exited, status=0/SUCCESS)
Process: 3246 ExecStartPre=/usr/bin/docker kill nginx (code=exited, status=0/SUCCESS)
Main PID: 3281 (code=exited, status=0/SUCCESS)
May 28 11:18:20,frontend systemd[1]: nginx.service holdoff time over, scheduling restart.
May 28 11:18:20 frontend systemd[1]: start request repeated too quickly for nginx.service
May 28 11:18:20 frontend systemd[1]: Failed to start nginx-container.
May 28 11:18:20 frontend systemd[1]: Unit nginx.service entered failed state.
May 28 11:18:20 frontend systemd[1]: nginx.service failed.
Because you have not specified a Type in your systemd unit file, systemd is using the default, simple. From systemd.service:
If set to simple (the default if neither Type= nor BusName=, but
ExecStart= are specified), it is expected that the process
configured with ExecStart= is the main process of the service.
This means that if the process started by ExecStart exits, systemd
will assume your service has exited and will clean everything up.
Because you are running the docker client with -d, it exits
immediately...thus, systemd cleans up the service.
Typically, when starting containers with systemd, you would not use
the -d flag. This means that the client will continue running, and
will allow systemd to collect any output produced by your application.
That said, there are fundamental problems in starting Docker containers with systemd. Because of the way Docker operates, there really is no way for systemd to monitor the status of your container. All it can really do is track the status of the docker client, which is not the same thing (the client can exit/crash/etc without impacting your container). This isn't just relevant to systemd; any sort of process supervisor (upstart, runit, supervisor, etc) will have the same problem.