CoreOS Fleet could not get container - docker

I have 3 containers running on 3 machines. One is called graphite, one is called back and one is called front. The front container needs both the others to run, so i link them separately like this:
[Unit]
Description=front hystrix
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill front
ExecStartPre=-/usr/bin/docker rm -v front
ExecStartPre=/usr/bin/docker pull blurio/hystrixfront
ExecStart=/usr/bin/docker run --name front --link graphite:graphite --link back:back -p 8080:8080 blurio/hystrixfront
ExecStop=/usr/bin/docker stop front
I start both the other containers, wait till they're up and running, then start this one with fleetctl and it just instantly fails with this message:
fleetctl status front.service
? front.service - front hystrix
Loaded: loaded (/run/fleet/units/front.service; linked-runtime; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2015-05-12 13:46:08 UTC; 24s ago
Process: 922 ExecStop=/usr/bin/docker stop front (code=exited, status=0/SUCCESS)
Process: 912 ExecStart=/usr/bin/docker run --name front --link graphite:graphite --link back:back -p 8080:8080 blurio/hystrixfront (code=exited, status=1/FAILURE)
Process: 902 ExecStartPre=/usr/bin/docker pull blurio/hystrixfront (code=exited, status=0/SUCCESS)
Process: 892 ExecStartPre=/usr/bin/docker rm -v front (code=exited, status=1/FAILURE)
Process: 885 ExecStartPre=/usr/bin/docker kill front (code=exited, status=1/FAILURE)
Main PID: 912 (code=exited, status=1/FAILURE)
May 12 13:46:08 core-04 docker[902]: 8b9853c10955: Download complete
May 12 13:46:08 core-04 docker[902]: 0dc7a355f916: Download complete
May 12 13:46:08 core-04 docker[902]: 0dc7a355f916: Download complete
May 12 13:46:08 core-04 docker[902]: Status: Image is up to date for blurio/hystrixfront:latest
May 12 13:46:08 core-04 systemd[1]: Started front hystrix.
May 12 13:46:08 core-04 docker[912]: time="2015-05-12T13:46:08Z" level="fatal" msg="Error response from daemon: Could not get container for graphite"
May 12 13:46:08 core-04 systemd[1]: front.service: main process exited, code=exited, status=1/FAILURE
May 12 13:46:08 core-04 docker[922]: front
May 12 13:46:08 core-04 systemd[1]: Unit front.service entered failed state.
May 12 13:46:08 core-04 systemd[1]: front.service failed.
I also want to include the fleetctl list-units output, where you can see that the other two are running without problems.
fleetctl list-units
UNIT MACHINE ACTIVE SUB
back.service 0ff08b11.../172.17.8.103 active running
front.service 69ab2600.../172.17.8.104 failed failed
graphite.service 2886cedd.../172.17.8.101 active running

there are a couple issues here. first, you can't use the --link argument for docker. this is a docker specific instruction for linking one container to another on the same docker engine. in your example, you have multiple engines, so this technique won't work. If you want to use that technique, you will need to employ the ambassador pattern: coreos ambassador, either that, you you can use the X-Fleet directive MachineOf: to make all of the docker containers run on the same machine, however, I think that would defeat your goals.
Often with cloud services one service needs another, like in your case. If the other service is not running (yet), then the services that need it should be well behaved and either exit, or wait for the needed service to be ready. So the needed service must be discovered. There are many techniques for the discovery phase, and the waiting phase. For example, you can write a 'wrapper' script in each of your containers. That wrapper can do these duties. In your case, you could have a script in the back.service and graphite.service which writes information to the etcd database, like:
ExecStartPre=/usr/bin/env etcdctl set /graphite/status ready }'
Then in the startup script in front you can do a etcdctl get /graphite/status to see when the container becomes ready (and not continue until it is). If you like you can store the ip address and port in the graphite script so that the front script can pick up the place to connect to.
Another technique for discovery is to use registrator. This is a super handy docker container that updates a directory structure in etcd everytime a container comes and goes. This makes it easier to use a discovery technique like I listed above without having each container having to announce itself, it becomes automatic. You still need the 'front' container to have a startup script that waits for the service to appear in the etcd database. I usually start registrator on coreos boot. In fact, I start two copies, one for discovering internal addresses (flannel ones) and one for external (services that are available outside my containers). Here is an example of the database registrator manages on my machines:
core#fo1 ~/prs $ etcdctl ls --recursive /skydns
/skydns/net
/skydns/net/tacodata
/skydns/net/tacodata/services
/skydns/net/tacodata/services/cadvisor-4194
/skydns/net/tacodata/services/cadvisor-4194/fo2:cadvisor:4194
/skydns/net/tacodata/services/cadvisor-4194/fo1:cadvisor:4194
/skydns/net/tacodata/services/cadvisor-4194/fo3:cadvisor:4194
/skydns/net/tacodata/services/internal
/skydns/net/tacodata/services/internal/cadvisor-4194
/skydns/net/tacodata/services/internal/cadvisor-4194/fo2:cadvisor:4194
/skydns/net/tacodata/services/internal/cadvisor-4194/fo1:cadvisor:4194
/skydns/net/tacodata/services/internal/cadvisor-4194/fo3:cadvisor:4194
/skydns/net/tacodata/services/internal/cadvisor-8080
/skydns/net/tacodata/services/internal/cadvisor-8080/fo2:cadvisor:8080
/skydns/net/tacodata/services/internal/cadvisor-8080/fo1:cadvisor:8080
/skydns/net/tacodata/services/internal/cadvisor-8080/fo3:cadvisor:8080
You can see the internal and external available ports for cadvisor. If I get one of the records:
etcdctl get /skydns/net/tacodata/services/internal/cadvisor-4194/fo2:cadvisor:4194
{"host":"10.1.88.3","port":4194}
you get everything you need to connect to that container internally. This technique really starts to shine when coupled with skydns. Skydns presents a dns service using the information presented by registrator. So, long story short, I can simply make my application use the hostname (the hostname defaults to be the name of the docker image, but it can be changed). So in this example here my application can connect to cadvisor-8080, and dns will give it one of the 3 ip addresses it has (it is on 3 machines). The dns also supports srv records, so, if you aren't using a well know port the srv record can give you the port number.
Using coreos and fleet it is difficult not to get the containers themselves involved in the publish/discovery/wait game. At least that's been my experience.
-g

Related

Docker - error after moved storage to second disk and using overlay2

I just moved Docker default storage location to second disk setting up a /etc/docker/daemon.json as described in documentation, so far so goood.
The problem is that now I keep getting a bunch of volumes being continuously (re)mounted, ad obiously it is really annoying.
So I tried to set up overlay2 in /etc/docker/daemon.json, but now Docker doesn't event start
# sudo systemctl restart docker
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xeu docker.service" for details.
# systemctl status docker.service
× docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2022-12-15 11:06:36 CET; 10s ago
TriggeredBy: × docker.socket
Docs: https://docs.docker.com
Process: 17614 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status>
Main PID: 17614 (code=exited, status=1/FAILURE)
CPU: 54ms
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: Stopped Docker Application Container Engine.
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: docker.service: Start request repeated too quickly.
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: docker.service: Failed with result 'exit-code'.
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: Failed to start Docker Application Container Engine.
So, for now I give up using overlay2 since having all the Docker images and container on the seond disk is more important than getting rid of a bunch of volumes being mounted continuously, but can anyone tells me where the problem is and if there is a solution?
Update #1: strange permissions behaviour problem
Got a simple docker-compose.yml with a Wordpress service (official WP image) and a database service, and when I have the docker storage location on the second disk instead of default one the database (volume maybe?) seems inaccessible:
wordpress keep giving error on db connection
trying to run mysql interactive from db service result in error on login with root user
Obviously this is related to the docker storage location, but cannot find why, since new location is created by docker itself when started.

rootless docker - containers do not start after a power cut but starts again when host reboot

I have debian 10.5 host with docker running in rootless mode (followed this guide : https://docs.docker.com/engine/security/rootless/)
When there is a power cut (I don't have a UPS), my debian 10.5 VM starts automatically when power is restored.
Everything works fine except my docker daemon. There's no error with the service:
systemctl --user status docker
● docker.service - Docker Application Container Engine (Rootless)
Loaded: loaded (/home/dockerprod/.config/systemd/user/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-01-08 12:04:40 +04; 4min 43s ago
Docs: https://docs.docker.com
Main PID: 770 (rootlesskit)
CGroup: /user.slice/user-1001.slice/user#1001.service/docker.service
├─770 rootlesskit --net=vpnkit --mtu=1500 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run /
├─805 /proc/self/exe --net=vpnkit --mtu=1500 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/ru
├─816 vpnkit --ethernet /tmp/rootlesskit308973386/vpnkit-ethernet.sock --mtu 1500 --host-ip 0.0.0.0
├─896 dockerd --experimental --storage-driver=vfs
└─936 containerd --config /run/user/1001/docker/containerd/containerd.toml --log-level info
But the containers did not start for some reason.
I am not sure what logs to look at.
sudo journalctl -u docker.service
returns nothing
If I restart the host, the containers start as normal. So I always need to restart the host after a power cut which is not ideal when I am not at home.
Any idea what to look at?
Maybe a clue; my docker lib folder (where containers and images are stored) is on another HDD mounted automatically in /etc/fstab
Maybe after a power cut upon reboot, docker daemon is started before the HDD on which the docker lib folder is mounted? Does not know if this makes sense.
edit:
I moved the mounting command in /etc/fstab of HDD on which the docker lib is on to the top.
Does not solve the issue.
Another note, /lib/docker/containers/ is empty after the power cut
If I restart the host, /lib/docker/containers/ contains the containers again..

Trouble starting the redis server

I am using rails and want to run sidekiq and running sidekiq requires a Redis server to be installed. I installed Redis for my KDE Neon by following the instructions from the digital ocean. Here is the error that is displayed when I try to run sudo systemctl status redis :
redis.service - Redis In-Memory Data Store
Loaded: loaded (/etc/systemd/system/redis.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2021-03-24 17:24:12 IST; 6s ago
Process: 47334 ExecStart=/usr/local/bin/redis-server /etc/redis/redis.conf (code=exited, status=203/EXEC)
Main PID: 47334 (code=exited, status=203/EXEC)
Mar 24 17:24:12 maxagno3 systemd[1]: redis.service: Scheduled restart job, restart counter is at 5.
Mar 24 17:24:12 maxagno3 systemd[1]: Stopped Redis In-Memory Data Store.
Mar 24 17:24:12 maxagno3 systemd[1]: redis.service: Start request repeated too quickly.
Mar 24 17:24:12 maxagno3 systemd[1]: redis.service: Failed with result 'exit-code'.
Mar 24 17:24:12 maxagno3 systemd[1]: Failed to start Redis In-Memory Data Store.
Using redis-cli works fine. I assume when I first ran the command sudo systemctl disable redis it deleted the redis.service file. Since then I have uninstalled and installed the Redis but still, the error persists.
Quick help would greatly be appreciated.
The error your showing is hiding your originally error. Redis is basically in a reboot loop, which your error is eluding to. What you need to do is disable this restart functionality to get the underlining problem.
This can be done by doing the following:
Edit the /etc/systemd/system/redis.service
Edit the restart line to Restart=no. The possible options are no, on-success, on-failure, on-abnormal, on-watchdog, on-abort or always.
Edit the start limit interval to StartLimitInterval=0. This is normally set really high to prevent load from spiking by a service constantly restarting
Lastly reload your services for your changes to take affect. This is done by running systemctl daemon-reload
Once your service stops looping, you can try to manually start the service to get the actual error. If the error is too large you can look in your OS's general log greping specifically for Redis, or by running journalctl: journalctl -u redis.service
Hopefully this helps!
If you want clean and repeatable approach I suggest you to always use docker, especially for dev environment.
So starting docker redis as simple as:
docker run -d -p 6379:6379 --name my-redis redis

Error running auditd inside centos docker container: "Unable to set initial audit startup state to 'enable', exiting"

I'm trying to create a docker container with systemd enabled and install auditd on it.
I'm using the standard centos/systemd image provided in dockerhub.
But when I'm trying to start audit, it fails.
Here is the list of commands that I have done to create and get into the docker container:
docker run -d --rm --privileged --name systemd -v /sys/fs/cgroup:/sys/fs/cgroup:ro centos/systemd
docker exec -it systemd bash
Now, inside the docker container:
yum install audit
systemctl start auditd
I'm receiving the following error:
Job for auditd.service failed because the control process exited with error code. See "systemctl status auditd.service" and "journalctl -xe" for details.
Then I run:
systemctl status auditd.service
And I'm getting this info:
auditd[182]: Error sending status request (Operation not permitted)
auditd[182]: Error sending enable request (Operation not permitted)
auditd[182]: Unable to set initial audit startup state to 'enable', exiting
auditd[182]: The audit daemon is exiting.
auditd[181]: Cannot daemonize (Success)
auditd[181]: The audit daemon is exiting.
systemd[1]: auditd.service: control process exited, code=exited status=1
systemd[1]: Failed to start Security Auditing Service.
systemd[1]: Unit auditd.service entered failed state.
systemd[1]: auditd.service failed.
Do you guys have any ideas on why this is happening?
Thank you.
See this discussion:
At the moment, auditd can be used inside a container only for aggregating
logs from other systems. It cannot be used to get events relevant to the
container or the host OS. If you want to aggregate only, then set
local_events=no in auditd.conf.
Container support is still under development.
Also see this:
local_events
This yes/no keyword specifies whether or not to include local events. Normally you want local events so the default value is yes. Cases where you would set this to no is when you want to aggregate events only from the network. At the moment, this is useful if the audit daemon is running in a container. This option can only be set once at daemon start up. Reloading the config file has no effect.
So at least at Date: Thu, 19 Jul 2018 14:53:32 -0400, this feature not support, had to wait.

Docker containers shut down after systemd start

For some reason when using systemd unit files my docker containers start but get shut down instantly. I have tried finding logs but can not see any indication on why this is happening. Is there someone that knows how to solve this / find the logs that show what is happening?
Note: When starting them manually after boot with docker start containername then it works (also when using systemctl start nginx)
After some more digging I found this error: could not find udev device: No such device it could have something to do with this?
Unit Service file:
[Unit]
Description=nginx-container
Requires=docker.service
After=docker.service
[Service]
Restart=always
RestartSec=2
StartLimitInterval=3600
StartLimitBurst=5
TimeoutStartSec=5
ExecStartPre=-/usr/bin/docker kill nginx
ExecStartPre=-/usr/bin/docker rm nginx
ExecStart=/usr/bin/docker run -i -d -t --restart=no --name nginx -p 80:80 -v /projects/frontend/data/nginx/:/var/www -v /projects/frontend: nginx
ExecStop=/usr/bin/docker stop -t 2 nginx
[Install]
WantedBy=multi-user.target
Journalctl output:
May 28 11:18:15 frontend dockerd[462]: time="2015-05-28T11:18:15Z" level=info msg="-job start(d757f83d4a13f876140ae008da943e8c5c3a0765c1fe5bc4a4e2599b70c30626) = OK (0)"
May 28 11:18:15 frontend dockerd[462]: time="2015-05-28T11:18:15Z" level=info msg="POST /v1.18/containers/nginx/stop?t=2"
May 28 11:18:15 frontend dockerd[462]: time="2015-05-28T11:18:15Z" level=info msg="+job stop(nginx)"
Docker logs: empty (docker logs nginx)
Systemctl output: (systemctl status nginx, nginx.service)
● nginx.service - nginx-container
Loaded: loaded (/etc/systemd/system/multi-user.target.wants/nginx.service)
Active: failed (Result: start-limit) since Thu 2015-05-28 11:18:20 UTC; 12min ago
Process: 3378 ExecStop=/usr/bin/docker stop -t 2 nginx (code=exited, status=0/SUCCESS)
Process: 3281 ExecStart=/usr/bin/docker run -i -d -t --restart=no --name nginx -p 80:80 -v /projects/frontend/data/nginx/:/var/www -v /projects/frontend:/nginx (code=exited, status=0/SUCCESS)
Process: 3258 ExecStartPre=/usr/bin/docker rm nginx (code=exited, status=0/SUCCESS)
Process: 3246 ExecStartPre=/usr/bin/docker kill nginx (code=exited, status=0/SUCCESS)
Main PID: 3281 (code=exited, status=0/SUCCESS)
May 28 11:18:20,frontend systemd[1]: nginx.service holdoff time over, scheduling restart.
May 28 11:18:20 frontend systemd[1]: start request repeated too quickly for nginx.service
May 28 11:18:20 frontend systemd[1]: Failed to start nginx-container.
May 28 11:18:20 frontend systemd[1]: Unit nginx.service entered failed state.
May 28 11:18:20 frontend systemd[1]: nginx.service failed.
Because you have not specified a Type in your systemd unit file, systemd is using the default, simple. From systemd.service:
If set to simple (the default if neither Type= nor BusName=, but
ExecStart= are specified), it is expected that the process
configured with ExecStart= is the main process of the service.
This means that if the process started by ExecStart exits, systemd
will assume your service has exited and will clean everything up.
Because you are running the docker client with -d, it exits
immediately...thus, systemd cleans up the service.
Typically, when starting containers with systemd, you would not use
the -d flag. This means that the client will continue running, and
will allow systemd to collect any output produced by your application.
That said, there are fundamental problems in starting Docker containers with systemd. Because of the way Docker operates, there really is no way for systemd to monitor the status of your container. All it can really do is track the status of the docker client, which is not the same thing (the client can exit/crash/etc without impacting your container). This isn't just relevant to systemd; any sort of process supervisor (upstart, runit, supervisor, etc) will have the same problem.

Resources