Docker Swarm: Service keep on getting Ready & Shutdown - docker

I have couple of docker swarm nodes, When tried to create the service on Leader with below command. Service creation process still going on it is more-than 40 minutes now.
docker service create \
--mode global \
--mount type=bind,src=/project/m32/,dst=/root/m32/ \
--publish mode=host,target=310,published=310 \
--publish mode=host,target=311,published=311 \
--publish mode=host,target=312,published=312 \
--publish mode=host,target=313,published=313 \
--constraint "node.labels.m32 == true" \
--name m32 \
local-registry/ubuntu:07
overall progress: 1 out of 2 tasks
ew0edluvz39p: ready [======================================> ]
kzc7jf7irsrh: running [==================================================>]
From service process, it keep on showing as Ready and Shutdown
$ docker service ps m32
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
s4q0rqrqbpdn m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Ready Ready 1 second ago
r6vibgptm5oc \_ m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Shutdown Complete 1 second ago
joq2p6c9jpnx \_ m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Shutdown Complete 7 seconds ago
a5h8gac02vfx \_ m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Shutdown Complete 13 seconds ago
f51stfsdlhvp \_ m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Shutdown Complete 19 seconds ago
zqcbxkm4fwhr m32.kzc7jf7irsrhnx3kurcwqjb2j local-registry/ubuntu:07 sl-090 Ready Ready less than a second ago
za8efvi9x4yw \_ m32.kzc7jf7irsrhnx3kurcwqjb2j local-registry/ubuntu:07 sl-090 Shutdown Complete less than a second ago
$ sudo systemctl status docker.service
Nov 24 19:58:48 svr2 dockerd[2797]: time="2021-11-24T19:58:48.200421563+05:30" level=info msg="ignoring event" container=ea8b76fedb18159ba0cd8f279a9ca4264399c>
Nov 24 20:01:39 svr2 dockerd[2797]: time="2021-11-24T20:01:39.602028420+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
Nov 24 20:06:39 svr2 dockerd[2797]: time="2021-11-24T20:06:39.802013427+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
Nov 24 20:11:40 svr2 dockerd[2797]: time="2021-11-24T20:11:40.001992437+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
Nov 24 20:14:17 svr2 dockerd[2797]: time="2021-11-24T20:14:17.871605342+05:30" level=error msg="Error getting service xkauq9a599iv: service xkauq9a599iv not f>
Nov 24 20:14:52 svr2 dockerd[2797]: time="2021-11-24T20:14:52.833890158+05:30" level=error msg="Error getting service xkauq9a599iv: service xkauq9a599iv not f>
Nov 24 20:15:12 svr2 dockerd[2797]: time="2021-11-24T20:15:12.395692837+05:30" level=error msg="Error getting service pwaa8cvdd683: service pwaa8cvdd683 not f>
Nov 24 20:15:17 svr2 dockerd[2797]: time="2021-11-24T20:15:17.773200054+05:30" level=error msg="Error getting service xk0v0g2roypx: service xk0v0g2roypx not f>
Nov 24 20:16:18 svr2 dockerd[2797]: time="2021-11-24T20:16:18.529344060+05:30" level=error msg="Error getting service xk0v0g2roypx: service xk0v0g2roypx not f>
Nov 24 20:16:40 svr2 dockerd[2797]: time="2021-11-24T20:16:40.201888504+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
It looks loop process keep on creating containers. What is wrong in my way? Any help to fix this problem will be highly appreciated. Thanks

You really need to pass --restart-max-attempts 5 to your docker service create to ensure that services don't start too many times in a loop. Its bad for the stability of docker, and hard to debug. Rather have a task just give up and stop so you can see something is wrong and diagnose it.
To see specifically what is wrong you would want to look at the logs of each task. You use the individual task id's to see why each one failed:
# The logs for a task
docker service logs s4q0rqrqbpdn
# A general breakdown of a task
docker inspect s4q0rqrqbpdn
Sometimes you need to track down the actual container for the task and inspect that. docker container is not swarm aware, so
# list the service showing the full task id.
docker service ps <service> --no-trunc
# then docker context use <node> / ssh <node> to switch to a node of interest.
# Then, the container name is the "ID"."NAME" from the PS list. For example:
docker context use sl-089
docker container inspect m32.ew0edluvz39pazold0wnv2ean.s4q0rqrqbpdnABCDEFGABCDEFG
Inspecting the container can show if it was killed because of an OOM or certain other reasons that don't otherwise show up.

Related

Cannot connect to the Docker daemon after failed pull

When I try to pull a certain docker image, my pull fails, and then prevents me from connecting to the docker deamon again until I reboot my laptop. The Image in question is an official Jupyter images which works fine on my other machine. Restarting the Deamon does not help, but rebooting my laptop does.
I tried to docker system prune -a already, that's why there are no images on my laptop anymore. Does somebody have an idea how to fix this problem?
I think the problem might be connected to one of the images not finishing it's extraction.
EDIT
I have the same problem with a alpine image. see below
me#mylaptop $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
me#mylaptop $ docker pull jupyter/datascience-notebook
Using default tag: latest
latest: Pulling from jupyter/datascience-notebook
e6ca3592b144: Extracting [==================================================>] 28.56MB/28.56MB
534a5505201d: Download complete
990916bd23bb: Download complete
979cd14ae800: Download complete
5e8b9f8fa9e0: Download complete
6f224ed88dc4: Download complete
6ee9ec4a62a8: Download complete
7a1ae22ba760: Download complete
a1602338a8d7: Download complete
fce5135a7ea1: Download complete
e62a1c9017ef: Download complete
a5049ad1c512: Download complete
ec06c1612b0a: Download complete
acceda87b341: Download complete
939052532b6f: Download complete
d2dee4cc07fe: Download complete
4fe5e9dd4fad: Download complete
8fd08517e0c6: Download complete
7105a3ca8c38: Download complete
66c0798f609e: Download complete
94f3fc35ed38: Download complete
aa68263474a3: Download complete
6e7d1433394b: Download complete
f5902e69d9b7: Download complete
490bb991b4de: Download complete
fab6e92b04fa: Download complete
failed to register layer: Error processing tar file(exit status 1): Error cleaning up after pivot: remove /.pivot_root297865553: device or resource busy
me#mylaptop $ docker images
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
me#mylaptop $ sudo systemctl start docker
me#mylaptop $ systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-09-30 08:11:12 CEST; 15min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 908 (dockerd)
Tasks: 10
Memory: 140.8M
CGroup: /system.slice/docker.service
└─908 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
sep 30 08:11:11 mylaptop dockerd[908]: time="2020-09-30T08:11:11.992016198+02:00" level=warning msg="Your kernel does not support cgroup rt runtime"
sep 30 08:11:11 mylaptop dockerd[908]: time="2020-09-30T08:11:11.992433459+02:00" level=info msg="Loading containers: start."
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.227615723+02:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can b>
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.296603004+02:00" level=info msg="Loading containers: done."
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.486944893+02:00" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: >
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.487273874+02:00" level=info msg="Docker daemon" commit=48a66213fe graphdriver(s)=overlay2 version=19.03.12-ce
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.491959213+02:00" level=info msg="Daemon has completed initialization"
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.530816090+02:00" level=info msg="API listen on /run/docker.sock"
sep 30 08:11:12 mylaptop systemd[1]: Started Docker Application Container Engine.
sep 30 08:23:36 mylaptop dockerd[908]: time="2020-09-30T08:23:36.941202710+02:00" level=info msg="Attempting next endpoint for pull after error: failed to register layer: Error processing tar fi>
me#mylaptop $ docker images
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
me#mylaptop $ docker pull alpine:3.12.0
3.12.0: Pulling from library/alpine
df20fa9351a1: Extracting [==================================================>] 2.798MB/2.798MB
failed to register layer: Error processing tar file(exit status 1): Error cleaning up after pivot: remove /.pivot_root517304538: device or resource busy
Solved it. The problem is that my kernel was/became to old.
The warning below by systemctl brought made me find this post on forums.docker.com
me#mylaptop $ systemctl status docker
...
sep 30 08:11:11 mylaptop dockerd[908]: time="2020-09-30T08:11:11.992016198+02:00" level=warning msg="Your kernel does not support cgroup rt runtime"
...
I'm running Manjaro so I upgrade my kernel with this command:
sudo mhwd-kernel -i linux54
After which docker worked again.

Docker + docker-compose up + Cannot start service

we have docker-compose.yml that contain configuration for Kafka , zookeeper and schema registry
when we start the docker compose we get the following errors
docker-compose up -d
Starting kafka-docker-final_zookeeper3_1 ... error
ERROR: for kafka-docker-final_zookeeper3_1 Cannot start service zookeeper3: network dd321821f3cb4a715c31e04b32bff2cf206c85ed5581b01b1c6a94ffa45f330e not found
ERROR: for zookeeper3 Cannot start service zookeeper3: network dd321821f3cb4a715c31e04b32bff2cf206c85ed5581b01b1c6a94ffa45f330e not found
ERROR: Encountered errors while bringing up the project.
and
systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2020-03-19 07:57:29 UTC; 1h 55min ago
Docs: https://docs.docker.com
Main PID: 12105 (dockerd)
Tasks: 30
Memory: 654.6M
CGroup: /system.slice/docker.service
└─12105 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Mar 19 07:57:29 master3 dockerd[12105]: time="2020-03-19T07:57:29.610005717Z" level=info msg="Daemon has completed initialization"
Mar 19 07:57:29 master3 dockerd[12105]: time="2020-03-19T07:57:29.631338594Z" level=info msg="API listen on /var/run/docker.sock"
Mar 19 07:57:29 master3 systemd[1]: Started Docker Application Container Engine.
Mar 19 07:58:12 master3 dockerd[12105]: time="2020-03-19T07:58:12.352833676Z" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: net/http: re...ng headers)"
Mar 19 07:58:12 master3 dockerd[12105]: time="2020-03-19T07:58:12.352916724Z" level=info msg="Attempting next endpoint for pull after error: Get https://registry-1.docker.io/...ng headers)"
Mar 19 07:58:12 master3 dockerd[12105]: time="2020-03-19T07:58:12.353019409Z" level=error msg="Handler for POST /v1.22/images/create returned error: Get https://registry-1.do...ng headers)"
Mar 19 08:03:47 master3 dockerd[12105]: time="2020-03-19T08:03:47.255058871Z" level=warning msg="error locating sandbox id 20ce3c5b6383ad92dae848c3de1d91bbfff9306ca86fdc90fae...c not found"
Mar 19 08:03:47 master3 dockerd[12105]: time="2020-03-19T08:03:47.263976715Z" level=error msg="ef808aa411ae0aaef0920397c77b6d9a327bdd1651877402fe1fc142a513af8a cleanup: faile...h container"
Mar 19 09:50:43 master3 dockerd[12105]: time="2020-03-19T09:50:43.920457464Z" level=warning msg="error locating sandbox id 20ce3c5b6383ad92dae848c3de1d91bbfff9306ca86fdc90fae...c not found"
Mar 19 09:50:43 master3 dockerd[12105]: time="2020-03-19T09:50:43.927744636Z" level=error msg="ef808aa411ae0aaef0920397c77b6d9a327bdd1651877402fe1fc142a513af8a cleanup: faile...h container"
Hint: Some lines were ellipsized, use -l to show in full.
regarding to
Cannot start service zookeeper3: network dd321821f3cb4a715c31e04b32bff2cf206c85ed5581b01b1c6a94ffa45f330e not found
Cannot start service zookeeper3: network dd321821f3cb4a715c31e04b32bff2cf206c85ed5581b01b1c6a94ffa45f330e not found
how to fix this issue?
docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6c729cb0bb2c confluentinc/cp-schema-registry:latest "/etc/confluent/dock…" 3 months ago Exited (255) 2 hours ago 0.0.0.0:8081->8081/tcp kafka-docker-schemaregistry_1
ef808aa411ae confluentinc/cp-zookeeper:latest "/etc/confluent/dock…" 3 months ago Exited (255) 2 hours ago
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
docker network ls
NETWORK ID NAME DRIVER SCOPE
e5566ab8ca6d bridge bridge local
2467d9664593 host host local
c509e32d0d67 kafka-docker-default bridge local
08966157382c none null local
we fixed the issue by the following procedure
# docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6c729cb0bb2c confluentinc/cp-schema-registry:latest "/etc/confluent/dock…" 3 months ago Exited (255) 5 hours ago 0.0.0.0:8081->8081/tcp kafka-docker-schemaregistry_1
ef808aa411ae confluentinc/cp-zookeeper:latest "/etc/confluent/dock…" 3 months ago Exited (255) 5 hours ago kafka-docker-zookeeper3_1
# docker container rm 6c729cb0bb2c
# docker container rm ef808aa411ae
systemctl stop docker
systemctl start docker
docker-compose up -d
Creating kafka-docker-zookeeper3_1 ... done
Creating kafka-docker-kafka3_1 ... done
Creating kafka-docker-schemaregistry_1 ... done
docker-compose ps
Name Command State Ports
------------------------------------------------------------------------------------------------------------------------------------------------
kafka-docker-kafka3_1 /etc/confluent/docker/run Up 0.0.0.0:9092->9092/tcp
kafka-docker-schemaregistry_1 /etc/confluent/docker/run Up 0.0.0.0:8081->8081/tcp
kafka-docker-zookeeper3_1 /etc/confluent/docker/run Up 0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp
I had the same problem and for me it was enough to:
sudo docker-compose down
sudo docker-compose up

Docker fails to start due to "volume store metadata database: timeout"

I have followed the installation instructions of Docker CE for CentOS. Initially this worked. At some point the system was restarted and now starting Docker fails. Appreciate expert eyes on this matter...
systemctl start docker produces:
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
systemctl status docker.service produces:
Apr 21 11:25:23 sec-services-build-1 systemd[1]: Starting Docker Application Container Engine...
Apr 21 11:25:23 sec-services-build-1 dockerd[9693]: time="2017-04-21T11:25:23.370390797+03:00" level=info msg="libcontainerd: previous instance of containerd still alive (8908)"
Apr 21 11:25:23 sec-services-build-1 dockerd[9693]: time="2017-04-21T11:25:23.382492171+03:00" level=warning msg="overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support. Running without d_type support will no longer be supported in Docker 17.12."
Apr 21 11:25:23 sec-services-build-1 dockerd[9693]: time="2017-04-21T11:25:23.382547668+03:00" level=info msg="[graphdriver] using prior storage driver: overlay"
Apr 21 11:25:24 sec-services-build-1 dockerd[9693]: Error starting daemon: error while opening volume store metadata database: timeout
Apr 21 11:25:24 sec-services-build-1 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Apr 21 11:25:24 sec-services-build-1 systemd[1]: Failed to start Docker Application Container Engine.
Apr 21 11:25:24 sec-services-build-1 systemd[1]: Unit docker.service entered failed state.
Apr 21 11:25:24 sec-services-build-1 systemd[1]: docker.service failed.
From here: https://github.com/moby/moby/issues/22507
I ran:
ps axf | grep docker | grep -v grep | awk '{print "kill -9 " $1}' | sudo sh
I was then able to restart docker using:
sudo systemctl start docker
Step 1: systemctl status docker (if docker is running) stop the docker.
step 2: systemctl stop docker.
step 3: dockerd
i got this message when copying volumes from production machine, ended up to overwrite metadata.db inside /var/lib/docker/volumes, then it crashes. A fix is so simple
docker system prune --volumes -f && rm /var/lib/docker/volumes/metadata.db && docker-compose up -d
I encountered the same error.
❶tried
sudo kill -9 1452
multiple times, but it doesn't work. There's still a dockerd process active.
1452 ? Zsl 127:42 [dockerd] <defunct>
❷tried as #Artur Mustafin suggested:
sudo mv /var/lib/docker/volumes/metadata.db /var/lib/docker/volumes/metadata.db.bk
it worked.
so I tried all of these and nothing worked. However what worked was removing all the containers from /var/lib/docker/containers. Then i killed all docker processes (ps -ef | grep docker) then restarted docker and the docker socket. When docker became active I added the containers one at a time and 1 container was what caused the issues

Docker daemon no start

I'm trying to start the Docker daemon:
sudo systemctl start docker
But nothing happens, the cursor just blinks and the process never ends.
Yesterday it was working properly :(
sudo journalctl -fu docker
ago 18 16:05:24 host docker[1602]: time="2016-08-18T16:05:24.467635627-05:00" level=info msg="New containerd process, pid: 1609\n"
ago 18 16:05:24 host docker[1602]: time="2016-08-18T16:05:24.482107319-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
ago 18 16:05:30 host docker[1602]: time="2016-08-18T16:05:30.470570243-05:00" level=info msg="New containerd process, pid: 1620\n"
ago 18 16:05:30 host docker[1602]: time="2016-08-18T16:05:30.491495106-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
ago 18 16:08:06 host systemd[1]: Stopped Docker Application Container Engine.
-- Reboot --
ago 18 16:16:52 host systemd[1]: Starting Docker Application Container Engine...
ago 18 16:16:54 host docker[2294]: time="2016-08-18T16:16:54.360878396-05:00" level=info msg="New containerd process, pid: 2327\n"
ago 18 16:16:54 host docker[2294]: time="2016-08-18T16:16:54.686503187-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
ago 18 16:17:00 host docker[2294]: time="2016-08-18T16:17:00.664023288-05:00" level=info msg="New containerd process, pid: 2368\n"
ago 18 16:17:00 host docker[2294]: time="2016-08-18T16:17:00.67708602-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
One interesting thing with systemd is that if it thinks that a daemon is running, then the start command does nothing.
I have had to do the following to make sure I cleanly restart certain daemons:
sudo systemctl stop service-name
# wait a little if the service is slow to stop like the Cassandra database
sudo systemctl start service-name
That has worked for me with various services.
One way to know whether the service is considered running, is to check the status like so:
systemctl status service-name

my coreos/fleet deployed service is dying and I can't tell why

I'm trying to deploy nsqlookupd using fleet on a brand shiny new coreos cluster in EC2. Here is my systemd unit file:
[Unit]
Description=nsqlookupd service
After=docker.service
Requires=docker.service
[Service]
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill nsqlookupd
ExecStartPre=-/usr/bin/docker rm nsqlookupd
ExecStart=/usr/bin/docker run -d --name=nsqlookupd -e BROADCAST_ADDRESS=$COREOS_PUBLIC_IPV4 -p 4160:4160 -p 4161:4161 mikedewar/nsqlookupd
ExecStartPost=/usr/bin/etcdctl set /nsqlookupd_broadcast_address $COREOS_PUBLIC_IPV4
ExecStop=/usr/bin/docker stop -t 1 nsqlookupd
ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address
I've verified the container works fine if I just run the ExecStart command. My docker logs just look like
~ $ docker logs nsqlookupd
2014/08/08 02:23:58 nsqlookupd v0.2.29-alpha (built w/go1.2.2)
2014/08/08 02:23:58 TCP: listening on [::]:4160
2014/08/08 02:23:58 HTTP: listening on [::]:4161
and my fleetctl journal looks like
$ fleetctl journal nsqlookupd.service
-- Logs begin at Sun 2014-08-03 12:49:00 UTC, end at Fri 2014-08-08 02:30:06 UTC. --
Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Starting nsqlookupd service...
Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: Error response from daemon: No such container: nsqlookupd
Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: 2014/08/08 02:23:57 Error: failed to kill one or more containers
Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: Error response from daemon: No such container: nsqlookupd
Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: 2014/08/08 02:23:57 Error: failed to remove one or more containers
Aug 08 02:23:57 ip-10-147-9-249 etcdctl[6157]: 54.198.93.169
Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Started nsqlookupd service.
Aug 08 02:23:57 ip-10-147-9-249 docker[6155]: 0fce4465f61c092541ba9d4c4e89ce13c4d6bedc096519034ed585d7adb5e0d7
Aug 08 02:23:59 ip-10-147-9-249 docker[6194]: nsqlookupd
both of which look just fine. But the container dies quietly, and my fleetctl list-units gives
$ fleetctl list-units
UNIT STATE LOAD ACTIVE SUB DESC MACHINE
nsqlookupd.service launched loaded deactivating stop nsqlookupd service 1320802c.../10.147.9.249
Running docker images is a little worrying:
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
<none> <none> 8ef9d8f9d18d 9 minutes ago 710 MB
mikedewar/nsqadmin latest 432af572bda8 2 days ago 710 MB
mikedewar/nsqd latest 00bd4e474964 2 days ago 710 MB
<none> <none> adf0ed97208e 3 weeks ago 710 MB
mikedewar/nsqlookupd latest 2219c0e783d9 3 weeks ago 710 MB
<none> <none> 35d2212f8932 3 weeks ago 710 MB
mikedewar/nsq latest f9794fe056e1 3 weeks ago 710 MB
busybox latest a9eb17255234 9 weeks ago 2.433 MB
zmarcantel/cassandra latest b1168b45b4f8 4 months ago 738 MB
as I've been updating mikedewar/nsqlookupd quite regularly over the last 3 weeks. Maybe that's the time I first pushed something to docker hub? I'd love to know that the image I'm working with is the up-to-date one. I've tried docker rmi mikedewar/nsqlookupd followed by docker pull mikedewar/nsqlookupd but the CREATED column still says it was created 3 weeks ago.
I don't know if this is useful, but the ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address command seems to have worked - the etcdctl log line in the fleet journal suggests I managed to set the key to my IP, but after the container dies I can't get that key from etcd.
Any help on where to look next for clues, or any ideas why this is happening would be greatly appreciated! As is probably clear I'm rather new to this sort of thing...
You shouldn't run docker containers in detached mode in a unit file. Your execstart contains it: ExecStart=/usr/bin/docker run -d. This will cause systemd to think the process exited immediately since it was forked into the background.
As for managing versions, if you want to be absolutely sure you're getting the latest copy, you should tag your containers and then pull mikedewar/nsqlookupd:1.2.3. You can increment this each time in your fleet unit file.

Resources