Docker + docker-compose up + Cannot start service - docker

we have docker-compose.yml that contain configuration for Kafka , zookeeper and schema registry
when we start the docker compose we get the following errors
docker-compose up -d
Starting kafka-docker-final_zookeeper3_1 ... error
ERROR: for kafka-docker-final_zookeeper3_1 Cannot start service zookeeper3: network dd321821f3cb4a715c31e04b32bff2cf206c85ed5581b01b1c6a94ffa45f330e not found
ERROR: for zookeeper3 Cannot start service zookeeper3: network dd321821f3cb4a715c31e04b32bff2cf206c85ed5581b01b1c6a94ffa45f330e not found
ERROR: Encountered errors while bringing up the project.
and
systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2020-03-19 07:57:29 UTC; 1h 55min ago
Docs: https://docs.docker.com
Main PID: 12105 (dockerd)
Tasks: 30
Memory: 654.6M
CGroup: /system.slice/docker.service
└─12105 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Mar 19 07:57:29 master3 dockerd[12105]: time="2020-03-19T07:57:29.610005717Z" level=info msg="Daemon has completed initialization"
Mar 19 07:57:29 master3 dockerd[12105]: time="2020-03-19T07:57:29.631338594Z" level=info msg="API listen on /var/run/docker.sock"
Mar 19 07:57:29 master3 systemd[1]: Started Docker Application Container Engine.
Mar 19 07:58:12 master3 dockerd[12105]: time="2020-03-19T07:58:12.352833676Z" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: net/http: re...ng headers)"
Mar 19 07:58:12 master3 dockerd[12105]: time="2020-03-19T07:58:12.352916724Z" level=info msg="Attempting next endpoint for pull after error: Get https://registry-1.docker.io/...ng headers)"
Mar 19 07:58:12 master3 dockerd[12105]: time="2020-03-19T07:58:12.353019409Z" level=error msg="Handler for POST /v1.22/images/create returned error: Get https://registry-1.do...ng headers)"
Mar 19 08:03:47 master3 dockerd[12105]: time="2020-03-19T08:03:47.255058871Z" level=warning msg="error locating sandbox id 20ce3c5b6383ad92dae848c3de1d91bbfff9306ca86fdc90fae...c not found"
Mar 19 08:03:47 master3 dockerd[12105]: time="2020-03-19T08:03:47.263976715Z" level=error msg="ef808aa411ae0aaef0920397c77b6d9a327bdd1651877402fe1fc142a513af8a cleanup: faile...h container"
Mar 19 09:50:43 master3 dockerd[12105]: time="2020-03-19T09:50:43.920457464Z" level=warning msg="error locating sandbox id 20ce3c5b6383ad92dae848c3de1d91bbfff9306ca86fdc90fae...c not found"
Mar 19 09:50:43 master3 dockerd[12105]: time="2020-03-19T09:50:43.927744636Z" level=error msg="ef808aa411ae0aaef0920397c77b6d9a327bdd1651877402fe1fc142a513af8a cleanup: faile...h container"
Hint: Some lines were ellipsized, use -l to show in full.
regarding to
Cannot start service zookeeper3: network dd321821f3cb4a715c31e04b32bff2cf206c85ed5581b01b1c6a94ffa45f330e not found
Cannot start service zookeeper3: network dd321821f3cb4a715c31e04b32bff2cf206c85ed5581b01b1c6a94ffa45f330e not found
how to fix this issue?
docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6c729cb0bb2c confluentinc/cp-schema-registry:latest "/etc/confluent/dock…" 3 months ago Exited (255) 2 hours ago 0.0.0.0:8081->8081/tcp kafka-docker-schemaregistry_1
ef808aa411ae confluentinc/cp-zookeeper:latest "/etc/confluent/dock…" 3 months ago Exited (255) 2 hours ago
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
docker network ls
NETWORK ID NAME DRIVER SCOPE
e5566ab8ca6d bridge bridge local
2467d9664593 host host local
c509e32d0d67 kafka-docker-default bridge local
08966157382c none null local

we fixed the issue by the following procedure
# docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6c729cb0bb2c confluentinc/cp-schema-registry:latest "/etc/confluent/dock…" 3 months ago Exited (255) 5 hours ago 0.0.0.0:8081->8081/tcp kafka-docker-schemaregistry_1
ef808aa411ae confluentinc/cp-zookeeper:latest "/etc/confluent/dock…" 3 months ago Exited (255) 5 hours ago kafka-docker-zookeeper3_1
# docker container rm 6c729cb0bb2c
# docker container rm ef808aa411ae
systemctl stop docker
systemctl start docker
docker-compose up -d
Creating kafka-docker-zookeeper3_1 ... done
Creating kafka-docker-kafka3_1 ... done
Creating kafka-docker-schemaregistry_1 ... done
docker-compose ps
Name Command State Ports
------------------------------------------------------------------------------------------------------------------------------------------------
kafka-docker-kafka3_1 /etc/confluent/docker/run Up 0.0.0.0:9092->9092/tcp
kafka-docker-schemaregistry_1 /etc/confluent/docker/run Up 0.0.0.0:8081->8081/tcp
kafka-docker-zookeeper3_1 /etc/confluent/docker/run Up 0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp

I had the same problem and for me it was enough to:
sudo docker-compose down
sudo docker-compose up

Related

Not able to attach on user defined bridge attach command

I tried to attach in container. But while i attach it's come out.
Here is the network below
NETWORK ID NAME DRIVER SCOPE
7ab1dac4166d alpine-net bridge local
Below are my containers
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4f98c3758144 httpd "httpd-foreground" 23 minutes ago Up 3 seconds 80/tcp alpine4
a00e9603190e httpd "httpd-foreground" 23 minutes ago Up 23 minutes 80/tcp alpine3
a02ac156f310 httpd "httpd-foreground" 23 minutes ago Up 23 minutes 80/tcp alpine2
55a0395dc0bc httpd "httpd-foreground" 24 minutes ago Up About a minute 80/tcp alpine1
I am trying to connect alpine1. But not working. After press ctrl+c container got shutdown.
root#ip-172-31-35-240:~# docker network connect alpine-net 4f98c3758144
Error response from daemon: endpoint with name alpine4 already exists in network alpine-net
root#ip-172-31-35-240:~# docker attach alpine1
^C[Wed Sep 28 17:38:25.604430 2022] [mpm_event:notice] [pid 1:tid 140167697374528] AH00491: caught SIGTERM, shutting down
root#ip-172-31-35-240:~# docker attach alpine1
You cannot attach to a stopped container, start it first
root#ip-172-31-35-240:~# docker attach alpine2
^C[Wed Sep 28 17:47:57.649545 2022] [mpm_event:notice] [pid 1:tid 140336080485696] AH00491: caught SIGTERM, shutting down
root#ip-172-31-35-240:~# docker attach a00e9603190e
^C[Wed Sep 28 17:48:56.424446 2022] [mpm_event:notice] [pid 1:tid 140629127331136] AH00491: caught SIGTERM, shutting down
How to attach it? In default bridge it's working. User defined bridge not working
Try1: I recreated server and I have only one network. Below that I have 2 container in this network. When I try again getting shutdown. If you please help to fix this.
Network List :
root#ip-172-31-32-242:~# docker network ls
NETWORK ID NAME DRIVER SCOPE
7ee777e7a9d6 bridge bridge local
694493b04e19 gold-net bridge local
75c9930ac3df host host local
6ffae4da13f6 none null local
Listing the containers
root#ip-172-31-32-242:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0f275cb17da4 httpd "httpd-foreground" 49 seconds ago Up 48 seconds 80/tcp apache4
d3bf64d7f751 httpd "httpd-foreground" About a minute ago Up About a minute 80/tcp apache3
0ddd638f1c92 httpd "httpd-foreground" 2 minutes ago Up 2 minutes 80/tcp apache2
b09daeeb17b8 httpd "httpd-foreground" 2 minutes ago Up 2 minutes 80/tcp apache1
Failure Result:
root#ip-172-31-32-242:~# docker container attach apache1
^C[Thu Sep 29 06:44:56.692274 2022] [mpm_event:notice] [pid 1:tid 140043293744448] AH00491: caught SIGTERM, shutting down
root#ip-172-31-32-242:~#
root#ip-172-31-32-242:~#
root#ip-172-31-32-242:~#
root#ip-172-31-32-242:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0f275cb17da4 httpd "httpd-foreground" 21 minutes ago Up 21 minutes 80/tcp apache4
d3bf64d7f751 httpd "httpd-foreground" 21 minutes ago Up 21 minutes 80/tcp apache3
0ddd638f1c92 httpd "httpd-foreground" 22 minutes ago Up 22 minutes 80/tcp apache2
b09daeeb17b8 httpd "httpd-foreground" 23 minutes ago Exited (0) 8 minutes ago apache1
root#ip-172-31-32-242:~#

Docker Swarm: Service keep on getting Ready & Shutdown

I have couple of docker swarm nodes, When tried to create the service on Leader with below command. Service creation process still going on it is more-than 40 minutes now.
docker service create \
--mode global \
--mount type=bind,src=/project/m32/,dst=/root/m32/ \
--publish mode=host,target=310,published=310 \
--publish mode=host,target=311,published=311 \
--publish mode=host,target=312,published=312 \
--publish mode=host,target=313,published=313 \
--constraint "node.labels.m32 == true" \
--name m32 \
local-registry/ubuntu:07
overall progress: 1 out of 2 tasks
ew0edluvz39p: ready [======================================> ]
kzc7jf7irsrh: running [==================================================>]
From service process, it keep on showing as Ready and Shutdown
$ docker service ps m32
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
s4q0rqrqbpdn m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Ready Ready 1 second ago
r6vibgptm5oc \_ m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Shutdown Complete 1 second ago
joq2p6c9jpnx \_ m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Shutdown Complete 7 seconds ago
a5h8gac02vfx \_ m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Shutdown Complete 13 seconds ago
f51stfsdlhvp \_ m32.ew0edluvz39pazold0wnv2ean local-registry/ubuntu:07 sl-089 Shutdown Complete 19 seconds ago
zqcbxkm4fwhr m32.kzc7jf7irsrhnx3kurcwqjb2j local-registry/ubuntu:07 sl-090 Ready Ready less than a second ago
za8efvi9x4yw \_ m32.kzc7jf7irsrhnx3kurcwqjb2j local-registry/ubuntu:07 sl-090 Shutdown Complete less than a second ago
$ sudo systemctl status docker.service
Nov 24 19:58:48 svr2 dockerd[2797]: time="2021-11-24T19:58:48.200421563+05:30" level=info msg="ignoring event" container=ea8b76fedb18159ba0cd8f279a9ca4264399c>
Nov 24 20:01:39 svr2 dockerd[2797]: time="2021-11-24T20:01:39.602028420+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
Nov 24 20:06:39 svr2 dockerd[2797]: time="2021-11-24T20:06:39.802013427+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
Nov 24 20:11:40 svr2 dockerd[2797]: time="2021-11-24T20:11:40.001992437+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
Nov 24 20:14:17 svr2 dockerd[2797]: time="2021-11-24T20:14:17.871605342+05:30" level=error msg="Error getting service xkauq9a599iv: service xkauq9a599iv not f>
Nov 24 20:14:52 svr2 dockerd[2797]: time="2021-11-24T20:14:52.833890158+05:30" level=error msg="Error getting service xkauq9a599iv: service xkauq9a599iv not f>
Nov 24 20:15:12 svr2 dockerd[2797]: time="2021-11-24T20:15:12.395692837+05:30" level=error msg="Error getting service pwaa8cvdd683: service pwaa8cvdd683 not f>
Nov 24 20:15:17 svr2 dockerd[2797]: time="2021-11-24T20:15:17.773200054+05:30" level=error msg="Error getting service xk0v0g2roypx: service xk0v0g2roypx not f>
Nov 24 20:16:18 svr2 dockerd[2797]: time="2021-11-24T20:16:18.529344060+05:30" level=error msg="Error getting service xk0v0g2roypx: service xk0v0g2roypx not f>
Nov 24 20:16:40 svr2 dockerd[2797]: time="2021-11-24T20:16:40.201888504+05:30" level=info msg="NetworkDB stats svr2(00bbf0799aa6) - netID:ubuzyty9mq4tb7xyb>
It looks loop process keep on creating containers. What is wrong in my way? Any help to fix this problem will be highly appreciated. Thanks
You really need to pass --restart-max-attempts 5 to your docker service create to ensure that services don't start too many times in a loop. Its bad for the stability of docker, and hard to debug. Rather have a task just give up and stop so you can see something is wrong and diagnose it.
To see specifically what is wrong you would want to look at the logs of each task. You use the individual task id's to see why each one failed:
# The logs for a task
docker service logs s4q0rqrqbpdn
# A general breakdown of a task
docker inspect s4q0rqrqbpdn
Sometimes you need to track down the actual container for the task and inspect that. docker container is not swarm aware, so
# list the service showing the full task id.
docker service ps <service> --no-trunc
# then docker context use <node> / ssh <node> to switch to a node of interest.
# Then, the container name is the "ID"."NAME" from the PS list. For example:
docker context use sl-089
docker container inspect m32.ew0edluvz39pazold0wnv2ean.s4q0rqrqbpdnABCDEFGABCDEFG
Inspecting the container can show if it was killed because of an OOM or certain other reasons that don't otherwise show up.

Cannot connect to the Docker daemon after failed pull

When I try to pull a certain docker image, my pull fails, and then prevents me from connecting to the docker deamon again until I reboot my laptop. The Image in question is an official Jupyter images which works fine on my other machine. Restarting the Deamon does not help, but rebooting my laptop does.
I tried to docker system prune -a already, that's why there are no images on my laptop anymore. Does somebody have an idea how to fix this problem?
I think the problem might be connected to one of the images not finishing it's extraction.
EDIT
I have the same problem with a alpine image. see below
me#mylaptop $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
me#mylaptop $ docker pull jupyter/datascience-notebook
Using default tag: latest
latest: Pulling from jupyter/datascience-notebook
e6ca3592b144: Extracting [==================================================>] 28.56MB/28.56MB
534a5505201d: Download complete
990916bd23bb: Download complete
979cd14ae800: Download complete
5e8b9f8fa9e0: Download complete
6f224ed88dc4: Download complete
6ee9ec4a62a8: Download complete
7a1ae22ba760: Download complete
a1602338a8d7: Download complete
fce5135a7ea1: Download complete
e62a1c9017ef: Download complete
a5049ad1c512: Download complete
ec06c1612b0a: Download complete
acceda87b341: Download complete
939052532b6f: Download complete
d2dee4cc07fe: Download complete
4fe5e9dd4fad: Download complete
8fd08517e0c6: Download complete
7105a3ca8c38: Download complete
66c0798f609e: Download complete
94f3fc35ed38: Download complete
aa68263474a3: Download complete
6e7d1433394b: Download complete
f5902e69d9b7: Download complete
490bb991b4de: Download complete
fab6e92b04fa: Download complete
failed to register layer: Error processing tar file(exit status 1): Error cleaning up after pivot: remove /.pivot_root297865553: device or resource busy
me#mylaptop $ docker images
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
me#mylaptop $ sudo systemctl start docker
me#mylaptop $ systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-09-30 08:11:12 CEST; 15min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 908 (dockerd)
Tasks: 10
Memory: 140.8M
CGroup: /system.slice/docker.service
└─908 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
sep 30 08:11:11 mylaptop dockerd[908]: time="2020-09-30T08:11:11.992016198+02:00" level=warning msg="Your kernel does not support cgroup rt runtime"
sep 30 08:11:11 mylaptop dockerd[908]: time="2020-09-30T08:11:11.992433459+02:00" level=info msg="Loading containers: start."
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.227615723+02:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can b>
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.296603004+02:00" level=info msg="Loading containers: done."
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.486944893+02:00" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: >
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.487273874+02:00" level=info msg="Docker daemon" commit=48a66213fe graphdriver(s)=overlay2 version=19.03.12-ce
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.491959213+02:00" level=info msg="Daemon has completed initialization"
sep 30 08:11:12 mylaptop dockerd[908]: time="2020-09-30T08:11:12.530816090+02:00" level=info msg="API listen on /run/docker.sock"
sep 30 08:11:12 mylaptop systemd[1]: Started Docker Application Container Engine.
sep 30 08:23:36 mylaptop dockerd[908]: time="2020-09-30T08:23:36.941202710+02:00" level=info msg="Attempting next endpoint for pull after error: failed to register layer: Error processing tar fi>
me#mylaptop $ docker images
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
me#mylaptop $ docker pull alpine:3.12.0
3.12.0: Pulling from library/alpine
df20fa9351a1: Extracting [==================================================>] 2.798MB/2.798MB
failed to register layer: Error processing tar file(exit status 1): Error cleaning up after pivot: remove /.pivot_root517304538: device or resource busy
Solved it. The problem is that my kernel was/became to old.
The warning below by systemctl brought made me find this post on forums.docker.com
me#mylaptop $ systemctl status docker
...
sep 30 08:11:11 mylaptop dockerd[908]: time="2020-09-30T08:11:11.992016198+02:00" level=warning msg="Your kernel does not support cgroup rt runtime"
...
I'm running Manjaro so I upgrade my kernel with this command:
sudo mhwd-kernel -i linux54
After which docker worked again.

Docker daemon no start

I'm trying to start the Docker daemon:
sudo systemctl start docker
But nothing happens, the cursor just blinks and the process never ends.
Yesterday it was working properly :(
sudo journalctl -fu docker
ago 18 16:05:24 host docker[1602]: time="2016-08-18T16:05:24.467635627-05:00" level=info msg="New containerd process, pid: 1609\n"
ago 18 16:05:24 host docker[1602]: time="2016-08-18T16:05:24.482107319-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
ago 18 16:05:30 host docker[1602]: time="2016-08-18T16:05:30.470570243-05:00" level=info msg="New containerd process, pid: 1620\n"
ago 18 16:05:30 host docker[1602]: time="2016-08-18T16:05:30.491495106-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
ago 18 16:08:06 host systemd[1]: Stopped Docker Application Container Engine.
-- Reboot --
ago 18 16:16:52 host systemd[1]: Starting Docker Application Container Engine...
ago 18 16:16:54 host docker[2294]: time="2016-08-18T16:16:54.360878396-05:00" level=info msg="New containerd process, pid: 2327\n"
ago 18 16:16:54 host docker[2294]: time="2016-08-18T16:16:54.686503187-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
ago 18 16:17:00 host docker[2294]: time="2016-08-18T16:17:00.664023288-05:00" level=info msg="New containerd process, pid: 2368\n"
ago 18 16:17:00 host docker[2294]: time="2016-08-18T16:17:00.67708602-05:00" level=fatal msg="bad listen address format /var/run/docker/libcontainerd/docker-containerd.sock, expected proto://address"
One interesting thing with systemd is that if it thinks that a daemon is running, then the start command does nothing.
I have had to do the following to make sure I cleanly restart certain daemons:
sudo systemctl stop service-name
# wait a little if the service is slow to stop like the Cassandra database
sudo systemctl start service-name
That has worked for me with various services.
One way to know whether the service is considered running, is to check the status like so:
systemctl status service-name

my coreos/fleet deployed service is dying and I can't tell why

I'm trying to deploy nsqlookupd using fleet on a brand shiny new coreos cluster in EC2. Here is my systemd unit file:
[Unit]
Description=nsqlookupd service
After=docker.service
Requires=docker.service
[Service]
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill nsqlookupd
ExecStartPre=-/usr/bin/docker rm nsqlookupd
ExecStart=/usr/bin/docker run -d --name=nsqlookupd -e BROADCAST_ADDRESS=$COREOS_PUBLIC_IPV4 -p 4160:4160 -p 4161:4161 mikedewar/nsqlookupd
ExecStartPost=/usr/bin/etcdctl set /nsqlookupd_broadcast_address $COREOS_PUBLIC_IPV4
ExecStop=/usr/bin/docker stop -t 1 nsqlookupd
ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address
I've verified the container works fine if I just run the ExecStart command. My docker logs just look like
~ $ docker logs nsqlookupd
2014/08/08 02:23:58 nsqlookupd v0.2.29-alpha (built w/go1.2.2)
2014/08/08 02:23:58 TCP: listening on [::]:4160
2014/08/08 02:23:58 HTTP: listening on [::]:4161
and my fleetctl journal looks like
$ fleetctl journal nsqlookupd.service
-- Logs begin at Sun 2014-08-03 12:49:00 UTC, end at Fri 2014-08-08 02:30:06 UTC. --
Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Starting nsqlookupd service...
Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: Error response from daemon: No such container: nsqlookupd
Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: 2014/08/08 02:23:57 Error: failed to kill one or more containers
Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: Error response from daemon: No such container: nsqlookupd
Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: 2014/08/08 02:23:57 Error: failed to remove one or more containers
Aug 08 02:23:57 ip-10-147-9-249 etcdctl[6157]: 54.198.93.169
Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Started nsqlookupd service.
Aug 08 02:23:57 ip-10-147-9-249 docker[6155]: 0fce4465f61c092541ba9d4c4e89ce13c4d6bedc096519034ed585d7adb5e0d7
Aug 08 02:23:59 ip-10-147-9-249 docker[6194]: nsqlookupd
both of which look just fine. But the container dies quietly, and my fleetctl list-units gives
$ fleetctl list-units
UNIT STATE LOAD ACTIVE SUB DESC MACHINE
nsqlookupd.service launched loaded deactivating stop nsqlookupd service 1320802c.../10.147.9.249
Running docker images is a little worrying:
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
<none> <none> 8ef9d8f9d18d 9 minutes ago 710 MB
mikedewar/nsqadmin latest 432af572bda8 2 days ago 710 MB
mikedewar/nsqd latest 00bd4e474964 2 days ago 710 MB
<none> <none> adf0ed97208e 3 weeks ago 710 MB
mikedewar/nsqlookupd latest 2219c0e783d9 3 weeks ago 710 MB
<none> <none> 35d2212f8932 3 weeks ago 710 MB
mikedewar/nsq latest f9794fe056e1 3 weeks ago 710 MB
busybox latest a9eb17255234 9 weeks ago 2.433 MB
zmarcantel/cassandra latest b1168b45b4f8 4 months ago 738 MB
as I've been updating mikedewar/nsqlookupd quite regularly over the last 3 weeks. Maybe that's the time I first pushed something to docker hub? I'd love to know that the image I'm working with is the up-to-date one. I've tried docker rmi mikedewar/nsqlookupd followed by docker pull mikedewar/nsqlookupd but the CREATED column still says it was created 3 weeks ago.
I don't know if this is useful, but the ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address command seems to have worked - the etcdctl log line in the fleet journal suggests I managed to set the key to my IP, but after the container dies I can't get that key from etcd.
Any help on where to look next for clues, or any ideas why this is happening would be greatly appreciated! As is probably clear I'm rather new to this sort of thing...
You shouldn't run docker containers in detached mode in a unit file. Your execstart contains it: ExecStart=/usr/bin/docker run -d. This will cause systemd to think the process exited immediately since it was forked into the background.
As for managing versions, if you want to be absolutely sure you're getting the latest copy, you should tag your containers and then pull mikedewar/nsqlookupd:1.2.3. You can increment this each time in your fleet unit file.

Resources