In my docker swarm cluster, I am running nexus3 repository as docker image registry. This repository is a critical component of my devops infrastructure, because we have many jenkins instances running in the swarm thar start all their build in separate build agent containers. When my nexus service is down, my jenkins instances can not pull the images for the build agent containers and so they are not able to start builds anymore, if nexus crashes (for example because the cluster node which is running nexus crashes or is rebooted).
Yesterday we had the additional problem, that the node, nexus was running on, was the only node that had a local copy of the image for nexus itself. So no other host could launch my nexus service. I had to rebuild my image from the GIT repository with the Dockerfile on another node and then launch nexus there. All in all, this took about half an hour in which we were not able do start any build job.
So I tried to start nexus in replicated mode with two replicas, like this:
deploy:
mode: replicated
replicas: 2
My nexus service is using a volume like this:
services:
nexus:
volumes:
- sonatype-work:/opt/sonatype/sonatype-work
[...]
volumes:
sonatype-work:
driver: local
driver_opts:
o: bind
device: /mnt/docker-data/services/nexus3/sonatype-work
type: none
When I redeploy my nexus stack, one instance is starting and the second one always starts and then exits without error (completed). See docker service ps for my nexus service:
docker#master:/mnt/docker-data/services/nexus3$ docker service ps nexus_nexus
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
oxez5kl866ma nexus_nexus.1 localhost:5001/devops/nexus3:0.0.1 node1 Running Running 2 minutes ago
u0s6slqlj0uf nexus_nexus.2 localhost:5001/devops/nexus3:0.0.1 node4 Ready Ready 1 second ago
d1u1btefzf1s \_ nexus_nexus.2 localhost:5001/devops/nexus3:0.0.1 node4 Shutdown Complete 1 second ago
ythgbtrmycon \_ nexus_nexus.2 localhost:5001/devops/nexus3:0.0.1 node3 Shutdown Complete 8 seconds ago
The log (docker service logs -f nexus_nexus) gives no information why my second instance does not stay running but always completes and restarts.
Is there maybe an I/O conflict, because both instances try to use a volume on the node they are deployed on that points to the same host directory (see device in my volume definition)?
Or does someone have another idea what is going wrong here?
Related
This question already has answers here:
Docker container will automatically stop after "docker run -d"
(22 answers)
Closed 10 days ago.
Given this simple Docker compose.yaml file:
services:
test:
image: node:18
website:
image: nginx
After running:
docker compose up
docker ps
I expected to see two running containers/images. Instead I got just the one:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c970ef47fb93 nginx "/docker-entrypoint.…" 51 seconds ago Up 48 seconds 80/tcp myid-website-1
What is happening here? Does Docker expect that a persistent process is kept running within the container image? How does it decide which services persist?
I also noticed that adding restart: always caused the Node image to perpetually restart. What would be a good way to get the Node image to start via Docker Compose, such that I could log into it via docker exec?
My instinct is that this has to do with the distinction between services and images/containers.
For a container to persist, it needs to run a program. When the program ends, the container is stopped.
The Nginx image runs the command nginx -g daemon off; which starts Nginx and then waits for requests to come in. It doesn't end.
The Node image runs the command node. When there's no arguments passed to it, it runs in interactive mode. But when you run it like you do, there's no TTY attached to the container, so node sees that there's no way to get any input. So node exits and the container is stopped.
If you run the command docker ps -a, you'll also see stopped containers. You'll then see that your node container has exited.
A word of warning, this is my first posting, and I am new to docker and Kubernetes with enough knowledge to get me into trouble.
I am confused about where docker container images are being stored and listing images.
To illustrate my confusion I start with the confirmation that "docker images" indicates no image for nginx is present.
Next I create a pod running nginx.
kubectl run nginx --image=nginx is succesful in pulling image "nginx" from github (or that's my assumption):
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8s default-scheduler Successfully assigned default/nginx to minikube
Normal Pulling 8s kubelet Pulling image "nginx"
Normal Pulled 7s kubelet Successfully pulled image "nginx" in 833.30993ms
Normal Created 7s kubelet Created container nginx
Normal Started 7s kubelet Started container nginx
Even though the above output indicates the image is pulled, issuing "docker images" does not include nginx the output.
If I understand correctly, when an image is pulled, it is being stored on my local disk. In my case (Linux) in /var/lib/docker.
So my first question is, why doesn't docker images list it in the output, or is the better question where does docker images look for images?
Next if I issue a docker pull for nginx it is pulled from what I assume to be Github. docker images now includes it in it's output.
Just for my clarification, nothing up to this point involves a private local registry, correct?
I purposefully create a basic local Docker Registry using the docker registry container thinking it would be clearer since that will allow me to explicitly specify a registry but this only results in another issue:
docker run -d \
-p 5000:5000 \
--restart=always \
--name registry \
-v /registry:/var/lib/registry \
registry
I tag and push the nginx image to my newly created local registry:
docker tag nginx localhost:5000/nginx:latest
docker push localhost:5000/nginx:latest
The push refers to repository [localhost:5000/nginx]
2bed47a66c07: Pushed
82caad489ad7: Pushed
d3e1dca44e82: Pushed
c9fcd9c6ced8: Pushed
0664b7821b60: Pushed
9321ff862abb: Pushed
latest: digest: sha256:4424e31f2c366108433ecca7890ad527b243361577180dfd9a5bb36e828abf47 size: 1570
I now delete the original nginx image:
docker rmi nginx
Untagged: nginx:latest
Untagged: nginx#sha256:9522864dd661dcadfd9958f9e0de192a1fdda2c162a35668ab6ac42b465f0603
... and the newely tagged one:
docker rmi localhost:5000/nginx
Untagged: localhost:5000/nginx:latest
Untagged: localhost:5000/nginx#sha256:4424e31f2c366108433ecca7890ad527b243361577180dfd9a5bb36e828abf47
Deleted: sha256:f652ca386ed135a4cbe356333e08ef0816f81b2ac8d0619af01e2b256837ed3e
... but from where are they being deleted?
Now the image nginx should only be present in localhost:5000/? But docker images doesn't show it in it's output.
Moving on, I try to create the nginx pod once more using the image pushed to localhost:5000/nginx:latest.
kubectl run nginx --image=localhost:5000/nginx:latest --image-pull-policy=IfNotPresent
This is the new issue. The connection to localhost:5000 is refused.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 1s kubelet Pulling image "localhost:5000/nginx:latest"
Warning Failed 1s kubelet Failed to pull image "localhost:5000/nginx:latest": rpc error: code = Unknown desc = Error response from daemon: Get "http://localhost:5000/v2/": dial tcp 127.0.0.1:5000: connect: connection refused
Warning Failed 1s kubelet Error: ErrImagePull
Normal BackOff 0s kubelet Back-off pulling image "localhost:5000/nginx:latest"
Why is it I can pull and push to localhost:5000, but pod creation fails with what appears to be an authorization issue? I try logging into the registry but no matter what I use for the username and user password, login is successful. This confuses me more.
I would try creating/specifying imagePullSecret, but based on docker login outcome, it doesn't make sense.
Clearly I not getting it.
Someone please have pity on me and show where I have lost my way.
I will try to bring some clarity to you despite the fact your question already contains about 1000 questions (and you'll probably have 1000 more after my answer :D)
Before you can begin to understand any of this, you need to learn a few basic things:
Docker produces images which are used by containers - it similar to Virtual Machine, but more lightweight (I'm oversimplifying, but the TL;DR is pretty much that).
Kubernetes is an orchestration tool - it is responsible for starting containers (by using already built images) and tracking their state (i.e. if this container has crashed it should be restarted, or if it's not started it should be started, etc)
Docker can run on any machine. To be able to start a container you need to build an image first. The image is essentially a lightweight mini OS (i.e. alpine, ubuntu, windows, etc) which is configured with only those dependencies you need to run your application. This image is then pushed to a public repository/registry (hub.docker.com) or to a private one. And afterwards it's used for starting containers.
Kubernetes builds on top of this and adds the "automation" layer which is responsible for scheduling and monitoring the containers. For example, you have a group of 10 servers all running nginx. One of those servers restarts - the nginx container will be automatically started by k8s.
A kubernetes cluster is the group of physical machines that are dedicated to the mentioned logical cluster. These machines have labels or tags which define the purpose of physical node and work as a constraint for where a container will be scheduled.
Now that I have explained the minimum basics in an oversimplified way I can move with answering your questions.
When you do docker run nginx - you are instructing docker to pull the nginx image from https://hub.docker.com/_/nginx and then start it on the machine you executed the command on (usually your local machine).
When you do kubectl run nginx --image=nginx - you are instructing Kubernetes to do something similar to 1. but in a cluster. The container will be deployed to a random machine somewhere in the cluster unless you put a nodeSelector or configure affinity. If you put a nodeSelector this container (called Pod in K8S) will be placed on that specific node.
You have started a private registry server on your local machine. It is crucial to know that localhost inside a container will point to the container itself.
It is worth mentioning that some of the kubernetes commands will create their own container for the execution phase of the command. (remember this!)
When you run kubectl run nginx --image=nginx everything works fine, because it is downloading the image from https://hub.docker.com/_/nginx.
When you run kubectl run nginx --image=localhost:5000/nginx you are telling kubernetes to instruct docker to look for the image at localhost which is ambiguous because you have multiple layers of containers running (check 4.). This means the command that will do docker pull localhost:5000/nginx also runs in a docker container -- so there is no service running at port :5000 (the registry is running in a completely different isolated container!) :D
And this is why you are getting Error: ErrImagePull - it can't resolve localhost as it points to itslef.
As for the docker rmi nginx and docker rmi localhost:5000/nginx commands - by running them you removed your local copy of the nginx images.
If you run docker run localhost:5000/nginx on the machine where you started docker run registry you should get a running nginx container.
You should definitely read the Docker Guide BEFORE you try to dig into Kubernetes or nothing will ever make sense.
Your head will stop hurting after that I promise... :D
TL;DR
docker images lists images stored in the docker daemon's data root, by default /var/lib/docker.
You're deploying images to Kubernetes, the images are pulled onto the node on which the pod is scheduled. For example, using Kubernetes in Docker:
kind create cluster
kubectl run nginx --image=nginx
docker exec -it $(kubectl get pod nginx -o jsonpath={.spec.nodeName}) crictl images
crictl is a command-line interface for CRI-compatible container runtimes.
Docker images are pulled from Docker Hub by default, not Github. When using a local docker registry, images are stored in the registry's data volume. The docker registry storage may be customized, by default data is stored in (storage.filesystem.rootdirectory) /var/lib/registry.
You can use tools like skopeo to list images stored in a docker registry, for example:
skopeo list-tags docker://localhost:5000/nginx --tls-verify=false
I am following the Docker Getting Started Tutorials and I am stuck at part 3 (Docker Compose). I am operating on a fresh Ubuntu 16.04 installation. Following the tutorial from part 1 and part 2, except logging in to a Docker account and pushing the newly created image to a remote repository.
The Docker image file and python file are the same as in part 2, and the .yml file is the same as part 3 (I copy-pasted them with vim).
I can deploy a stack with docker compose apparently fine. However, when getting to the part where I am supposed to send a request via curl, I get the following response
curl: (7) Failed to connect to localhost port 80: Connection refused
This is the output of docker service ls right after returning from docker stack deploy:
ID NAME MODE REPLICAS IMAGE PORTS
m3ux2u3i6cpv getstartedlab_web replicated 1/5 username/repo:tag *:80->80/tcp
This is the output of docker container ls (fired right after docker service ls):
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bd870fcb64f4 username/repo:tag "python app.py" 7 seconds ago Up 1 second 80/tcp getstartedlab_web.2.p9v9p34kztmu8rvht3ndg2xtb
db73404d495f username/repo:tag "python app.py" 7 seconds ago Up 1 second 80/tcp getstartedlab_web.1.z3o2t10oiidtzofsonv9cwcvd
While docker ps returns no lines.
And this is the output of docker network ls:
NETWORK ID NAME DRIVER SCOPE
5776b070996c bridge bridge local
47549d9b2e88 docker_gwbridge bridge local
59xa0454g133 getstartedlab_webnet overlay swarm
e27f62ede27d host host local
ramvt1h8ueg7 ingress overlay swarm
f0fe862c5dcc none null local
I can still run the image as a single container and get the expected result, i.e. being able to connect to it via browser or curl and get an error message related to Redis, but I do not understand why it doesn't work when I deploy a stack.
As far as Ubuntu firewall settings are concerned, I have tinkered with none since the installation, and as for Docker and Docker Compose, I have only followed the steps in the "getting started" tutorials in chapters 1-to-3, including downloading Docker Compose binaries and changing permissions with chmod as described. I also added my user to the Docker group so I don't have to sudo every time I need to run a command. I am not behind a proxy server (I am running all tests in local) and I haven't tinkered with any defaults either.
I think this may be a duplicate of this question, though it hasn't been answered nor commented on yet. It is not a duplicate of this question, as I am following a different tutorial.
UPDATE:
As it was, I was using EXACTLY the same docker-compose.yml file. The key issue was a name mismatch with the docker image name, as is visible in the docker service ls output. I thank Janshair Khan for the inspiration. What is strange is for there to be a username/repo image created apparently 9 months ago:
REPOSITORY TAG IMAGE ID CREATED SIZE
<my getting-started image>
python 2.7-slim 4fd30fc83117 7 weeks ago 138MB
hello-world latest f2a91732366c 2 months ago 1.85kB
username/repo <none> c7f5ee4d4030 9 months ago 182MB
Anybody who knows Is there a way to watch the change of service`tasks in docker swarm mode?
Any some examples?
First, list the services:
docker service ls
Then you can list the tasks running for a given service with:
docker service ps <service>
Example:
$ docker service ps redis
ID NAME SERVICE IMAGE LAST STATE DESIRED STATE NODE
0qihejybwf1x5vqi8lgzlgnpq redis.1 redis redis:3.0.6 Running 8 seconds Running manager1
bk658fpbex0d57cqcwoe3jthu redis.2 redis redis:3.0.6 Running 9 seconds Running worker2
5ls5s5fldaqg37s9pwayjecrf redis.3 redis redis:3.0.6 Running 9 seconds Running worker1
After a docker service scale you can see the new tasks being added or removed depending on the number desired.
You can also follow the status change (running/stopped, etc.) for tasks under the LAST STATE column.
All these informations about task status change are also available through the Docker Remote API.
I am trying out docker swarm with 1.12 on my Mac. I started 3 VirtualBox VMs, created a swarm cluster of 3 all fine.
docker#redis1:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
2h1m8equ5w5beetbq3go56ebl redis3 Ready Active
8xubu8g7pzjvo34qdtqxeqjlj redis2 Ready Active Reachable
cbi0lyekxmp0o09j5hx48u7vm * redis1 Ready Active Leader
However, when I create a service, I see no errors yet replicas always displays 0/1:
docker#redis1:~$ docker service create --replicas 1 --name hello ubuntu:latest /bin/bash
76kvrcvnz6kdhsmzmug6jgnjv
docker#redis1:~$ docker service ls
ID NAME REPLICAS IMAGE COMMAND
76kvrcvnz6kd hello 0/1 ubuntu:latest /bin/bash
docker#redis1:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
What could be the problem? Where do I look for logs?
Thanks!
The problem is that your tasks (calling bin/bash) exits quickly since it's not doing anything.
If you look at the tasks for your service, you'll see that one is started and then shutdown within seconds. Another one is then started, shutdown and so on, since you're requested that 1 task be running at all times.
docker service ps hello
If you use ubuntu:latest top for instance, the task will stay up running.
This also can happen if you specify a volume in your compose file that is bound to a local directory that does not exist.
If you look at the log (on some Linux systems, this is journalctl -xe), you'll see which volume can't be bound.
In my case, the replicas were not working and a 0/0 was shown as I did not build them before.
As I saw here, when u publish to swarm with a
docker-compose.yml you need to build them before
So, I decided to do a full system prune, and next to it, a build and a deploy (here, my stack was called demo and I did not have previous services or containers running):
docker stack rm demo
docker system prune --all
docker-compose build
docker stack deploy -c ./docker-compose.yml demo
After this, all was up and running and now services replicas are up on swarm
PS C:\Users\Alejandro\demo> docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
oi0ngcmv0v29 demo_appweb replicated 2/2 webapp:1.0 *:80->4200/tcp
ahuyj0idz5tv demo_express replicated 2/2 backend:1.0 *:3000->3000/tcp
fll3m9p6qyof demo_fileinspector replicated 1/1 fileinspector:1.0 *:8080->8080/tcp
The way I maintain the replicas working, at the moment, in dev mode:
Angular/CLi app:
command: >
bash -c "npm install && ng serve --host 0.0.0.0 --port 4200"
NodeJS Backend (Express)
command: >
bash -c "npm install && set DEBUG=myapp:* & npm start --host 0.0.0.0 --port 3000"