Trying to set up a zero-downtime deployment using docker stack deploy, docker swarm one node localhost environment.
After building image demo:latest, the first deployment using the command docker stack deploy --compose-file docker-compose.yml demo able to see 4 replicas running and can access nginx default home page on port 8080 on my local machine. Now updating index.html, building image with the same name and tag running docker stack deplopy command causing below error and changes are not reflected.
Deleting the deployment and recreating will work, but I am trying to see how can updates rolled in without downtime. Please help here.
Error
Updating service demo_demo (id: wh5jcgirsdw27k0v1u5wla0x8)
image demo:latest could not be accessed on a registry to record
its digest. Each node will access demo:latest independently,
possibly leading to different nodes running different
versions of the image.
Dockerfile
FROM nginx:1.19-alpine
ADD index.html /usr/share/nginx/html/
docker-compose.yml
version: "3.7"
services:
demo:
image: demo:latest
ports:
- "8080:80"
deploy:
replicas: 4
update_config:
parallelism: 2
order: start-first
failure_action: rollback
delay: 10s
rollback_config:
parallelism: 0
order: stop-first
TLDR: push your image to a registry after you build it
Docker swarm doesn't really work without a public or private docker registry. Basically all the nodes need to get their images from the same place, and the registry is the mechanism by which that information is shared. There are other ways to get images loaded on each node in the swarm, but it involves executing the same commands on every node one at a time to load in the image, which isn't great.
Alternatively you could use docker configs for your configuration data and not rebuild the image every time. That would work passably well without a registry, and you can swap out the config data with little-no downtime:
Rotate Configs
Related
TL;DR: I have two almost identical services in my compose file except for the name of the service and the published ports. When deploying with docker stack deploy..., why does the first service fail with a no such image error, while the second service using the same image runs perfectly fine?
Full: I have a docker-compose file with two Apache Tomcat services pulling the same image from my private git repository. The only difference between the two services in my docker-compose.yml is the name of the service (*_dev vs. *_prod) and the published ports. I deploy this docker-compose file on my swarm using the Gitlab CI with the gitlab-ci.yml. For the deployment of my docker-compose in this gitlab-ci.yml I use two commands:
...
script:
- docker pull $REGISTRY:$TAG
- docker stack deploy -c docker-commpose.yml webapp1 --with registry-auth
...
(I use a docker pull [image] command to have the image on the right node, since my --with-registry-auth is not working properly, but this is not my problem currently).
Now the strange thing is that for the first service, I obtain a No such image: error and the service is stopped, while for the second service everything seems to run perfectly fine. Both services are on the same worker node. This is what I get if I docker ps:
:~$ docker service ps webapp1_tomcat_dev
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
xxx1 webapp1_tomcat_dev.1 url/repo:tag worker1 node Shutdown Rejected 10 minutes ago "No such image: url/repo:tag#xxx…"
xxx2 \_ webapp1_tomcat_dev.1 url/repo:tag worker1 node Shutdown Rejected 10 minutes ago "No such image: url/repo:tag#xxx…"
:~$ docker service ps webapp1_tomcat_prod
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
xxx3 webapp1_tomcat_prod.1 url/repo:tag worker1 node Running Running 13 minutes ago
I have used the --no-trunc obtain to see that the IMAGE used by *_prod and *_dev is identical.
The restart_policy in my docker-compose explains why the first service fails three minutes after the second service started. Here is my docker-compose:
version: '3.2'
services:
tomcat_dev:
image: url/repo:tag
deploy:
restart_policy:
condition: on-failure
delay: 60s
window: 120s
max_attempts: 1
ports:
- "8282:8080"
tomcat_prod:
image: url/repo:tag
deploy:
restart_policy:
condition: on-failure
delay: 60s
window: 120s
max_attempts: 1
ports:
- "8283:8080"
Why does the first service fail with a no such image error? Is it for example just not possible to have two services, that use the same image, work on the same worker node?
(I cannot simply scale-up one service, since I need to upload files to the webapp which are different for production and development - e.g. dev vs prod licenses - and hence I need two distinct services)
EDIT: Second service works because it is created first:
$ docker stack deploy -c docker-compose.yml webapp1 --with-registry-auth
Creating service webapp1_tomcat_prod
Creating service webapp1_tomcat_dev
I found a workaround by separating my services over two different docker compose files (docker-compose-prod.yml and docker-compose-dev.yml) and perform the docker stack deploy command in my gitlab-ci.yml twice:
...
script:
- docker pull $REGISTRY:$TAG
- docker stack deploy -c docker-commpose-prod.yml webapp1 --with registry-auth
- docker pull $REGISTRY:$TAG
- docker stack deploy -c docker-commpose-dev.yml webapp1 --with registry-auth
...
My gut says my restart_policy in my docker-compose was too strict as well (had a max_attempts: 1) and may be due to this the image couldn't be used in time / within one restart (as suggested by #Ludo21South). Hence I allowed more attempts, but since I already separated the services over two files (which worked already) I have not checked if this hypothesis is true.
I have a docker-compose file that exposes 2 services, a master service and a slave service. I want to be able to scale the slave service to some number of instances using
docker-compose up --scale slave=N
However, one of the options I must specify on command run in the master service is the number of slave instances to expect. E.g. If I scale slave=10, I need to set --num-slaves=10 in the command on the master service.
Is there a way to determine the number of instances of a given service either from the docker-compose file itself, or from a customized entrypoint shellscript?
The problem I'm facing is that since there is no way I've yet found to specify the number of scaled instances from within the docker-compose file format itself, I'm relying on the person running the command to enter the scale factor consistently and to have that value align with the value I need to tell the master node to expect. And trusting users to do the right thing is a recipe for disaster. If I could continue to let the user specify the scale value on the command line, I need a way to determine what that value is at runtime.
scale is not added to up from compose version 3 but you may use replicas:
version: "3.7"
services:
redis:
image: redis:latest
deploy:
replicas: 1
and run it using:
docker-compose --compatibility up -d
docker-compose 1.20.0 introduces a new --compatibility flag designed
to help developers transition to version 3 more easily. When enabled,
docker-compose reads the deploy section of each service’s definition
and attempts to translate it into the equivalent version 2 parameter.
Currently, the following deploy keys are translated:
resources limits and memory reservations
replicas
restart_policy condition and max_attempts
but:
Do not use this in production!
We recommend against using --compatibility mode in production. Because
the resulting configuration is only an approximate using non-Swarm
mode properties, it may produce unexpected results.
see this
PS:
Docker container names must be unique you cannot scale a service beyond 1 container if you have specified a
custom name. Attempting to do so results in an error.
Unfortunately there is no way to define replicas for docker compose. IT ONLY WORKS FOR DOCKER SWARM The documentation specifies it link
Tip: Alternatively, in Compose file version 3.x, you can specify replicas under the deploy key as part of a service configuration for Swarm mode. The deploy key and its sub-options (including replicas) only works with the docker stack deploy command, not docker-compose up or docker-compose run.
So if you have the deploy section in the yaml, but run it with docker-compose, then it will not take any effect.
version: "3.3"
services:
alpine1:
image: alpine
container_name: alpine1
command: ["/bin/sleep", "10000"]
deploy:
replicas: 4
alpine2:
image: alpine
container_name: alpine2
command: ["/bin/sleep", "10000"]
deploy:
replicas: 2
So the only way to scale up in docker compose is by running the scale command manually.
docker-compose scale alpine1=3
Note I had a job in which they loved docker-compose so we had bash scripts to perform operations such as the ones you describe. So for example we would have something like ./controller-app.sh scale test_service=10 and it would run docker-compose scale test_service=10
UPDATE
To check the number of replicas you can mount the docker socket into your container. Then run docker ps --format {{ .Names }} | grep $YOUR_CONTAINER_NAME.
Here is how you would mount the socket.
docker run -v /var/run/docker.sock:/var/run/docker.sock -it alpine sh
Install docker
apk update
apk add docker
I have a docker-compose.yml file which works with docker-compose up --build. My app works and everything is fine.
version: '3'
services:
myapp:
container_name: myapp
restart: always
build: ./myapp
ports:
- "8000:8000"
command: /usr/local/bin/gunicorn -w 2 -b :8000 flaskplot:app
nginx:
container_name: nginx
restart: always
build: ./nginx
ports:
- "80:80"
depends_on:
- myapp
But when I use docker stack deploy -c docker-compose.yml myapp, I get the following error:
Ignoring unsupported options: build, restart
Ignoring deprecated options:
container_name: Setting the container name is not supported.
Creating network myapp_default
Creating service myapp_myapp
failed to create service myapp_myapp: Error response from daemon: rpc error: code = InvalidArgument desc = ContainerSpec: image reference must be provided
any hints how I should "translate" the docker-compose.yml file to make it compatible with docker stack deploy?
To run containers in swarm mode, you do not build them on each swarm node individually. Instead you build the image once, typically on a CI server, push to a registry server (often locally hosted, or you can use docker hub), and specify the image name inside your compose file with an "image" section for each service.
Doing that will get rid of the hard error. You'll likely remove the build section of the compose file since it no longer applies.
Specifying "container_name" is unsupported because it would break the ability to scale or perform updates (a container name must be unique within the docker engine). Let swarm name the containers and reference your app on the docker network by it's service name.
Specifying "depends_on" is not supported because containers may be started on different nodes, and rolling updates/failure recovery may remove some containers providing a service after the app started. Docker can retry the failing app until the other service starts up, or preferably you configure an entrypoint that waits for the dependencies to become available with some kind of ping for a minute or two.
Without seeing your Dockerfile, I'd also recommend setting up a healthcheck on each image. Swarm mode uses this to control rolling updates and recover from application failures.
Lastly, consider adding a "deploy" section to your compose file. This tells swarm mode how to deploy and update your service, including how many replicas, constraints on where to run, memory and CPU limits and requirements, and how fast to update the service. You can define a restart policy here as well but I recommend against it since I've seen docker engines restarting containers that conflict with swarm mode deploying containers on other nodes, or even a new container on the same node.
You can see the full compose file documentation with all of these options here: https://docs.docker.com/compose/compose-file/
I am working on building automated CI/CD pipeline for LAMP application using docker.
I want image to be spinned into 5 containers, so that 5 different developers can work on their code. Can this be atained? I tried it using replicas, but it didnt worked out.
version: '3'
services:
web:
build: .
ports:
- "8080:80"#
deploy:
mode: replicated
replicas: 4
Error which i get:
:#!/bin/bash -eo pipefail docker-compose up ERROR: The Compose file
'./docker-compose.yml' is invalid because: Additional properties are
not allowed ('jobs' was unexpected) You might be seeing this error
because you're using the wrong Compose file version. Either specify a
supported version (e.g "2.2" or "3.3") and place your service
definitions under the services key, or omit the version key and place
your service definitions at the root of the file to use version 1. For
more on the Compose file format versions, see
docs.docker.com/compose/compose-file Exited with code 1 –
Also, from different container, can developers push, pull and commit to git? Will work done in one container will get lost if image is rebuild or run?
What things should i actually take care of while building this pipeline.
First of all, build your image separately using a Dockerfile with docker build -t <image name>:<version/tag> . then use following compose file with docker stack deploy to deploy your stack.
version: '3'
services:
web:
image: <image name>:<version/tag>
ports:
- "8080:80"#
deploy:
mode: replicated
replicas: 4
deploy attribute should be inside a service because it describes the number of replicas a service must have. It is not a global attribute like services. That seems to be the only problem in your compose file and docker compose up is complaining about this when running from the pipeline.
Update
You cannot run multiple replicas with a single docker-compose command. To run multiple replicas from a compose.yml, create a swarm by executing docker swarm init on your machine.
Afterward, simply replace docker-compose up with docker stack deploy <stack name>. docker-compose simply ignores the deploy attribute.
For details on differences between docker-compose up and docker stack deploy <stack name> refer to this article: https://nickjanetakis.com/blog/docker-tip-23-docker-compose-vs-docker-stack
I started a flask API service onto docker swarm cluster with 1 master and 3 worker node. I have deployed task using the following docker compose file,
version: '3'
services:
xgboost-model-api:
image: xgboost-model-api
ports:
- "5000:5000"
deploy:
mode: global
networks:
- xgboost-net
networks:
xgboost-net:
I deployed the task using the following docker swarm command,
docker stack deploy --compose-file docker-compose.yml xgboost-swarm
However, the task was started only on my master node and not on any worker node.
$ docker service ls
ID NAME MODE REPLICAS IMAGE
pgd8cktr4foz viz replicated 1/1
dockersamples/visualizer
twrpr4av4c7f xgboost-swarm_xgboost-model-api global 1/4 xgboost-model-api
xxrfn1w7eqw6 dockercloud-server-proxy global 1/1 dockercloud/server-proxy
Dockerfile being used is here. Any thoughts on why this behavior occurs would be appreciated.
As stated in this thread (duplicate?):
If you are using a private registry its important to share the login and credentials with the worker nodes by using
docker stack deploy --with-registry-auth
---- UPDATE
From your compose file it doesn't look like you are using a private registry. Generally speaking if containers can't start successfuly on the workers they will end up on the manager.
Some possible reasons for this are:
Can't access private registry (fix with --with-registry-auth)
Application requires some change on the host to run (like elasticSearch requires vm.max_map_count=262144)
HealthCheck fails on other node because of poorly written helthcheck
Network setting issues preventing pulling an image
Try removing your stack and running it again. Then do docker service ps --no-trunc {serviceName} this might show you tasks that should run the service on another node and why it failed.
Check out this SO thread for more troubleshooting tips.