I have a docker swarm cluster that hosts my rails app and sidekiq as separate containers.
The API application writes an incoming uploaded file into the public folder and sends the path to sidekiq worker to upload to s3. I used docker volume mapping for this.
Because of this dependency, I need a sidekiq container running in all the nodes where my API application is running.
Is there any way to tell swarm to deploy a sidekiq container, when it is deploying an API container in a new node?
Or, is there any workaround which can solve my problem without volume mapping dependency in the first place?
My docker-stack.yml
version: "3.9"
services:
app:
image: rails_app
command: bundle exec rails s -e production
ports:
- 8000:8000
volumes:
- app-assets:/app/public/assets
networks:
- my-network
deploy:
replicas: 6
placement:
constraints:
- "node.role==worker"
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
delay: 5s
worker:
image: rails_app
command: bundle exec sidekiq -c 2 -e production
networks:
- my-network
volumes:
- app-assets:/app/public/assets
deploy:
replicas: 6
placement:
constraints:
- "node.role==worker"
restart_policy:
condition: on-failure
delay: 5s
networks:
my-network:
volumes:
app-assets:
Even after 3 days of googling, I was not able to find any such configurations with the Docker swarm. but I was able to solve this bottleneck by using NFS for volume mapping.
More info on the NFS : https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nfs-mount-on-ubuntu-16-04
Related
I’m trying to use traefik with docker swarm but i’m having troubles during service updates. I run a stack deploy or service update the service goes down for some seconds
How to reproduce:
1 - Create a Dockerfile:
FROM jwilder/whoami
RUN echo $(date) > daniel.txt
2 - Build 2 demo images:
$ docker build -t whoami:01 .
$ docker build -t whoami:02 .
3 - Create a docker-compose.yml:
version: '3.5'
services:
app:
image: whoami:01
ports:
- 81:8000
deploy:
replicas: 2
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.frontend.rule=Host:localhost
- traefik.port=8000
- traefik.docker.network=web
networks:
- web
reverse-proxy:
image: traefik
command:
- "--api"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=localhost"
- "--docker.watch"
- "--docker.exposedbydefault=false"
- "--docker.network=web"
deploy:
replicas: 1
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
placement:
constraints:
- node.role == manager
networks:
- web
ports:
- 80:80
- 8080:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
web:
external: true
4 - Deploy the stack:
$ docker stack deploy -c docker-compose.yml stack_name
5 - Curl to get the service response:
$ while true ; do sleep .1; curl localhost; done
You should see something like this:
I'm adc1473258e9
I'm bc82ea92b560
I'm adc1473258e9
I'm bc82ea92b560
That means the load balance is working
6 - Update the service
$ docker service update --image whoami:02 got_app
The traefik respond with Bad Gateway when should be zero downtime.
How to fix it?
Bad gateway means traefik is configured to forward requests, but it's not able to reach the container on the ip and port that it's configured to use. Common issues causing this are:
traefik and the service on different docker networks
service exists in multiple networks and traefik picks the wrong one
wrong port being used to connect to the container (use the container port and make sure it's listening on all interfaces, aka 0.0.0.0)
From the comments, this is only happening during the deploy, which means traefik is hitting containers before they are ready to receive requests, or while they are being stopped.
You can configure containers with a healthcheck and send request through swarm mode's VIP using a Dockerfile that looks like:
FROM jwilder/whoami
RUN echo $(date) >/build-date.txt
HEALTHCHECK --start-period=30s --retries=1 CMD wget -O - -q http://localhost:8000
And then in the docker-compose.yml:
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.loadbalancer.swarm=true
...
And I would also configure the traefik service with the following options:
- "--retry.attempts=2"
- "--forwardingTimeouts.dialTimeout=1s"
However, traefik will keep a connection open and the VIP will continue to send all requests to the same backend container over that same connection. What you can do instead is have traefik itself perform the healthcheck:
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.healthcheck.path=/
...
I would still leave the healthcheck on the container itself so Docker gives the container time to start before stopping the other container. And leave the retry option on the traefik service so any request to a stopping container, or just one that hasn't been detected by the healthcheck, has a chance to try try again.
Here's the resulting compose file that I used in my environment:
version: '3.5'
services:
app:
image: test-whoami:1
ports:
- 6081:8000
deploy:
replicas: 2
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.healthcheck.path=/
- traefik.frontend.rule=Path:/
- traefik.port=8000
- traefik.docker.network=test_web
networks:
- web
reverse-proxy:
image: traefik
command:
- "--api"
- "--retry.attempts=2"
- "--forwardingTimeouts.dialTimeout=1s"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=localhost"
- "--docker.watch"
- "--docker.exposedbydefault=false"
- "--docker.network=test_web"
deploy:
replicas: 1
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
placement:
constraints:
- node.role == manager
networks:
- web
ports:
- 6080:80
- 6880:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
web:
Dockerfile is as quoted above. Image names, ports, network names, etc were changed to avoid conflicting with other things in my environment.
As of today (jun/2021) Traefik can't drain the connections during update.
To achieve a zero-downtime rolling update you should delegate the load-balancing to docker swarm itself:
# trafik v2
# docker-compose.yml
services:
your_service:
deploy:
labels:
- traefik.docker.lbswarm=true
From the docs:
Enables Swarm's inbuilt load balancer (only relevant in Swarm Mode).
If you enable this option, Traefik will use the virtual IP provided by docker swarm instead of the containers IPs. Which means that Traefik will not perform any kind of load balancing and will delegate this task to swarm.
Further info:
https://github.com/traefik/traefik/issues/41
https://github.com/traefik/traefik/issues/1480
I have a two servers to use in a Docker cluster Swarm(test only), one is a Manager and other is a Worker, but running the command docker stack deploy --compose-file docker-compose.yml teste2 all the services is run in the manager and the worker not receive the containers to run, for some reason the Swarm is not achieving distributing the services in the cluster and running all in manager server.
Will my docker-compose.yml be causing the problem or might it be a network problem?
Here are some settings:
Servers CentOs 7, Docker version 18.09.4;
I executed the commands systemctl stop firewalld && systemctl disable firewalld to disable firewall;
I executed the command docker swarm join --token ... in the worker;
Result docker node ls:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
993dko0vu6vlxjc0pyecrjeh0 * name.server.manager Ready Active Leader 18.09.4
2fn36s94wjnu3nei75ymeuitr name.server.worker Ready Active 18.09.4
File docker-compose.yml:
version: "3"
services:
web:
image: testehello
deploy:
replicas: 5
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
# placement:
# constraints: [node.role == worker]
ports:
- 4000:80
networks:
- webnet
visualizer:
image: dockersamples/visualizer:stable
ports:
- 8080:8080
stop_grace_period: 1m30s
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
placement:
constraints: [node.role == manager]
networks:
webnet:
I executed the command docker stack deploy --compose-file docker-compose.yml teste2
In the docker-compose.yml I commented the parameters placement and constraints because they did not work and did not start the containers on the servers, without it the containers are started in the manager. Through the Visualizer all appear in the manager.
I think the images are not accessible from a worker node, that is why they not receive containers, try to use this guide by docker https://docs.docker.com/engine/swarm/stack-deploy/
P.S. I think you solved it already, but just in case.
I have managed swarm using compose.yml as follows.
compose.yml
version: '3'
services:
web:
image: nginx
depends_on:
- db
- api
deploy:
replicas: 2
update_config:
parallelism: 2
delay: 10s
db:
image: mysql
environment:
MYSQL_ROOT_PASSWORD: password
MYSQL_DATABASE: main
MYSQL_USER: root
MYSQL_PASSWORD: password
deploy:
placement:
constraints: [node.role == manager]
api:
image: node
depends_on:
- db
deploy:
replicas: 2
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
nginx.conf
upstream test-web {
server web:5000 fail_timeout=5s max_fails=5;
}
proxy_pass http://test-web;
The problem that I am having is that the starting order of docker containers will be random as follows.
Unexpected boot results
docker stack deploy --compose-file compose.yml blue --with-registry-auth
Creating network main_default
Creating service
Creating service main_web
Creating service main_db
Creating service main_api
Startup results as expected
docker stack deploy --compose-file compose.yml blue --with-registry-auth
Creating network main_default
Creating service
Creating service main_db
Creating service main_api
Creating service main_web
If the web container is launched earlier than the api container, nginx encounters an error that api host is not found because web container does not know that api container exists.
So I'm investigating ways to improve this problem from the following layers.
nginx
is there an option to retry with nginx option even if api domain could not be found?
docker-compose
is there a method to reliably fix the start order of containers other than links and depends_on?
supervisor
To start the process in docker I am running via supervisor now.
is there a way to retry and start nginx among supervisor option even if an api container is not found and an error occurs?
Thnak you for reading my question.
I am trying to deploy my working docker-compose set up to a docker-swarm, everything seems ok, except that the only service that got replicate and generate a running container is the redis one, the 3 others got stuck and never generate any running container, they don't even download their respective images.
I can't find any debug feature, all the logs are empty, I'm completely helpless.
Let me show you the current state of my installation.
docker node ls print =>
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
oapl4et92vjp6mv67h2vw8boq boot2docker Ready Active
x2fal9iwj6aqt1vgobyp1smv1 * manager1 Ready Active Leader
lmtuojednaiqmfmm2izl7npf0 worker1 Ready Active
The docker compose =>
version: '3'
services:
mongo:
image: mongo
container_name: mongo
restart: always
volumes:
- /data/db:/data/db
deploy:
placement:
constraints: [node.role == manager]
ports:
- "27017:27017"
redis:
image: redis
container_name: redis
restart: always
bnbkeeper:
image: registry.example.tld/keepers:0.10
container_name: bnbkeeper
deploy:
replicas: 5
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
depends_on:
- mongo
- redis
ports:
- "8080:8080"
links:
- mongo
- redis
environment:
- REDIS_HOST=redis
- MONGO_HOST=mongo
bnbkeeper-ws:
image: registry.example.tld/keepers:0.10
container_name: bnbkeeper-ws
restart: unless-stopped
depends_on:
- mongo
- redis
ports:
- "3800:3800"
links:
- mongo
- redis
environment:
- REDIS_HOST=redis
command: npm run start:subscription
The current state of my services
ID NAME MODE REPLICAS IMAGE PORTS
tbwfswsxx23f stack_bnbkeeper replicated 0/5 registry.example.tld/keepers:0.10
khrqtx28qoia stack_bnbkeeper-ws replicated 0/1 registry.example.tld/keepers:0.10
lipa8nvncpxb stack_mongo replicated 0/1 mongo:latest
xtz2411htcg7 stack_redis replicated 1/1 redis:latest
My redis successful service (docker service ps stack_redis)
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
cqv0njcgsw6f stack_redis.1 redis:latest worker1 Running Running 25 minutes ago
my mongo unsuccessful service (docker service ps stack_mongo)
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
yipokxxiftqq stack_mongo.1 mongo:latest Running New 25 minutes ago
I'm completely new to docker swarm, and probably made a silly mistake here, but I couldn't find much documentation on how to setup such a simple stack.
To monitor, try this:
journalctl -f -n10
Then run the docker stack deploy command in a separate session and see what it shows
try removing port publish and add --endpoint-mode dnsrr to your service.
I'm using docker 1.12.1
I have an easy docker-compose script.
version: '2'
services:
jenkins-slave:
build: ./slave
image: jenkins-slave:1.0
restart: always
ports:
- "22"
environment:
- "constraint:NODE==master1"
jenkins-master:
image: jenkins:2.7.1
container_name: jenkins-master
restart: always
ports:
- "8080:8080"
- "50000"
environment:
- "constraint:NODE==node1"
I run this script with docker-compose -p jenkins up -d.
This Creates my 2 containers but only on my master (from where I execute my command). I would expect that one would be created on the master and one on the node.
I also tried to add
networks:
jenkins_swarm:
driver: overlay
and
networks:
- jenkins_swarm
After every service but this is failing with:
Cannot create container for service jenkins-master: network jenkins_jenkins_swarm not found
While the network is created when I perform docker network ls
Someone who can help me to deploy 2 containers on my 2 nodes with docker-compose. Swarm is defenitly working on my "cluster". I followed this tutorial to verify.
Compose doesn't support Swarm Mode at the moment.
When you run docker compose up on the master node, Compose issues docker run commands for the services in the Compose file, rather than docker service create - which is why the containers all run on the master. See this answer for options.
On the second point, networks are scoped in 1.12. If you inspect your network you'll find it's been created at swarm-level, but Compose is running engine-level containers which can't see the swarm network.
We can do this with docker compose v3 now.
https://docs.docker.com/engine/swarm/#feature-highlights
https://docs.docker.com/compose/compose-file/
You have to initialize the swarm cluster using command
$ docker swarm init
You can add more nodes as worker or manager -
https://docs.docker.com/engine/swarm/join-nodes/
Once you have your both nodes added to the cluster, pass your compose v3 i.e deployment file to create a stack. Compose file should just contain predefined images, you can't give a Dockerfile for deployment in Swarm mode.
$ docker stack deploy -c dev-compose-deploy.yml --with-registry-auth PL
View your stack services status -
$ docker stack services PL
Try to use Labels & Placement constraints to put services on different nodes.
Example "dev-compose-deploy.yml" file for your reference
version: "3"
services:
nginx:
image: nexus.example.com/pl/nginx-dev:latest
extra_hosts:
- "dev-pldocker-01:10.2.0.42”
- "int-pldocker-01:10.2.100.62”
- "prd-plwebassets-01:10.2.0.62”
ports:
- "80:8003"
- "443:443"
volumes:
- logs:/app/out/
networks:
- pl
deploy:
replicas: 3
labels:
feature.description: “Frontend”
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: any
placement:
constraints: [node.role == worker]
command: "/usr/sbin/nginx"
viz:
image: dockersamples/visualizer
ports:
- "8085:8080"
networks:
- pl
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
replicas: 1
labels:
feature.description: "Visualizer"
restart_policy:
condition: any
placement:
constraints: [node.role == manager]
networks:
pl:
volumes:
logs: