I’m trying to use traefik with docker swarm but i’m having troubles during service updates. I run a stack deploy or service update the service goes down for some seconds
How to reproduce:
1 - Create a Dockerfile:
FROM jwilder/whoami
RUN echo $(date) > daniel.txt
2 - Build 2 demo images:
$ docker build -t whoami:01 .
$ docker build -t whoami:02 .
3 - Create a docker-compose.yml:
version: '3.5'
services:
app:
image: whoami:01
ports:
- 81:8000
deploy:
replicas: 2
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.frontend.rule=Host:localhost
- traefik.port=8000
- traefik.docker.network=web
networks:
- web
reverse-proxy:
image: traefik
command:
- "--api"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=localhost"
- "--docker.watch"
- "--docker.exposedbydefault=false"
- "--docker.network=web"
deploy:
replicas: 1
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
placement:
constraints:
- node.role == manager
networks:
- web
ports:
- 80:80
- 8080:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
web:
external: true
4 - Deploy the stack:
$ docker stack deploy -c docker-compose.yml stack_name
5 - Curl to get the service response:
$ while true ; do sleep .1; curl localhost; done
You should see something like this:
I'm adc1473258e9
I'm bc82ea92b560
I'm adc1473258e9
I'm bc82ea92b560
That means the load balance is working
6 - Update the service
$ docker service update --image whoami:02 got_app
The traefik respond with Bad Gateway when should be zero downtime.
How to fix it?
Bad gateway means traefik is configured to forward requests, but it's not able to reach the container on the ip and port that it's configured to use. Common issues causing this are:
traefik and the service on different docker networks
service exists in multiple networks and traefik picks the wrong one
wrong port being used to connect to the container (use the container port and make sure it's listening on all interfaces, aka 0.0.0.0)
From the comments, this is only happening during the deploy, which means traefik is hitting containers before they are ready to receive requests, or while they are being stopped.
You can configure containers with a healthcheck and send request through swarm mode's VIP using a Dockerfile that looks like:
FROM jwilder/whoami
RUN echo $(date) >/build-date.txt
HEALTHCHECK --start-period=30s --retries=1 CMD wget -O - -q http://localhost:8000
And then in the docker-compose.yml:
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.loadbalancer.swarm=true
...
And I would also configure the traefik service with the following options:
- "--retry.attempts=2"
- "--forwardingTimeouts.dialTimeout=1s"
However, traefik will keep a connection open and the VIP will continue to send all requests to the same backend container over that same connection. What you can do instead is have traefik itself perform the healthcheck:
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.healthcheck.path=/
...
I would still leave the healthcheck on the container itself so Docker gives the container time to start before stopping the other container. And leave the retry option on the traefik service so any request to a stopping container, or just one that hasn't been detected by the healthcheck, has a chance to try try again.
Here's the resulting compose file that I used in my environment:
version: '3.5'
services:
app:
image: test-whoami:1
ports:
- 6081:8000
deploy:
replicas: 2
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.healthcheck.path=/
- traefik.frontend.rule=Path:/
- traefik.port=8000
- traefik.docker.network=test_web
networks:
- web
reverse-proxy:
image: traefik
command:
- "--api"
- "--retry.attempts=2"
- "--forwardingTimeouts.dialTimeout=1s"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=localhost"
- "--docker.watch"
- "--docker.exposedbydefault=false"
- "--docker.network=test_web"
deploy:
replicas: 1
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
placement:
constraints:
- node.role == manager
networks:
- web
ports:
- 6080:80
- 6880:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
web:
Dockerfile is as quoted above. Image names, ports, network names, etc were changed to avoid conflicting with other things in my environment.
As of today (jun/2021) Traefik can't drain the connections during update.
To achieve a zero-downtime rolling update you should delegate the load-balancing to docker swarm itself:
# trafik v2
# docker-compose.yml
services:
your_service:
deploy:
labels:
- traefik.docker.lbswarm=true
From the docs:
Enables Swarm's inbuilt load balancer (only relevant in Swarm Mode).
If you enable this option, Traefik will use the virtual IP provided by docker swarm instead of the containers IPs. Which means that Traefik will not perform any kind of load balancing and will delegate this task to swarm.
Further info:
https://github.com/traefik/traefik/issues/41
https://github.com/traefik/traefik/issues/1480
Related
I trying to implement sticky session on dockers-swarm with traefik, but I could not achieve session persistence over two replicas on same machine.
In my docker-compose.yml, I have added labels for traefik and added the loadbalancer as well. Below is my docker-compose.yml, (Although the indentation might not look proper here, but it correct in actual project)
version: '3'
services:
web:
image: php:7.2.11-apache-stretch
ports:
- "8080:80"
volumes:
- ./code/:/var/www/html/hello/
stdin_open: true
tty: true
deploy:
mode: replicated
replicas: 2
restart_policy:
condition: any
update_config:
delay: 2s
labels:
- "traefik.docker.network=docker-test_privnet"
- "traefik.port=80"
- "traefik.backend.loadbalancer.sticky=true"
- "traefik.frontend.rule=PathPrefix:/hello"
networks:
- privnet
loadbalancer:
image: traefik
command:
--docker \
--docker.swarmmode \
--docker.watch \
--web \
--loglevel=DEBUG
ports:
- 80:80
- 9090:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
update_config:
delay: 2s
placement:
constraints: [node.role == manager]
networks:
- privnet
networks:
privnet:
external: true
Am I missing anything?
A few things.
.sticky is deprecated in favor of traefik.backend.loadbalancer.stickiness=true
I don't think you need to set the network with traefik.docker.network when you only have a single network connected to that service.
Make sure you're testing with a tool that uses cookies, which is how sticky sessions stay sticky. If using curl, be sure to use -c and -b as in this example.
I used the voting app sample from my test Swarm setup and added sticky sessions to the "vote" service and it worked for me on a single node. If using a multi-node swarm you'll need the LB in front of multiple swarm nodes to also enable sticky.
I set up a cluster of 3x Raspberry Pi 3 running Raspbian Stretch Lite and Docker 18.06.1-ce. Swarm is initialized and working fine so far. I read the docs on setting up traefik on docker swarm (1, 2 ) but I can't get the whoami container proxied from traefik.
Here's my stack.yml:
version: '3'
networks:
proxy:
external: true
services:
traefik:
image: traefik
command: --api --docker --docker.swarmMode --docker.watch
deploy:
placement:
constraints:
- node.role == manager
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
- proxy
ports:
- "80:80"
- "443:443"
- "8002:8080"
whoami:
image: stefanscherer/whoami
networks:
- proxy
deploy:
labels:
- "traefik.port=80"
- "traefik.docker.network=proxy"
- "traefik.frontend.rule=Path:/whoami"
Stack is running:
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
tx0npbsb3t0k traefik_traefik replicated 1/1 traefik:latest *:80->80/tcp, *:443->443/tcp, *:8002->8080/tcp
7fqaew880p9p traefik_whoami replicated 1/1 stefanscherer/whoami:latest
The proxy network is set up with the overlay driver and the attachable flag.
Traefik dashboard is accessible and shows the whoami frontend and backend. But opening http://pinode1/whoami/ in the browser I get Error 502 Bad Gateway (with or without trailing slash).
I have traefik running and serving whoami successfully on another non-swarm machine so I wonder what's wrong in the swarm setup.
I had (as all of us who used Docker Cloud) to migrate my app to new environment, so i chose Docker Swarm CE. I am using Traefik as reverse proxy, and before migration it workked with segments just as per documentation, but for some reason it can not deal with those anymore in Swarm.
My service expose ports 3000 and 3001 for given path prefixes. Here is the part of docker-compose.yml for problematic service:
my-service:
image: my-service-image
deploy:
restart_policy:
condition: on-failure
labels:
traefik.port: 80
traefik.serviceapi.backend: api
traefik.serviceapi.frontend.entryPoints: "http,https"
traefik.serviceapi.frontend.rule: "PathPrefixStrip:/service/api"
traefik.serviceapi.port: 3000
traefik.servicesocket.backend: socket
traefik.servicesocket.frontend.entryPoints: "http,https,ws,wss"
traefik.servicesocket.frontend.rule: "PathPrefixStrip:/service/socket"
traefik.servicesocket.port: 3001
but for some reason swarm does not recognise these traefik segments, or I am missing something.
Did anyone have same issue?
Thanks!
UPDATE:
traefik:
image: traefik
command:
- "--web"
- "--docker"
- "--docker.swarmMode"
- "--docker.watch"
- "--docker.domain=my-domain"
- "--defaultentrypoints=http,https"
- "--entrypoints=Name:http Address::80 Redirect.EntryPoint:https"
- "--entrypoints=Name:https Address::443 TLS"
- "--acme"
- "--acme.storage=/etc/traefik/acme/acme.json"
- "--acme.entryPoint=https"
- "--acme.domains=my-domain"
- "--acme.httpChallenge.entryPoint=http"
- "--acme.email=email"
- "--logLevel=DEBUG"
deploy:
restart_policy:
condition: on-failure
ports:
- 80:80
- 443:443
- 8080:8080
volumes:
- ./var/run/docker.sock:/var/run/docker.sock
- ./traefik/acme/acme.json:/etc/traefik/acme/acme.json
I have more than 10 services working properly, only issue is that I cannot reach my-service endpoints, seems that swarm do not recognize traefik segments.
I found solution, so it might be helpful to someone:
for Docker Swarm you must set traefik.port, EXCEPT you are using segments - in this case you remove traefik.port and just set up ports of your service. In my case i just had to remove traefik.port: 80, and also both .backend's, traefik will create it's own.
I have an application where we use the subdomain string to specify which team a customer belongs to.
For example:
customer1.ourdomain.com
customer2.ourdomain.com
... etc
Behind the scenes in our application we are parsing the string customerX from the origin URL to retrieve the customer's appropriate data.
When I run a docker container from the cli without compose, like the below, I am able to connect to my container and have the expected behavior:
docker run -d -p 7000:7000 MyApplicationImage:latest
However, when I try to access this same behavior through the means of the docker-compose.yaml:
docker stack deploy -c docker-compose.yaml MyApplication`
My browser will not connect and times out/hangs.
Essentially:
localhost:7070 -> works
teamName.localhost:7070 -> Does not connect
This is what my docker-compose.yaml file looks like as of right now:
version: "3"
services:
sone:
image: MyApplicationImage:latest
deploy:
replicas: 1
restart_policy:
condition: on-failure
ports:
- "7000:7000"
networks:
- webnet
stwo:
image: myimagetwo:latest
deploy:
replicas: 1
restart_policy:
condition: on-failure
ports:
- "7001:7001"
networks:
- webnet
networks:
webnet:
I'm using docker 1.12.1
I have an easy docker-compose script.
version: '2'
services:
jenkins-slave:
build: ./slave
image: jenkins-slave:1.0
restart: always
ports:
- "22"
environment:
- "constraint:NODE==master1"
jenkins-master:
image: jenkins:2.7.1
container_name: jenkins-master
restart: always
ports:
- "8080:8080"
- "50000"
environment:
- "constraint:NODE==node1"
I run this script with docker-compose -p jenkins up -d.
This Creates my 2 containers but only on my master (from where I execute my command). I would expect that one would be created on the master and one on the node.
I also tried to add
networks:
jenkins_swarm:
driver: overlay
and
networks:
- jenkins_swarm
After every service but this is failing with:
Cannot create container for service jenkins-master: network jenkins_jenkins_swarm not found
While the network is created when I perform docker network ls
Someone who can help me to deploy 2 containers on my 2 nodes with docker-compose. Swarm is defenitly working on my "cluster". I followed this tutorial to verify.
Compose doesn't support Swarm Mode at the moment.
When you run docker compose up on the master node, Compose issues docker run commands for the services in the Compose file, rather than docker service create - which is why the containers all run on the master. See this answer for options.
On the second point, networks are scoped in 1.12. If you inspect your network you'll find it's been created at swarm-level, but Compose is running engine-level containers which can't see the swarm network.
We can do this with docker compose v3 now.
https://docs.docker.com/engine/swarm/#feature-highlights
https://docs.docker.com/compose/compose-file/
You have to initialize the swarm cluster using command
$ docker swarm init
You can add more nodes as worker or manager -
https://docs.docker.com/engine/swarm/join-nodes/
Once you have your both nodes added to the cluster, pass your compose v3 i.e deployment file to create a stack. Compose file should just contain predefined images, you can't give a Dockerfile for deployment in Swarm mode.
$ docker stack deploy -c dev-compose-deploy.yml --with-registry-auth PL
View your stack services status -
$ docker stack services PL
Try to use Labels & Placement constraints to put services on different nodes.
Example "dev-compose-deploy.yml" file for your reference
version: "3"
services:
nginx:
image: nexus.example.com/pl/nginx-dev:latest
extra_hosts:
- "dev-pldocker-01:10.2.0.42”
- "int-pldocker-01:10.2.100.62”
- "prd-plwebassets-01:10.2.0.62”
ports:
- "80:8003"
- "443:443"
volumes:
- logs:/app/out/
networks:
- pl
deploy:
replicas: 3
labels:
feature.description: “Frontend”
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: any
placement:
constraints: [node.role == worker]
command: "/usr/sbin/nginx"
viz:
image: dockersamples/visualizer
ports:
- "8085:8080"
networks:
- pl
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
replicas: 1
labels:
feature.description: "Visualizer"
restart_policy:
condition: any
placement:
constraints: [node.role == manager]
networks:
pl:
volumes:
logs: