I trying to implement sticky session on dockers-swarm with traefik, but I could not achieve session persistence over two replicas on same machine.
In my docker-compose.yml, I have added labels for traefik and added the loadbalancer as well. Below is my docker-compose.yml, (Although the indentation might not look proper here, but it correct in actual project)
version: '3'
services:
web:
image: php:7.2.11-apache-stretch
ports:
- "8080:80"
volumes:
- ./code/:/var/www/html/hello/
stdin_open: true
tty: true
deploy:
mode: replicated
replicas: 2
restart_policy:
condition: any
update_config:
delay: 2s
labels:
- "traefik.docker.network=docker-test_privnet"
- "traefik.port=80"
- "traefik.backend.loadbalancer.sticky=true"
- "traefik.frontend.rule=PathPrefix:/hello"
networks:
- privnet
loadbalancer:
image: traefik
command:
--docker \
--docker.swarmmode \
--docker.watch \
--web \
--loglevel=DEBUG
ports:
- 80:80
- 9090:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
update_config:
delay: 2s
placement:
constraints: [node.role == manager]
networks:
- privnet
networks:
privnet:
external: true
Am I missing anything?
A few things.
.sticky is deprecated in favor of traefik.backend.loadbalancer.stickiness=true
I don't think you need to set the network with traefik.docker.network when you only have a single network connected to that service.
Make sure you're testing with a tool that uses cookies, which is how sticky sessions stay sticky. If using curl, be sure to use -c and -b as in this example.
I used the voting app sample from my test Swarm setup and added sticky sessions to the "vote" service and it worked for me on a single node. If using a multi-node swarm you'll need the LB in front of multiple swarm nodes to also enable sticky.
Related
I want to deploy HA Postgresql with Failover Patroni and HAProxy (like single entrypoint) in docker swarm.
I have docker-compose.yml -
version: "3.7"
services:
etcd1:
image: patroni
networks:
- test
env_file:
- docker/etcd.env
container_name: test-etcd1
hostname: etcd1
command: etcd -name etcd1 -initial-advertise-peer-urls http://etcd1:2380
etcd2:
image: patroni
networks:
- test
env_file:
- docker/etcd.env
container_name: test-etcd2
hostname: etcd2
command: etcd -name etcd2 -initial-advertise-peer-urls http://etcd2:2380
etcd3:
image: patroni
networks:
- test
env_file:
- docker/etcd.env
container_name: test-etcd3
hostname: etcd3
command: etcd -name etcd3 -initial-advertise-peer-urls http://etcd3:2380
patroni1:
image: patroni
networks:
- test
env_file:
- docker/patroni.env
hostname: patroni1
container_name: test-patroni1
environment:
PATRONI_NAME: patroni1
deploy:
placement:
constraints: [node.role == manager]
# - node.labels.type == primary
# - node.role == manager
patroni2:
image: patroni
networks:
- test
env_file:
- docker/patroni.env
hostname: patroni2
container_name: test-patroni2
environment:
PATRONI_NAME: patroni2
deploy:
placement:
constraints: [node.role == worker]
# - node.labels.type != primary
# - node.role == worker
patroni3:
image: patroni
networks:
- test
env_file:
- docker/patroni.env
hostname: patroni3
container_name: test-patroni3
environment:
PATRONI_NAME: patroni3
deploy:
placement:
constraints: [node.role == worker]
# - node.labels.type != primary
# - node.role == worker
haproxy:
image: patroni
networks:
- test
env_file:
- docker/patroni.env
hostname: haproxy
container_name: test-haproxy
ports:
- "5000:5000"
- "5001:5001"
command: haproxy
networks:
test:
driver: overlay
attachable: true
And deploy this services in docker swarm with this command:
docker stack deploy --compose-file docker-compose.yml test
When i use this command, my services is creating, but service patroni2 and patroni3 don't start on other nodes, which roles are worker. They don't start at all!
I want to see my services deploy on all nodes (3 - one manager and two workers) which existing in docker swarm
But if i delete constraints, all my services start on one node, when i deploy docker-compose.yml in swarm.
May be this services can't see my network, though i deploy it using docker official documentation.
With different service names, docker will not attempt to spread containers across multiple nodes, and will fall back to the least used node that satisfies the requirements, where least used is measured by the number of scheduled containers.
You could attempt to solve this by using the same service name and 3 replicas. This would require that they be defined identically. To make this work, you can leverage a few features, the first being that etcd.tasks will resolve to the individual ip addresses of each etcd service container. And the second are service templates which can be used to inject values like {{.Task.Slot}} into the settings for hostname, volume mounts, and env variables. The challenge is the list at the end will likely not give you what you want, which is a way to uniquely address each replica from the other replicas. Hostname seems like it would work, but it unfortunately does not resolve in docker's DNS implementation (and wouldn't be easy to implement since it's possible to create a container with the capabilities to change the hostname after docker has deployed it).
The option you are left with is configuring constraints on each service to run on specific nodes. That's less than ideal, and reduces the fault tolerance of these services. If you have lots of nodes that can be separated into 3 groups then using node labels would solve the issue.
I have a Swarm cluster with a Manager and a Worker node.
All the containers running on the manager are accessible through Traefik and working fine.
I just deployed a new Worker node and joined my swarm on the node.
Now I start scaling some services and realized they were timing out on the worker node.
So I setup a simple example using the whoami container, and cannot figure out why I cannot access it. Here are my configs (all deployed on the MANAGER node):
version: '3.6'
networks:
traefik-net:
driver: overlay
attachable: true
external: true
services:
whoami:
image: jwilder/whoami
networks:
- traefik-net
deploy:
labels:
- "traefik.port=8000"
- "traefik.frontend.rule=Host:whoami.myhost.com"
- "traefik.docker.network=traefik-net"
replicas: 2
placement:
constraints: [node.role != manager]
My traefik:
version: '3.6'
networks:
traefik-net:
driver: overlay
attachable: true
external: true
services:
reverse-proxy:
image: traefik # The official Traefik docker image
command: --docker --docker.swarmmode --docker.domain=myhost.com --docker.watch --api
ports:
- "80:80" # The HTTP port
# - "8080:8080" # The Web UI (enabled by --api)
- "443:443"
networks:
- traefik-net
volumes:
- /var/run/docker.sock:/var/run/docker.sock # So that Traefik can listen
- /home/ubuntu/docker-configs/traefik/traefik.toml:/traefik.toml
- /home/ubuntu/docker-configs/traefik/acme.json:/acme.json
deploy:
labels:
traefik.port: 8080
traefik.frontend.rule: "Host:traefik.myhost.com"
traefik.docker.network: traefik-net
replicas: 1
placement:
constraints: [node.role == manager]
My worker docker ps output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b825f95b0366 jwilder/whoami:latest "/app/http" 4 hours ago Up 4 hours 8000/tcp whoami_whoami.2.tqbh4csbqxvsu6z5i7vizc312
50cc04b7f0f4 jwilder/whoami:latest "/app/http" 4 hours ago Up 4 hours 8000/tcp whoami_whoami.1.rapnozs650mxtyu970isda3y4
I tried opening firewall ports, disabling it completely, nothing seems to work. Any help is appreciated
I had to use --advertise-addr y.y.y.y to make it work
I’m trying to use traefik with docker swarm but i’m having troubles during service updates. I run a stack deploy or service update the service goes down for some seconds
How to reproduce:
1 - Create a Dockerfile:
FROM jwilder/whoami
RUN echo $(date) > daniel.txt
2 - Build 2 demo images:
$ docker build -t whoami:01 .
$ docker build -t whoami:02 .
3 - Create a docker-compose.yml:
version: '3.5'
services:
app:
image: whoami:01
ports:
- 81:8000
deploy:
replicas: 2
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.frontend.rule=Host:localhost
- traefik.port=8000
- traefik.docker.network=web
networks:
- web
reverse-proxy:
image: traefik
command:
- "--api"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=localhost"
- "--docker.watch"
- "--docker.exposedbydefault=false"
- "--docker.network=web"
deploy:
replicas: 1
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
placement:
constraints:
- node.role == manager
networks:
- web
ports:
- 80:80
- 8080:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
web:
external: true
4 - Deploy the stack:
$ docker stack deploy -c docker-compose.yml stack_name
5 - Curl to get the service response:
$ while true ; do sleep .1; curl localhost; done
You should see something like this:
I'm adc1473258e9
I'm bc82ea92b560
I'm adc1473258e9
I'm bc82ea92b560
That means the load balance is working
6 - Update the service
$ docker service update --image whoami:02 got_app
The traefik respond with Bad Gateway when should be zero downtime.
How to fix it?
Bad gateway means traefik is configured to forward requests, but it's not able to reach the container on the ip and port that it's configured to use. Common issues causing this are:
traefik and the service on different docker networks
service exists in multiple networks and traefik picks the wrong one
wrong port being used to connect to the container (use the container port and make sure it's listening on all interfaces, aka 0.0.0.0)
From the comments, this is only happening during the deploy, which means traefik is hitting containers before they are ready to receive requests, or while they are being stopped.
You can configure containers with a healthcheck and send request through swarm mode's VIP using a Dockerfile that looks like:
FROM jwilder/whoami
RUN echo $(date) >/build-date.txt
HEALTHCHECK --start-period=30s --retries=1 CMD wget -O - -q http://localhost:8000
And then in the docker-compose.yml:
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.loadbalancer.swarm=true
...
And I would also configure the traefik service with the following options:
- "--retry.attempts=2"
- "--forwardingTimeouts.dialTimeout=1s"
However, traefik will keep a connection open and the VIP will continue to send all requests to the same backend container over that same connection. What you can do instead is have traefik itself perform the healthcheck:
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.healthcheck.path=/
...
I would still leave the healthcheck on the container itself so Docker gives the container time to start before stopping the other container. And leave the retry option on the traefik service so any request to a stopping container, or just one that hasn't been detected by the healthcheck, has a chance to try try again.
Here's the resulting compose file that I used in my environment:
version: '3.5'
services:
app:
image: test-whoami:1
ports:
- 6081:8000
deploy:
replicas: 2
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.healthcheck.path=/
- traefik.frontend.rule=Path:/
- traefik.port=8000
- traefik.docker.network=test_web
networks:
- web
reverse-proxy:
image: traefik
command:
- "--api"
- "--retry.attempts=2"
- "--forwardingTimeouts.dialTimeout=1s"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=localhost"
- "--docker.watch"
- "--docker.exposedbydefault=false"
- "--docker.network=test_web"
deploy:
replicas: 1
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
placement:
constraints:
- node.role == manager
networks:
- web
ports:
- 6080:80
- 6880:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
web:
Dockerfile is as quoted above. Image names, ports, network names, etc were changed to avoid conflicting with other things in my environment.
As of today (jun/2021) Traefik can't drain the connections during update.
To achieve a zero-downtime rolling update you should delegate the load-balancing to docker swarm itself:
# trafik v2
# docker-compose.yml
services:
your_service:
deploy:
labels:
- traefik.docker.lbswarm=true
From the docs:
Enables Swarm's inbuilt load balancer (only relevant in Swarm Mode).
If you enable this option, Traefik will use the virtual IP provided by docker swarm instead of the containers IPs. Which means that Traefik will not perform any kind of load balancing and will delegate this task to swarm.
Further info:
https://github.com/traefik/traefik/issues/41
https://github.com/traefik/traefik/issues/1480
I created a docker-compose.yml file containing two services that are run on two different nodes. The two services are meant to communicate on the same port as client and server. Below is my docker-compose.yml file.
version: "3"
services:
service1:
image: localrepo/image1
deploy:
placement:
constraints: [node.hostname == node1]
replicas: 1
resources:
limits:
cpus: "1"
memory: 1000M
restart_policy:
condition: on-failure
ports:
- 8000:8000
networks:
- webnet
service2:
image: localrepo/image2
deploy:
placement:
constraints: [node.hostname == node2]
replicas: 1
resources:
limits:
cpus: "1"
memory: 500M
restart_policy:
condition: on-failure
ports:
- "8000:8000"
networks:
- webnet
networks:
webnet:
When I issue docker stack deploy -c I get error, reading
> Error response from daemon: rpc error: code = 3 desc = port '8000' is already in use by service.
In this thread I read that deploying a service in a swarm makes the port accessible throughout any node in the swarm. If I understand correctly, that makes the port occupied by any node in the cluster. In the same thread, it was suggested to use mode=host publishing, which will only expose the port on the actual host that the container runs. I applied that in the port as:
ports:
- "mode=host, target=8000, published=8000"
Making that change in both service and trying to issue docker stack gives another error:
> 1 error(s) decoding:
* Invalid containerPort: mode=host, target=8000, published=8000
Does anyone know how to fix this issue?
p.s: I tried both "Version3" and "Version3.2" but the issue didn't solve.
I don't know how did you specify host mode since your docker-compose.yml doesn't represent host mode anywhere.
Incidentally, try with long syntax which can specify host mode in docker-compose.yml file.
This long syntax is new in v3.2 and the below is the example(I check it works)
(This is compatible docker engine version against docker-compose syntax version.)
version: '3.4' # version: '3.2' also will works
networks:
swarm_network:
driver: overlay
services:
service1:
image: asleea/test
command: ["nc", "-vlkp", "8000"]
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == node1
ports:
- published: 8000
target: 8000
mode: host
networks:
swarm_network:
service2:
image: asleea/test
command: ["nc", "service1", "8000"]
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == node2
ports:
- published: 8000
target: 8000
mode: host
networks:
swarm_network:
The problem is fixed after upgrading to the latest docker version, 18.01.0-ce
I have an application where we use the subdomain string to specify which team a customer belongs to.
For example:
customer1.ourdomain.com
customer2.ourdomain.com
... etc
Behind the scenes in our application we are parsing the string customerX from the origin URL to retrieve the customer's appropriate data.
When I run a docker container from the cli without compose, like the below, I am able to connect to my container and have the expected behavior:
docker run -d -p 7000:7000 MyApplicationImage:latest
However, when I try to access this same behavior through the means of the docker-compose.yaml:
docker stack deploy -c docker-compose.yaml MyApplication`
My browser will not connect and times out/hangs.
Essentially:
localhost:7070 -> works
teamName.localhost:7070 -> Does not connect
This is what my docker-compose.yaml file looks like as of right now:
version: "3"
services:
sone:
image: MyApplicationImage:latest
deploy:
replicas: 1
restart_policy:
condition: on-failure
ports:
- "7000:7000"
networks:
- webnet
stwo:
image: myimagetwo:latest
deploy:
replicas: 1
restart_policy:
condition: on-failure
ports:
- "7001:7001"
networks:
- webnet
networks:
webnet: