Using docker swarm, I am trying to deploy N instances of my app on N nodes in a way that each app is deployed on the node with the corresponding index. E.g.: app1 must be deployed on node1, app2 on node2, ...
The bellow is not working as it complains Error response from daemon: rpc error: code = Unknown desc = value 'node{{.Task.Slot}}' is invalid.
Any suggestion how to achieve this ?
I also have the impression, in a long shot, to use something with labels but I cannot wrap my head over it yet. Anyhow please advise.
version: "3.8"
services:
app:
image: app:latest
hostname: "app{{.Task.Slot}}"
networks:
- app-net
volumes:
- "/data/shared/app{{.Task.Slot}}/config:/app/config"
deploy:
replicas: 5
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: any
placement:
constraints:
- "node.hostname==node{{.Task.Slot}}" <========= ERROR
Service template parameters are documented as only resolving in:
the hostname: directive
for volume definitions
in labels.
environment variables.
Placement preference / constraints is not supported, but would be brilliant as it would allow simple deployments of Minio, etcd, consul and other clustered services where you need to pin replicas to nodes.
Related
I have a small network of computers, bundled into a docker swarm. I'm running a service that needs host networking, which is fine when it runs on a dedicated node.
But I want to let it run on any node - yet maintaining the ability to access its web UI by connecting to a fixed hostname/IP address, regardless of which node the service is actually running on.
This is normally handled by docker's ingress network, which allows me to connect to a published port on any node's IP address, and routes the connection to the proper node. However, apparently this doesn't work with host networking, and if I specify the ingress network explicitely, it gets rejected.
So, is there a way to both have host networking, while keeping ingress routing? Or what would be the recommended way to let me connect to the service without worrying about which node it's running on at any given moment?
EDIT:
My stack file is the following:
version: '3'
services:
app:
image: ghcr.io/home-assistant/home-assistant:stable
volumes:
- ...
privileged: true
deploy:
replicas: 1
restart_policy:
condition: any
placement:
constraints:
- node.hostname==nas
networks:
- host
networks:
host:
external: true
I execute
sudo docker node ls
And this is my output
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
smlsbj3r6qjq7s22cl9cgi1f1 * ip-172-30-0-94 Ready Active Leader 20.10.2
b5e3w8nvrw3kw8q3sg1188439 ip-172-30-0-107 Ready Active 20.10.2
phjkfj09ydvgzztaib2zxcfv9 ip-172-30-0-131 Ready Active 20.10.2
m73z9ikte16klds06upruifji ip-172-30-0-193 Ready Active 20.10.2
So far I get, I know I have one manager and 3 workers. So, if I have a service which has a constraint that match with node.role property of worker nodes, some of them will be elected by docker swarm to execute the containers related to the service itself.
The info of my current service is this:
ID: 5p4hpxmvru9kbwz9y5oymoeq0
Name: elasbit_relay1
Service Mode: Replicated
Replicas: 1
Placement:
Constraints: [node.role!=manager]
UpdateConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Update order: stop-first
RollbackConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Rollback order: stop-first
ContainerSpec:
Image: inputoutput/cardano-node:latest#sha256:02779484dc23731cdbea6388920acc6ddd8e40c03285bc4f9c7572a91fe2ee08
Args: run --topology /configuration/testnet-topology.json --database-path /db --socket-path /db/node.socket --host-addr 0.0.0.0 --port 3001 --config /configuration/testnet-config.json
Init: false
Mounts:
Target: /configuration
Source: /home/ubuntu/cardano-docker-run/testnet
ReadOnly: true
Type: bind
Target: /db
Source: db
ReadOnly: false
Type: volume
Resources:
Endpoint Mode: vip
Ports:
PublishedPort = 12798
Protocol = tcp
TargetPort = 12798
PublishMode = ingress
The key part is [node.role!=manager]. It gives me no suitable node (unsupported platform on 3 nodes; scheduling constraints ….
I tried a lot of ways:
Use docker-compose format (yml) with a constraints list:
deploy:
replicas: 1
placement:
constraints: [node.role==worker]
restart_policy:
condition: on-failure
Use label to nodes.
In all of them I failed. The funny part is, that if I point with some constraint to the node manager, it works! Do I have some typo? Well, I don't see it.
Im using Docker version 20.10.2, build 20.10.2-0ubuntu1~18.04.2.
I'm a bit confused as I'm used to use docker-compose in a single-server environment. Now I have the idea to use a Docker Swarm cluster with docker-compose (as it's what I know better) but I'm a bit confused on how to make it work against my app's needs. For instance:
My app is made up by a manager app and multiple workers. My idea is to have the manager app run in the Docker Swarm manager's server (is that possible?) and then use docker-compose to replicate the workers only through the rest of the Swarm cluster nodes.
A small map would be something like:
Server A -> manager
Server B -> worker1, worker2, worker3
Server C -> worker4, worker5
The workers connect to the manager through a defined IP & port in the environment section in the docker-compose.yml file.
My question is: How do I start up the manager only on a single server, and how do I replicate the workers only in the other nodes, without having a manager per cluster node? (as I don't want/need that). Thanks in advance!
You can to define by constraints
version: '3.8'
services:
manager:
hostname: 'manager'
image: traefik
deploy:
placement:
max_replicas_per_node: 1
constraints: [node.role == manager]
service:
image: service
deploy:
mode: replicated
replicas: 5
placement:
constraints: [node.role == worker]
I am running a service on Docker Swarm. This is what I did to deploy the service:
docker swarm init
docker stack deploy -c docker-compose.yml MyApplication
Content of docker-compose.yml:
version: "3"
services:
web:
image: myimage:1.0
ports:
- "9000:80"
- "9001:443"
deploy:
replicas: 3
resources:
limits:
cpus: "0.5"
memory: 256M
restart_policy:
condition: on-failure
Let't say that I update the application and build a new image myimage:2.0. What is a proper way to deploy the new version of image to the service without the downtime?
A way to achieve this is:
provide a healthcheck. That way docker will know if your new deployment has succeeded.
https://docs.docker.com/engine/reference/builder/#healthcheck
https://docs.docker.com/compose/compose-file/#healthcheck]
control how docker will update your service with update_config
https://docs.docker.com/compose/compose-file/#update_config
pay attention to order and parallelism, for example if you choose order: stop-first + parallelism: 2 and your replicas are the same amount as parallelism, your app will stop completely when updating
if your update doesn't succeed you probably want to rollback
https://docs.docker.com/compose/compose-file/#rollback_config
don't forget the restart_policy too
I have some examples on that subject:
Docker Swarm Mode Replicated Example with Flask and Caddy
https://github.com/douglasmiranda/lab/tree/master/caddy-healthcheck-of-caddy-itself
With this you can simply run docker stack deploy... again. If there was changes in the service, it will be updated.
you can use the command docker service update --image but it will start a new container with a implicit scale 0/1.
The downtime depends of your application.
I have a Prometheus setup that monitors metrics exposed by my own services. This works fine for a single instance, but once I start scaling them, Prometheus gets completely confused and starts tracking incorrect values.
All services are running on a single node, through docker-compose.
This is the job in the scrape_configs:
- job_name: 'wowanalyzer'
static_configs:
- targets: ['prod:8000']
Each instance of prod tracks metrics in its memory and serves it at /metrics. I'm guessing Prometheus picks a random container each time it scraps which leads to the huge increase in counts recorded, building up over time. Instead I'd like Prometheus to read /metrics on all instances simultaneously, regardless of the amount of instances active at that time.
docker-gen (https://github.com/jwilder/docker-gen) was developed for this purpose.
You would need to create a sidecart container running docker-gen that generates a new set of targets.
If I remember well the host names generated are prod_1, prod_2, prod_X, etc.
I tried a lot to find something to help us with this issue but it looks an unsolved issue.
So, I decided to create this tool that helps us with this service-discovery.
https://github.com/juliofalbo/docker-compose-prometheus-service-discovery
Feel free to contribute and open issues!
You can use DNS service discovery feature. For example:
docker-compose.yml:
version: "3"
services:
myapp:
image: appimage:v1
restart: always
networks:
- back
prometheus:
image: "prom/prometheus:v2.32.1"
container_name: "prometheus"
restart: "always"
ports: [ "9090:9090" ]
volumes:
- "./prometheus.yml:/etc/prometheus/prometheus.yml"
- "prometheus_data:/prometheus"
networks:
- back
prometheus.yml sample:
global:
scrape_interval: 15s
evaluation_interval: 60s
scrape_configs:
- job_name: 'monitoringjob'
dns_sd_configs:
- names: [ 'myapp' ] <-- service name from docker-compose
type: 'A'
port: 8080
metrics_path: '/actuator/prometheus'
You can check your DNS records using nslookup util from any container in this network:
docker exec -it myapp bash
bash-4.2# yum install bind-utils
bash-4.2# nslookup myapp
Server: 127.0.0.11
Address: 127.0.0.11#53
Non-authoritative answer:
Name: myapp
Address: 172.22.0.2
Name: myapp
Address: 172.22.0.7