Services don't start on docker swarm nodes - docker

I want to deploy HA Postgresql with Failover Patroni and HAProxy (like single entrypoint) in docker swarm.
I have docker-compose.yml -
version: "3.7"
services:
etcd1:
image: patroni
networks:
- test
env_file:
- docker/etcd.env
container_name: test-etcd1
hostname: etcd1
command: etcd -name etcd1 -initial-advertise-peer-urls http://etcd1:2380
etcd2:
image: patroni
networks:
- test
env_file:
- docker/etcd.env
container_name: test-etcd2
hostname: etcd2
command: etcd -name etcd2 -initial-advertise-peer-urls http://etcd2:2380
etcd3:
image: patroni
networks:
- test
env_file:
- docker/etcd.env
container_name: test-etcd3
hostname: etcd3
command: etcd -name etcd3 -initial-advertise-peer-urls http://etcd3:2380
patroni1:
image: patroni
networks:
- test
env_file:
- docker/patroni.env
hostname: patroni1
container_name: test-patroni1
environment:
PATRONI_NAME: patroni1
deploy:
placement:
constraints: [node.role == manager]
# - node.labels.type == primary
# - node.role == manager
patroni2:
image: patroni
networks:
- test
env_file:
- docker/patroni.env
hostname: patroni2
container_name: test-patroni2
environment:
PATRONI_NAME: patroni2
deploy:
placement:
constraints: [node.role == worker]
# - node.labels.type != primary
# - node.role == worker
patroni3:
image: patroni
networks:
- test
env_file:
- docker/patroni.env
hostname: patroni3
container_name: test-patroni3
environment:
PATRONI_NAME: patroni3
deploy:
placement:
constraints: [node.role == worker]
# - node.labels.type != primary
# - node.role == worker
haproxy:
image: patroni
networks:
- test
env_file:
- docker/patroni.env
hostname: haproxy
container_name: test-haproxy
ports:
- "5000:5000"
- "5001:5001"
command: haproxy
networks:
test:
driver: overlay
attachable: true
And deploy this services in docker swarm with this command:
docker stack deploy --compose-file docker-compose.yml test
When i use this command, my services is creating, but service patroni2 and patroni3 don't start on other nodes, which roles are worker. They don't start at all!
I want to see my services deploy on all nodes (3 - one manager and two workers) which existing in docker swarm
But if i delete constraints, all my services start on one node, when i deploy docker-compose.yml in swarm.
May be this services can't see my network, though i deploy it using docker official documentation.

With different service names, docker will not attempt to spread containers across multiple nodes, and will fall back to the least used node that satisfies the requirements, where least used is measured by the number of scheduled containers.
You could attempt to solve this by using the same service name and 3 replicas. This would require that they be defined identically. To make this work, you can leverage a few features, the first being that etcd.tasks will resolve to the individual ip addresses of each etcd service container. And the second are service templates which can be used to inject values like {{.Task.Slot}} into the settings for hostname, volume mounts, and env variables. The challenge is the list at the end will likely not give you what you want, which is a way to uniquely address each replica from the other replicas. Hostname seems like it would work, but it unfortunately does not resolve in docker's DNS implementation (and wouldn't be easy to implement since it's possible to create a container with the capabilities to change the hostname after docker has deployed it).
The option you are left with is configuring constraints on each service to run on specific nodes. That's less than ideal, and reduces the fault tolerance of these services. If you have lots of nodes that can be separated into 3 groups then using node labels would solve the issue.

Related

Running Services on Specific Nodes with Docker Swarm

I'm new to docker swarm and looking to set containers to run on a specific node in the swarm.
For example, I have the following nodes:
Manager
Worker1
Worker2
And I have a couple services listed in a compose yml similar to:
services:
my_service:
image: my_image
container_name: my_container_name
networks:
- my_network
my_service2:
image: my_image2
container_name: my_container_name2
networks:
- my_network
How can I make it so that my_service only runs on Worker1 and my_service2 only runs on Worker2?
UPDATE:
I managed to find the solution. Can specify deployment constraints as shown below.
my_service:
image: my_image
container_name: my_container_name
networks:
- my_network
deploy:
placement:
constraints:
- node.hostname == Worker1
my_service2:
image: my_image2
container_name: my_container_name2
networks:
- my_network
deploy:
placement:
constraints:
- node.hostname == Worker2

docker stack not finding volumes on worker node

I am moving from docker-compose to docker stack
I am trying to launch my database on my worker node from my manager node so I am using a docker-stack.yml file
on the manager I use command: docker stack deploy -c docker-stack.yml mr
I get error:
* error decoding 'Volumes[0]': invalid spec: :/docker-entrypoint-initdb.d: empty section between colons
* error decoding 'Volumes[1]': invalid spec: db_data:: empty section between colons
* error decoding 'Volumes[2]': invalid spec: db_logs:: empty section between colons
Is there a way in docker stacks to specify to look for those volumes locally to the worker node ? also for .env files ?
Here is my docker-stack.yaml:
version: "3.8"
services:
nginx:
container_name: nginx
image: "${NGINX_IMAGE}"
build: build/nginx
deploy:
placement:
constraints: [node.role == manager]
restart: always
env_file: .env
ports:
- "80:80"
- "443:443"
volumes:
- "${APP_HOST_DIR}/public:/var/www/app/public:ro"
- "${APP_HOST_LETSENCRYPT}:${APP_CONTAINER_LETSENCRYPT}"
- "${APP_HOST_NGINX_CONF}:${APP_CONTAINER_NGINX_CONF}"
networks:
- central_mr
depends_on:
- app
app:
container_name: app
image: "${APP_IMAGE}"
deploy:
placement:
constraints: [node.role == manager]
restart: always
build: build/app
env_file: .env
networks:
- central_mr
volumes:
- "${APP_HOST_DIR}:${APP_CONTAINER_DIR}"
dbmr:
container_name: database
image: "${MARIADB_VERSION}"
restart: always
deploy:
placement:
constraints: [node.role == worker]
env_file: .env
volumes:
- "${SQL_INIT}:/docker-entrypoint-initdb.d"
- "db_data:${MARIADB_DATA_DIR}"
- "db_logs:${MARIADB_LOG_DIR}"
environment:
MYSQL_ROOT_PASSWORD: "${MYSQL_ROOT_PASSWORD}"
MYSQL_DATABASE: "${MYSQL_DATABASE}"
MYSQL_USER: "${MYSQL_USER}"
MYSQL_PASSWORD: "${MYSQL_PASSWORD}"
ports:
- "3306:3306"
networks:
- central_mr
volumes:
db_data:
db_logs:
networks:
central_mr:
the .env is on the worker node; I have a different .env on my manager. I need the service to look for .env on the machine it is running onto. Same for volumes
This isn't supported. First, I'm not sure the .env file will be parsed (it wasn't last time I checked). So to expand variables inside of the compose file, you need to source those variables yourself where you run the stack deploy command:
set -a; . ./.env; set +a
docker stack deploy -c docker-compose.yml stack_name
That does not expand the values from the files on the workers. There will only be one state for the service and containers deployed on workers, with one exception.
A few of the fields support service templates which allow you to set fields like an env variable or volume mount, using templates like {{.Node.Hostname}}.

Control distribution of docker swarm services across different computers?

Is there a way to control the distribution of services across different computers? I have one master with two workers and 5 services:
web server
database
redis
celery
s3 storage connection
I only want to outsource the celery workers and run everything else on the master. Is there a way to control that with docker swarm? I have not created a registry yet, because I am not sure if that is still necessary.
Here is my current experimental docker-compose file.
version: "3.8"
volumes:
s3data:
driver: local
services:
web:
image: localhost:5000/web
build: .
env_file:
- ./.env
environment:
- ENVIRONMENT=develop
command: python manage.py runserver 0.0.0.0:8000
volumes:
- ./app/:/app/
- ./lib/lrg_omics/:/lrg-omics/
- s3data:/datalake/
- /data/media/:/appmedia/
- /data/static/:/static/
ports:
- "8000:8000"
depends_on:
- db
- redis
- s3vol
links:
- redis:redis
restart: always
db:
image: postgres
volumes:
- /data/db/:/var/lib/postgresql/data
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
redis:
restart: always
image: redis:alpine
ports:
- "6379:6379"
celery:
restart: on-failure
image: pp-celery-worker
build:
context: .
dockerfile: Dockerfile
command: bash -c "celery -A main worker -l info --concurrency 8"
env_file:
- ./.env
volumes:
- ./app/:/app/
- ./lib/lrg_omics/:/lrg-omics/
- s3data:/datalake/
environment:
- DB_HOST=db
- DB_NAME=app
- DB_USER=postgres
- DB_PASS=postgres
depends_on:
- db
- redis
- web
- s3vol
deploy:
replicas: 2
placement:
max_replicas_per_node: 1
s3vol:
image: elementar/s3-volume
command: /data s3://PQC
environment:
- BACKUP_INTERVAL=2
- AWS_ACCESS_KEY_ID=...
- AWS_SECRET_ACCESS_KEY=...
- ENDPOINT_URL=https://example.com
volumes:
- s3data:/data
When I deploy this with sudo docker stack deploy --compose-file docker-compose-distributed.yml QC
And then look at the services I get something like this:
sudo docker stack services QC
>>>
ID NAME MODE REPLICAS IMAGE PORTS
xx5hkbswipoz QC_celery replicated 0/2 (max 1 per node) celery-worker:latest
natb3trv9ngi QC_db replicated 0/1 postgres:latest
1bxpkb18ojay QC_redis replicated 1/1 redis:alpine *:6379->6379/tcp
6rsl5gfpd0oa QC_s3vol replicated 1/1 elementar/s3-volume:latest
aszkle6msmqr QC_web replicated 0/1 localhost:5000/web:latest *:8000->8000/tcp
For some reason only redis and the S3 containers run. And both of them on the master. Nothing runs on the workers.
I am quite new to docker swarm so there is probably more than one thing wrong here. Any comments on best practices are welcome.
To determine why the services are not starting
docker service ps QC_celery --no-trunc will show the state of the service and a message from docker.
To control placement consult the Compose file version 3 reference on placement constraints. Basically it entails adding to the deploy: node:
deploy:
replicas: 2
placement:
max_replicas_per_node: 1
constraints:
- node.role==worker
While, nominally, compose.yml and stack.yml files share a format, they support different feature subsets and for complex deployments it becomes helpful to split the deployment into discreet compose.yml files for docker compose and stack.yml files for swarm deployments.
docker stack deploy -c docker-compose.yml -c docker-stack.yml QC can merge a docker-compose.yml base file with stack specific settings, and you can keep docker compose artifacts in your docker-compose.override.yml. these artifacts include:
build: - docker swarm needs the image to be built and available in a registry, either local(swarm hosted?) or docker-hub.
depends_on:, links: - not supported by swarm, which assumes services can be restarted at any time, and will find each other using docker networks.
restart: controlled by restart_policy: under deploy:

Kibana can't reach elasticsearch

Using docker-compose v3 and deploying to a swarm:
version: '3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.4.1
deploy:
replicas: 1
ports:
- "9200:9200"
tty: true
kibana:
image: docker.elastic.co/kibana/kibana:5.4.1
deploy:
mode: global
ports:
- "5601:5601"
depends_on:
- elasticsearch
tty: true
I see this in the kibana service log:
Unable to revive connection: http://elasticsearch:9200/
Elasticsearch service is running and can be reached.
Swarm consists of 3 nodes.
What am I missing?
Update:
I turns out that if I try to access kibana on the same swarm node where elasticsearch is running, it works. All other nodes either have a network problem or cannot resolve the elasticsearch name.
I found the reason, and the solution.
My swarm is running on AWS - All nodes are placed in the same security group and I assumed all ports were open internally in that security group. That's not the case.
I explicitly configured the security group to allow inbound traffic as per dockers routing mesh specs here: https://docs.docker.com/engine/swarm/ingress/
Docker-compose by default generates a network and puts all services within it. But I do not know if it changes in docker swarm. To define it you can do this.
version: '3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.4.1
deploy:
replicas: 1
ports:
- "9200:9200"
tty: true
networks:
- some-name
kibana:
image: docker.elastic.co/kibana/kibana:5.4.1
deploy:
mode: global
ports:
- "5601:5601"
links:
- elasticsearch
depends_on:
- elasticsearch
tty: true
networks:
- some-name
networks:
some-name:
driver: overlay
I hope it serves you, I will wait for news.

Use docker-compose with docker swarm

I'm using docker 1.12.1
I have an easy docker-compose script.
version: '2'
services:
jenkins-slave:
build: ./slave
image: jenkins-slave:1.0
restart: always
ports:
- "22"
environment:
- "constraint:NODE==master1"
jenkins-master:
image: jenkins:2.7.1
container_name: jenkins-master
restart: always
ports:
- "8080:8080"
- "50000"
environment:
- "constraint:NODE==node1"
I run this script with docker-compose -p jenkins up -d.
This Creates my 2 containers but only on my master (from where I execute my command). I would expect that one would be created on the master and one on the node.
I also tried to add
networks:
jenkins_swarm:
driver: overlay
and
networks:
- jenkins_swarm
After every service but this is failing with:
Cannot create container for service jenkins-master: network jenkins_jenkins_swarm not found
While the network is created when I perform docker network ls
Someone who can help me to deploy 2 containers on my 2 nodes with docker-compose. Swarm is defenitly working on my "cluster". I followed this tutorial to verify.
Compose doesn't support Swarm Mode at the moment.
When you run docker compose up on the master node, Compose issues docker run commands for the services in the Compose file, rather than docker service create - which is why the containers all run on the master. See this answer for options.
On the second point, networks are scoped in 1.12. If you inspect your network you'll find it's been created at swarm-level, but Compose is running engine-level containers which can't see the swarm network.
We can do this with docker compose v3 now.
https://docs.docker.com/engine/swarm/#feature-highlights
https://docs.docker.com/compose/compose-file/
You have to initialize the swarm cluster using command
$ docker swarm init
You can add more nodes as worker or manager -
https://docs.docker.com/engine/swarm/join-nodes/
Once you have your both nodes added to the cluster, pass your compose v3 i.e deployment file to create a stack. Compose file should just contain predefined images, you can't give a Dockerfile for deployment in Swarm mode.
$ docker stack deploy -c dev-compose-deploy.yml --with-registry-auth PL
View your stack services status -
$ docker stack services PL
Try to use Labels & Placement constraints to put services on different nodes.
Example "dev-compose-deploy.yml" file for your reference
version: "3"
services:
nginx:
image: nexus.example.com/pl/nginx-dev:latest
extra_hosts:
- "dev-pldocker-01:10.2.0.42”
- "int-pldocker-01:10.2.100.62”
- "prd-plwebassets-01:10.2.0.62”
ports:
- "80:8003"
- "443:443"
volumes:
- logs:/app/out/
networks:
- pl
deploy:
replicas: 3
labels:
feature.description: “Frontend”
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: any
placement:
constraints: [node.role == worker]
command: "/usr/sbin/nginx"
viz:
image: dockersamples/visualizer
ports:
- "8085:8080"
networks:
- pl
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
replicas: 1
labels:
feature.description: "Visualizer"
restart_policy:
condition: any
placement:
constraints: [node.role == manager]
networks:
pl:
volumes:
logs:

Resources