I execute
sudo docker node ls
And this is my output
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
smlsbj3r6qjq7s22cl9cgi1f1 * ip-172-30-0-94 Ready Active Leader 20.10.2
b5e3w8nvrw3kw8q3sg1188439 ip-172-30-0-107 Ready Active 20.10.2
phjkfj09ydvgzztaib2zxcfv9 ip-172-30-0-131 Ready Active 20.10.2
m73z9ikte16klds06upruifji ip-172-30-0-193 Ready Active 20.10.2
So far I get, I know I have one manager and 3 workers. So, if I have a service which has a constraint that match with node.role property of worker nodes, some of them will be elected by docker swarm to execute the containers related to the service itself.
The info of my current service is this:
ID: 5p4hpxmvru9kbwz9y5oymoeq0
Name: elasbit_relay1
Service Mode: Replicated
Replicas: 1
Placement:
Constraints: [node.role!=manager]
UpdateConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Update order: stop-first
RollbackConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Rollback order: stop-first
ContainerSpec:
Image: inputoutput/cardano-node:latest#sha256:02779484dc23731cdbea6388920acc6ddd8e40c03285bc4f9c7572a91fe2ee08
Args: run --topology /configuration/testnet-topology.json --database-path /db --socket-path /db/node.socket --host-addr 0.0.0.0 --port 3001 --config /configuration/testnet-config.json
Init: false
Mounts:
Target: /configuration
Source: /home/ubuntu/cardano-docker-run/testnet
ReadOnly: true
Type: bind
Target: /db
Source: db
ReadOnly: false
Type: volume
Resources:
Endpoint Mode: vip
Ports:
PublishedPort = 12798
Protocol = tcp
TargetPort = 12798
PublishMode = ingress
The key part is [node.role!=manager]. It gives me no suitable node (unsupported platform on 3 nodes; scheduling constraints ….
I tried a lot of ways:
Use docker-compose format (yml) with a constraints list:
deploy:
replicas: 1
placement:
constraints: [node.role==worker]
restart_policy:
condition: on-failure
Use label to nodes.
In all of them I failed. The funny part is, that if I point with some constraint to the node manager, it works! Do I have some typo? Well, I don't see it.
Im using Docker version 20.10.2, build 20.10.2-0ubuntu1~18.04.2.
Related
Using docker swarm, I am trying to deploy N instances of my app on N nodes in a way that each app is deployed on the node with the corresponding index. E.g.: app1 must be deployed on node1, app2 on node2, ...
The bellow is not working as it complains Error response from daemon: rpc error: code = Unknown desc = value 'node{{.Task.Slot}}' is invalid.
Any suggestion how to achieve this ?
I also have the impression, in a long shot, to use something with labels but I cannot wrap my head over it yet. Anyhow please advise.
version: "3.8"
services:
app:
image: app:latest
hostname: "app{{.Task.Slot}}"
networks:
- app-net
volumes:
- "/data/shared/app{{.Task.Slot}}/config:/app/config"
deploy:
replicas: 5
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: any
placement:
constraints:
- "node.hostname==node{{.Task.Slot}}" <========= ERROR
Service template parameters are documented as only resolving in:
the hostname: directive
for volume definitions
in labels.
environment variables.
Placement preference / constraints is not supported, but would be brilliant as it would allow simple deployments of Minio, etcd, consul and other clustered services where you need to pin replicas to nodes.
I'm a bit confused as I'm used to use docker-compose in a single-server environment. Now I have the idea to use a Docker Swarm cluster with docker-compose (as it's what I know better) but I'm a bit confused on how to make it work against my app's needs. For instance:
My app is made up by a manager app and multiple workers. My idea is to have the manager app run in the Docker Swarm manager's server (is that possible?) and then use docker-compose to replicate the workers only through the rest of the Swarm cluster nodes.
A small map would be something like:
Server A -> manager
Server B -> worker1, worker2, worker3
Server C -> worker4, worker5
The workers connect to the manager through a defined IP & port in the environment section in the docker-compose.yml file.
My question is: How do I start up the manager only on a single server, and how do I replicate the workers only in the other nodes, without having a manager per cluster node? (as I don't want/need that). Thanks in advance!
You can to define by constraints
version: '3.8'
services:
manager:
hostname: 'manager'
image: traefik
deploy:
placement:
max_replicas_per_node: 1
constraints: [node.role == manager]
service:
image: service
deploy:
mode: replicated
replicas: 5
placement:
constraints: [node.role == worker]
Docker updates container but network registration takes 10 minutes to complete so while the new container is being registered the page returns 502 because internal network is still pointing at the old container. How can i delay the removal of the old container after the update to the new container by 10 minutes or so? Ideally I would like to push this config with docker stack but I'll do whatever it takes. I should also note that I am unable to use replicas right now due to certain limitations of a security package i'm being forced to use.
version: '3.7'
services:
xxx:
image: ${xxx}/com.xxx:${xxx}
environment:
- SERVICE_NAME=xxx
- xxx
- _xxx
- SPRING_PROFILES_ACTIVE=${xxx}
networks:
- xxx${xxx}
healthcheck:
interval: 1m
deploy:
mode: replicated
replicas: 1
resources:
limits:
cpus: '3'
memory: 1024M
reservations:
cpus: '0.50'
memory: 256M
labels:
- com.docker.lb.hosts=xxx${_xxx}.xxx.com
- jenkins.url=${xxx}
- com.docker.ucp.access.label=/${xxx}/xxx
- com.docker.lb.network=xxx${_xxx}
- com.docker.lb.port=8080
- com.docker.lb.service_cluster=${xxx}
- com.docker.lb.ssl_cert=xxx.cert
- com.docker.lb.ssl_key=xxx.key
- com.docker.lb.redirects=http://xxx${_xxx}.xxx.com/xxx,https://xxx${_xxx}.xxx.com/xxx
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 120s
update_config:
parallelism: 1
delay: 10s
order: start-first
failure_action: rollback
rollback_config:
parallelism: 0
order: stop-first
secrets:
- ${xxx}
networks:
xxx${_xxx}:
external: true
secrets:
${xxx}:
external: true
xxx.cert:
external: true
xxx.key:
external: true
Use proper healthcheck - see the reference here: https://docs.docker.com/compose/compose-file/#healthcheck
So:
You need to define proper test to know when your new container is fully up (that goes inside test instruction of your healthcheck).
Use start_period instruction to specify your 10 (or so) minute way - otherwise, Docker Swarm would just kill your new container and never let it start.
Basically, once you get healthcheck right, this should solve your issue.
i applied docker tutorial to set up a swarm.
I used docker toolbox, because i'm on windows 10 Family.
i step all statements, but at the end, the statement "curl ip_adress" doesn't run. error also with access on url.
$ docker --version
Docker version 18.03.0-ce, build 0520e24302
docker-compose.yml, located in /home/docker of virtual machine called "myvm1" :
version: "3"
services:
web:
# replace username/repo:tag with your name and image details
image: 12081981/friendlyhello:part1
deploy:
replicas: 5
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
ports:
- "80:80"
networks:
- webnet
networks:
webnet:
swarm :
$ docker-machine ssh myvm1 "docker stack ps getstartedlab"
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
blmx8mldam52 getstartedlab_web.1 12081981/friendlyhello:part1 myvm1 Running Running 9 seconds ago
04ctl86chp6o getstartedlab_web.2 12081981/friendlyhello:part1 myvm3 Running Running 6 seconds ago
r3qyznllno9j getstartedlab_web.3 12081981/friendlyhello:part1 myvm3 Running Running 6 seconds ago
2twwicjssie9 getstartedlab_web.4 12081981/friendlyhello:part1 myvm1 Running Running 9 seconds ago
o4rk4x7bb3vm getstartedlab_web.5 12081981/friendlyhello:part1 myvm3 Running Running 6 seconds ago
result of "docker-machine ls" :
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
default - virtualbox Running tcp://192.168.99.100:2376 v18.09.0
myvm1 * virtualbox Running tcp://192.168.99.102:2376 v18.09.0
myvm3 - virtualbox Running tcp://192.168.99.103:2376 v18.09.0
test with curl
$ curl 192.168.99.102
curl: (7) Failed to connect to 192.168.99.102 port 80: Connection refused
How do i do to debug ?
I can give more information, if you want.
Thanks in advance.
Use of the routing mesh in Windows appears to be an EE only feature right now. You can monitor this docker for windows issue for more details. The current workaround is to use DNSRR internally and publish ports to the host directly instead of with the routing mesh. If you want your application to be reachable from any node in the cluster, this means you'd need to have a service on ever host in the cluster, scheduled globally, listening on the requested port. E.g.
version: "3.2"
services:
web:
# replace username/repo:tag with your name and image details
image: 12081981/friendlyhello:part1
deploy:
# global runs 1 on every node, instead of the replicated variant
mode: global
# DNSRR skips the VIP normally assigned to services
endpoint_mode: dnsrr
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
ports:
- target: 80
published: 80
protocol: tcp
# host publishes the port directly from the container without the routing mesh
mode: host
networks:
- webnet
networks:
webnet:
I had a small working docker swarm on google cloud platform.
There are just two nodes, one with nginx and php, the other one with mysql.
Right now it seems that from master node I can't connect to the mysql on the worker node.
SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known
Same problem also with ping from a shell inside the container.
I've used --advertise-addr flag when init the swarm:
docker swarm init --advertise-addr 10.156.0.3
Then I've successfully join the swarm from the 2nd node:
docker swarm join --token my-token 10.156.0.3:2377
Also the deploy is successful
docker stack deploy --compose-file docker-compose.yml test
Creating network test_default
Creating service test_mysql
Creating service test_web
Creating service test_app
(in docker-compose.yml there is no network definition, I'm using the docker default)
Nodes:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
oz1ebgrp1a68brxi0nd1gdr2k mysql-001 Ready Active 18.03.1-ce
ndy11zyxi0wym8mjmgh8op1ni * app-001 Ready Active Leader 18.03.1-ce
docker stack ps test
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
9afwjgtpy8lc test_app.1 127.0.0.1:5000/app:latest app-001 Running Running 8 minutes ago
mgajupmcai0t test_web.1 127.0.0.1:5000/web:latest app-001 Running Running 8 minutes ago
s17jvkukahl7 test_mysql.1 mysql:5.7 mysql-001 Running Running 8 minutes ago
docker networks:
NETWORK ID NAME DRIVER SCOPE
9084b39892f4 bridge bridge local
ofqtewx039fl test_default overlay swarm
5cc9d4554bea docker_gwbridge bridge local
97fbd06a23b5 host host local
x8f408klk2ms ingress overlay swarm
ca1b849ea73a none null local
Here is my docker info
Containers: 12
Running: 3
Paused: 0
Stopped: 9
Images: 35
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: ndy11zyxi0wym8mjmgh8op1ni
Is Manager: true
ClusterID: q23l1v6dav3u4anqqu51nwx0r
Managers: 1
Nodes: 2
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
Total Memory: 14.09GiB
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.156.0.3
Manager Addresses:
10.156.0.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.13.0-1019-gcp
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 14.09GiB
Name: app-001
ID: IWKK:NWRJ:HKAQ:3JSQ:7H3L:2WXC:IIJ7:OEKB:4ARR:T7FY:VAWR:HOPL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
This swarm was working fine few weeks ago. I didn't need this application for few weeks so I've turned off all the machines. Meanwhile swarm-node.crt expired and so today when I've turned on the machine I had to remove the service and the swarm and recreate it from scratch. The result is that I can't connect from container on one node to container on the other node.
Any help will be appreciated.
UPDATE:
here is docker-compose.yml
version: '3'
services:
web:
image: 127.0.0.1:5000/web
build:
context: ./web
volumes:
- ./test:/var/www
build:
ports:
- 80:80
links:
- app
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == app-001
app:
image: 127.0.0.1:5000/app
build:
context: ./app
volumes:
- ./test:/var/www
depends_on:
- mysql
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == app-001
mysql:
image: mysql:5.7
volumes:
- /mnt/disks/ssd-001/mysql:/var/lib/mysql
- /mnt/disks/buckets/common-storage-001/backup/mysql:/backup
environment:
- "MYSQL_DATABASE=test"
- "MYSQL_USER=test"
- "MYSQL_PASSWORD=*****"
- "MYSQL_ROOT_PASSWORD=*****"
command: mysqld --key-buffer-size=32M --max-allowed-packet=16M --myisam-recover-options=FORCE,BACKUP --tmp-table-size=32M --query-cache-type=0 --query-cache-size=0 --max-heap-table-size=32M --max-connections=500 --thread-cache-size=50 --innodb-flush-method=O_DIRECT --innodb-log-file-size=512M --innodb-buffer-pool-size=16G --open-files-limit=65535
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == mysql-001