Docker swarm node unable to detect service from another host in swarm - docker

My goal is to set up a docker swarm on a group of 3 linux (ubuntu) physical workstations and run a dask cluster on that.
$ docker --version
Docker version 17.06.0-ce, build 02c1d87
I am able to init the docker swarm and add all of the machines to the swarm.
cordoba$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
j8k3hm87w1vxizfv7f1bu3nfg box1 Ready Active
twg112y4m5tkeyi5s5vtlgrap box2 Ready Active
upkr459m75au0vnq64v5k5euh * box3 Ready Active Leader
I then run docker stack deploy -c docker-compose.yml dask-cluster on the Leader box.
Here is docker-compose.yml:
version: "3"
services:
dscheduler:
image: richardbrks/dask-cluster
ports:
- "8786:8786"
- "9786:9786"
- "8787:8787"
command: dask-scheduler
networks:
- distributed
deploy:
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints: [node.role == manager]
dworker:
image: richardbrks/dask-cluster
command: dask-worker dscheduler:8786
environment:
- "affinity:container!=dworker*"
networks:
- distributed
depends_on:
- dscheduler
deploy:
replicas: 3
restart_policy:
condition: on-failure
networks:
distributed:
and here is richardbrks/dask-cluster:
# Official python base image
FROM python:2.7
# update apt-repository
RUN apt-get update
# only install enough library to run dask on a cluster (with monitoring)
RUN pip install --no-cache-dir \
psutil \
dask[complete]==0.15.2 \
bokeh
When I deploy the swarm, the dworker nodes that are not on the same machine as dscheduler
does not know what dscheduler is. I ssh'd into one of these nodes and looked in env,
and dscheduler was not there. I also tried to ping dscheduler, and got "ping: unknown host".
I thought docker was supposed to provide an internal dns based for service discovery
so that calling dscheduler will take me to the address of the dschedler node.
Is there some set up to my computers that I am missing? or are any of my files missing something?
All of this code is also located in https://github.com/MentalMasochist/dask-swarm

According to this issue in swarm:
Because of some networking limitations (I think related to virtual
IPs), the ping tool will not work with overlay networking. Are you
service names resolvable with other tools like dig?
Personally I could always connect from one service to the other using curl. Your setup seems correct and your services should be able to communicate.
FYI depends on is not supported in swarm
Update 2: I think you are not using the port. Servicename is no replacement for the port. You need to use the port as the container knows it internally.

There was nothing wrong with dask or docker swarm. The problem was bad router firmware. After I went back to a prior version of the router firmware, the cluster worked fine.

Related

mediasoup v3 with Docker

Im trying to run an 2 WebRTC example(using mediasoup) in docker
I want to run two servers as I am working on video calling across a set of instances!
My Error:
Have you seen this Error:
createProducerTransport null Error: port bind failed due to address not available [transport:udp, ip:'172.17.0.1', port:50517, attempt:1/50000]
I think it's something to do with setting the docker network?
docker-compose.yml
version: "3"
services:
db:
image: mysql
restart: always
app:
image: app
build: .
ports:
- "1440:443"
- "2000-2020"
- "80:8080"
depends_on:
- db
app2:
image: app
build: .
ports:
- "1441:443"
- "2000-2020"
- "81:8080"
depends_on:
- db
Dockerfile
FROM node:12
WORKDIR /app
COPY . .
CMD npm start
It sais it couldn't bind the address so it could be the ip or the port that causes the problem.
The ip seems like it's the ip of the docker instance. although of the docker instances are in two different machines it should be the ip of the server and not the docker instance. (in the mediasoup settings)
There are also ports of the rtcp connection that have to be opened in the docker instance. They are normally also in the mediasouo config file. usually a range of a few hundred ports that need to be opened.
You should set your rtc min and max port to 2000 and 2020 for testing purpose. Also you are not forwarding these ports I guess. In docker-compose use 2000-2020:2000-2020 Also make sure to set your listenIps properly.
If you are running mediasoup in docker, container where mediasoup is installed should be run in network mode host.
This is explained here:
How to use host network for docker compose?
and official docs
https://docs.docker.com/network/host/
Also you should pay attention to mediasoup configuration settings webRtcTransport.listenIps and plainRtpTransport.listenIp they should tell client on which IP address is your mediasoup server listening.

Why are my services not showing up on the network after started in docker swarm?

After fiddling around for a couple days with what was new to me a week ago, I'm kind of stuck and would like your help. I've created a docker swarm with some Pi's running Ubuntu server 20.04 LTS and when I use the command:
$ docker stack deploy --compose-file docker-compose.visualizer.yml visualizer
The terminal feedback is:
Creating network visualizer_default
Creating service visualizer_visualizersvc
Practically the same output when I run:
$ docker stack deploy --compose-file docker-compose.home-assistant.yml home-assistant
Checking the stacks:
$ docker stack ls
NAME SERVICES ORCHESTRATOR
home-assistant 1 Swarm
visualizer 1 Swarm
Checking services in stacks:
$ docker stack services visualizer
ID NAME MODE REPLICAS IMAGE PORTS
t5nz28hzbzma visualizer_visualizersvc replicated 0/1 dockersamples/visualizer:latest *:8000->8080/tcp
$ docker stack services home-assistant
ID NAME MODE REPLICAS IMAGE PORTS
olj1nbx5vj40 home-assistant_homeassistant replicated 0/1 homeassistant/home-assistant:stable *:8123->8123/tcp
When I then browse to the ports specified in docker-compose.visualizer.yml or docker-compose.home-assistant.yml there is no response on the server side ("can't connect"). Identical for both the manager and worker IP. This is inside a home network, in a single subnet with no traffic rules set for LAN traffic.
EDIT: a portscan reveals no open ports in the specified range on either host.
Any comments on my work are welcome as I'm learning, but I would very much like to see some containers 'operational'.
As a reference I included the docker-compose files:
docker-compose.home-assistant.yml
version: "3"
services:
homeassistant:
image: homeassistant/home-assistant:stable
ports:
- "8123:8123"
volumes:
- './home-assistant:/config'
environment:
TZ: 'Madrid'
restart: unless-stopped
network_mode: host
docker-compose.visualizer.yml
version: "3"
services:
visualizersvc:
image: alexellis2/visualizer-arm:latest
deploy:
placement:
constraints:
- 'node.role==manager'
ports:
- '8000:8080'
volumes:
- '/var/run/docker.sock:/var/run/docker.sock'
Bonus points for telling me if I should always approach the manager through the specified ports or if I have to approach the machine running the service (or any good documentation on the subject.)
Not long after you post a question you happen to find the answer yourself of course:
I never scaled the services (to 1 in my case)
docker service scale [SERVICE_ID]=1
EDIT: The services were not scaling to 1 because of another error, I think in the visualizer, but this brought me to the final answer.
Now I'm getting a mountain of new error messages, but at least those are verbose :)
Any feedback is still welcome.

Traefik with docker swarm and mode: global: frontend rule to substitute hostname

I have a Docker Swarm cluster (currently 5 machines) where I run everything as a Docker Stack like so, initiating from the host: manager1:
$ docker stack deploy -c docker-compose.yml mystack
But I use Traefik as reverse proxy.
I wanted to add a Syncthing container to share some data between nodes, so I want it to run on each node. This is achieved thanks to the option:
deploy:
mode: global
This properly creates the containers I want, one per node.
I then want to access each Syncthing instance, thanks to Traefik, with unique urls like this:
frontend: manager1.syncthing.mydomain.com --> backend: syncthing container on host manager1
frontend: worker1.syncthing.mydomain.com --> backend: syncthing container on host worker1
frontend: worker2.syncthing.mydomain.com --> backend: syncthing container on host worker2
...
I fail to find the proper configuration for this (it is even possible?).
I thought about substituting variable in the docker-compose like so:
deploy:
...
labels:
...
- "traefik.frontend.rule=Host:${HOSTNAME}.syncthing.mydomain.com"
Even if $HOSTNAME is defined on all nodes (including manager), this fails; Traefik creates a useless route: ".syncthing.mydomain.com". A few researches have shown that this should at least not substitute ${HOSTNAME} for "" (pull/30781) but for "manager1". Anyway, I think this can be safely expected not to work as the substitution would probably be done on the master1 node where the docker stack deploy command is run.
As a workaround, I can deploy one service per node and use placement constraints to deploy one service per node; but this does not scale as new nodes would have to be added manually.
Any help would be greatly appreciated.
PS:
I run everything as arm on raspberry pi.
Docker version 17.05.0-ce, build 89658be
docker-compose version 1.9.0, build 2585387
traefik:cancoillotte

Docker - Run app on swarm manager (cant connect)

TLDR version:
How can I verify/set ports 7946 & 4789 on my swarm node so that I can view my application running from my docker-machine?
Complete question:
I am going through the docker tutorials and am on step 4
https://docs.docker.com/get-started/part4/#accessing-your-cluster
When I get to accessing your cluster section. It says that I should just be able to grab the ip address from one of my nodes displayed using docker-machine ls. I run that command, see the IP, grab it and put it into my browser (or alternatively use curl) and i receive the error
This site can’t be reached
192.168.99.100 refused to connect.
Try:
Checking the connection
Checking the proxy and the firewall
ERR_CONNECTION_REFUSED
Below this step it has a note saying that before you enable swarm mode, assuming they mean when you run:
docker-machine ssh myvm1 "docker swarm init --advertise-addr <myvm1 ip>"
You should check the following port settings
Having connectivity trouble?
Keep in mind that in order to use the ingress network in the swarm, you need to have the following ports open between the swarm nodes before you enable swarm mode:
Port 7946 TCP/UDP for container network discovery.
Port 4789 UDP for the container ingress network.
I've spent the last few days going through documentation, redoing the steps and trying everything I can to have this work but nothing has been successful.
Can anyone explain/provide documentation to show me how to view/set these ports, or explain if I am missing some other important information?
UPDATE
I wasn't able to get swarm working, so I decided to just run everything from a docker-compose.yml file. Here is the code I used below:
docker-compose.yml file:
version: '3'
services:
www:
build: .
ports:
- "80:80"
links:
- db
depends_on:
- db
volumes:
- .:/opt/www
db:
image: mysql:5.7
volumes:
- /var/lib/mysql
restart: always
environment:
MYSQL_ROOT_PASSWORD: supersecure
MYSQL_DATABASE: test_db
MYSQL_USER: jake
MYSQL_PASSWORD: supersecure
and a Dockerfile located in the same directory containing the following:
# A simple Flask app container.
FROM python:2.7
LABEL maintainer="your name here"
# Place app in container.
ADD . /opt/www
WORKDIR /opt/www
# Install dependencies.
RUN pip install -r requirements.txt
EXPOSE 80
ENV FLASK_APP index.py
ENV FLASK_DEBUG 1
CMD python index.py
you'll need to create any other files which are referenced in these two files (example requirements.txt & index.py) but those are all in the same directory as the dockerfile & docker-compose.yml files. Please comment if anyone has questions

Deploy a docker stack on one node (co-schedule containers like docker swarm)

I'm aware that docker-compose with docker-swarm (which is now legacy) is able to co-schedule some services on one node (using dependency filters such as link)
I was wondering if this kind of co-scheduling is possible using modern docker engine swarm mode and the new stack deployment introduced in Docker 1.13
In docker-compose file version 3, links are said to be ignored while deploying a stack in a swarm, so obviously links aren't the solution.
We have a bunch of servers to run batch short-running jobs and the network between them is not very high speed. We want to run each batch job (which consists of multiple containers) on one server to avoid networking overhead. Is this feature implemented in docker stack or docker swarm mode or we should use the legacy docker-swarm?
Also, I couldn't find co-scheduling with another container in the placement policies.
#Roman: You are right.
To deploy to a specific node you need to use placement policy:
version: '3'
services:
job1:
image: example/job1
deploy:
placement:
node.hostname: node-1
networks:
- example
job2:
image: example/job2
deploy:
placement:
node.hostname: node-1
networks:
- example
networks:
example:
driver: overlay
You can still use depends_on
It worth having a look at dockerize too.

Resources