Docker swarm redis connectivity issue - docker

I have following docker-compose.yml redis config.
version: '3.5'
services:
db:
image: redis:latest
command: redis-server --bind 0.0.0.0 --appendonly yes --protected-mode no
ports:
- target: 6379
published: 6379
protocol: tcp
mode: ingress
There are two hosts leader-0 (manager) and redis-0 (worker)
> root#leader-0:~# docker node ls
ID HOSTNAME STATUS
46tmallxr4l8xr7i90vlwntjq * leader-0 Ready
mofbedj4sqlxgnyatbxhlokc7 redis-0 Ready
Redis host redis-0 exposes 6379 port on the localhost as expected:
> root#redis-0:~# redis-cli -h 127.0.0.1 ping
PONG
but 6379 is not available on the manager (although it should):
> root#leader-0:~# redis-cli -h 127.0.0.1 ping
Could not connect to Redis at 127.0.0.1:6379: Connection timed out
Interesting part is:
Connection timed out (not refused).
redis-cli -h 127.0.0.1 ping on other workers hosts works as expected (returns PONG).
Docker overlay mash network should expose 6379 port on the local interface on each host, but it looks like something went wrong and I messed up figuring out what exactly.
Other services on the manager host works properly (I can
curl http://localhost:${SERVICE_PORT}/).
Manager host has the same firewall rules as worker hosts (+ additional ports opened)

Related

Docker Swarm Networking - no communication to some exposed ports

I have following docker-dompose file:
version: "3.9"
services:
pihole:
container_name: pihole
image: pihole/pihole:latest
ports:
- target: 53
published: 53
protocol: tcp
mode: host
- target: 53
published: 53
protocol: udp
mode: host
# - target: 80
# published: 80
# protocol: tcp
# mode: host
environment:
TZ: 'Europe/Warsaw'
DNS1: 1.1.1.1
DNS2: 8.8.8.8
VIRTUAL_HOST: 'pihole.local'
volumes:
- ./etc/pihole/:/etc/pihole
- ./etc-dnsmasq.d:/etc/dnsmasq.d
dns:
- 1.1.1.1
- 8.8.8.8
cap_add:
- NET_ADMIN
restart: unless-stopped
networks:
- public
networks:
public:
Working solution with docker-compose
Running this with:
docker-compose --file docker-compose-pihole.yml up -d
exposes ports 53 tcp/udp on host ip address
$ nmap 172.30.0.100 -Pn
Starting Nmap 7.80 ( https://nmap.org ) at 2022-01-02 10:42 CET
Nmap scan report for 172.30.0.100
Host is up (0.0038s latency).
Not shown: 998 filtered ports
PORT STATE SERVICE
22/tcp open ssh
53/tcp open domain
and dns resolution is working
$ nslookup google.pl 172.30.0.100
Server: 172.30.0.100
Address: 172.30.0.100#53
Non-authoritative answer:
Name: google.pl
Address: 172.217.16.3
Name: google.pl
Address: 2a00:1450:401b:804::2003
and I'm able to telnet to port 53
$ telnet 172.30.0.100 53
Trying 172.30.0.100...
Connected to 172.30.0.100.
Escape character is '^]'.
NOT Working solution with docker stack deploy
Running the same docker-compose file with
docker stack deploy -c docker-compose-pihole.yml pihole
also exposes 53 port tcp/udp on host IP address
$ nmap 172.30.0.100 -Pn
Starting Nmap 7.80 ( https://nmap.org ) at 2022-01-02 10:46 CET
Nmap scan report for 172.30.0.100
Host is up (0.0022s latency).
Not shown: 998 filtered ports
PORT STATE SERVICE
22/tcp open ssh
53/tcp open domain
however name resolution is not working
nslookup google.pl 172.30.0.100
;; connection timed out; no servers could be reached
telnet to port 53 is closed by remote host
$ telnet 172.30.0.100 53
Trying 172.30.0.100...
Connected to 172.30.0.100.
Escape character is '^]'.
Connection closed by foreign host.
Another strange thing is when port 80 is exposed.
In both cases I can access web UI on port 80 connecting to host IP
I have no idea what's going on and how to fix communication on port 53.
Fixed.
One ENV was missing for pihole:
- DNSMASQ_LISTENING: all
Two days to figure this out!

Consul agent. Check socket connection failed: error="dial tcp 172.19.0.6:50044: connect: connection refused"

I am having troubles with microservice health checks in my consul docker setup, which i believe is a symptom of failure in service discovery as i only have one server in my registry.
Below is consul list of members from inside the docker container.
/ # consul members
Node Address Status Type Build Protocol DC Segment
7b1edb14a647 172.19.0.6:8301 alive server 1.7.4 2 dc1 <all>
/ #
Consul container logs repeat the same error below for all the microservices:
consul | 2020-06-16T12:19:11.087Z [WARN] agent: Check socket connection failed: check=service:ffa44b66c4869601c04abdbea6dc5be5 error="dial tcp 172.19.0.6:50044: connect: connection refused"
I am using docker-compose v.3.2 to create a network for containers.
This is a consul service definition
consul:
container_name: consul
ports:
- '8400:8400'
- '8500:8500'
- '8600:53/udp'
image: consul
command: ['agent', '-server', '-bootstrap', '-ui', '-client', '0.0.0.0']
Microservice definition
service-notification:
build:
context: .
dockerfile: apps/service-notification/Dockerfile
args:
NODE_ENV: development
depends_on:
- consul
image: 'service-notification:latest'
restart: always
environment:
- CONSUL_HOST=consul
ports:
- '50044:50044'
I am using CONSUL_HOST env variable to pass in correct host url.
Consul config for the microservice
consul:
host: ${{CONSUL_HOST}}
port: 8500
service:
discoveryHost: ${{CONSUL_HOST}}
healthCheck:
timeout: 1s
interval: 10s
tcp: ${{ service.discoveryHost }}:${{ service.port }}
maxRetry: 5
retryInterval: 5000
tags: ["v1.0.0", "microservice"]
name: io.ultimatebackend.srv.notification
port: 50044
My conclusion so far is that consul server container fails to reach the agents somehow. But i don't know why and i feel like i am missing some obvious peace of consul structure. Please advise.
I was incorrectly configuring my service. The dicoveryHost should be an IP and port of a micro-service inside docker network.

Traefik 2.2 cannot connect to Docker Swarm API over TCP

Running Docker 18.09.7ce with Docker API v1.39 on Ubuntu 18.04 LTS.
I'm trying to set up Traefik 2.2 as a reverse proxy for some swarm services but for some reason Traefik can't connect to the Docker daemon via the TCP port given in the Traefik documentation. These three error messages keep repeating.
level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" status_code=-1 module=api
level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at tcp://127.0.0.1:2377. Is the docker daemon running?" providerName=docker
level=error msg="Provider connection error Cannot connect to the Docker daemon at tcp://127.0.0.1:2377. Is the docker daemon running?, retrying in 1.461723532s" providerName=docker
It's running on a manager node (I only have one node) and the swarm is working fine, with the API exposed via that TCP port, as shown by the output of the following command.
$ sudo ss --tcp --listening --processes --numeric | grep ":2377"
LISTEN 0 128 *:2377 *:* users:(("dockerd",pid=30747,fd=23))
My architecture is based on this blog post, with a shared overlay network called proxy created with docker network create --driver=overlay proxy.
I tried this but it didn't work, and I can't really find any other related questions. Here are my configuration files:
traefik.toml
[providers.docker]
endpoint = "tcp://127.0.0.1:2377"
swarmMode = true
network = "proxy"
[entryPoints]
[entryPoints.web]
address = ":80"
[entryPoints.web-secure]
address = ":443"
[certificatesResolvers.le.acme]
email = "my-email#email.com"
storage = "/letsencrypt/acme.json"
caserver = "https://acme-staging-v02.api.letsencrypt.org/directory" # For testing
[certificatesResolvers.le.acme.httpChallenge]
entryPoint = "web"
[log]
level = "DEBUG"
traefik.yml
version: "3.7"
services:
reverse-proxy:
deploy:
placement:
constraints:
- node.role == manager
image: "traefik:v2.2"
ports:
- 80:80
- 443:443
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
- "/path/to/traefik.toml:/etc/traefik/traefik.toml"
- "letsencrypt:/letsencrypt"
networks:
- "proxy"
networks:
proxy:
external: true
volumes:
letsencrypt:
The only difference I can see is that the blog does not explicitly define an endpoint for the dockers provider. Maybe to removing that?

JetBrains/Teamtools in docker container "Could not listen on address 0.0.0.0 and port 443"

Problem
I'm trying to set up JetBrains Hub, Youtrack, Upsource and Teamcity in a docker container and configure each to be available on their own IP (macvlan) at the default ports 80 redirected to 443 and 443 for HTTPS (so the port numbers do not show up in the browser).
However if I do that I get:
Could not listen on address 0.0.0.0 and port 443
Leaving the teamtools on their default ports 8080 and 8443 works or giving them ports over 2000 seems to work as well.
I checked with fuser 443/tcp and netstat -tulpn but there is nothing running on port 80 or 443. (had to install the packages for those in the container)
I tried setting the listening address to the NICs IP or 172.0.0.1 but this is refused as well:
root#teamtools [ /opt/teamtools ]# docker run --rm -it \
-v /opt/hub/data:/opt/hub/data \
-v /opt/hub/conf:/opt/hub/conf \
-v /opt/hub/logs:/opt/hub/logs \
-v /opt/hub/backups:/opt/hub/backups \
jetbrains/hub:2018.2.9840 \
configure --listen-address=192.168.1.211
* Configuring JetBrains Hub 2018.2
* Setting property 'listen-address' to '192.168.1.211' from arguments
[APP-WRAPPER] Failed to configure Hub: java.util.concurrent.ExecutionException: com.jetbrains.bundle.exceptions.BadConfigurationException: Could not listen on address {192.168.1.211} . Please specify another listen address in property listen-address
Question:
Why can I not set ports 80 and 443?
Why does it work for ports over
2000?
How can I make this work without a reverse proxy?
(reverse-proxy comes with a whole bunch of other issues, that I'm trying to avoid with this setup)
Setup
ESXi 6.7 Host
- vSwitch0 (Allow promiscuous mode: Yes)
- port group: VM Netork (Allow promiscuous mode: No)
- other VMs
- port group: Promiscuous Ports (Allow promiscuous mode: Yes)
- Teamtools VM (Photon OS 2.0, IP: 192.168.1.210)
- firewall based on: https://unrouted.io/2017/08/15/docker-firewall/
- docker/docker-compose
- hub (IP: 192.168.1.211:80/443)
- youtrack (IP: 192.168.1.212:80/443)
- upsource (IP: 192.168.1.213:80/443)
- teamcity-server (IP: 192.168.1.214:80/443)
- teamcity_db (MariaDB 10.3) (IP: 192.168.1.215:3306)
docker-compose.yml
version: '2'
networks:
macnet:
driver: macvlan
driver_opts:
parent: eth0
ipam:
config:
- subnet: 192.168.1.0/24
gateway: 192.168.1.1
services:
hub:
# set a custom container name so no more than one container can be created from this config
container_name: hub
image: "jetbrains/hub:2018.2.9840"
restart: unless-stopped
volumes:
- /opt/hub/data:/opt/hub/data
- /opt/hub/conf:/opt/hub/conf
- /opt/hub/logs:/opt/hub/logs
- /opt/hub/backups:/opt/hub/backups
- /opt/teamtools:/opt/teamtools
expose:
- "80"
- "443"
- "8080"
- "8443"
networks:
macnet:
ipv4_address: 192.168.1.211
domainname: office.mydomain.com
hostname: hub
environment:
- "JAVA_OPTS=-J-Djavax.net.ssl.trustStore=/opt/teamtools/certs/keyStore.p12 -J-Djavax.net.ssl.trustStorePassword=xxxxxxxxxxxxxx"
...
Upsource is running by user jetbrans, which is non-root.
https://www.w3.org/Daemon/User/Installation/PrivilegedPorts.html

How does service discovery work with modern docker/docker-compose?

I'm using Docker 1.11.1 and docker-compose 1.8.0-rc2.
In the good old days (so, last year), you could set up a docker-compose.yml file like this:
app:
image: myapp
frontend:
image: myfrontend
links:
- app
And then start up the environment like this:
docker scale app=3 frontend=1
And your frontend container could inspect the environment variables
for variables named APP_1_PORT, APP_2_PORT, etc to discover the
available backend hosts and configure itself accordingly.
Times have changed. Now, we do this...
version: '2'
services:
app:
image: myapp
frontend:
image: myfrontend
links:
- app
...and instead of environment variables, we get DNS. So inside the
frontend container, I can ask for app_app_1 or app_app_2 or
app_app_3 and get the corresponding ip address. I can also ask for
app and get the address of app_app_1.
But how do I discover all of the available backend containers? I
guess I could loop over getent hosts ... until it fails:
counter=1
while :; do
getent hosts app_$counter || break
backends="$backends app_$counter"
let counter++
done
But that seems ugly and fragile.
I've heard rumors about round-robin dns, but (a) that doesn't seem to
be happening in my test environment, and (b) that doesn't necessarily
help if your frontend needs simultaneous connections to the backends.
How is simple container and service discovery meant to work in the
modern Docker world?
Docker's built-in Nameserver & Loadbalancer
Docker comes with a built-in nameserver. The server is, by default, reachable via 127.0.0.11:53.
Every container has by default a nameserver entry in /etc/resolv.conf, so it is not required to specify the address of the nameserver from within the container. That is why you can find your service from within the network with service or task_service_n.
If you do task_service_n then you will get the address of the corresponding service replica.
If you only ask for the service docker will perform internal load balancing between container in the same network and external load balancing to handle requests from outside.
When swarm is used, docker will additionally use two special networks.
The ingress network, which is actually an overlay network and handles incomming trafic to the swarm. It allows to query any service from any node in the swarm.
The docker_gwbridge, a bridge network, which connects the overlay networks of the individual hosts to an their physical network. (including ingress)
When using swarm to deploy services, the behavior as described in the examples below will not work unless endpointmode is set to dns roundrobin instead of vip.
endpoint_mode: vip - Docker assigns the service a virtual IP (VIP) that acts as the front end for clients to reach the service on a network. Docker routes requests between the client and available worker nodes for the service, without client knowledge of how many nodes are participating in the service or their IP addresses or ports. (This is the default.)
endpoint_mode: dnsrr - DNS round-robin (DNSRR) service discovery does not use a single virtual IP. Docker sets up DNS entries for the service such that a DNS query for the service name returns a list of IP addresses, and the client connects directly to one of these. DNS round-robin is useful in cases where you want to use your own load balancer, or for Hybrid Windows and Linux applications.
Example
For example deploy three replicas from dig/docker-compose.yml
version: '3.8'
services:
whoami:
image: "traefik/whoami"
deploy:
replicas: 3
DNS Lookup
You can use tools such as dig or nslookup to do a DNS lookup against the nameserver in the same network.
docker run --rm --network dig_default tutum/dnsutils dig whoami
; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> whoami
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58433
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;whoami. IN A
;; ANSWER SECTION:
whoami. 600 IN A 172.28.0.3
whoami. 600 IN A 172.28.0.2
whoami. 600 IN A 172.28.0.4
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Mon Nov 16 22:36:37 UTC 2020
;; MSG SIZE rcvd: 90
If you are only interested in the IP, you can provide the +short option
docker run --rm --network dig_default tutum/dnsutils dig +short whoami
172.28.0.3
172.28.0.4
172.28.0.2
Or look for specific service
docker run --rm --network dig_default tutum/dnsutils dig +short dig_whoami_2
172.28.0.4
Load balancing
The default loadbalancing happens on the transport layer or layer 4 of the OSI Model. So it is TCP/UDP based. That means it is not possible to inpsect and manipulate http headers with this method. In the enterprise edition it is apparently possible to use labels similar to the ones treafik is using in the example a bit further down.
docker run --rm --network dig_default curlimages/curl -Ls http://whoami
Hostname: eedc94d45bf4
IP: 127.0.0.1
IP: 172.28.0.3
RemoteAddr: 172.28.0.5:43910
GET / HTTP/1.1
Host: whoami
User-Agent: curl/7.73.0-DEV
Accept: */*
Here is the hostname from 10 times curl:
Hostname: eedc94d45bf4
Hostname: 42312c03a825
Hostname: 42312c03a825
Hostname: 42312c03a825
Hostname: eedc94d45bf4
Hostname: d922d86eccc6
Hostname: d922d86eccc6
Hostname: eedc94d45bf4
Hostname: 42312c03a825
Hostname: d922d86eccc6
Health Checks
Health checks, by default, are done by checking the process id (PID) of the container on the host kernel. If the process is running successfully, the container is considered healthy.
Oftentimes other health checks are required. The container may be running but the application inside has crashed. In many cases a TCP or HTTP check is preferred.
It is possible to bake a custom health checks into images. For example, using curl to perform L7 health checks.
FROM traefik/whoami
HEALTHCHECK CMD curl --fail http://localhost || exit 1
It is also possible to specify the health check via cli when starting the container.
docker run \
--health-cmd "curl --fail http://localhost || exit 1" \
--health-interval=5s \
--timeout=3s \
traefik/whoami
Example with Swarm
As initially mentioned, swarms behavior is different in that it will assign a virtual IP to services by default. Its actually not different its just docker or docker-compose doesn't create real services, it just imitates the behavior of swarm but still runs the container normally, as services can, in fact, only be created by manager nodes.
Keeping in mind we are on a swarm manager and thus the default mode is VIP
Create a overlay network that can be used by regular containers too
$ docker network create --driver overlay --attachable testnet
create some service with 2 replicas
$ docker service create --network testnet --replicas 2 --name digme nginx
Now lets use dig again and making sure we attach the container to the same network
$ docker run --network testnet --rm tutum/dnsutils dig digme
digme. 600 IN A 10.0.18.6
We see that indeed we only got one IP address back, so it appears that this is the virtual IP that has been assigned by docker.
Swarm allows actually to get the single IPs in this case without explicitly setting the endpoint mode.
We can query for tasks.<servicename> in this case that is tasks.digme
$ docker run --network testnet --rm tutum/dnsutils dig tasks.digme
tasks.digme. 600 IN A 10.0.18.7
tasks.digme. 600 IN A 10.0.18.8
This has brought us 2 A records pointing to the individual replicas.
Now lets create another service with endpointmode set to dns roundrobin
docker service create --endpoint-mode dnsrr --network testnet --replicas 2 --name digme2 nginx
$ docker run --network testnet --rm tutum/dnsutils dig digme2
digme2. 600 IN A 10.0.18.21
digme2. 600 IN A 10.0.18.20
This way we get both IPs without adding the prefix tasks.
Service Discovery & Loadbalancing Strategies
If the built in features are not sufficent, some strategies can be implemented to achieve better control. Below are some examples.
HAProxy
Haproxy can use the docker nameserver in combination with dynamic server templates to discover the running container. Then the traditional proxy features can be leveraged to achieve powerful layer 7 load balancing with http header manipulation and chaos engeering such as retries.
version: '3.8'
services:
loadbalancer:
image: haproxy
volumes:
- ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
ports:
- 80:80
- 443:443
whoami:
image: "traefik/whoami"
deploy:
replicas: 3
...
resolvers docker
nameserver dns1 127.0.0.11:53
resolve_retries 3
timeout resolve 1s
timeout retry 1s
hold other 10s
hold refused 10s
hold nx 10s
hold timeout 10s
hold valid 10s
hold obsolete 10s
...
backend whoami
balance leastconn
option httpchk
option redispatch 1
retry-on all-retryable-errors
retries 2
http-request disable-l7-retry if METH_POST
dynamic-cookie-key MY_SERVICES_HASHED_ADDRESS
cookie MY_SERVICES_HASHED_ADDRESS insert dynamic
server-template whoami- 6 whoami:80 check resolvers docker init-addr libc,none
...
Traefik
The previous method is already pretty decent. However, you may have noticed that it requires knowing which services should be discovered and also the number of replicas to discover is hard coded. Traefik, a container native edge router, solves both problems. As long as we enable Traefik via label, the service will be discovered. This decentralized the configuration. It is as if each service registers itself.
The label can also be used to inspect and manipulate http headers.
version: "3.8"
services:
traefik:
image: "traefik:v2.3"
command:
- "--log.level=DEBUG"
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:80"
ports:
- "80:80"
- "8080:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
whoami:
image: "traefik/whoami"
labels:
- "traefik.enable=true"
- "traefik.port=80"
- "traefik.http.routers.whoami.entrypoints=web"
- "traefik.http.routers.whoami.rule=PathPrefix(`/`)"
- "traefik.http.services.whoami.loadbalancer.sticky=true"
- "traefik.http.services.whoami.loadbalancer.sticky.cookie.name=MY_SERVICE_ADDRESS"
deploy:
replicas: 3
Consul
Consul is a tool for service discovery and configuration management. Services have to be registered via API request. It is a more complex solution that probably only makes sense in bigger clusters, but can be very powerful. Usually it recommended running this on bare metal and not in a container. You could install it alongside the docker host on each server in your cluster.
In this example it has been paired with the registrator image, which takes care of registering the docker services in consuls catalog.
The catalog can be leveraged in many ways. One of them is to use consul-template.
Note that consul comes with its own DNS resolver so in this instance the docker DNS resolver is somewhat neglected.
version: '3.8'
services:
consul:
image: gliderlabs/consul-server:latest
command: "-advertise=${MYHOST} -server -bootstrap"
container_name: consul
hostname: ${MYHOST}
ports:
- 8500:8500
registrator:
image: gliderlabs/registrator:latest
command: "-ip ${MYHOST} consul://${MYHOST}:8500"
container_name: registrator
hostname: ${MYHOST}
depends_on:
- consul
volumes:
- /var/run/docker.sock:/tmp/docker.sock
proxy:
build: .
ports:
- 80:80
depends_on:
- consul
whoami:
image: "traefik/whoami"
deploy:
replicas: 3
ports:
- "80"
Dockerfile for custom proxy image with consul template backed in.
FROM nginx
RUN curl https://releases.hashicorp.com/consul-template/0.25.1/consul-template_0.25.1_linux_amd64.tgz \
> consul-template_0.25.1_linux_amd64.tgz
RUN gunzip -c consul-template_0.25.1_linux_amd64.tgz | tar xvf -
RUN mv consul-template /usr/sbin/consul-template
RUN rm /etc/nginx/conf.d/default.conf
ADD proxy.conf.ctmpl /etc/nginx/conf.d/
ADD consul-template.hcl /
CMD [ "/bin/bash", "-c", "/etc/init.d/nginx start && consul-template -config=consul-template.hcl" ]
Consul template takes a template file and renders it according to the content of consuls catalog.
upstream whoami {
{{ range service "whoami" }}
server {{ .Address }}:{{ .Port }};
{{ end }}
}
server {
listen 80;
location / {
proxy_pass http://whoami;
}
}
After the template has been changed, the restart command is executed.
consul {
address = "consul:8500"
retry {
enabled = true
attempts = 12
backoff = "250ms"
}
}
template {
source = "/etc/nginx/conf.d/proxy.conf.ctmpl"
destination = "/etc/nginx/conf.d/proxy.conf"
perms = 0600
command = "/etc/init.d/nginx reload"
command_timeout = "60s"
}
Feature Table
Built In
HAProxy
Traefik
Consul-Template
Resolver
Docker
Docker
Docker
Consul
Service Discovery
Automatic
Server Templates
Label System
KV Store + Template
Health Checks
Yes
Yes
Yes
Yes
Load Balancing
L4
L4, L7
L4, L7
L4, L7
Sticky Session
No
Yes
Yes
Depends on proxy
Metrics
No
Stats Page
Dashboard
Dashboard
You can view some of the code samples in more detail on github.

Resources