Not resovalbe hostnames of RabbitMQ cluster when deployed on Docker Swarm using the replica parameter - docker

I use Docker Swarm to deploy 3 instances of RabbitMQ and Consul for the peer discovery.
version: '3.7'
services:
rabbitmq-1:
image: rabbitmq:3.8.9-alpine
hostname: "rabbitmq-1"
environment:
- RABBITMQ_ERLANG_COOKIE="SECRET_COOKIE"
networks:
- prod-net
configs:
- source: rabbitmq-config
target: /etc/rabbitmq/rabbitmq.conf
- source: rabbitmq-plugins-config
target: /etc/rabbitmq/enabled_plugins
volumes:
- /shared/rabbitmq/1/config:/etc/rabbitmq
- /shared/rabbitmq/1/data:/var/lib/rabbitmq
deploy:
replicas: 1
rabbitmq-2:
image: rabbitmq:3.8.9-alpine
hostname: "rabbitmq-2"
environment:
- RABBITMQ_ERLANG_COOKIE="SECRET_COOKIE"
networks:
- prod-net
configs:
- source: rabbitmq-config
target: /etc/rabbitmq/rabbitmq.conf
- source: rabbitmq-plugins-config
target: /etc/rabbitmq/enabled_plugins
volumes:
- /shared/rabbitmq/2/config:/etc/rabbitmq
- /shared/rabbitmq/2/data:/var/lib/rabbitmq
deploy:
replicas: 1
rabbitmq-3:
image: rabbitmq:3.8.9-alpine
hostname: "rabbitmq-3"
environment:
- RABBITMQ_ERLANG_COOKIE="SECRET_COOKIE"
networks:
- prod-net
configs:
- source: rabbitmq-config
target: /etc/rabbitmq/rabbitmq.conf
- source: rabbitmq-plugins-config
target: /etc/rabbitmq/enabled_plugins
volumes:
- /shared/rabbitmq/3/config:/etc/rabbitmq
- /shared/rabbitmq/3/data:/var/lib/rabbitmq
deploy:
replicas: 1
networks:
consul-net:
external: true
name: prod-net
configs:
rabbitmq-config:
external: true
rabbitmq-plugins-config:
external: true
RabbitMQ registers itself with Consul by its hostname so that the nodes can discover each other. So all the hostnames must resolve on all nodes. In the example above I'm using the same value for both the hostname and the service name to achieve this. But I would like to use a more compact way to represent a cluster by using a replicas: 3 parameter.
version: '3.7'
services:
rabbitmq:
image: rabbitmq:3.8.9-alpine
hostname: "rabbitmq-{{.Task.Slot}}"
networks:
- prod-net
configs:
- source: rabbitmq-config
target: /etc/rabbitmq/rabbitmq.conf
- source: rabbitmq-plugins-config
target: /etc/rabbitmq/enabled_plugins
deploy:
replicas: 3
restart_policy:
condition: any
networks:
prod-net:
external: true
configs:
rabbitmq-config:
external: true
rabbitmq-plugins-config:
external: true
Sadly in this case the hostnames are not anymore resolvable by docker DNS and the nodes cannot see each other. It there a way to achieve this ?
Edit.1:
Using these in rabbitmq-config gives the errors bellow:
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns
cluster_formation.dns.hostname = tasks.rabbitmq
2020-10-15 22:06:01.573 [error] <0.272.0> attempted to contact: ['rabbit#rabbitmq-3','rabbit#rabbitmq-2']
2020-10-15 22:06:01.574 [error] <0.272.0> rabbit#rabbitmq-3:
2020-10-15 22:06:01.574 [error] <0.272.0> * unable to connect to epmd (port 4369) on rabbitmq-3: couldn't resolve hostname
2020-10-15 22:06:01.574 [error] <0.272.0> rabbit#rabbitmq-2:
2020-10-15 22:06:01.575 [error] <0.272.0> * unable to connect to epmd (port 4369) on rabbitmq-2: couldn't resolve hostname
Edit.2:
rabbitmq.conf using Consul peer discovery
# Credentials
default_user = admin
default_pass = Pa$$w0rd1
loopback_users.admin = false
vm_memory_high_watermark.absolute = 1024MiB
disk_free_limit.absolute = 5GB
loopback_users.guest = false
# TLS Support in RabbitMQ
listeners.ssl.default = 5671
# Disables non-TLS listeners, only TLS-enabled clients will be able to connect
listeners.tcp = none
ssl_options.cacertfile = /etc/rabbitmq/ca_certificate.pem
ssl_options.certfile = /etc/rabbitmq/server_certificate.pem
ssl_options.keyfile = /etc/rabbitmq/server_key.pem
ssl_options.password = Pa$$phr#se
ssl_options.verify = verify_peer
ssl_options.fail_if_no_peer_cert = true
# TLS Support in RabbitMQ UI
management.ssl.port = 15671
management.ssl.cacertfile = /etc/rabbitmq/ca_certificate.pem
management.ssl.certfile = /etc/rabbitmq/server_certificate.pem
management.ssl.keyfile = /etc/rabbitmq/server_key.pem
management.ssl.password = Pa$$phr#se
management.ssl.verify = verify_none
management.ssl.fail_if_no_peer_cert = false
management.ssl.honor_cipher_order = true
management.ssl.honor_ecc_order = true
management.ssl.client_renegotiation = false
management.ssl.secure_renegotiate = true
cluster_partition_handling = autoheal
# Consul peer discovery
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul
# Consul host (hostname or IP address)
cluster_formation.consul.host = consul_client
# Service name (as registered in Consul) defaults to "rabbitmq"
cluster_formation.consul.svc = rabbitmq
# Compute service address (if not specify it below)
cluster_formation.consul.svc_addr_auto = true
cluster_formation.consul.use_longname = true
cluster_formation.consul.svc_ttl = 50
cluster_formation.consul.deregister_after = 100
cluster_formation.node_cleanup.only_log_warning = true
Or using DNS peer discovery:
#DNS Peer Discovery
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns
cluster_formation.dns.hostname = tasks.rabbitmq

To achieve DNS resolve between RabbitMQ replicas, need to modify /etc/hosts of each RabbitMQ container.
This will allow to form RabbitMQ cluster via rabbtimq configuration file.
See demo on how to create RabbitMQ cluster using single docker swarm service. I removed references to Consul and TLS from rabbitmq.conf for simplicity.

Related

Bad gateway on Portainer on Docker swarm

noob here: I am trying to run portainer in a docker swarm.
I've been searching for a solution but I had no luck so far...
The traefik gui loads fine and I can see the portainer service running.Any ideas?
Please see config and compose files:
traefik.toml
[global]
checkNewVersion = true
# Dashboard
[api]
dashboard = true
# Traefik logs
[log]
level = "DEBUG"
filePath = "/traefik.log"
[entryPoints.http]
address = ":80"
[entryPoints.http.http.redirections.entryPoint]
to = "https"
scheme = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.http.tls]
certResolver = "main"
# Let's Encrypt
[certificatesResolvers.main.acme]
email = "name#gmail.com"
storage = "acme.json"
[certificatesResolvers.main.acme.dnsChallenge]
provider = "route53"
# Docker Traefik provider
[providers.docker]
endpoint = "unix:///var/run/docker.sock"
swarmMode = true
watch = true
exposedByDefault = false
[tls.options]
[tls.options.default]
minVersion = "VersionTLS13"
sniStrict = true
[tls.options.tls12]
minVersion = "VersionTLS12"
sinStrict = true
cipherSuites = [
"TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305",
"TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305"
]
docker-compose.yml
version: '3.2'
services:
agent:
image: portainer/agent:latest
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/lib/docker/volumes:/var/lib/docker/volumes
networks:
- traefik_public
deploy:
mode: global
placement:
constraints: [node.platform.os == linux]
portainer:
image: portainer/portainer-ee:latest
#command: -H tcp://tasks.agent:9001 --tlsskipverify
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /home/user/webapp/portainer/portainer_data:/data
networks:
- traefik_public
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.portainer.tls.certresolver=main"
- "traefik.http.routers.portainer.rule=Host(`portainer.mydomain.com`)"
- "traefik.http.services.portainer.loadbalancer.server.port=9000"
- "traefik.http.routers.portainer.entrypoints=https"
- "traefik.http.services.portainer.loadbalancer.server.scheme=https"
- "traefik.http.routers.portainer.tls=true"
mode: replicated
placement:
constraints: [node.role == manager]
networks:
traefik_public:
external: true
and this is the output of the traefik debug log :
error msg="Could not define the service name for the router: too many services" providerName=docker routerName=traefik-secure
debug msg="Filtering disabled container" container=portainer-agent-bn1xdlt8xbnzapa0fdjxadjiq providerName=docker
debug msg="Configuration received: {\"http\":{\"routers\":{\"api\":{\"entryPoints\":[\"https\"],\"service\":\"api#internal\",\"rule\":\"Host(`traefik.example.com`)\",\"tls\":{\"certResolver\":\"main\"}},\"portainer\":{\"entryPoints\":[\"https\"],\"service\":\"portainer\",\"rule\":\"Host(`portainer.example.com`)\"}},\"services\":{\"dummy\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.1.6:9999\"}],\"passHostHeader\":true}},\"portainer\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.1.12:9000\"}],\"passHostHeader\":true}},\"traefik\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.1.6:8080\"}],\"passHostHeader\":true}}}},\"tcp\":{},\"udp\":{}}" providerName=docker
debug msg="Skipping unchanged configuration." providerName=docker

Traefik V2 docker provider path /ws routing to external websocket server

Stack:
serverX
docker
traefik
x-site.com
redirect :80->:443
https://x-site.com/* -> x-site docker container
wss://x-site.com/ws proxy -> ws://external.websocket.com:8083/ws HOW TO?
y-site.com
...
external.websocket.com - works perfectly with all kinds of mqtt,ws clients, except web
software: emqx
listens: ws:8083 mqtt:1883
protocol: ws, mqtt
proxy protocol support: on
docker-compose.yml
version: '3.7'
services:
router:
container_name: router
image: traefik:v2.6
restart: on-failure:5
ports:
- target: 80
published: 80
mode: host
- target: 443
published: 443
mode: host
- target: 8080
published: 8080
mode: host
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./router:/etc/traefik
x-site:
container_name: x-site
image: backend
restart: on-failure:5
env_file:
- /.default.env
- /.server.env
- /root/env/x-site.env
labels:
traefik.enable: true
traefik.http.routers.x-site.rule: Host(`x-site.com`)
traefik.http.routers.x-site.entrypoints: web,websecure
traefik.http.routers.x-site.tls: true
# tried service loadbalancer and etc
# traefik.http.routers.ws_x-site.rule: Host(`x-site.com`) && PathPrefix(`/ws`)
# traefik.http.routers.ws_x-site.entrypoints: websecure
# traefik.http.routers.ws_x-site.service: websocket_x-site
# traefik.http.services.websocket_x-site.loadBalancer.server.url: ws://external.websocket.com:8083/ws
traefik.yaml
api:
insecure: true
dashboard: true
entryPoints:
web:
address: ":80"
http:
redirections:
entryPoint:
to: websecure
scheme: https
permanent: true
transport:
respondingTimeouts:
readTimeout: 10
writeTimeout: 10
idleTimeout: 10
lifeCycle:
requestAcceptGraceTimeout: 5
graceTimeOut: 5
websecure:
address: ":443"
transport:
respondingTimeouts:
readTimeout: 300
writeTimeout: 3600
idleTimeout: 180
lifeCycle:
requestAcceptGraceTimeout: 30
graceTimeOut: 30
providers:
docker:
endpoint: unix:///var/run/docker.sock
exposedByDefault: false
file:
filename: /etc/traefik/cert.yaml
log:
level: ERROR

Docker swarm : can't curl to a service container

I ve a service running under a stack swarm :
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
de74ba4d48c1 myregistry/myApi:1.0 "java -Dfile.encodin…" 3 minutes ago Up 3 minutes 8300/tcp myApiCtn
As you can see , my service is running on the 8300 port.
The probleme is that when i run curl ; it seems to not reply:
[user#server home]$ curl http://localhost:8300/api/elk/batch
curl: (52) Empty reply from server
In another side if i ran my container manually (without stack and without swarm services )
(docker run ...)
-> curl works well
My docker-compose is the following :
---
version: '3.4'
services:
api-batch:
image: myRegistry/myImageApi
networks:
- net_common
- default
stdin_open: true
volumes:
- /opt/application/current/logs:/opt/application/current/logs
- /var/opt/data/flat/flf/:/var/opt/data/flat/flf/
tty: true
ports:
- target: 8300
published: 8300
protocol: tcp
deploy:
mode: global
resources:
limits:
memory: 1024M
placement:
constraints:
- node.labels.type == test
healthcheck:
disable: true
networks:
net_common:
external: true
Where my networks list is the following :
NETWORK ID NAME DRIVER SCOPE
17795bfee9ca bridge bridge local
0faecb070730 docker_gwbridge bridge local
51c34d251495 host host local
j2nnf26asn3k ingress overlay swarm
3all3tmn3qn9 net_common overlay swarm
b7alw2yi5fk9 srcd-current_default overlay swarm
Any suggestion to make it work under swarm service ?

raspberry / docker swarm / traefik / portainer and no HTTPS

I spent the last 3 days trying to use traefik for HTTPS, load balancer, and to connect portainer and other docker containers in swarm mode. It is a home-server cluster made with 4 raspberrys, and what I want is the SSL auto-certificate function, and the HTTP to HTTPS redirection. For that purpose I've created a traefik.toml file:
logLevel = "DEBUG"
defaultEntryPoints = ["http", "https"]
[web]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[acme]
email = "xxx#xxx.com"
storage = "acme.json"
entryPoint = "https"
OnHostRule = true
[acme.httpChallenge]
entryPoint = "http"
[docker]
domain = "traefik" #<---- WHAT SHOULD I WRITE HERE?
watch = true
swarmmode = true
I don't know what should I write in the DOMAIN variable. I use NoIP as my dynamic DNS provider. Should I write the domain I get from them? and that should work inside my network? i.e. accesing from a computer inside my network with: 192.168.11.100
And I also have a docker-compose.yml file:
version: "3.4"
services:
proxy:
image: traefik:latest
command:
- "--api"
- "--entrypoints=Name:http Address::80 Redirect.EntryPoint:https"
- "--entrypoints=Name:https Address::443 TLS"
- "--defaultentrypoints=http,https"
- "--acme"
- "--acme.storage=/etc/traefik/acme/acme.json"
- "--acme.entryPoint=https"
- "--acme.httpChallenge.entryPoint=http"
- "--acme.onHostRule=true"
- "--acme.onDemand=false"
- "--acme.email=xxx#xxx.com"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=traefik.localhost" <- WHAT SHOULD I PUT IN HERE??
- "--docker.watch"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /mnt/traefik/acme.json:/etc/traefik/acme/acme.json
networks:
- appnet
ports:
- target: 80
published: 80
mode: host
- target: 443
published: 443
mode: host
- target: 8080
published: 8080
mode: host
deploy:
mode: global
placement:
constraints:
- node.role == manager
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: on-failure
networks:
appnet:
external: true
Deploy the stack, then I write in firefox in another computer 192.168.11.100, and I can see the "Welcome to nginx page". No HTTPS by the way. Try 192.168.11.100:8080 for the traefik dashboard. It is there, but again only HTTP.
If I deploy portainer, looks like it connects with traefik (at least appear in the dashboard), but again only HTTP.
Here's the logs for the traefik container after deploying portainer:
time="2019-02-19T11:32:52Z" level=error msg="Unable to obtain ACME certificate for domains \"portainer.com\" detected thanks to rule \"Host:portainer.com\" : unable to generate a certificate for the domains [portainer.com]: acme: Error -> One or more domains had a problem:\n[portainer.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://portainer.com/.well-known/acme-challenge/eDN0Z2VJRzuZm9wiAbar1BOVHLPJ5qPYKBpwfuJOtdY: \"<!doctype html><html><head><meta charset=\\\"utf-8\\\"><meta http-equiv=\\\"x-ua-compatible\\\" content=\\\"ie=edge\\\"><meta name=\\\"viewport\\\" cont\", url: \n"
time="2019-02-19T11:33:15Z" level=error msg="Unable to obtain ACME certificate for domains \"portainer.com\" detected thanks to rule \"Host:portainer.com\" : unable to generate a certificate for the domains [portainer.com]: acme: Error -> One or more domains had a problem:\n[portainer.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://portainer.com/.well-known/acme-challenge/Of6CWm4zvCdPo0BFPTxapEVXPU-qf7hhl1f6NCUTmQw: \"<!doctype html><html><head><meta charset=\\\"utf-8\\\"><meta http-equiv=\\\"x-ua-compatible\\\" content=\\\"ie=edge\\\"><meta name=\\\"viewport\\\" cont\", url: \n"
Am I missing something?

Traefik ACME DNS challenge not working with docker

I'm trying to configure Traefik as a proxy for docker containers running on DigitalOcean servers.
Here's my Traefik container configuration:
version: '2'
services:
traefik:
image: traefik
restart: always
command: --docker
ports:
- 80:80
- 443:443
networks:
- proxy
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- $PWD/traefik.toml:/traefik.toml
- $PWD/acme.json:/acme.json
container_name: traefik
environment:
DO_AUTH_TOKEN: abcd
labels:
- traefik.frontend.rule=Host:monitor.example.com
- traefik.port=8080
networks:
proxy:
external: true
And traefik.toml,
defaultEntryPoints = ["http", "https"]
[web]
address = ":8080"
[web.auth.basic]
users = ["admin:secretpassword"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[acme]
email = "lakshmi#example.com"
storage = "acme.json"
entryPoint = "https"
onHostRule = true
onDemand = false
[acme.dnsChallenge]
provider = "digitalocean"
delayBeforeCheck = 0
When I try to access https://monitor.example.com, I get this error:
traefik | time="2018-05-29T15:35:32Z" level=error msg="Unable to obtain ACME certificate for domains \"monitor.example.com\" detected thanks to rule \"Host:monitor.example.com\" : cannot obtain certificates: acme: Error -> One or more domains had a problem:\n[monitor.example.com] Error presenting token: HTTP 403: forbidden: You do not have access for the attempted action.\n"
I have given a valid DO token and pointed monitor.example.com to the VM running Traefik. Am I missing any step?
I was getting a 403 because Traefik was trying to write a TXT entry for ACME DNS challenge in my DigitalOcean domain using a read-only token. I changed it to a read-write token and it worked fine.
For anyone else having this issue, make sure acme.json has 600 permissions. Don't create or touch acme.json yourself. Let Traefik create it. After the pod is created, check permissions on acme.json.
The problem I found is Traefik creates acme.json and sets it to 600. After running upgrade, acme.json changed to 660 and starting giving the 'unknown resolver letsencrypt' error. The fix was having to uncomment the 'initContainers' lines in the values.yml in the Traefik Helm chart. Basically it sets permissions to 600 before startup. Hacky but works.
deployment:
enabled: true
# Can be either Deployment or DaemonSet
kind: Deployment
replicas: 1
annotations: {}
labels: {}
podAnnotations: {}
podLabels: {}
additionalContainers: []
volumeMounts:
- name: csi-pvc
initContainers:
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "chmod -Rv 600 /data/*"]
volumeMounts:
- name: csi-pvc
mountPath: /data
dnsPolicy: ClusterFirstWithHostNet
imagePullSecrets: []

Resources