Bad gateway on Portainer on Docker swarm - docker-swarm

noob here: I am trying to run portainer in a docker swarm.
I've been searching for a solution but I had no luck so far...
The traefik gui loads fine and I can see the portainer service running.Any ideas?
Please see config and compose files:
traefik.toml
[global]
checkNewVersion = true
# Dashboard
[api]
dashboard = true
# Traefik logs
[log]
level = "DEBUG"
filePath = "/traefik.log"
[entryPoints.http]
address = ":80"
[entryPoints.http.http.redirections.entryPoint]
to = "https"
scheme = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.http.tls]
certResolver = "main"
# Let's Encrypt
[certificatesResolvers.main.acme]
email = "name#gmail.com"
storage = "acme.json"
[certificatesResolvers.main.acme.dnsChallenge]
provider = "route53"
# Docker Traefik provider
[providers.docker]
endpoint = "unix:///var/run/docker.sock"
swarmMode = true
watch = true
exposedByDefault = false
[tls.options]
[tls.options.default]
minVersion = "VersionTLS13"
sniStrict = true
[tls.options.tls12]
minVersion = "VersionTLS12"
sinStrict = true
cipherSuites = [
"TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305",
"TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305"
]
docker-compose.yml
version: '3.2'
services:
agent:
image: portainer/agent:latest
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/lib/docker/volumes:/var/lib/docker/volumes
networks:
- traefik_public
deploy:
mode: global
placement:
constraints: [node.platform.os == linux]
portainer:
image: portainer/portainer-ee:latest
#command: -H tcp://tasks.agent:9001 --tlsskipverify
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /home/user/webapp/portainer/portainer_data:/data
networks:
- traefik_public
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.portainer.tls.certresolver=main"
- "traefik.http.routers.portainer.rule=Host(`portainer.mydomain.com`)"
- "traefik.http.services.portainer.loadbalancer.server.port=9000"
- "traefik.http.routers.portainer.entrypoints=https"
- "traefik.http.services.portainer.loadbalancer.server.scheme=https"
- "traefik.http.routers.portainer.tls=true"
mode: replicated
placement:
constraints: [node.role == manager]
networks:
traefik_public:
external: true
and this is the output of the traefik debug log :
error msg="Could not define the service name for the router: too many services" providerName=docker routerName=traefik-secure
debug msg="Filtering disabled container" container=portainer-agent-bn1xdlt8xbnzapa0fdjxadjiq providerName=docker
debug msg="Configuration received: {\"http\":{\"routers\":{\"api\":{\"entryPoints\":[\"https\"],\"service\":\"api#internal\",\"rule\":\"Host(`traefik.example.com`)\",\"tls\":{\"certResolver\":\"main\"}},\"portainer\":{\"entryPoints\":[\"https\"],\"service\":\"portainer\",\"rule\":\"Host(`portainer.example.com`)\"}},\"services\":{\"dummy\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.1.6:9999\"}],\"passHostHeader\":true}},\"portainer\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.1.12:9000\"}],\"passHostHeader\":true}},\"traefik\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.1.6:8080\"}],\"passHostHeader\":true}}}},\"tcp\":{},\"udp\":{}}" providerName=docker
debug msg="Skipping unchanged configuration." providerName=docker

Related

Not resovalbe hostnames of RabbitMQ cluster when deployed on Docker Swarm using the replica parameter

I use Docker Swarm to deploy 3 instances of RabbitMQ and Consul for the peer discovery.
version: '3.7'
services:
rabbitmq-1:
image: rabbitmq:3.8.9-alpine
hostname: "rabbitmq-1"
environment:
- RABBITMQ_ERLANG_COOKIE="SECRET_COOKIE"
networks:
- prod-net
configs:
- source: rabbitmq-config
target: /etc/rabbitmq/rabbitmq.conf
- source: rabbitmq-plugins-config
target: /etc/rabbitmq/enabled_plugins
volumes:
- /shared/rabbitmq/1/config:/etc/rabbitmq
- /shared/rabbitmq/1/data:/var/lib/rabbitmq
deploy:
replicas: 1
rabbitmq-2:
image: rabbitmq:3.8.9-alpine
hostname: "rabbitmq-2"
environment:
- RABBITMQ_ERLANG_COOKIE="SECRET_COOKIE"
networks:
- prod-net
configs:
- source: rabbitmq-config
target: /etc/rabbitmq/rabbitmq.conf
- source: rabbitmq-plugins-config
target: /etc/rabbitmq/enabled_plugins
volumes:
- /shared/rabbitmq/2/config:/etc/rabbitmq
- /shared/rabbitmq/2/data:/var/lib/rabbitmq
deploy:
replicas: 1
rabbitmq-3:
image: rabbitmq:3.8.9-alpine
hostname: "rabbitmq-3"
environment:
- RABBITMQ_ERLANG_COOKIE="SECRET_COOKIE"
networks:
- prod-net
configs:
- source: rabbitmq-config
target: /etc/rabbitmq/rabbitmq.conf
- source: rabbitmq-plugins-config
target: /etc/rabbitmq/enabled_plugins
volumes:
- /shared/rabbitmq/3/config:/etc/rabbitmq
- /shared/rabbitmq/3/data:/var/lib/rabbitmq
deploy:
replicas: 1
networks:
consul-net:
external: true
name: prod-net
configs:
rabbitmq-config:
external: true
rabbitmq-plugins-config:
external: true
RabbitMQ registers itself with Consul by its hostname so that the nodes can discover each other. So all the hostnames must resolve on all nodes. In the example above I'm using the same value for both the hostname and the service name to achieve this. But I would like to use a more compact way to represent a cluster by using a replicas: 3 parameter.
version: '3.7'
services:
rabbitmq:
image: rabbitmq:3.8.9-alpine
hostname: "rabbitmq-{{.Task.Slot}}"
networks:
- prod-net
configs:
- source: rabbitmq-config
target: /etc/rabbitmq/rabbitmq.conf
- source: rabbitmq-plugins-config
target: /etc/rabbitmq/enabled_plugins
deploy:
replicas: 3
restart_policy:
condition: any
networks:
prod-net:
external: true
configs:
rabbitmq-config:
external: true
rabbitmq-plugins-config:
external: true
Sadly in this case the hostnames are not anymore resolvable by docker DNS and the nodes cannot see each other. It there a way to achieve this ?
Edit.1:
Using these in rabbitmq-config gives the errors bellow:
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns
cluster_formation.dns.hostname = tasks.rabbitmq
2020-10-15 22:06:01.573 [error] <0.272.0> attempted to contact: ['rabbit#rabbitmq-3','rabbit#rabbitmq-2']
2020-10-15 22:06:01.574 [error] <0.272.0> rabbit#rabbitmq-3:
2020-10-15 22:06:01.574 [error] <0.272.0> * unable to connect to epmd (port 4369) on rabbitmq-3: couldn't resolve hostname
2020-10-15 22:06:01.574 [error] <0.272.0> rabbit#rabbitmq-2:
2020-10-15 22:06:01.575 [error] <0.272.0> * unable to connect to epmd (port 4369) on rabbitmq-2: couldn't resolve hostname
Edit.2:
rabbitmq.conf using Consul peer discovery
# Credentials
default_user = admin
default_pass = Pa$$w0rd1
loopback_users.admin = false
vm_memory_high_watermark.absolute = 1024MiB
disk_free_limit.absolute = 5GB
loopback_users.guest = false
# TLS Support in RabbitMQ
listeners.ssl.default = 5671
# Disables non-TLS listeners, only TLS-enabled clients will be able to connect
listeners.tcp = none
ssl_options.cacertfile = /etc/rabbitmq/ca_certificate.pem
ssl_options.certfile = /etc/rabbitmq/server_certificate.pem
ssl_options.keyfile = /etc/rabbitmq/server_key.pem
ssl_options.password = Pa$$phr#se
ssl_options.verify = verify_peer
ssl_options.fail_if_no_peer_cert = true
# TLS Support in RabbitMQ UI
management.ssl.port = 15671
management.ssl.cacertfile = /etc/rabbitmq/ca_certificate.pem
management.ssl.certfile = /etc/rabbitmq/server_certificate.pem
management.ssl.keyfile = /etc/rabbitmq/server_key.pem
management.ssl.password = Pa$$phr#se
management.ssl.verify = verify_none
management.ssl.fail_if_no_peer_cert = false
management.ssl.honor_cipher_order = true
management.ssl.honor_ecc_order = true
management.ssl.client_renegotiation = false
management.ssl.secure_renegotiate = true
cluster_partition_handling = autoheal
# Consul peer discovery
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul
# Consul host (hostname or IP address)
cluster_formation.consul.host = consul_client
# Service name (as registered in Consul) defaults to "rabbitmq"
cluster_formation.consul.svc = rabbitmq
# Compute service address (if not specify it below)
cluster_formation.consul.svc_addr_auto = true
cluster_formation.consul.use_longname = true
cluster_formation.consul.svc_ttl = 50
cluster_formation.consul.deregister_after = 100
cluster_formation.node_cleanup.only_log_warning = true
Or using DNS peer discovery:
#DNS Peer Discovery
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns
cluster_formation.dns.hostname = tasks.rabbitmq
To achieve DNS resolve between RabbitMQ replicas, need to modify /etc/hosts of each RabbitMQ container.
This will allow to form RabbitMQ cluster via rabbtimq configuration file.
See demo on how to create RabbitMQ cluster using single docker swarm service. I removed references to Consul and TLS from rabbitmq.conf for simplicity.

raspberry / docker swarm / traefik / portainer and no HTTPS

I spent the last 3 days trying to use traefik for HTTPS, load balancer, and to connect portainer and other docker containers in swarm mode. It is a home-server cluster made with 4 raspberrys, and what I want is the SSL auto-certificate function, and the HTTP to HTTPS redirection. For that purpose I've created a traefik.toml file:
logLevel = "DEBUG"
defaultEntryPoints = ["http", "https"]
[web]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[acme]
email = "xxx#xxx.com"
storage = "acme.json"
entryPoint = "https"
OnHostRule = true
[acme.httpChallenge]
entryPoint = "http"
[docker]
domain = "traefik" #<---- WHAT SHOULD I WRITE HERE?
watch = true
swarmmode = true
I don't know what should I write in the DOMAIN variable. I use NoIP as my dynamic DNS provider. Should I write the domain I get from them? and that should work inside my network? i.e. accesing from a computer inside my network with: 192.168.11.100
And I also have a docker-compose.yml file:
version: "3.4"
services:
proxy:
image: traefik:latest
command:
- "--api"
- "--entrypoints=Name:http Address::80 Redirect.EntryPoint:https"
- "--entrypoints=Name:https Address::443 TLS"
- "--defaultentrypoints=http,https"
- "--acme"
- "--acme.storage=/etc/traefik/acme/acme.json"
- "--acme.entryPoint=https"
- "--acme.httpChallenge.entryPoint=http"
- "--acme.onHostRule=true"
- "--acme.onDemand=false"
- "--acme.email=xxx#xxx.com"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=traefik.localhost" <- WHAT SHOULD I PUT IN HERE??
- "--docker.watch"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /mnt/traefik/acme.json:/etc/traefik/acme/acme.json
networks:
- appnet
ports:
- target: 80
published: 80
mode: host
- target: 443
published: 443
mode: host
- target: 8080
published: 8080
mode: host
deploy:
mode: global
placement:
constraints:
- node.role == manager
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: on-failure
networks:
appnet:
external: true
Deploy the stack, then I write in firefox in another computer 192.168.11.100, and I can see the "Welcome to nginx page". No HTTPS by the way. Try 192.168.11.100:8080 for the traefik dashboard. It is there, but again only HTTP.
If I deploy portainer, looks like it connects with traefik (at least appear in the dashboard), but again only HTTP.
Here's the logs for the traefik container after deploying portainer:
time="2019-02-19T11:32:52Z" level=error msg="Unable to obtain ACME certificate for domains \"portainer.com\" detected thanks to rule \"Host:portainer.com\" : unable to generate a certificate for the domains [portainer.com]: acme: Error -> One or more domains had a problem:\n[portainer.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://portainer.com/.well-known/acme-challenge/eDN0Z2VJRzuZm9wiAbar1BOVHLPJ5qPYKBpwfuJOtdY: \"<!doctype html><html><head><meta charset=\\\"utf-8\\\"><meta http-equiv=\\\"x-ua-compatible\\\" content=\\\"ie=edge\\\"><meta name=\\\"viewport\\\" cont\", url: \n"
time="2019-02-19T11:33:15Z" level=error msg="Unable to obtain ACME certificate for domains \"portainer.com\" detected thanks to rule \"Host:portainer.com\" : unable to generate a certificate for the domains [portainer.com]: acme: Error -> One or more domains had a problem:\n[portainer.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://portainer.com/.well-known/acme-challenge/Of6CWm4zvCdPo0BFPTxapEVXPU-qf7hhl1f6NCUTmQw: \"<!doctype html><html><head><meta charset=\\\"utf-8\\\"><meta http-equiv=\\\"x-ua-compatible\\\" content=\\\"ie=edge\\\"><meta name=\\\"viewport\\\" cont\", url: \n"
Am I missing something?

I'm trying to configure traefik + docker, but the browser loads the https url forever, do you know why?

I'm trying to configure traefik + docker but I'm having troubles: the browser loads the URL forever.
This is my actual configuration:
traefik.toml
debug = false
logLevel = "ERROR"
defaultEntryPoints = ["https","http"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
# https is the default
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "cloud.castignoli.it"
watch = true
exposedByDefault = false
[acme]
email = "marco.castignoli#gmail.com"
storage = "acme.json"
entryPoint = "https"
onHostRule = true
[acme.httpChallenge]
entryPoint = "http"
Then I have the acme.json, actually filled by treafik with the correct values.
I'm trying to activate https for the container foo, the domain is hello.cloud.castignoli.it
foo has only this label
traefik.frontend.rule=Host:hello.cloud.castignoli.it
These are traefik's logs
time="2018-10-11T08:04:50Z" level=error msg="Unable to obtain ACME certificate for domains \"reverse-proxy.traefik.\" detected thanks to rule \"Host:reverse-proxy.traefik.\" : unable to generate a certificate for the domains [reverse-proxy.traefik.]: acme: Error 400 - urn:ietf:params:acme:error:malformed - Error creating new order :: DNS name ends in a period"
This is the traefik dashboard
traefik's dashboard
The problem is with the domain for the traefik that is trying to generate a certificate for a non-existent domain.
In docker-compose.yml set labels with your domain or do not use --api. For example:
image: traefik
command: --api --docker
ports:
- "80:80"
- "443:443"
- "8080:8080"
networks:
- web
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/traefik/traefik.toml:/traefik.toml
- /opt/traefik/acme.json:/acme.json
labels:
- "traefik.docker.network=web"
- "traefik.port=8081"
- "traefik.enable=true"
- "traefik.frontend.rule=Host:your-awesome-host.com"

Traefik ACME DNS challenge not working with docker

I'm trying to configure Traefik as a proxy for docker containers running on DigitalOcean servers.
Here's my Traefik container configuration:
version: '2'
services:
traefik:
image: traefik
restart: always
command: --docker
ports:
- 80:80
- 443:443
networks:
- proxy
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- $PWD/traefik.toml:/traefik.toml
- $PWD/acme.json:/acme.json
container_name: traefik
environment:
DO_AUTH_TOKEN: abcd
labels:
- traefik.frontend.rule=Host:monitor.example.com
- traefik.port=8080
networks:
proxy:
external: true
And traefik.toml,
defaultEntryPoints = ["http", "https"]
[web]
address = ":8080"
[web.auth.basic]
users = ["admin:secretpassword"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[acme]
email = "lakshmi#example.com"
storage = "acme.json"
entryPoint = "https"
onHostRule = true
onDemand = false
[acme.dnsChallenge]
provider = "digitalocean"
delayBeforeCheck = 0
When I try to access https://monitor.example.com, I get this error:
traefik | time="2018-05-29T15:35:32Z" level=error msg="Unable to obtain ACME certificate for domains \"monitor.example.com\" detected thanks to rule \"Host:monitor.example.com\" : cannot obtain certificates: acme: Error -> One or more domains had a problem:\n[monitor.example.com] Error presenting token: HTTP 403: forbidden: You do not have access for the attempted action.\n"
I have given a valid DO token and pointed monitor.example.com to the VM running Traefik. Am I missing any step?
I was getting a 403 because Traefik was trying to write a TXT entry for ACME DNS challenge in my DigitalOcean domain using a read-only token. I changed it to a read-write token and it worked fine.
For anyone else having this issue, make sure acme.json has 600 permissions. Don't create or touch acme.json yourself. Let Traefik create it. After the pod is created, check permissions on acme.json.
The problem I found is Traefik creates acme.json and sets it to 600. After running upgrade, acme.json changed to 660 and starting giving the 'unknown resolver letsencrypt' error. The fix was having to uncomment the 'initContainers' lines in the values.yml in the Traefik Helm chart. Basically it sets permissions to 600 before startup. Hacky but works.
deployment:
enabled: true
# Can be either Deployment or DaemonSet
kind: Deployment
replicas: 1
annotations: {}
labels: {}
podAnnotations: {}
podLabels: {}
additionalContainers: []
volumeMounts:
- name: csi-pvc
initContainers:
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "chmod -Rv 600 /data/*"]
volumeMounts:
- name: csi-pvc
mountPath: /data
dnsPolicy: ClusterFirstWithHostNet
imagePullSecrets: []

Docker Traefik and letsencrypt wildcard

I've been trying to get traefik to install wildcard cert on my domain which requires dns challenge
from reading the logs it seems it was able to actually issue the cert but not install them correctly
time="2018-04-07T19:10:35Z" level=debug msg="Unable to marshal provider conf *acme.Provider with error: json: unsupported type: chan *acme.StoredData"
legolog: 2018/04/07 19:10:57 [INFO][example.tld] The server validated our request
legolog: 2018/04/07 19:10:58 [INFO][*.example.tld] acme: Validations succeeded; requesting certificates
legolog: 2018/04/07 19:11:01 [INFO][*.example.tld] Server responded with a certificate.
time="2018-04-07T19:11:01Z" level=error msg="Error loading new configuration, aborted unable to generate TLS certificate : tls: failed to find any PEM data in certificate input"
time="2018-04-07T19:12:33Z" level=debug msg="http2: server: error reading preface from client ******omitted***: remote error: tls: unknown certificate authority"
my domain dns provider is cloudflare
here's my docker docker-compose.yml
version: '2'
services:
traefik:
image: traefik:1.6.0-rc4
command: --api --docker
restart: always
ports:
- 80:80
- 443:443
- 8080:8080
networks:
- web
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/traefik/traefik.toml:/traefik.toml
- /opt/traefik/acme.json:/acme.json
environment:
- CLOUDFLARE_EMAIL=admin#example.tld
- CLOUDFLARE_API_KEY=
container_name: traefik
networks:
web:
external: true
And my traefik.toml
debug = true
logLevel = "DEBUG"
defaultEntryPoints = ["https","http"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[retry]
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "example.tld"
watch = true
exposedbydefault = false
[acme]
email = "admin#example.tld"
storage = "acme.json"
entryPoint = "https"
OnHostRule = true
acmeLogging = true
[acme.dnsChallenge]
provider = "cloudflare"
delayBeforeCheck = 0
[[acme.domains]]
main = "example.tld"
[[acme.domains]]
main = "*.example.tld"
I was able to fix the issue, it's a mistake on my part.
in the traefik.toml
you cannot use OnHostRule = true for wildcards certs
ReadMore:
docs.traefik.io/v1.7/configuration/acme/#onhostrule

Resources