Traefik ACME DNS challenge not working with docker - docker

I'm trying to configure Traefik as a proxy for docker containers running on DigitalOcean servers.
Here's my Traefik container configuration:
version: '2'
services:
traefik:
image: traefik
restart: always
command: --docker
ports:
- 80:80
- 443:443
networks:
- proxy
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- $PWD/traefik.toml:/traefik.toml
- $PWD/acme.json:/acme.json
container_name: traefik
environment:
DO_AUTH_TOKEN: abcd
labels:
- traefik.frontend.rule=Host:monitor.example.com
- traefik.port=8080
networks:
proxy:
external: true
And traefik.toml,
defaultEntryPoints = ["http", "https"]
[web]
address = ":8080"
[web.auth.basic]
users = ["admin:secretpassword"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[acme]
email = "lakshmi#example.com"
storage = "acme.json"
entryPoint = "https"
onHostRule = true
onDemand = false
[acme.dnsChallenge]
provider = "digitalocean"
delayBeforeCheck = 0
When I try to access https://monitor.example.com, I get this error:
traefik | time="2018-05-29T15:35:32Z" level=error msg="Unable to obtain ACME certificate for domains \"monitor.example.com\" detected thanks to rule \"Host:monitor.example.com\" : cannot obtain certificates: acme: Error -> One or more domains had a problem:\n[monitor.example.com] Error presenting token: HTTP 403: forbidden: You do not have access for the attempted action.\n"
I have given a valid DO token and pointed monitor.example.com to the VM running Traefik. Am I missing any step?

I was getting a 403 because Traefik was trying to write a TXT entry for ACME DNS challenge in my DigitalOcean domain using a read-only token. I changed it to a read-write token and it worked fine.

For anyone else having this issue, make sure acme.json has 600 permissions. Don't create or touch acme.json yourself. Let Traefik create it. After the pod is created, check permissions on acme.json.
The problem I found is Traefik creates acme.json and sets it to 600. After running upgrade, acme.json changed to 660 and starting giving the 'unknown resolver letsencrypt' error. The fix was having to uncomment the 'initContainers' lines in the values.yml in the Traefik Helm chart. Basically it sets permissions to 600 before startup. Hacky but works.
deployment:
enabled: true
# Can be either Deployment or DaemonSet
kind: Deployment
replicas: 1
annotations: {}
labels: {}
podAnnotations: {}
podLabels: {}
additionalContainers: []
volumeMounts:
- name: csi-pvc
initContainers:
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "chmod -Rv 600 /data/*"]
volumeMounts:
- name: csi-pvc
mountPath: /data
dnsPolicy: ClusterFirstWithHostNet
imagePullSecrets: []

Related

raspberry / docker swarm / traefik / portainer and no HTTPS

I spent the last 3 days trying to use traefik for HTTPS, load balancer, and to connect portainer and other docker containers in swarm mode. It is a home-server cluster made with 4 raspberrys, and what I want is the SSL auto-certificate function, and the HTTP to HTTPS redirection. For that purpose I've created a traefik.toml file:
logLevel = "DEBUG"
defaultEntryPoints = ["http", "https"]
[web]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[acme]
email = "xxx#xxx.com"
storage = "acme.json"
entryPoint = "https"
OnHostRule = true
[acme.httpChallenge]
entryPoint = "http"
[docker]
domain = "traefik" #<---- WHAT SHOULD I WRITE HERE?
watch = true
swarmmode = true
I don't know what should I write in the DOMAIN variable. I use NoIP as my dynamic DNS provider. Should I write the domain I get from them? and that should work inside my network? i.e. accesing from a computer inside my network with: 192.168.11.100
And I also have a docker-compose.yml file:
version: "3.4"
services:
proxy:
image: traefik:latest
command:
- "--api"
- "--entrypoints=Name:http Address::80 Redirect.EntryPoint:https"
- "--entrypoints=Name:https Address::443 TLS"
- "--defaultentrypoints=http,https"
- "--acme"
- "--acme.storage=/etc/traefik/acme/acme.json"
- "--acme.entryPoint=https"
- "--acme.httpChallenge.entryPoint=http"
- "--acme.onHostRule=true"
- "--acme.onDemand=false"
- "--acme.email=xxx#xxx.com"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=traefik.localhost" <- WHAT SHOULD I PUT IN HERE??
- "--docker.watch"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /mnt/traefik/acme.json:/etc/traefik/acme/acme.json
networks:
- appnet
ports:
- target: 80
published: 80
mode: host
- target: 443
published: 443
mode: host
- target: 8080
published: 8080
mode: host
deploy:
mode: global
placement:
constraints:
- node.role == manager
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: on-failure
networks:
appnet:
external: true
Deploy the stack, then I write in firefox in another computer 192.168.11.100, and I can see the "Welcome to nginx page". No HTTPS by the way. Try 192.168.11.100:8080 for the traefik dashboard. It is there, but again only HTTP.
If I deploy portainer, looks like it connects with traefik (at least appear in the dashboard), but again only HTTP.
Here's the logs for the traefik container after deploying portainer:
time="2019-02-19T11:32:52Z" level=error msg="Unable to obtain ACME certificate for domains \"portainer.com\" detected thanks to rule \"Host:portainer.com\" : unable to generate a certificate for the domains [portainer.com]: acme: Error -> One or more domains had a problem:\n[portainer.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://portainer.com/.well-known/acme-challenge/eDN0Z2VJRzuZm9wiAbar1BOVHLPJ5qPYKBpwfuJOtdY: \"<!doctype html><html><head><meta charset=\\\"utf-8\\\"><meta http-equiv=\\\"x-ua-compatible\\\" content=\\\"ie=edge\\\"><meta name=\\\"viewport\\\" cont\", url: \n"
time="2019-02-19T11:33:15Z" level=error msg="Unable to obtain ACME certificate for domains \"portainer.com\" detected thanks to rule \"Host:portainer.com\" : unable to generate a certificate for the domains [portainer.com]: acme: Error -> One or more domains had a problem:\n[portainer.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://portainer.com/.well-known/acme-challenge/Of6CWm4zvCdPo0BFPTxapEVXPU-qf7hhl1f6NCUTmQw: \"<!doctype html><html><head><meta charset=\\\"utf-8\\\"><meta http-equiv=\\\"x-ua-compatible\\\" content=\\\"ie=edge\\\"><meta name=\\\"viewport\\\" cont\", url: \n"
Am I missing something?

Traefik SSL configuration

So, I'm trying to deploy my docker swarm with traefik into a cluster of digital ocean droplets. I'm using traefik as my reverse proxy and load balancer, so I must get SSL certificate using traefik. The documentation seems simple enough so I don't really understand what's going wrong with my config. I hoped you guys could shed some light on what I'm doing wrong. I'm using wildcard domain to have most of my services running as subdomains of my root domain.So here's my toml:
debug = true
logLevel = "DEBUG"
defaultEntryPoints = ["https","http"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[retry]
[docker]
endpoint="unix:///var/run/docker.sock"
exposedByDefault=true
watch=true
swarmmode=true
domain="mouv.com"
[acme]
email = "leonardo#mouv.com"
storage = "acme.json"
entryPoint = "https"
acmeLogging = true
# caServer = "https://acme-v02.api.letsencrypt.org/directory"
caServer = "https://acme-staging-v02.api.letsencrypt.org/directory"
[acme.dnsChallenge]
provider = "digitalocean"
delayBeforeCheck = 0
[[acme.domains]]
main = "*.mouv.com"
sans = ["mouv.com"]
And here's my docker-stack.yml
version: '3.6'
services:
traefik:
image: traefik:latest
networks:
- mouv-net
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./traefik.toml:/traefik.toml
ports:
- "80:80"
- "443:443"
- "8080:8080"
command: --api
environment:
DO_AUTH_TOKEN: "xxxxxxxxxxxxxxxx"
deploy:
placement:
constraints: [node.role==manager]
user:
image: hollarves/users-mouv:latest
networks:
- mouv-net
deploy:
labels:
- "traefik.port=8500"
- "traefik.backend=user"
- "traefik.docker.network=mouv-stack_mouv-net"
- "traefik.enable=true"
- "traefik.protocol=http"
- "traefik.frontend.entryPoints=https"
- "traefik.frontend.rule=Host:user.mouv.com"
balances:
image: hollarves/balances-mouv:latest
networks:
- mouv-net
deploy:
labels:
- "traefik.port=8010"
- "traefik.backend=balance"
- "traefik.docker.network=mouv-stack_mouv-net"
- "traefik.enable=true"
- "traefik.protocol=http"
- "traefik.frontend.entryPoints=https"
- "traefik.frontend.rule=Host:balance.mouv.com"
# this container is not part of traefik's network.
firebase:
image: hollarves/firebase-mouv:latest
networks:
- firebase-net
[ ..... more containers ..... ]
networks:
mouv-net:
driver: overlay
[ .... more networks .... ]
I also saw this error in the logs
mueve-stack_traefik.1.ndgfhj96lymx#node-1 | time="2019-02-19T13:15:46Z" level=debug msg="http2: server: error reading preface from client 10.255.0.2:50668: remote error: tls: unknown certificate authority"
And this:
mueve-stack_traefik.1.igy1ilch6wl1#node-1 | time="2019-02-19T13:22:00Z" level=info msg="legolog: [WARN] [mueve.com] acme: error cleaning up: digitalocean: unknown record ID for '_acme-challenge.mueve.com.' "
When I try to navigate to one of my subdomain services I get
subdomain.mouv.com uses an invalid security certificate. The certificate is not trusted because it is self-signed. The certificate is only valid for 9a11926d7857657613b65578dfebc69f.8066eec25224a58acabd968e285babdf.traefik.default.
In my digital ocean domain configuration I'm pretty much just adding an A record pointing to my manager node's IP and a CNAME record as *.mouv.com
The certificates provided by the Let's Encrypt staging (caServer = "https://acme-staging-v02.api.letsencrypt.org/directory") are not valid certificates, it's normal.
https://letsencrypt.org/docs/staging-environment/
The staging environment intermediate certificate (“Fake LE Intermediate X1”) is issued by a root certificate not present in browser/client trust stores. If you wish to modify a test-only client to trust the staging environment for testing purposes you can do so by adding the “Fake LE Root X1” certificate to your testing trust store. Important: Do not add the staging root or intermediate to a trust store that you use for ordinary browsing or other activities, since they are not audited or held to the same standards as our production roots, and so are not safe to use for anything other than testing.
To have valid certificates you have to use Let's Encrypt production endpoint (caServer = "https://acme-v02.api.letsencrypt.org/directory")

I'm trying to configure traefik + docker, but the browser loads the https url forever, do you know why?

I'm trying to configure traefik + docker but I'm having troubles: the browser loads the URL forever.
This is my actual configuration:
traefik.toml
debug = false
logLevel = "ERROR"
defaultEntryPoints = ["https","http"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
# https is the default
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "cloud.castignoli.it"
watch = true
exposedByDefault = false
[acme]
email = "marco.castignoli#gmail.com"
storage = "acme.json"
entryPoint = "https"
onHostRule = true
[acme.httpChallenge]
entryPoint = "http"
Then I have the acme.json, actually filled by treafik with the correct values.
I'm trying to activate https for the container foo, the domain is hello.cloud.castignoli.it
foo has only this label
traefik.frontend.rule=Host:hello.cloud.castignoli.it
These are traefik's logs
time="2018-10-11T08:04:50Z" level=error msg="Unable to obtain ACME certificate for domains \"reverse-proxy.traefik.\" detected thanks to rule \"Host:reverse-proxy.traefik.\" : unable to generate a certificate for the domains [reverse-proxy.traefik.]: acme: Error 400 - urn:ietf:params:acme:error:malformed - Error creating new order :: DNS name ends in a period"
This is the traefik dashboard
traefik's dashboard
The problem is with the domain for the traefik that is trying to generate a certificate for a non-existent domain.
In docker-compose.yml set labels with your domain or do not use --api. For example:
image: traefik
command: --api --docker
ports:
- "80:80"
- "443:443"
- "8080:8080"
networks:
- web
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/traefik/traefik.toml:/traefik.toml
- /opt/traefik/acme.json:/acme.json
labels:
- "traefik.docker.network=web"
- "traefik.port=8081"
- "traefik.enable=true"
- "traefik.frontend.rule=Host:your-awesome-host.com"

Traefik and https private repository - tls error

I'm trying to deploy a private repository on my docker swarm.
I'm following the official docker repository guide to deploy it as a service. I want to be able to use it with https, from outside with a simple url as https://myregistry.mysite.com.
To do so I use following traefik labels in my stack yml file :
traefik.backend: "privateregistry"
traefik.docker.network: "webgateway" # docker overlay external
traefik.enable: "true"
traefik.frontend.entryPoint: "https"
traefik.frontend.redirect.entryPoint: "https"
traefik.frontend.rule: "Host:myregistry.mysite.com"
traefik.port: "5000"
I'm seeing my two frontend/backend in traefik UI but when I access to https://myregistry.mysite.com/v2/ (for example) I've a 500 fatal error. The service log output is
http: TLS handshake error from 10.0.0.68:47796: tls: first record does not look like a TLS handshake
I think I misunderstood something, certs side probably.
Any idea to do that without error ?
Thanks
I suppose you are missing the certificate of the (registry-) server on your client machine. I assume you have two certificate files (used on the server):
myregistry.mysite.com.crt
myregistry.mysite.com.key
Copy myregistry.mysite.com.crt on your client machine to /etc/docker/certs.d/myregistry.mysite.com/ca.crt on Linux or
~/.docker/certs.d/myregistry.mysite.com/ca.crt on Mac. Now you should be able to login from the client:
docker login myregistry.mysite.com
Appendix - Server Setup
Your server setup might look like this:
~/certs/myregistry.mysite.com.crt
~/certs/myregistry.mysite.com.key
~/docker-compose.yml
~/traefik.toml
docker-compose.yml
version: '3'
services:
frontproxy:
image: traefik
command: --api --docker --docker.swarmmode
ports:
- "80:80"
- "443:443"
volumes:
- ./certs:/etc/ssl:ro
- ./traefik.toml:/etc/traefik/traefik.toml:ro
- /var/run/docker.sock:/var/run/docker.sock # So that Traefik can listen to the Docker events
docker-registry:
image: registry:2
deploy:
labels:
- traefik.port=5000 # default port exposed by the registry
- traefik.frontend.rule=Host:myregistry.mysite.com
traefik.toml
defaultEntryPoints = ["http", "https"]
# Redirect HTTP to HTTPS and use certificate, see https://docs.traefik.io/configuration/entrypoints/
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
certFile = "/etc/ssl/myregistry.mysite.com.crt"
keyFile = "/etc/ssl/myregistry.mysite.com.key"
# Docker Swarm Mode Provider, see https://docs.traefik.io/configuration/backends/docker/#docker-swarm-mode
[docker]
endpoint = "tcp://127.0.0.1:2375"
domain = "docker.localhost"
watch = true
swarmMode = true
To deploy your registry run:
docker stack deploy myregistry -c ~/docker-compose.yml

Docker Traefik and letsencrypt wildcard

I've been trying to get traefik to install wildcard cert on my domain which requires dns challenge
from reading the logs it seems it was able to actually issue the cert but not install them correctly
time="2018-04-07T19:10:35Z" level=debug msg="Unable to marshal provider conf *acme.Provider with error: json: unsupported type: chan *acme.StoredData"
legolog: 2018/04/07 19:10:57 [INFO][example.tld] The server validated our request
legolog: 2018/04/07 19:10:58 [INFO][*.example.tld] acme: Validations succeeded; requesting certificates
legolog: 2018/04/07 19:11:01 [INFO][*.example.tld] Server responded with a certificate.
time="2018-04-07T19:11:01Z" level=error msg="Error loading new configuration, aborted unable to generate TLS certificate : tls: failed to find any PEM data in certificate input"
time="2018-04-07T19:12:33Z" level=debug msg="http2: server: error reading preface from client ******omitted***: remote error: tls: unknown certificate authority"
my domain dns provider is cloudflare
here's my docker docker-compose.yml
version: '2'
services:
traefik:
image: traefik:1.6.0-rc4
command: --api --docker
restart: always
ports:
- 80:80
- 443:443
- 8080:8080
networks:
- web
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/traefik/traefik.toml:/traefik.toml
- /opt/traefik/acme.json:/acme.json
environment:
- CLOUDFLARE_EMAIL=admin#example.tld
- CLOUDFLARE_API_KEY=
container_name: traefik
networks:
web:
external: true
And my traefik.toml
debug = true
logLevel = "DEBUG"
defaultEntryPoints = ["https","http"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[retry]
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "example.tld"
watch = true
exposedbydefault = false
[acme]
email = "admin#example.tld"
storage = "acme.json"
entryPoint = "https"
OnHostRule = true
acmeLogging = true
[acme.dnsChallenge]
provider = "cloudflare"
delayBeforeCheck = 0
[[acme.domains]]
main = "example.tld"
[[acme.domains]]
main = "*.example.tld"
I was able to fix the issue, it's a mistake on my part.
in the traefik.toml
you cannot use OnHostRule = true for wildcards certs
ReadMore:
docs.traefik.io/v1.7/configuration/acme/#onhostrule

Resources