Traefik query alive/dead backends - docker-swarm

We have a treafik installation on docker swarm with several services balanced through traefik. Each service has at least two backends balanced with wrr and a healthcheck.
Is there a way (api, rest endpoint, logfile whatever) to find out which frontends have dead backends? By dead I mean on which backends treafik has detected via healthcheck that they are not eligible for balancing?
What is the best practice for this?

I see two ways of getting that info:
Traefik log
Look at Traefik log which provides traces for healthchecks:
time="2019-03-05T22:19:35Z" level=debug msg="Refreshing health check for backend: backend-web-so-55004614",
time="2019-03-05T22:19:35Z" level=warning msg="Health check still failing. Backend: \"backend-web-so-55004614\" URL: \"http://192.168.80.2:80\" Reason: received error status code: 404",
time="2019-03-05T22:19:36Z" level=debug msg="Refreshing health check for backend: backend-web-so-55004614",
time="2019-03-05T22:19:36Z" level=warning msg="Health check still failing. Backend: \"backend-web-so-55004614\" URL: \"http://192.168.80.2:80\" Reason: received error status code: 404",
Traefik /metrics
If it is not convenient to parse Traefik logs, you could active Traefik Prometheus metrics (which is enabled by default):
docker run -d -v /var/run/docker.sock:/var/run/docker.sock -p "80:80" -p "8080:8080" traefik --api --docker
Then you can make an HTTP query on http://localhost:8080/metrics and look for lines containing _backend_server_up. Each of these lines tells you that your backend is up and healthy. If a backend is missing, that means it is unhealthy or stopped:
traefik_backend_server_up{backend="backend-robots",url="http://172.23.0.3:80"} 1
traefik_backend_server_up{backend="backend-smtp-ui",url="http://172.25.0.3:8025"} 1
traefik_backend_server_up{backend="backend-varnish-admin",url="http://172.23.0.8:6085"} 1
traefik_backend_server_up{backend="backend-varnish-http",url="http://172.23.0.8:6081"} 1
traefik_backend_server_up{backend="backend-web-apps",url="http://172.21.0.2:80"} 1
traefik_backend_server_up{backend="backend-web-report",url="http://172.19.0.6:80"} 1
You could have a script querying this URL or you could install Prometheus which has alerting rules

Related

Why is Portainer ignoring my certificate even though I have specified --sslcert and --sslkey?

I have Portainer CE 2.9.2 running in a docker container. I'm starting it with the --sslcert and --sslkey options to specify my own certificate, but the browser keeps showing the built-in certificate, self-signed by localhost and not my certificate.
I'm starting Portainer with Ansible's Community Docker module. The syntax is nearly identical to docker compose. Here is the task in the Ansible playbook:
- name: Run Portainer
docker_container:
image: portainer/portainer-ce
name: portainer
hostname: portainer
state: started
restart: yes
restart_policy: unless-stopped
ports:
- 8000:8000
- 9000:9000
- 9443:9443
volumes:
- /opt/docker/portainer/certs:/certs
- /opt/docker/portainer/data:/data
- /var/run/docker.sock:/var/run/docker.sock
command:
--sslcert /certs/uno.home.crt --sslkey /certs/uno.home.key
Using docker inspect, I can see it's picked up the command line argument and the /certs bind mount is there.
"Args": [
"--sslcert",
"/certs/uno.home.crt",
"--sslkey",
"/certs/uno.home.key"
]
...
"HostConfig": {
"Binds": [
"/opt/docker/portainer/certs:/certs:rw",
"/opt/docker/portainer/data:/data:rw",
"/var/run/docker.sock:/var/run/docker.sock:rw"
]
I can also verify the presence of the certificate files inside the container.
$ docker cp portainer:/certs .
$ ls certs
uno.home.crt uno.home.key
But, when I open up a browser on port 9443, I get a certificate that is signed by localhost, not the cert I have placed in the /opt/docker/portainer/certs directory.
I don't believe it is a problem with my certificate, as I have used the very same cert with an Nginx reverse proxy setup and it works as expected. My best guess is that Portainer is ignoring my certificate in favor of its built-in one, because the certificate displayed by the browser is the same regardless of me using the --sslcert / --sslkey options or not. But, I can't figure out where I've gone wrong.
The log file shows no errors:
$ docker logs portainer
level=info msg="2021/11/05 00:12:36 [INFO] [main,compose] [message: binary is missing, falling-back to compose plugin] [error: docker-compose binary not found]"
2021/11/05 00:12:36 server: Reverse tunnelling enabled
2021/11/05 00:12:36 server: Fingerprint 79:94:35:05:71:59:7a:eb:e9:03:a2:61:ad:1a:c5:11
2021/11/05 00:12:36 server: Listening on 0.0.0.0:8000...
level=info msg="2021/11/05 00:12:36 [INFO] [cmd,main] Starting Portainer version 2.9.2"
level=info msg="2021/11/05 00:12:36 [DEBUG] [chisel, monitoring] [check_interval_seconds: 10.000000] [message: starting tunnel management process]"
level=info msg="2021/11/05 00:12:36 [DEBUG] [internal,init] [message: start initialization monitor ]"
level=info msg="2021/11/05 00:12:36 [INFO] [http,server] [message: starting HTTPS server on port :9443]"
level=info msg="2021/11/05 00:12:36 [INFO] [http,server] [message: starting HTTP server on port :9000]"
All the examples I've found on the web say docker compose style configuration should be done like this:
command:
--ssl
--sslcert /certs/portainer.crt
--sslkey /certs/portainer.key
Besides the file names and the --ssl, that's what I've got. I removed the --ssl after seeing a message in the Portainer log say it was a deprecated option and was only accepted for backward compatibility.
I suppose the fact that it ignores my cert could be a bug, though I don't want to file a bug report if it's just user error on my part. Can anyone see where I've gone wrong in the configuration of this thing?
This was indeed a bug and was fixed by the Portainer team. https://github.com/portainer/portainer/issues/6021

Connect the Cassandra container to application web container failed - Error: 202 Connecting to Node

So, I created two docker's images and I want to connect one to which other with the docker composer. The first image is Cassandra 3.11.11 (from the official hub docker) and the other I created by myself with the tomcat version 9.0.54 and my application spring boot.
I ran the docker-compose.ylm below to connect the two container, where cassandra:latest is the cassandra's image and centos7-tomcat9-myapp is my app web's image.
version: '3'
services:
casandra:
image: cassandra:latest
myapp:
image: centos7-tomcat9-myapp
depends_on:
- casandra
environment:
- CASSANDRA_HOST=cassandra
I ran the command line to start the app web's image : docker run -it --rm --name fe3c2f120e01 -p 8888:8080 centos7-tomcat9-app .
In the console log the spring boot show me the error below. It happened, because the myapp's container could not connect to the Cassandra's container.
2021-10-15 15:12:14.240 WARN 1 --- [ s0-admin-1]
c.d.o.d.i.c.control.ControlConnection : [s0] Error connecting to
Node(endPoint=127.0.0.1:9042, hostId=null, hashCode=47889c49), trying
next node (ConnectionInitException: [s0|control|connecting...]
Protocol initialization request, step 1 (OPTIONS): failed to send
request (io.netty.channel.StacklessClosedChannelException))
What am I doing wrong?
EDIT
This is the nodetool status about the cassandra's image:
[root#GDBDEV04 cassandradb]# docker exec 552d359d177e nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.3 84.76 KiB 16 100.0% 685b6e0a-13c2-4d41-ba99-f3b0fa94477c rack1
EDIT 2
I need to connect the Cassandra's DB image with the web application image. It is different to connect microservices. I tried to change the 127.0.0.0 (inside the cassandra.yaml) to 0.0.0.0 (only to test) and the error persist. I think something missing in my docker-compose.yml for sure. However, I did not know what.
Finally I found the error. In my case, I need to fixed the docker-compose.yml file adding the Cassandra and Tomcat's ports. And in my application.properties (spring boot config file), I changed the cluster's name.
Docker-compose.yml:
version: '3'
services:
cassandra:
image: cassandra:latest
ports:
- "9044:9042"
myapp:
image: centos7-tomcat9-myapp
ports:
-"8086:8080"
depends_on:
- cassandra
environment:
- CASSANDRA_HOST=cassandra
Application.config :
# CASSANDRA (CassandraProperties)
cassandra.cluster = Test Cluster
cassandra.contactpoints=${CASSANDRA_HOST}
This question help me to resolve my problem: Accessing docker container mysql databases

Traefik delivering two different ports on same docker

I have a problem with traefik. The situation:
I have a docker gitlab with gitlab-registry activated, no problem with that.
My gitlab answer on port 80 and registry on port 80 as well, and this is not a problem nether (tried with port mapping and worked well.
Now I try to configure these guys in traefik and that's when problems occur. When ther is just one router and one service, all good but when I try to add the registry I have an error:
The involved part in the docker-compose file:
labels:
- traefik.http.routers.gitlab.rule=Host(`git.example.com`)
- traefik.http.routers.gitlab.tls=true
- traefik.http.routers.gitlab.tls.certresolver=lets-encrypt
- traefik.http.routers.gitlab.service=gitlabservice
- traefik.http.services.giltabservice.loadbalancer.server.port=80
- traefik.http.routers.gitlab.entrypoints=websecure
- traefik.http.routers.gitlabregistry.rule=Host(`registry.example.com`)
- traefik.http.routers.gitlabregistry.tls=true
- traefik.http.routers.gitlabregistry.tls.certresolver=lets-encrypt
- traefik.http.routers.gitlabregistry.service=gitlabregistryservice
- traefik.http.services.giltabregistryservice.loadbalancer.server.port=80
- traefik.http.routers.gitlabregistry.entrypoints=websecure
And the errors I have in traefik:
time="2021-03-27T08:48:20Z" level=error msg="the service "gitlabservice#docker" does not exist" entryPointName=websecure routerName=gitlab#docker
time="2021-03-27T08:48:20Z" level=error msg="the service "gitlabregistryservice#docker" does not exist" entryPointName=websecure routerName=gitlabregistry#docker
Thanks in advance if you have any idea.

JaegerTracing : Jaeger Ingester unable to read from Kafka Queue and store into ElasticSearch

I am new to Jaeger and Kafka,
I am trying to use Kafka as intermediate buffer.
I am using OpenTelemetry to send data to Jaeger-Collector directly using -Dotel.exporter.jaeger.endpoint.
Jaeger-Collector is deployed on Kubernetes and the Kafka is on another network but is accessible. I can confirm that the traces are sent to Jaeger-collector.
On hitting the /metrics of collector and output tells me that spans were written successfully to Kafka.
jaeger_kafka_spans_written_total{status="success"} 21
The Collector logs indicate what topic I am writing to
{"Brokers":["myKafkaBroker......"}},"topic":"tp6"}
I want to get this (Span) data from Kafka Queue to ElasticSearch. To do this I am starting the Jaeger Ingester as follows
docker run -e "SPAN_STORAGE_TYPE=elasticsearch" jaegertracing/jaeger-ingester:1.22 --kafka.consumer.topic=tp6 --kafka.consumer.brokers='myKafkaBroker' --es.tls.skip-host-verify
But the container is stopped after fatal error
{"level":"fatal","ts":1615546463.7784193,"caller":"command-line-arguments/main.go:64","msg":"Failed to init storage factory","error":"failed to create primary Elasticsearch client: health check timeout: Head \"http://127.0.0.1:9200\": dial tcp 127.0.0.1:9200: connect: connection refused: no Elasticsearch node available","stacktrace":"main.main.func1\n\tcommand-line-arguments/main.go:64\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra#v0.0.7/command.go:838\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra#v0.0.7/command.go:943\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra#v0.0.7/command.go:883\nmain.main\n\tcommand-line-arguments/main.go:113\nruntime.main\n\truntime/proc.go:204"}
The elasticsearch and ingester are being run on the same machine using docker. The elasticsearch is running on docker using
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node"ocker.elastic.co/elasticsearch/elasticsearch:7.11.2
I have disabled TLS so that shouldn't be a problem. I am unable to get this to work.

How to run a redis cluster on a docker cluster?

Context
I am trying to setup a redis cluster so that it runs on top off a docker cluster, to achieve maximum auto-healing.
More precisely, I have a docker compose file, which defines a service that has 3 replicas. Each service replica has a redis-server running on.
Then I have a program inside each replica that listens to changes on the docker cluster and that starts the cluster when conditions are met (each 3 redis-servers know each other).
Setting up the redis cluster works has expected, the cluster is formed and all the redis-servers communicate well, but the communication between redis-servers is inside the docker cluster.
The Problem
When I try to communicate from outside the docker cluster, because of the ingress mode I am able to talk to a redis-server, however when I try to add info (eg: set foo bar) and the client is moved to another redis-server the communication hangs and eventually times out.
Code
This is the docker-compose file.
version: "3.3"
services:
redis-cluster:
image: redis-srv-instance
volumes:
- /var/run/:/var/run
deploy:
mode: replicated
#endpoint_mode: dnsrr
replicas: 3
resources:
limits:
cpus: '0.5'
memory: 512M
ports:
- target: 6379
published: 30000
protocol: tcp
mode: ingress
The flux of commands that show the problem.
Client
~ ./redis-cli -c -p 30000
127.0.0.1:30000>
Redis-server
OK
1506533095.032738 [0 10.255.0.2:59700] "COMMAND"
1506533098.335858 [0 10.255.0.2:59700] "info"
Client
127.0.0.1:30000> set ghb fki
OK
Redis-server
1506533566.481334 [0 10.255.0.2:59718] "COMMAND"
1506533571.315238 [0 10.255.0.2:59718] "set" "ghb" "fki"
Client
127.0.0.1:30000> set rte fgh
-> Redirected to slot [3830] located at 10.0.0.3:6379
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
(150.31s)
not connected>
Any ideas? I have also tried making my one proxy/load balancer but didn't work.
Thank you! Have a nice day.
For this use case, sentinel might help. Redis on its own is not capably of high availability. Sentinel on the other side is a distributed system which can do the following for you:
Route the ingress trafic to the current Redis master.
Elect a new Redis master should the current one fail.
While I have previously done research on this topic, I have not yet managed to pull to getter a working example.
redis-cli would get the redis server ip inside the ingress network, and try to access the remote redis server by that ip directly. That is why redis-cli shows Redirected to slot [3830] located at 10.0.0.3:6379. But this internal 10.0.0.3 is not accessible to redis-cli.
One solution is to run another proxy service which attaches to the same network with redis cluster. The application sends all requests to that proxy service, and the proxy service talks with redis cluster.
Or you could create 3 swarm services that uses the bridge network and exposes the redis port to node. Your internal program needs to change accordingly.

Resources