haproxy not load balancing the test app in docker swarm - docker

I have 3 vm (virtualbox). All of them setup to use a single VIP with keepalived. (192.168.100.200). I have one proxy on each vm and one test app on each vm. ( I am testing a high availability scenario where loosing one or two nodes, keeps the setup going). I have the keepalived working correctly. It is just that the requests are not loadbalanced, it always going to the same instance.
What is going wrong ?
version: "3.8"
services:
# HAproxy
haproxy :
image : haproxy:2.3.2
container_name : haproxy
networks :
- app-net
ports :
- 80:80
volumes :
- /etc/haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg
deploy :
mode : global
restart_policy :
condition : on-failure
delay : 5s
max_attempts : 3
window : 120s
# Nginx test site
wwwsite :
image : nginxdemos/hello
networks :
- app-net
ports :
- 8080:80
deploy :
mode : global
networks :
app-net :
driver : overlay
name : app-net
attachable: true
haproxy.conf
global
stats socket /var/run/haproxy.stat mode 660 level admin
stats timeout 30s
user root
group root
resolvers docker
nameserver dns1 127.0.0.11:53
resolve_retries 3
timeout resolve 1s
timeout retry 1s
hold other 10s
hold refused 10s
hold nx 10s
hold timeout 10s
hold valid 10s
hold obsolete 10s
defaults
timeout connect 10s
timeout client 30s
timeout server 30s
mode http
frontend fe_web
mode http
bind *:80
default_backend nodes
backend nodes
balance roundrobin
server node1 192.168.100.201:8080 check
server node2 192.168.100.202:8080 check
server node3 192.168.100.203:8080 check
listen stats
bind *:8081
mode http
stats enable
stats uri /
stats hide-version

Related

Connect the Cassandra container to application web container failed - Error: 202 Connecting to Node

So, I created two docker's images and I want to connect one to which other with the docker composer. The first image is Cassandra 3.11.11 (from the official hub docker) and the other I created by myself with the tomcat version 9.0.54 and my application spring boot.
I ran the docker-compose.ylm below to connect the two container, where cassandra:latest is the cassandra's image and centos7-tomcat9-myapp is my app web's image.
version: '3'
services:
casandra:
image: cassandra:latest
myapp:
image: centos7-tomcat9-myapp
depends_on:
- casandra
environment:
- CASSANDRA_HOST=cassandra
I ran the command line to start the app web's image : docker run -it --rm --name fe3c2f120e01 -p 8888:8080 centos7-tomcat9-app .
In the console log the spring boot show me the error below. It happened, because the myapp's container could not connect to the Cassandra's container.
2021-10-15 15:12:14.240 WARN 1 --- [ s0-admin-1]
c.d.o.d.i.c.control.ControlConnection : [s0] Error connecting to
Node(endPoint=127.0.0.1:9042, hostId=null, hashCode=47889c49), trying
next node (ConnectionInitException: [s0|control|connecting...]
Protocol initialization request, step 1 (OPTIONS): failed to send
request (io.netty.channel.StacklessClosedChannelException))
What am I doing wrong?
EDIT
This is the nodetool status about the cassandra's image:
[root#GDBDEV04 cassandradb]# docker exec 552d359d177e nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.3 84.76 KiB 16 100.0% 685b6e0a-13c2-4d41-ba99-f3b0fa94477c rack1
EDIT 2
I need to connect the Cassandra's DB image with the web application image. It is different to connect microservices. I tried to change the 127.0.0.0 (inside the cassandra.yaml) to 0.0.0.0 (only to test) and the error persist. I think something missing in my docker-compose.yml for sure. However, I did not know what.
Finally I found the error. In my case, I need to fixed the docker-compose.yml file adding the Cassandra and Tomcat's ports. And in my application.properties (spring boot config file), I changed the cluster's name.
Docker-compose.yml:
version: '3'
services:
cassandra:
image: cassandra:latest
ports:
- "9044:9042"
myapp:
image: centos7-tomcat9-myapp
ports:
-"8086:8080"
depends_on:
- cassandra
environment:
- CASSANDRA_HOST=cassandra
Application.config :
# CASSANDRA (CassandraProperties)
cassandra.cluster = Test Cluster
cassandra.contactpoints=${CASSANDRA_HOST}
This question help me to resolve my problem: Accessing docker container mysql databases

Prometheus with Dockerfile

I have the following Dockerfile:
FROM prom/prometheus
ADD prometheus.yml /etc/prometheus/
with prometheus.yml:
global:
scrape_interval: 15s
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'prometheus'
metrics_path: /metrics
scrape_interval: 15s
static_configs:
- targets: ['localhost:9090']
- job_name: 'auth-service'
scrape_interval: 15s
metrics_path: /actuator/prometheus
static_configs:
- targets: ['localhost:8080']
And run it with the following command:
docker build -t prometheus .
docker run -d -p 9090:9090 --rm prometheus
prometheus has status up
auth-service has status down (Get "http://localhost:8080/actuator/prometheus": dial tcp 127.0.0.1:8080: connect: connection refused)
How can I solve problem with auth-service, because from local machine I can get metrics from this address http://localhost:8080/actuator/prometheus:
v.balun#macbook-vbalun Trainter-Prometheus % curl -X GET
http://localhost:8080/actuator/prometheus
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the
Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytes{area="heap",id="G1 Survivor Space",} 4194304.0
jvm_memory_committed_bytes{area="heap",id="G1 Old Gen",} 3.145728E7
jvm_memory_committed_bytes{area="nonheap",id="Metaspace",} 3.0982144E7
jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 2555904.0
jvm_memory_committed_bytes{area="heap",id="G1 Eden Space",} 2.7262976E7
jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space",} 4325376.0
jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 6291456.0
The issue you are having seems not related to prometheus, it seems it is at the docker network level.
Inside your prometheus container you are saying this:
static_configs:
- targets: ['localhost:8080']
But remember that localhost is NOT now your physical host (As when you ran it locally outside Docker), it's now inside the container, and inside the same container most likely you don't have your service running...
With the information provided I suggest you the following:
Try first instead localhost use your real IP, depending on the network configuration you are using for your container, it will be enough...
You can use instead localhost the ip address of your auth-service, this is the one given by docker, you can run a docker inspect... to get it.
If #1 and #2 didn't work and if auth-service is running in another container inside the same physical host, then you could use a bridge network to make the communication between the containers possible, more details here: https://docs.docker.com/network/bridge/
👆 Once both containers are running in the same network you can use the container name to reference it instead localhost, something like:
static_configs:
- targets: ['auth-service:8080']

Docker Swarm: traffic in assigned state

When I scale a service up from 1 node (Node A) to 2 nodes (Node A and Node B), I see traffic immediately being routed to both nodes (including the new Node B even though it isn't ready).
As a result, an Nginx proxy will return 502s half the time (until Node B is ready).
Any suggestions how you can delay this traffic?
Note: this isn't waiting for another container to come up as mentioned here: Docker Compose wait for container X before starting Y
This is about delaying the network connection until the container is ready.
If you do not configure a healthcheck section, docker will assume that the container is available as soon as it is started.
Note that the initial healthcheck is only done after the set interval.
So you could add something extremely basic like testing if port 80 is connectable (you need nc in your docker image):
healthcheck:
test: nc -w 1 127.0.0.1 80 < /dev/null
interval: 30s
timeout: 10s
retries: 3
start_period: 5s

Traefik query alive/dead backends

We have a treafik installation on docker swarm with several services balanced through traefik. Each service has at least two backends balanced with wrr and a healthcheck.
Is there a way (api, rest endpoint, logfile whatever) to find out which frontends have dead backends? By dead I mean on which backends treafik has detected via healthcheck that they are not eligible for balancing?
What is the best practice for this?
I see two ways of getting that info:
Traefik log
Look at Traefik log which provides traces for healthchecks:
time="2019-03-05T22:19:35Z" level=debug msg="Refreshing health check for backend: backend-web-so-55004614",
time="2019-03-05T22:19:35Z" level=warning msg="Health check still failing. Backend: \"backend-web-so-55004614\" URL: \"http://192.168.80.2:80\" Reason: received error status code: 404",
time="2019-03-05T22:19:36Z" level=debug msg="Refreshing health check for backend: backend-web-so-55004614",
time="2019-03-05T22:19:36Z" level=warning msg="Health check still failing. Backend: \"backend-web-so-55004614\" URL: \"http://192.168.80.2:80\" Reason: received error status code: 404",
Traefik /metrics
If it is not convenient to parse Traefik logs, you could active Traefik Prometheus metrics (which is enabled by default):
docker run -d -v /var/run/docker.sock:/var/run/docker.sock -p "80:80" -p "8080:8080" traefik --api --docker
Then you can make an HTTP query on http://localhost:8080/metrics and look for lines containing _backend_server_up. Each of these lines tells you that your backend is up and healthy. If a backend is missing, that means it is unhealthy or stopped:
traefik_backend_server_up{backend="backend-robots",url="http://172.23.0.3:80"} 1
traefik_backend_server_up{backend="backend-smtp-ui",url="http://172.25.0.3:8025"} 1
traefik_backend_server_up{backend="backend-varnish-admin",url="http://172.23.0.8:6085"} 1
traefik_backend_server_up{backend="backend-varnish-http",url="http://172.23.0.8:6081"} 1
traefik_backend_server_up{backend="backend-web-apps",url="http://172.21.0.2:80"} 1
traefik_backend_server_up{backend="backend-web-report",url="http://172.19.0.6:80"} 1
You could have a script querying this URL or you could install Prometheus which has alerting rules

How to run a redis cluster on a docker cluster?

Context
I am trying to setup a redis cluster so that it runs on top off a docker cluster, to achieve maximum auto-healing.
More precisely, I have a docker compose file, which defines a service that has 3 replicas. Each service replica has a redis-server running on.
Then I have a program inside each replica that listens to changes on the docker cluster and that starts the cluster when conditions are met (each 3 redis-servers know each other).
Setting up the redis cluster works has expected, the cluster is formed and all the redis-servers communicate well, but the communication between redis-servers is inside the docker cluster.
The Problem
When I try to communicate from outside the docker cluster, because of the ingress mode I am able to talk to a redis-server, however when I try to add info (eg: set foo bar) and the client is moved to another redis-server the communication hangs and eventually times out.
Code
This is the docker-compose file.
version: "3.3"
services:
redis-cluster:
image: redis-srv-instance
volumes:
- /var/run/:/var/run
deploy:
mode: replicated
#endpoint_mode: dnsrr
replicas: 3
resources:
limits:
cpus: '0.5'
memory: 512M
ports:
- target: 6379
published: 30000
protocol: tcp
mode: ingress
The flux of commands that show the problem.
Client
~ ./redis-cli -c -p 30000
127.0.0.1:30000>
Redis-server
OK
1506533095.032738 [0 10.255.0.2:59700] "COMMAND"
1506533098.335858 [0 10.255.0.2:59700] "info"
Client
127.0.0.1:30000> set ghb fki
OK
Redis-server
1506533566.481334 [0 10.255.0.2:59718] "COMMAND"
1506533571.315238 [0 10.255.0.2:59718] "set" "ghb" "fki"
Client
127.0.0.1:30000> set rte fgh
-> Redirected to slot [3830] located at 10.0.0.3:6379
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
(150.31s)
not connected>
Any ideas? I have also tried making my one proxy/load balancer but didn't work.
Thank you! Have a nice day.
For this use case, sentinel might help. Redis on its own is not capably of high availability. Sentinel on the other side is a distributed system which can do the following for you:
Route the ingress trafic to the current Redis master.
Elect a new Redis master should the current one fail.
While I have previously done research on this topic, I have not yet managed to pull to getter a working example.
redis-cli would get the redis server ip inside the ingress network, and try to access the remote redis server by that ip directly. That is why redis-cli shows Redirected to slot [3830] located at 10.0.0.3:6379. But this internal 10.0.0.3 is not accessible to redis-cli.
One solution is to run another proxy service which attaches to the same network with redis cluster. The application sends all requests to that proxy service, and the proxy service talks with redis cluster.
Or you could create 3 swarm services that uses the bridge network and exposes the redis port to node. Your internal program needs to change accordingly.

Resources