redis-sentinel logs flooded with +sentinel-address-switch and +sentinel-address-update - docker

Recently we updated docker swarm to redis-6.2 image. There are master-slave cache and queue set up on 2 swarm nodes, and 3 sentinel services set up to watch them. With 6.2 we use hostname resolution in the redis configuration with "sentinel resolve-hostnames yes" and "--replica-announce-ip" in the cache and queue services command line.
This was working fine in development swarm, but in production it is emitting several log messages per second with messages like
+sentinel-address-switch master cache 10.0.1.185 6379 ip redis-sentinel3 port 5000 for 1a21dc3b66fdd1d205e2dbd872d5726e48e07208
and
+sentinel-address-update sentinel 1a21dc3b66fdd1d205e2dbd872d5726e48e07208 10.0.1.195 5000 # cache 10.0.1.185 6379 1 additional matching instances
The redis services are working, but the excessive logging is a nuisance. Any clue what could cause these repeated log messages?

shutdown sentinel,and add sentinel announce-ip x.x.x.x
refer from: https://rtfm.co.ua/en/redis-sentinel-bind-0-0-0-0-the-localhost-issue-and-the-announce-ip-option/

Related

Error: unable to perform an operation on node 'rabbit#localhost'

So I have an issue with docker-compose and rabbitmq.
I run docker-compose up. Everything spins up. Docker-compose:
services:
rabbitmq3:
image: "rabbitmq:3-management"
hostname: "localhost"
command: rabbitmq-server
ports:
- 5672:5672
- 15672:15672
Then I do sudo rabbitmqctl status to check connection with node. I get this error:
Error: unable to perform an operation on node 'rabbit#localhost'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit#localhost
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: [rabbit#localhost]
rabbit#localhost:
* connected to epmd (port 4369) on localhost
* epmd reports: node 'rabbit' not running at all
no other nodes on localhost
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-25456-rabbit#localhost'
* effective user's home directory: /Users/olof.grund
* Erlang cookie hash: d1oONiVA/qogGxkf6vs9Rw==
When I do it in the container docker-compose exec -T rabbitmq3 rabbitmqctl status it works.
Do I need to expose something from docker somehow? Some rabbitmq client or node maybe?
I used all the tips that I have found in other sources. (adding IP to /etc/hosts/, restarts of containers, services). Took me a day to finally get this to work and it boils down to this.
<wait for 60secs since the rabbit container has been started>
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl force_boot
rabbitmqctl start_app
Rabbitmq uses Erlang's distribution protocol, which requires port 4369 open for the EPMD (Erlang Port Mapper Daemon), expose it in the docker-compose and stop the EPMD running in your host.

Cannot connect to GCP Memorystore from GCP Dataflow

I'm attempting to use GCP Memorystore to handle session ids for a event streaming job running on GCP Dataflow. The job fails with a timeout when trying to connect to Memorystore:
redis.clients.jedis.exceptions.JedisConnectionException: Failed connecting to host 10.0.0.4:6379
at redis.clients.jedis.Connection.connect(Connection.java:207)
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:101)
at redis.clients.jedis.Connection.sendCommand(Connection.java:126)
at redis.clients.jedis.Connection.sendCommand(Connection.java:117)
at redis.clients.jedis.Jedis.get(Jedis.java:155)
My Memorystore instance has these properties:
Version is 4.0
Authorized network is default-auto
Master is in us-central1-b. Replica is in us-central1-a.
Connection properties: IP address: 10.0.0.4, Port number: 6379
> gcloud redis instances list --region us-central1
INSTANCE_NAME VERSION REGION TIER SIZE_GB HOST PORT NETWORK RESERVED_IP STATUS CREATE_TIME
memorystore REDIS_4_0 us-central1 STANDARD_HA 1 10.0.0.4 6379 default-auto 10.0.0.0/29 READY 2019-07-15T11:43:14
My Dataflow job has these properties:
runner: org.apache.beam.runners.dataflow.DataflowRunner
zone: us-central1-b
network: default-auto
> gcloud dataflow jobs list
JOB_ID NAME TYPE CREATION_TIME STATE REGION
2019-06-17_02_01_36-3308621933676080017 eventflow Streaming 2019-06-17 09:01:37 Running us-central1
My "default" network could not be used since it is a legacy network, which Memorystore would not accept. I failed to find a way to upgrade the default network from legacy to auto and did not want to delete the existing default network since this would require messing with production services. Instead I created a new network "default-auto" of type auto, with the same firewall rules as the default network. The one I believe is relevant for my Dataflow job is this:
Name: default-auto-internal
Type: Ingress
Targets: Apply to all
Filters: IP ranges: 10.0.0.0/20
Protocols/ports:
tcp:0-65535
udp:0-65535
icmp
Action: Allow
Priority: 65534
I can connect to Memorystore using "telnet 10.0.0.4 6379" from a Compute Engine instance.
Things I have tried, which did not change anything:
- Switched Redis library, from Jedis 2.9.3 to Lettuce 5.1.7
- Deleted and re-created the Memorystore instance
Is Dataflow not supposed to be able to connect to Memorystore, or am I missing something?
Figured it out. I was trying to connect to Memorystore from code called directly from the main method of my Dataflow job. Connecting from code running in a Dataflow step worked. On second though (well, actually more like 1002nd thought) this makes sense because main() is running on the driver machine (my desktop in this case) whereas the steps of the Dataflow graph will run on GCP. I have confirmed this theory by connecting to Memorystore on localhost:6379 in my main(). This works since I have an SSH tunnel to Memorystore running on port 6379 (using this trick).

nginx + uwsgi running on Amazon ECS returns socket: Too many open files (24) ( changing Ulimit did not help )

I have a UWSGI running behind Nginx proxy server. I tried to benchmark my backend instance using Apache bench. At one point in time, I get Too many open files (24) error when I run the command ab -c 1100 -n 2000 https://example.com/test.
I changed the ulimits of my ECS Instance as well as the docker containers and confirmed it by typing ulimit -n which returns 100000 in both the locations.
I cross checked the Individual NGINX, Uwsgi processes limits by opening the /proc/PID where the Max open files is set to 100000.
The worker_connections and worker_rlimit_nofile parameters in /etc/nginx/nginx.conf are also set to highest limit possible.

Starting Redis cluster hangs when calling redis-trib

I have tried to setup a Redis cluster running docker but it hangs when I try to join them. My docker ps gives me this:
Notice the port mapping.
All containers have this basic redis.conf file
port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
cluster-announce-ip 127.0.0.1
cluster-announce-port [7001, 7002, 7003, 7004, 7005 or 7006]
cluster-announce-bus-port [7101, 7102, 7103, 7104, 7105 or 7106]
Where the only change is the cluster-announce-port and cluster-announce-bus-port for each docker container. I hope you get the point.
I try to join the nodes with ./redis-trib.rb create --replicas 1 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 127.0.0.1:7006
And it discovers it perfectly and asking if the config should be accepted:
But then redis-trib hangs indefinitely with "Waiting for the cluster to join". I can see through docker logs r_1 to r_6, that the epoch is getting set:
1:M 15 Jul 10:38:08.493 # configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH
So redis-trib does call the different nodes.
I cant really find anything about the cluster-announce variables anywhere. Does anyone here know how to do this? I think my problems lies in this part.
The redis version I am using is 4.0.10.
Ok so I figured it out. I needed to
set my cluster-announce-ip to the Ethernet adapter that has been created when installing docker (open up a terminal and do ipconfig)
update redis-trib.rb to reflect this IP
map the 16379 port when the docker image is created

How to run a redis cluster on a docker cluster?

Context
I am trying to setup a redis cluster so that it runs on top off a docker cluster, to achieve maximum auto-healing.
More precisely, I have a docker compose file, which defines a service that has 3 replicas. Each service replica has a redis-server running on.
Then I have a program inside each replica that listens to changes on the docker cluster and that starts the cluster when conditions are met (each 3 redis-servers know each other).
Setting up the redis cluster works has expected, the cluster is formed and all the redis-servers communicate well, but the communication between redis-servers is inside the docker cluster.
The Problem
When I try to communicate from outside the docker cluster, because of the ingress mode I am able to talk to a redis-server, however when I try to add info (eg: set foo bar) and the client is moved to another redis-server the communication hangs and eventually times out.
Code
This is the docker-compose file.
version: "3.3"
services:
redis-cluster:
image: redis-srv-instance
volumes:
- /var/run/:/var/run
deploy:
mode: replicated
#endpoint_mode: dnsrr
replicas: 3
resources:
limits:
cpus: '0.5'
memory: 512M
ports:
- target: 6379
published: 30000
protocol: tcp
mode: ingress
The flux of commands that show the problem.
Client
~ ./redis-cli -c -p 30000
127.0.0.1:30000>
Redis-server
OK
1506533095.032738 [0 10.255.0.2:59700] "COMMAND"
1506533098.335858 [0 10.255.0.2:59700] "info"
Client
127.0.0.1:30000> set ghb fki
OK
Redis-server
1506533566.481334 [0 10.255.0.2:59718] "COMMAND"
1506533571.315238 [0 10.255.0.2:59718] "set" "ghb" "fki"
Client
127.0.0.1:30000> set rte fgh
-> Redirected to slot [3830] located at 10.0.0.3:6379
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
(150.31s)
not connected>
Any ideas? I have also tried making my one proxy/load balancer but didn't work.
Thank you! Have a nice day.
For this use case, sentinel might help. Redis on its own is not capably of high availability. Sentinel on the other side is a distributed system which can do the following for you:
Route the ingress trafic to the current Redis master.
Elect a new Redis master should the current one fail.
While I have previously done research on this topic, I have not yet managed to pull to getter a working example.
redis-cli would get the redis server ip inside the ingress network, and try to access the remote redis server by that ip directly. That is why redis-cli shows Redirected to slot [3830] located at 10.0.0.3:6379. But this internal 10.0.0.3 is not accessible to redis-cli.
One solution is to run another proxy service which attaches to the same network with redis cluster. The application sends all requests to that proxy service, and the proxy service talks with redis cluster.
Or you could create 3 swarm services that uses the bridge network and exposes the redis port to node. Your internal program needs to change accordingly.

Resources