sentinel cannot promote initial master back to MASTER mode after failover - docker

I am running redis in sentinel mode with 1 master node, 2 replica nodes and 3 sentinel nodes. I am running all the nodes in docker swarm environment.
All nodes starts fine. At start we have the following IPs for nodes
master 10.0.20.2
replica-1 10.0.20.5
replica-2 10.0.20.10
Next I stop the master container to bring master node down so that sentinel should pick one of replica nodes as new master.
This goes fine and replica-1 node is selected as new master.
In meantime, docker swarm spin up new container for masterand it joins as slave in the redis sentinel network.
Next, I bring the replica-1 node down for another failover. Now the actual issue happens when sentinel tries to upgrade master node from slave to master.
Below is the masternode redis config file when sentinel tries to make it master. I am wondering why the file is updated with replicaof 10.0.20.2 6379 when this node is the new master and IP is of same node.
master node redis.conf
root#0fd67f6ceb37:/data# tail -f /etc/redis/redis.conf
replica-announce-ip "redis-master"
#replica-announce-port 6379
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbchecksum yes
# Generated by CONFIG REWRITE
replicaof 10.0.20.2 6379
This is wrong configuration so it fails in sometime and sentinel picks replica-2 node as new master
This is the error I see when masternode logs ( below is the detailed log file)
Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
And in the end replica-2 acts as masterand replica-1 and master as two slaves.
master node logs (this is after master joins as slave and sentinel tries to promote it to master mode)
[docker#chopswarm1 redis-failover]$ d logs 0fd67f6ceb37
1:C 05 Nov 2019 06:43:49.360 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 05 Nov 2019 06:43:49.360 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 05 Nov 2019 06:43:49.360 # Configuration loaded
1:M 05 Nov 2019 06:43:49.361 * Running mode=standalone, port=6379.
1:M 05 Nov 2019 06:43:49.361 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 05 Nov 2019 06:43:49.361 # Server initialized
1:M 05 Nov 2019 06:43:49.361 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 05 Nov 2019 06:43:49.361 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 05 Nov 2019 06:43:49.361 * DB loaded from disk: 0.000 seconds
1:M 05 Nov 2019 06:43:49.361 * Ready to accept connections
1:S 05 Nov 2019 06:43:59.817 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 05 Nov 2019 06:43:59.817 * REPLICAOF 10.0.20.5:6379 enabled (user request from 'id=5 addr=10.0.20.7:60534 fd=10 name=sentinel-38a1e461-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=148 qbuf-free=32620 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 05 Nov 2019 06:43:59.817 # CONFIG REWRITE executed with success.
1:S 05 Nov 2019 06:44:00.386 * Connecting to MASTER 10.0.20.5:6379
1:S 05 Nov 2019 06:44:00.387 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:00.387 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:00.387 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:00.387 * Trying a partial resynchronization (request 0b1ed09c8d497744632c93cab960c4ca4ee9a11e:1).
1:S 05 Nov 2019 06:44:00.388 * Full resync from master: f3c311652d8860c93048eba075521df7033cab2f:38645
1:S 05 Nov 2019 06:44:00.388 * Discarding previously cached master state.
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: receiving 178 bytes from master
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: Flushing old data
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 05 Nov 2019 06:44:00.486 * MASTER <-> REPLICA sync: Finished with success
1:S 05 Nov 2019 06:44:35.367 # Connection with master lost.
1:S 05 Nov 2019 06:44:35.367 * Caching the disconnected master state.
1:S 05 Nov 2019 06:44:35.464 * Connecting to MASTER 10.0.20.5:6379
1:S 05 Nov 2019 06:44:35.465 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:35.465 # Error condition on socket for SYNC: Connection refused
1:S 05 Nov 2019 06:44:36.466 * Connecting to MASTER 10.0.20.5:6379
1:S 05 Nov 2019 06:44:36.466 * MASTER <-> REPLICA sync started
1:M 05 Nov 2019 06:44:40.748 # Setting secondary replication ID to f3c311652d8860c93048eba075521df7033cab2f, valid up to offset: 46004. New replication ID is 77213f07383dd307e4b6d917b6a8789de42cad20
1:M 05 Nov 2019 06:44:40.748 * Discarding previously cached master state.
1:M 05 Nov 2019 06:44:40.748 * MASTER MODE enabled (user request from 'id=16 addr=10.0.20.7:60576 fd=17 name=sentinel-38a1e461-cmd age=31 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=140 qbuf-free=32628 obl=36 oll=0 omem=0 events=r cmd=exec')
1:M 05 Nov 2019 06:44:40.748 # CONFIG REWRITE executed with success.
1:M 05 Nov 2019 06:44:41.881 * Replica redis-replica-2:6379 asks for synchronization
1:M 05 Nov 2019 06:44:41.881 * Partial resynchronization request from redis-replica-2:6379 accepted. Sending 881 bytes of backlog starting from offset 46004.
1:S 05 Nov 2019 06:44:43.132 # Connection with replica redis-replica-2:6379 lost.
1:S 05 Nov 2019 06:44:43.132 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 05 Nov 2019 06:44:43.132 * REPLICAOF 10.0.20.2:6379 enabled (user request from 'id=24 addr=10.0.20.7:60636 fd=15 name=sentinel-38a1e461-cmd age=3 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=291 qbuf-free=32477 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 05 Nov 2019 06:44:43.133 # CONFIG REWRITE executed with success.
1:S 05 Nov 2019 06:44:43.484 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:43.484 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:43.484 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:43.484 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:43.484 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:43.484 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:44.489 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:44.489 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:44.489 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:44.489 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:44.490 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:44.490 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:45.489 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:45.490 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:45.490 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:45.490 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:45.490 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:45.490 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:46.493 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:46.493 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:46.493 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:46.493 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:46.493 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:44:46.494 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:44:47.493 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:44:47.494 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:44:47.494 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:44:47.494 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:44:47.494 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
<-- omitted few entries for the same errors as above for better readability -->
1:S 05 Nov 2019 06:45:21.575 * Connecting to MASTER 10.0.20.2:6379
1:S 05 Nov 2019 06:45:21.575 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:45:21.575 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:45:21.575 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:45:21.575 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:45:21.575 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 05 Nov 2019 06:45:22.456 * REPLICAOF 10.0.20.10:6379 enabled (user request from 'id=113 addr=10.0.20.7:60950 fd=12 name=sentinel-38a1e461-cmd age=5 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=150 qbuf-free=32618 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 05 Nov 2019 06:45:22.456 # CONFIG REWRITE executed with success.
1:S 05 Nov 2019 06:45:22.577 * Connecting to MASTER 10.0.20.10:6379
1:S 05 Nov 2019 06:45:22.577 * MASTER <-> REPLICA sync started
1:S 05 Nov 2019 06:45:22.577 * Non blocking connect for SYNC fired the event.
1:S 05 Nov 2019 06:45:22.577 * Master replied to PING, replication can continue...
1:S 05 Nov 2019 06:45:22.577 * Trying a partial resynchronization (request 77213f07383dd307e4b6d917b6a8789de42cad20:46885).
1:S 05 Nov 2019 06:45:22.577 * Successful partial resynchronization with master.
1:S 05 Nov 2019 06:45:22.577 # Master replication ID changed to 3235720aad34423d6f82f9db4a953042c1f16d58
1:S 05 Nov 2019 06:45:22.577 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
sentinel log file ( have added additional line breaks when failover starts)
root#3708cf05eca4:/data# cat sentinel.log
1:X 05 Nov 2019 06:40:49.116 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 05 Nov 2019 06:40:49.116 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 05 Nov 2019 06:40:49.116 # Configuration loaded
1:X 05 Nov 2019 06:40:49.117 * Running mode=sentinel, port=26379.
1:X 05 Nov 2019 06:40:49.117 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 05 Nov 2019 06:40:49.119 # Sentinel ID is 38a1e461910e17fb7be79e695040074df2dde2df
1:X 05 Nov 2019 06:40:49.119 # +monitor master eaas-redis-master 10.0.20.2 6379 quorum 2
1:X 05 Nov 2019 06:40:49.120 * +slave slave redis-replica-1:6379 10.0.20.5 6379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:40:51.183 * +sentinel sentinel 3b0831ce9f6aff70f9bf45f4211d66ebfd1c6a21 10.0.20.33 26379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:40:59.150 * +slave slave redis-replica-2:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:40:59.202 * +fix-slave-config slave redis-replica-1:6379 10.0.20.5 6379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:41:01.362 * +sentinel sentinel 464f3750404b419fccf513784f40baf7f6622cba 10.0.20.41 26379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:41:09.249 * +fix-slave-config slave redis-replica-2:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:43:48.513 # +sdown master eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:43:48.594 # +new-epoch 1
1:X 05 Nov 2019 06:43:48.595 # +vote-for-leader 464f3750404b419fccf513784f40baf7f6622cba 1
1:X 05 Nov 2019 06:43:48.613 # +odown master eaas-redis-master 10.0.20.2 6379 #quorum 2/2
1:X 05 Nov 2019 06:43:48.613 # Next failover delay: I will not start a failover before Tue Nov 5 06:43:59 2019
1:X 05 Nov 2019 06:43:49.732 # +config-update-from sentinel 464f3750404b419fccf513784f40baf7f6622cba 10.0.20.41 26379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:43:49.732 # +switch-master eaas-redis-master 10.0.20.2 6379 10.0.20.5 6379
1:X 05 Nov 2019 06:43:49.732 * +slave slave 10.0.20.10:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:43:49.732 * +slave slave 10.0.20.2:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:43:49.785 * +slave slave redis-replica-2:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:43:59.816 * +convert-to-slave slave 10.0.20.2:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:09.832 * +slave slave redis-master:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.453 # +sdown master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.524 # +odown master eaas-redis-master 10.0.20.5 6379 #quorum 2/2
1:X 05 Nov 2019 06:44:40.524 # +new-epoch 2
1:X 05 Nov 2019 06:44:40.524 # +try-failover master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.525 # +vote-for-leader 38a1e461910e17fb7be79e695040074df2dde2df 2
1:X 05 Nov 2019 06:44:40.525 # 3b0831ce9f6aff70f9bf45f4211d66ebfd1c6a21 voted for 3b0831ce9f6aff70f9bf45f4211d66ebfd1c6a21 2
1:X 05 Nov 2019 06:44:40.528 # 464f3750404b419fccf513784f40baf7f6622cba voted for 38a1e461910e17fb7be79e695040074df2dde2df 2
1:X 05 Nov 2019 06:44:40.580 # +elected-leader master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.580 # +failover-state-select-slave master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.681 # +selected-slave slave redis-master:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.681 * +failover-state-send-slaveof-noone slave redis-master:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:40.748 * +failover-state-wait-promotion slave redis-master:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.003 # +promoted-slave slave redis-master:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.003 # +failover-state-reconf-slaves master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.101 * +slave-reconf-sent slave 10.0.20.10:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:41.598 # -odown master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:42.050 * +slave-reconf-inprog slave 10.0.20.10:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:42.050 * +slave-reconf-done slave 10.0.20.10:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:42.107 * +slave-reconf-sent slave redis-replica-2:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:43.056 * +slave-reconf-inprog slave redis-replica-2:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:43.056 * +slave-reconf-done slave redis-replica-2:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:43.132 * +slave-reconf-sent slave 10.0.20.2:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:44.111 * +slave-reconf-inprog slave 10.0.20.2:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 # +failover-end-for-timeout master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 # +failover-end master eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 * +slave-reconf-sent-be slave redis-master:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 * +slave-reconf-sent-be slave 10.0.20.2:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.5 6379
1:X 05 Nov 2019 06:44:46.056 # +switch-master eaas-redis-master 10.0.20.5 6379 10.0.20.2 6379
1:X 05 Nov 2019 06:44:46.057 * +slave slave 10.0.20.10:6379 10.0.20.10 6379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:44:46.057 * +slave slave 10.0.20.5:6379 10.0.20.5 6379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:44:51.062 # +sdown slave 10.0.20.5:6379 10.0.20.5 6379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:45:11.226 # +sdown master eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:45:16.233 # +new-epoch 3
1:X 05 Nov 2019 06:45:16.234 # +vote-for-leader 464f3750404b419fccf513784f40baf7f6622cba 3
1:X 05 Nov 2019 06:45:16.535 # +odown master eaas-redis-master 10.0.20.2 6379 #quorum 3/2
1:X 05 Nov 2019 06:45:16.535 # Next failover delay: I will not start a failover before Tue Nov 5 06:45:26 2019
1:X 05 Nov 2019 06:45:17.285 # +config-update-from sentinel 464f3750404b419fccf513784f40baf7f6622cba 10.0.20.41 26379 # eaas-redis-master 10.0.20.2 6379
1:X 05 Nov 2019 06:45:17.285 # +switch-master eaas-redis-master 10.0.20.2 6379 10.0.20.10 6379
1:X 05 Nov 2019 06:45:17.285 * +slave slave 10.0.20.5:6379 10.0.20.5 6379 # eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:17.285 * +slave slave 10.0.20.2:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:22.456 * +fix-slave-config slave 10.0.20.2:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:26.347 * +slave slave redis-replica-1:6379 10.0.20.5 6379 # eaas-redis-master 10.0.20.10 6379
1:X 05 Nov 2019 06:45:26.348 * +slave slave redis-master:6379 10.0.20.2 6379 # eaas-redis-master 10.0.20.10 6379
root#3708cf05eca4:/data#
So I want to know why sentinel rewrites the configuration file with replicaof for master node only(it happens only for master node and not for replica nodes when they are promoted to MASTER mode)
How can I improve this scenario so that master node can run in MASTER mode again if sentinel picks it up during failover.
Please let me know if any more information is required.
Below are redis configuration files for master and replica nodes when I start the docker swarm stack.
redis.conf(master)
dir /data/
replica-announce-ip {{REDIS_MASTER}}
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbchecksum yes
redis.conf(replica)
replicaof {{REDIS_MASTER}} 6379
dir /data/
replica-announce-ip {{REDIS_REPLICA}}
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error no
rdbchecksum yes

There is some specific issue with docker service vs docker container IPs: https://github.com/moby/moby/issues/30963
So, when you set REDIS_MASTER_HOST: redis-master (for example) environment it points to docker service IP, but not actual redis-master container IP - and this behavior breaks sentinel logic.
I used this configurations in docker-compose (note dnsrr endpoint_mode):
redis-master:
image: bitnami/redis:5.0.9
environment:
REDIS_REPLICATION_MODE: master
ALLOW_EMPTY_PASSWORD: 'yes'
REDIS_AOF_ENABLED: 'no'
deploy:
endpoint_mode: dnsrr
This configuration provides one IP for redis-master DNS service record and container itself. But in this case IP-address after redis-master container restart will change, so after failure I recreate sentinel configuration with commands:
# on redis-master
redis-cli SLAVEOF <new IP master> 6379
# on all sentinel nodes
redis-cli -p 26379 SENTINEL REMOVE mymaster
redis-cli -p 26379 SENTINEL monitor mymaster <new IP redis-master> 6379 2
After this manipulation redis-master will change role correctly.
Another option:
I asked for function to get correct IP for redis in docker swarm environments: https://github.com/bitnami/bitnami-docker-redis/issues/174
And I created fork: https://github.com/tartemov/bitnami-docker-redis (There is one change with dns_lookup function)
In this case, docker-compose file will look like:
services:
redis-master:
image: ndocker-registry/redis:5.0.9
hostname: "redis-master"
environment:
REDIS_REPLICATION_MODE: master
ALLOW_EMPTY_PASSWORD: 'yes'
REDIS_AOF_ENABLED: 'no'
redis-slave-1:
image: docker-registry/redis:5.0.9
environment:
REDIS_REPLICATION_MODE: slave
REDIS_MASTER_HOST: redis-master
ALLOW_EMPTY_PASSWORD: 'yes'
REDIS_AOF_ENABLED: 'no'
sentinel:
image: bitnami/redis-sentinel:5.0.9-debian-10-r49
environment:
REDIS_MASTER_HOST: redis-master
ALLOW_EMPTY_PASSWORD: 'yes'
REDIS_AOF_ENABLED: 'no'
REDIS_SENTINEL_DOWN_AFTER_MILLISECONDS: 5000
REDIS_SENTINEL_FAILOVER_TIMEOUT: 60000
deploy:
mode: replicated
replicas: 3
Note hostname property - with this property and new dns_lookup function redis can annotate IP address
And you don't need to run any manual actions - service address will not change and sentinel will correctly set roles for any of redis nodes

Related

Redis config doesn't setup on docker

I tried to specify Redis config in the Dockerfile:
FROM redis:7.0.0
EXPOSE 6379
COPY redis.conf /usr/local/etc/redis/redis.conf
CMD ["redis-server", "--include /usr/local/etc/redis/redis.conf"]
But in logs:
redis_1 | 1:C 14 Nov 2022 12:32:28.045 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1 | 1:C 14 Nov 2022 12:32:28.045 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1 | 1:C 14 Nov 2022 12:32:28.045 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis_1 | 1:M 14 Nov 2022 12:32:28.046 * monotonic clock: POSIX clock_gettime
redis_1 | 1:M 14 Nov 2022 12:32:28.050 * Running mode=standalone, port=6379.
redis_1 | 1:M 14 Nov 2022 12:32:28.051 # Server initialized
And redis config doesn't work
With config I want to disable replica-read-only, which produces "Can't write against read-only replica" errors.

Docker container fails to initalize from docker-compose

I am new to Docker and in the process of learning Docker Compose, using CentOS 8 as the host system. All of these files are in the same directory and marked executable with mode 0755.
My main application is in webapp.py:
import time
import redis
from flask import Flask
app = Flask(__name__)
cache = redis.Redis(host='redis', port=6379)
def get_hit_count():
retries = 5
while True:
try:
return cache.incr('hits')
except redis.exceptions.ConnectionError as exc:
if retries == 0:
raise exc
retries -= 1
time.sleep(0.5)
#app.route('/')
def hello():
count = get_hit_count()
return 'Hello World! I have been seen {} times.\n'.format(count)
There is a matching requirements.txt file:
flask
redis
A Dockerfile:
FROM python:3.4-alpine
ADD . /code
WORKDIR /code
RUN pip install -r requirements.txt
CMD ["python", "webapp.py"]
And a docker-compose.yml file:
version: "3"
services:
web:
build: .
ports:
- "5000:5000"
stdin_open: true
tty: true
redis:
image: "redis:alpine"
However, when I run docker-compose up, it fails by exiting the web app. Compose outputs:
[root#localhost webappl]# docker-compose up
Starting webappl_redis_1 ... done
Starting webappl_web_1 ... done
Attaching to webappl_web_1, webappl_redis_1
redis_1 | 1:C 27 Dec 2021 09:03:35.861 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1 | 1:C 27 Dec 2021 09:03:35.861 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1 | 1:C 27 Dec 2021 09:03:35.861 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis_1 | 1:M 27 Dec 2021 09:03:35.863 * monotonic clock: POSIX clock_gettime
redis_1 | 1:M 27 Dec 2021 09:03:35.864 * Running mode=standalone, port=6379.
redis_1 | 1:M 27 Dec 2021 09:03:35.864 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
redis_1 | 1:M 27 Dec 2021 09:03:35.864 # Server initialized
redis_1 | 1:M 27 Dec 2021 09:03:35.864 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis_1 | 1:M 27 Dec 2021 09:03:35.865 * Loading RDB produced by version 6.2.6
redis_1 | 1:M 27 Dec 2021 09:03:35.865 * RDB age 4147 seconds
redis_1 | 1:M 27 Dec 2021 09:03:35.865 * RDB memory usage when created 0.77 Mb
redis_1 | 1:M 27 Dec 2021 09:03:35.865 # Done loading RDB, keys loaded: 0, keys expired: 0.
redis_1 | 1:M 27 Dec 2021 09:03:35.865 * DB loaded from disk: 0.000 seconds
redis_1 | 1:M 27 Dec 2021 09:03:35.865 * Ready to accept connections
webappl_web_1 exited with code 0
Why does it exit?
can you trying adding following to your web: after ports:? if stdin_open doesn't work, try stdin
stdin_open: true
tty: true

Connection refused when using redis with docker-compose

So this is my current docker-compose.yml:
version: "2.0"
services:
redis:
image: redis
container_name: framework-redis
ports:
- "127.0.0.1:6379:6379"
web:
image: myContainer:v1
container_name: framework-web
depends_on:
- redis
volumes:
- /var/www/myApp:/app
environment:
LOG_STDOUT: /var/log/docker.access.log
LOG_STDERR: /var/log/docker.error.log
ports:
- "8100:80"
I've tried different settings; for example: not using a port value for redis, using 0.0.0.0, switching to the expose option.
If I try to connect using 127.0.0.1 from the host machine it works, but it fails with a connection refused message for my app container.
Any thoughts?
If you're accessing framework-redis from framework-web, then you need to access it using ip (or container name, i.e., framework-redis) and port of framework-redis. Since, its going to be behind a docker bridge, an ip in the range of 172.17.0.0/16 will be allocated to framework-redis. You can use that IP or better just give the container name along with 6379 port.
$ cat docker-compose.yml
version: "2.0"
services:
redis:
image: redis
container_name: framework-redis
web:
image: redis
container_name: framework-web
depends_on:
- redis
command: [ "redis-cli", "-h", "framework-redis", "ping" ]
$
$ docker-compose up
Recreating framework-redis ... done
Recreating framework-web ... done
Attaching to framework-redis, framework-web
framework-redis | 1:C 09 Dec 2019 19:25:52.798 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
framework-redis | 1:C 09 Dec 2019 19:25:52.798 # Redis version=5.0.6, bits=64, commit=00000000, modified=0, pid=1, just started
framework-redis | 1:C 09 Dec 2019 19:25:52.798 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
framework-redis | 1:M 09 Dec 2019 19:25:52.799 * Running mode=standalone, port=6379.
framework-redis | 1:M 09 Dec 2019 19:25:52.800 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
framework-redis | 1:M 09 Dec 2019 19:25:52.800 # Server initialized
framework-redis | 1:M 09 Dec 2019 19:25:52.800 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
framework-redis | 1:M 09 Dec 2019 19:25:52.800 * DB loaded from disk: 0.000 seconds
framework-redis | 1:M 09 Dec 2019 19:25:52.800 * Ready to accept connections
framework-web | PONG
framework-web exited with code 0
As you can see above, I received a PONG for the PING command.
Some additional points:
ports are written in the form HOST_PORT:CONTAINER_PORT. You don't need to give IP (as pointed out by #coulburton in the comments).
If you're only accessing framework-redis from framework-web, then you don't need to publish ports (i.e., 6379:6379 in the ports section). We only need to publish ports when we want to access an application running in the container network (which as far as I know by default is 172.17.0.0/16) from some other network (ex, the host machine or some other physical machine).

Adding Sentinel ANNOUNCE_IP after container is up via entrypoint.sh

Docker swarm-mode is tricky when it comes to IP addresses since the IP it shows not necessarily the one in use. This confuse Sentinel in swarm-mode. To get around this Im adding steps to sentinel-entrypoint to ANNOUNCE_IP after container in up and has its IP assigned.
I added the following to sentienl-entrypoint.sh:
ANNOUNCE_IP=$(awk "END{print $1}" /etc/hosts)
echo $ANNOUNCE_IP
if [ -n "$ANNOUNCE_IP" ]; then
echo "sentinel announce-ip $ANNOUNCE_IP" >> /usr/local/bin/sentinel.conf
fi
When containers start I see the following in log:
port 26379
dir /tmp
sentinel monitor docker-cluster redis_1 6379 3
sentinel down-after-milliseconds docker-cluster 5000
sentinel parallel-syncs docker-cluster 1
sentinel failover-timeout docker-cluster 5000
0
1:X 10 Oct 2019 16:26:55.426 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 10 Oct 2019 16:26:55.426 # Redis version=5.0.6, bits=64, commit=00000000, modified=0, pid=1,
just started
1:X 10 Oct 2019 16:26:55.426 # Configuration loaded
1:X 10 Oct 2019 16:26:55.427 * Running mode=sentinel, port=26379.
1:X 10 Oct 2019 16:26:55.427 # WARNING: The TCP backlog setting of 511 cannot be enforced because
/proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 10 Oct 2019 16:26:55.432 # Sentinel ID is 6c81c967eeed77e513166491ee0e6b989f261c74
1:X 10 Oct 2019 16:26:55.432 # +monitor master docker-cluster 10.0.0.159 6379 quorum 3
1:X 10 Oct 2019 16:26:55.433 * +slave slave 10.0.0.13:6379 10.0.0.13 6379 # docker-cluster
10.0.0.159 6379
1:X 10 Oct 2019 16:27:00.450 # +sdown slave 10.0.0.13:6379 10.0.0.13 6379 # docker-cluster
10.0.0.159 6379
For some unknown reason the IP address shows as 0, but when I log into the container and run the same command I get the IP of the container. Any ideas?

Redis cluster in docker-swarm mode updated recipe

Anyone has a working recipe of Redis cluster in swarm mode? I tried everything I know and searched the internet but seems like an impossible task.
Here is what I have so far:
version: '3.4'
services:
redis-master:
image: redis
networks:
- redisdb
ports:
- 6379:6379
volumes:
- redis-master:/data
redis-slave:
image: redis
networks:
- redisdb
command: redis-server --slaveof redis-master 6379
volumes:
- redis-slave:/data
sentinel:
image: redis
networks:
- redisdb
ports:
- 26379:26379
command: >
bash -c "echo 'port 26379' > sentinel.conf &&
echo 'dir /tmp' >> sentinel.conf &&
echo 'sentinel monitor redis-master redis-master 6379 2' >> sentinel.conf &&
echo 'sentinel down-after-milliseconds redis-master 5000' >> sentinel.conf &&
echo 'sentinel parallel-syncs redis-master 1' >> sentinel.conf &&
echo 'sentinel failover-timeout redis-master 5000' >> sentinel.conf &&
cat sentinel.conf &&
redis-server sentinel.conf --sentinel"
links:
- redis-master
- redis-slave
volumes:
redis-master:
driver: local
redis-slave:
driver: local
networks:
redisdb:
attachable: true
driver: overlay
I use the following command to deploy as a service:
docker stack deploy --compose-file docker-compose-test.yml redis
The result is deployed services, were redis-master and redis-slave are connecting and I can see the synchronization processes happening as follows:
redis-master log:
1:C 16 Oct 2019 04:19:42.720 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 16 Oct 2019 04:19:42.720 # Redis version=5.0.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 16 Oct 2019 04:19:42.720 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 16 Oct 2019 04:19:42.723 * Running mode=standalone, port=6379.
1:M 16 Oct 2019 04:19:42.723 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 16 Oct 2019 04:19:42.723 # Server initialized
1:M 16 Oct 2019 04:19:42.723 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 16 Oct 2019 04:19:42.723 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 16 Oct 2019 04:19:42.723 * Ready to accept connections
1:M 16 Oct 2019 04:19:43.976 * Replica 10.0.27.2:6379 asks for synchronization
1:M 16 Oct 2019 04:19:43.976 * Full resync requested by replica 10.0.27.2:6379
1:M 16 Oct 2019 04:19:43.976 * Starting BGSAVE for SYNC with target: disk
1:M 16 Oct 2019 04:19:43.976 * Background saving started by pid 15
15:C 16 Oct 2019 04:19:43.982 * DB saved on disk
15:C 16 Oct 2019 04:19:43.982 * RDB: 0 MB of memory used by copy-on-write
1:M 16 Oct 2019 04:19:44.053 * Background saving terminated with success
1:M 16 Oct 2019 04:19:44.053 * Synchronization with replica 10.0.27.2:6379 succeeded
Redis-slave log:
1:C 16 Oct 2019 04:19:40.776 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 16 Oct 2019 04:19:40.776 # Redis version=5.0.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 16 Oct 2019 04:19:40.776 # Configuration loaded
1:S 16 Oct 2019 04:19:40.779 * Running mode=standalone, port=6379.
1:S 16 Oct 2019 04:19:40.779 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 16 Oct 2019 04:19:40.779 # Server initialized
1:S 16 Oct 2019 04:19:40.779 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:S 16 Oct 2019 04:19:40.779 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 16 Oct 2019 04:19:40.779 * Ready to accept connections
1:S 16 Oct 2019 04:19:40.779 * Connecting to MASTER redis-master:6379
1:S 16 Oct 2019 04:19:40.817 # Unable to connect to MASTER: Invalid argument
1:S 16 Oct 2019 04:19:41.834 * Connecting to MASTER redis-master:6379
1:S 16 Oct 2019 04:19:41.851 # Unable to connect to MASTER: Invalid argument
1:S 16 Oct 2019 04:19:42.866 * Connecting to MASTER redis-master:6379
1:S 16 Oct 2019 04:19:42.942 # Unable to connect to MASTER: Invalid argument
1:S 16 Oct 2019 04:19:43.970 * Connecting to MASTER redis-master:6379
1:S 16 Oct 2019 04:19:43.975 * MASTER <-> REPLICA sync started
1:S 16 Oct 2019 04:19:43.975 * Non blocking connect for SYNC fired the event.
1:S 16 Oct 2019 04:19:43.975 * Master replied to PING, replication can continue...
1:S 16 Oct 2019 04:19:43.976 * Partial resynchronization not possible (no cached master)
1:S 16 Oct 2019 04:19:43.977 * Full resync from master: 39bb36f74ef0cdefdc08a2dc8d4a86112ea69f12:0
1:S 16 Oct 2019 04:19:44.053 * MASTER <-> REPLICA sync: receiving 175 bytes from master
1:S 16 Oct 2019 04:19:44.054 * MASTER <-> REPLICA sync: Flushing old data
1:S 16 Oct 2019 04:19:44.054 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 16 Oct 2019 04:19:44.054 * MASTER <-> REPLICA sync: Finished with success
Redis-sentinel log:
port 26379
port 26379
dir /tmp
dir /tmp
sentinel monitor redis-master redis-master 6379 2
sentinel monitor redis-master redis-master 6379 2
sentinel down-after-milliseconds redis-master 5000
sentinel down-after-milliseconds redis-master 5000
sentinel parallel-syncs redis-master 1
sentinel parallel-syncs redis-master 1
sentinel failover-timeout redis-master 5000
sentinel failover-timeout redis-master 5000
1:X 16 Oct 2019 04:19:49.506 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
*** FATAL CONFIG FILE ERROR ***
1:X 16 Oct 2019 04:19:49.506 # Redis version=5.0.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 16 Oct 2019 04:19:49.506 # Configuration loaded
Reading the configuration file, at line 3
>>> 'sentinel monitor redis-master redis-master 6379 2'
1:X 16 Oct 2019 04:19:49.508 * Running mode=sentinel, port=26379.
1:X 16 Oct 2019 04:19:49.508 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
Can't resolve master instance hostname.
1:X 16 Oct 2019 04:19:49.511 # Sentinel ID is aad9553d4999951f3b37eede5968b4aa262c07a9
1:X 16 Oct 2019 04:19:49.511 # +monitor master redis-master 10.0.27.5 6379 quorum 2
1:X 16 Oct 2019 04:19:49.512 * +slave slave 10.0.27.2:6379 10.0.27.2 6379 # redis-master 10.0.27.5 6379
1:X 16 Oct 2019 04:19:54.514 # +sdown slave 10.0.27.2:6379 10.0.27.2 6379 # redis-master 10.0.27.5 6379
So the slave-announce-ip, replica-announce-ip etc are having issues:
1- because the ips keep changing
2- Overlay networks ips are not always what they show they are
So failover dose not work and if master went down slave dont kick in!

Resources