I'm trying to setup the consul server and connect an agent to it for 2 or 3 days already. I'm using docker-compose.
But after performing a join operation, agent gets a message "Agent not live or unreachable".
Here are the logs:
root#e33a6127103f:/app# consul agent -join 10.1.30.91 -data-dir=/tmp/consul
==> Starting Consul agent...
==> Joining cluster...
Join completed. Synced with 1 initial agents
==> Consul agent running!
Version: 'v1.0.1'
Node ID: '0e1adf74-462d-45a4-1927-95ed123f1526'
Node name: 'e33a6127103f'
Datacenter: 'dc1' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
Cluster Addr: 172.17.0.2 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
==> Log data will now stream in as it occurs:
2017/12/06 10:44:43 [INFO] serf: EventMemberJoin: e33a6127103f 172.17.0.2
2017/12/06 10:44:43 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
2017/12/06 10:44:43 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
2017/12/06 10:44:43 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp)
2017/12/06 10:44:43 [INFO] agent: (LAN) joining: [10.1.30.91]
2017/12/06 10:44:43 [INFO] serf: EventMemberJoin: consul1 172.19.0.2 2017/12/06 10:44:43 [INFO] consul: adding server consul1 (Addr: tcp/172.19.0.2:8300) (DC: dc1)
2017/12/06 10:44:43 [INFO] agent: (LAN) joined: 1 Err: <nil>
2017/12/06 10:44:43 [INFO] agent: started state syncer
2017/12/06 10:44:43 [WARN] manager: No servers available
2017/12/06 10:44:43 [ERR] agent: failed to sync remote state: No known Consul servers
2017/12/06 10:44:54 [INFO] memberlist: Suspect consul1 has failed, no acks received
2017/12/06 10:44:55 [ERR] consul: "Catalog.NodeServices" RPC failed to server 172.19.0.2:8300: rpc error getting client: failed to get conn: dial tcp <nil>->172.19.0.2:8300: i/o timeout
2017/12/06 10:44:55 [ERR] agent: failed to sync remote state: rpc error getting client: failed to get conn: dial tcp <nil>->172.19.0.2:8300: i/o timeout
2017/12/06 10:44:58 [INFO] memberlist: Marking consul1 as failed, suspect timeout reached (0 peer confirmations)
2017/12/06 10:44:58 [INFO] serf: EventMemberFailed: consul1 172.19.0.2
2017/12/06 10:44:58 [INFO] consul: removing server consul1 (Addr: tcp/172.19.0.2:8300) (DC: dc1)
2017/12/06 10:45:05 [INFO] memberlist: Suspect consul1 has failed, no acks received
2017/12/06 10:45:06 [WARN] manager: No servers available
2017/12/06 10:45:06 [ERR] agent: Coordinate update error: No known Consul servers
2017/12/06 10:45:12 [WARN] manager: No servers available
2017/12/06 10:45:12 [ERR] agent: failed to sync remote state: No known Consul servers
2017/12/06 10:45:13 [INFO] serf: attempting reconnect to consul1 172.19.0.2:8301
2017/12/06 10:45:28 [WARN] manager: No servers available
2017/12/06 10:45:28 [ERR] agent: failed to sync remote state: No known Consul servers
2017/12/06 10:45:32 [WARN] manager: No servers available . `
My settings are:
docker-compose SERVER:
consul1:
image: "consul.1.0.1"
container_name: "consul1"
hostname: "consul1"
volumes:
- ./consul/config:/config/
ports:
- "8400:8400"
- "8500:8500"
- "8600:53"
- "8300:8300"
- "8301:8301"
command: "agent -config-dir=/config -ui -server -bootstrap-expect 1"
Help please solve the problem.
I think you using wrong ip-address of consul-server
"consul agent -join 10.1.30.91 -data-dir=/tmp/consul"
10.1.30.91 this is not docker container ip it might be your host address/virtualbox.
Get consul-container ip and use that to join in consul-agent command.
For more info about how consul and agent works follow the link
https://dzone.com/articles/service-discovery-with-docker-and-consul-part-1
Try to get the right IP address by executing this command:
docker inspect <container id> | grep "IPAddress"
Where the is the container ID of the consul server.
Than use the obtained address instead of "10.1.30.91" in the command
consul agent -join <IP ADDRESS CONSUL SERVER> -data-dir=/tmp/consul
Related
Environment:
Openstack Train, deploy by kolla-ansible
RabbitMQ 3.7.10 on Erlang 20.2.2
three control nodes(also run other components)
Problem:
node-34 rabbitmq consume large memory(30G) during 04-20 16:31 to 04-20 16:46(restart the rabbitmq process manual, or it will consume memory until it trigger OOM-killer although set vm_memory_high_watermark to 0.1 [another cluster with the same environment])
node-33 rabbitmq consume 15G virtual memory but only little physical memory during 04-20 16:26 to 04-20 16:28
fixed the problem only need to restart node-34 rabbitmq
Question:
what's the root cause of this issue?
how could I fix it totally but not restart while the issue happen?
Component logs:
node-33 rabbitmq
2022-04-20 16:20:00.731 [info] <0.30576.499> connection <0.30576.499> (1.1.1.45:33314 -> 1.1.1.33:5672 - nova-compute:7:1dab7694-168e-491c-8aa5-5e5a9f993750): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:25:25.678 [info] <0.14459.449> closing AMQP connection <0.14459.449> (1.1.1.32:53356 -> 1.1.1.33:5672 - nova-compute:7:facf1224-83df-4e48-8189-d78213ee5bc2, vhost: '/', user: 'openstack')
2022-04-20 16:25:25.679 [info] <0.21656.462> closing AMQP connection <0.21656.462> (1.1.1.32:58944 -> 1.1.1.33:5672 - nova-compute:7:9c706aca-9db6-4e61-bebd-568a6f282307, vhost: '/', user: 'openstack')
2022-04-20 16:25:25.679 [error] <0.3679.462> Supervisor {<0.3679.462>,rabbit_channel_sup_sup} had child channel_sup started with rabbit_channel_sup:start_link() at undefined exit with reason shutdown in context shutdown_error
2022-04-20 16:25:25.683 [info] <0.13987.330> closing AMQP connection <0.13987.330> (1.1.1.32:35890 -> 1.1.1.33:5672 - nova-compute:7:5fdd2029-8f50-4a81-b861-06b071fffc98, vhost: '/', user: 'openstack')
2022-04-20 16:25:41.101 [info] <0.1613.508> accepting AMQP connection <0.1613.508> (1.1.1.33:54246 -> 1.1.1.33:5672)
2022-04-20 16:25:41.104 [info] <0.1613.508> Connection <0.1613.508> (1.1.1.33:54246 -> 1.1.1.33:5672) has a client-provided name: nova-conductor:24:71983386-ad60-4608-8186-a4aef8644d9d
2022-04-20 16:25:41.104 [info] <0.1613.508> connection <0.1613.508> (1.1.1.33:54246 -> 1.1.1.33:5672 - nova-conductor:24:71983386-ad60-4608-8186-a4aef8644d9d): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:25:42.000 [warning] <0.32.0> lager_error_logger_h dropped 2 messages in the last second that exceeded the limit of 1000 messages/sec
2022-04-20 16:27:36.137 [info] <0.24964.510> accepting AMQP connection <0.24964.510> (1.1.1.33:38314 -> 1.1.1.33:5672)
2022-04-20 16:27:36.141 [info] <0.24964.510> Connection <0.24964.510> (1.1.1.33:38314 -> 1.1.1.33:5672) has a client-provided name: nova-compute:7:be0a8525-b04c-465d-a938-e90599bd54d3
2022-04-20 16:27:36.142 [info] <0.24964.510> connection <0.24964.510> (1.1.1.33:38314 -> 1.1.1.33:5672 - nova-compute:7:be0a8525-b04c-465d-a938-e90599bd54d3): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:34:29.549 [error] <0.2946.6153> closing AMQP connection <0.2946.6153> (1.1.1.35:58822 -> 1.1.1.33:5672 - nova-conductor:21:e037e12d-2911-47d1-90ef-d00c3c288380):
missed heartbeats from client, timeout: 60s
2022-04-20 16:34:30.557 [info] <0.414.521> accepting AMQP connection <0.414.521> (1.1.1.35:38810 -> 1.1.1.33:5672)
2022-04-20 16:34:30.558 [info] <0.414.521> Connection <0.414.521> (1.1.1.35:38810 -> 1.1.1.33:5672) has a client-provided name: nova-conductor:21:e037e12d-2911-47d1-90ef-d00c3c288380
2022-04-20 16:34:30.559 [info] <0.414.521> connection <0.414.521> (1.1.1.35:38810 -> 1.1.1.33:5672 - nova-conductor:21:e037e12d-2911-47d1-90ef-d00c3c288380): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:34:32.117 [error] <0.13587.486> closing AMQP connection <0.13587.486> (1.1.1.31:36248 -> 1.1.1.33:5672 - nova-compute:7:e592f063-ff69-4387-86a5-d552bb43572e):
missed heartbeats from client, timeout: 60s
2022-04-20 16:40:36.440 [error] <0.13109.8083> closing AMQP connection <0.13109.8083> (1.1.1.35:47356 -> 1.1.1.33:5672 - cinder-volume:32:2a7ba690-3b2a-486d-888b-bc4bb19962ee):
missed heartbeats from client, timeout: 60s
2022-04-20 16:40:36.537 [error] <0.31800.280> closing AMQP connection <0.31800.280> (1.1.1.33:60648 -> 1.1.1.33:5672 - cinder-volume:32:c7fddb16-bb8c-4646-8535-980ba5900508):
missed heartbeats from client, timeout: 60s
2022-04-20 16:40:43.139 [error] <0.5296.525> closing AMQP connection <0.5296.525> (1.1.1.33:47884 -> 1.1.1.33:5672 - nova-conductor:24:71983386-ad60-4608-8186-a4aef8644d9d):
missed heartbeats from client, timeout: 60s
========== a lot of above "[info]" "[error]" "missed heartbeats" log, until restart the node-34 rabbitmq process
2022-04-20 16:46:19.528 [info] <0.16487.538> connection <0.16487.538> (1.1.1.34:51432 -> 1.1.1.33:5672 - nova-scheduler:59:d19d2307-c177-490f-9f36-9709f6f86345): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:46:19.562 [info] <0.20786.0> Mirrored queue 'q-l3-plugin.node-35' in vhost '/': Master <rabbit#node-33.1.1271.0> saw deaths of mirrors <rabbit#node-34.1.1245.0>
2022-04-20 16:46:19.563 [info] <0.23194.0> Mirrored queue 'q-plugin_fanout_3e006483c59744de91c4607550a2ea75' in vhost '/': Master <rabbit#node-33.1.3305.0> saw deaths of mirrors <rabbit#node-34.1.3374.0>
node-34 rabbitmq
2022-04-20 16:20:48.095 [info] <0.20912.3993> Connection <0.20912.3993> (1.1.1.31:55326 -> 1.1.1.34:5672) has a client-provided name: nova-compute:7:a480ff3a-1e36-4797-b1dc-cc1d7eff8d8f
2022-04-20 16:20:49.000 [warning] lager_file_backend dropped 1 messages in the last second that exceeded the limit of 50 messages/sec
2022-04-20 16:25:25.676 [info] <0.22127.3978> closing AMQP connection <0.22127.3978> (1.1.1.32:52030 -> 1.1.1.34:5672 - nova-compute:7:310dcd66-11e7-485e-8099-1e4ab9e1c05d, vhost: '/', user: 'openstack')
2022-04-20 16:27:36.116 [info] <0.19371.3997> accepting AMQP connection <0.19371.3997> (1.1.1.33:58880 -> 1.1.1.34:5672)
2022-04-20 16:27:36.138 [info] <0.19371.3997> Connection <0.19371.3997> (1.1.1.33:58880 -> 1.1.1.34:5672) has a client-provided name: nova-compute:7:012d81c4-a5eb-4b38-8a39-d577cef8c12a
2022-04-20 16:27:36.142 [info] <0.19371.3997> connection <0.19371.3997> (1.1.1.33:58880 -> 1.1.1.34:5672 - nova-compute:7:012d81c4-a5eb-4b38-8a39-d577cef8c12a): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:34:32.645 [error] <0.25171.3412> closing AMQP connection <0.25171.3412> (1.1.1.35:60358 -> 1.1.1.34:5672 - nova-conductor:23:7f5f17b1-e86e-476a-9047-b57317c02723):
missed heartbeats from client, timeout: 60s
2022-04-20 16:34:33.653 [info] <0.23357.4653> accepting AMQP connection <0.23357.4653> (1.1.1.35:44456 -> 1.1.1.34:5672)
2022-04-20 16:34:33.657 [info] <0.23357.4653> Connection <0.23357.4653> (1.1.1.35:44456 -> 1.1.1.34:5672) has a client-provided name: nova-conductor:23:7f5f17b1-e86e-476a-9047-b57317c02723
2022-04-20 16:34:33.658 [info] <0.23357.4653> connection <0.23357.4653> (1.1.1.35:44456 -> 1.1.1.34:5672 - nova-conductor:23:7f5f17b1-e86e-476a-9047-b57317c02723): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:34:34.484 [error] <0.3180.3713> closing AMQP connection <0.3180.3713> (1.1.1.33:41126 -> 1.1.1.34:5672 - nova-conductor:22:6de6e8f9-10c8-48c6-8aac-55489aa24d9b):
missed heartbeats from client, timeout: 60s
2022-04-20 16:34:35.492 [info] <0.19068.4664> accepting AMQP connection <0.19068.4664> (1.1.1.33:48246 -> 1.1.1.34:5672)
2022-04-20 16:34:35.493 [info] <0.19068.4664> Connection <0.19068.4664> (1.1.1.33:48246 -> 1.1.1.34:5672) has a client-provided name: nova-conductor:22:6de6e8f9-10c8-48c6-8aac-55489aa24d9b
2022-04-20 16:34:35.494 [info] <0.19068.4664> connection <0.19068.4664> (1.1.1.33:48246 -> 1.1.1.34:5672 - nova-conductor:22:6de6e8f9-10c8-48c6-8aac-55489aa24d9b): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:34:37.617 [error] <0.19797.3640> closing AMQP connection <0.19797.3640> (1.1.1.34:38380 -> 1.1.1.34:5672 - nova-conductor:24:af6de2d2-5fb4-43b8-aac7-eb363d60315c):
missed heartbeats from client, timeout: 60s
========== a lot of above "[info]" and "[error]" "missed heartbeats" log, until restart the this(node-34) rabbitmq process
2022-04-20 16:45:54.306 [info] <0.7671.7632> accepting AMQP connection <0.7671.7632> (1.1.1.31:38548 -> 1.1.1.34:5672)
2022-04-20 16:45:54.307 [info] <0.7671.7632> Connection <0.7671.7632> (1.1.1.31:38548 -> 1.1.1.34:5672) has a client-provided name: nova-compute:7:355e2e03-3d83-4d95-a8bf-0165643a40fd
2022-04-20 16:45:54.307 [info] <0.7671.7632> connection <0.7671.7632> (1.1.1.31:38548 -> 1.1.1.34:5672 - nova-compute:7:355e2e03-3d83-4d95-a8bf-0165643a40fd): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:45:55.359 [error] <0.22496.6560> closing AMQP connection <0.22496.6560> (1.1.1.33:46352 -> 1.1.1.34:5672 - nova-conductor:24:6341a486-5f31-4180-865e-49d9e6fef1fd):
missed heartbeats from client, timeout: 60s
2022-04-20 16:45:56.367 [info] <0.31031.7657> accepting AMQP connection <0.31031.7657> (1.1.1.33:38992 -> 1.1.1.34:5672)
2022-04-20 16:45:56.368 [info] <0.31031.7657> Connection <0.31031.7657> (1.1.1.33:38992 -> 1.1.1.34:5672) has a client-provided name: nova-conductor:24:6341a486-5f31-4180-865e-49d9e6fef1fd
2022-04-20 16:45:56.368 [info] <0.31031.7657> connection <0.31031.7657> (1.1.1.33:38992 -> 1.1.1.34:5672 - nova-conductor:24:6341a486-5f31-4180-865e-49d9e6fef1fd): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:45:58.115 [warning] <0.32550.7440> closing AMQP connection <0.32550.7440> (1.1.1.35:58740 -> 1.1.1.34:5672 - nova-conductor:22:bbe54f83-b7c4-452f-a120-9a220afc6c59, vhost: '/', user: 'openstack'):
client unexpectedly closed TCP connection
2022-04-20 16:45:59.123 [info] <0.5691.7687> accepting AMQP connection <0.5691.7687> (1.1.1.35:38818 -> 1.1.1.34:5672)
2022-04-20 16:45:59.124 [info] <0.5691.7687> Connection <0.5691.7687> (1.1.1.35:38818 -> 1.1.1.34:5672) has a client-provided name: nova-conductor:22:bbe54f83-b7c4-452f-a120-9a220afc6c59
2022-04-20 16:45:59.125 [info] <0.5691.7687> connection <0.5691.7687> (1.1.1.35:38818 -> 1.1.1.34:5672 - nova-conductor:22:bbe54f83-b7c4-452f-a120-9a220afc6c59): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:46:05.456 [warning] <0.1973.7449> closing AMQP connection <0.1973.7449> (1.1.1.36:52852 -> 1.1.1.34:5672 - nova-compute:7:37b137ea-7671-4ea9-ae86-f243e9a13606, vhost: '/', user: 'openstack'):
client unexpectedly closed TCP connection
2022-04-20 16:46:06.643 [warning] <0.2825.7450> closing AMQP connection <0.2825.7450> (1.1.1.33:33106 -> 1.1.1.34:5672 - nova-conductor:21:7bd7d6dc-7367-4690-aed8-81c3989f5c74, vhost: '/', user: 'openstack'):
client unexpectedly closed TCP connection
2022-04-20 16:46:08.968 [warning] <0.16541.7452> closing AMQP connection <0.16541.7452> (1.1.1.35:59814 -> 1.1.1.34:5672 - nova-conductor:25:2e3dd6d2-13a8-4661-96ad-d4fa6bbd2e72, vhost: '/', user: 'openstack'):
client unexpectedly closed TCP connection
2022-04-20 16:46:13.000 [warning] lager_file_backend dropped 13 messages in the last second that exceeded the limit of 50 messages/sec
2022-04-20 16:46:13.038 [info] <0.19552.7774> RabbitMQ is asked to stop...
2022-04-20 16:46:13.806 [info] <0.27176.7775> accepting AMQP connection <0.27176.7775> (1.1.1.35:40350 -> 1.1.1.34:5672)
2022-04-20 16:46:13.807 [info] <0.27176.7775> Connection <0.27176.7775> (1.1.1.35:40350 -> 1.1.1.34:5672) has a client-provided name: nova-conductor:22:99215d08-ba56-44a1-be0e-2cfa9935a4c7
2022-04-20 16:46:13.808 [info] <0.27176.7775> connection <0.27176.7775> (1.1.1.35:40350 -> 1.1.1.34:5672 - nova-conductor:22:99215d08-ba56-44a1-be0e-2cfa9935a4c7): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:46:14.112 [info] <0.19552.7774> Stopping RabbitMQ applications and their dependencies in the following order:
rabbitmq_management
amqp_client
rabbitmq_web_dispatch
cowboy
cowlib
rabbitmq_management_agent
rabbit
mnesia
rabbit_common
os_mon
2022-04-20 16:46:14.113 [info] <0.19552.7774> Stopping application 'rabbitmq_management'
2022-04-20 16:46:14.223 [warning] <0.8143.0> RabbitMQ HTTP listener registry could not find context rabbitmq_management_tls
2022-04-20 16:46:14.237 [info] <0.33.0> Application rabbitmq_management exited with reason: stopped
2022-04-20 16:46:14.237 [info] <0.19552.7774> Stopping application 'amqp_client'
2022-04-20 16:46:14.265 [info] <0.33.0> Application amqp_client exited with reason: stopped
2022-04-20 16:46:14.265 [info] <0.19552.7774> Stopping application 'rabbitmq_web_dispatch'
2022-04-20 16:46:14.282 [info] <0.33.0> Application rabbitmq_web_dispatch exited with reason: stopped
2022-04-20 16:46:14.282 [info] <0.19552.7774> Stopping application 'cowboy'
2022-04-20 16:46:14.293 [warning] <0.25152.7431> closing AMQP connection <0.25152.7431> (1.1.1.34:42738 -> 1.1.1.34:5672 - nova-conductor:25:b31b82a2-e738-4cd9-805e-36c7b520531e, vhost: '/', user: 'openstack'):
client unexpectedly closed TCP connection
2022-04-20 16:46:14.301 [info] <0.19552.7774> Stopping application 'cowlib'
2022-04-20 16:46:14.301 [info] <0.19552.7774> Stopping application 'rabbitmq_management_agent'
2022-04-20 16:46:14.301 [info] <0.33.0> Application cowboy exited with reason: stopped
2022-04-20 16:46:14.302 [info] <0.33.0> Application cowlib exited with reason: stopped
2022-04-20 16:46:14.324 [info] <0.19552.7774> Stopping application 'rabbit'
2022-04-20 16:46:14.324 [info] <0.33.0> Application rabbitmq_management_agent exited with reason: stopped
2022-04-20 16:46:14.326 [info] <0.260.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping unregistration.
2022-04-20 16:46:14.327 [info] <0.8135.0> stopped TCP listener on 1.1.1.34:5672
2022-04-20 16:46:14.337 [error] <0.19192.0> Error on AMQP connection <0.19192.0> (1.1.1.34:53718 -> 1.1.1.34:5672 - barbican-keystone-listener:7:b59ea871-cbb6-462e-b8e7-3454536978dd, vhost: '/', user: 'openstack', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
========== a lot of these "[error]" and "operation none caused" log
2022-04-20 16:46:14.338 [error] <0.18651.0> Error on AMQP connection <0.18651.0> (1.1.1.35:49664 -> 1.1.1.34:5672 - magnum-conductor:112:0a1e307e-90cb-4e5f-bc1c-a721cdb7f83e, vhost: '/', user: 'openstack', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
2022-04-20 16:46:21.680 [info] <0.33.0> Application lager started on node 'rabbit#node-34'
2022-04-20 16:46:21.685 [info] <0.5.0> Log file opened with Lager
2022-04-20 16:46:25.645 [info] <0.33.0> Application mnesia started on node 'rabbit#node-34'
2022-04-20 16:46:25.649 [info] <0.33.0> Application mnesia exited with reason: stopped
2022-04-20 16:46:25.988 [info] <0.33.0> Application recon started on node 'rabbit#node-34'
2022-04-20 16:46:25.989 [info] <0.33.0> Application inets started on node 'rabbit#node-34'
2022-04-20 16:46:25.989 [info] <0.33.0> Application jsx started on node 'rabbit#node-34'
2022-04-20 16:46:25.989 [info] <0.33.0> Application os_mon started on node 'rabbit#node-34'
2022-04-20 16:46:25.989 [info] <0.33.0> Application crypto started on node 'rabbit#node-34'
2022-04-20 16:46:25.989 [info] <0.33.0> Application cowlib started on node 'rabbit#node-34'
2022-04-20 16:46:26.078 [info] <0.33.0> Application mnesia started on node 'rabbit#node-34'
2022-04-20 16:46:26.078 [info] <0.33.0> Application xmerl started on node 'rabbit#node-34'
2022-04-20 16:46:26.078 [info] <0.33.0> Application asn1 started on node 'rabbit#node-34'
2022-04-20 16:46:26.078 [info] <0.33.0> Application public_key started on node 'rabbit#node-34'
2022-04-20 16:46:26.078 [info] <0.33.0> Application ssl started on node 'rabbit#node-34'
2022-04-20 16:46:26.078 [info] <0.33.0> Application ranch started on node 'rabbit#node-34'
2022-04-20 16:46:26.085 [info] <0.33.0> Application cowboy started on node 'rabbit#node-34'
2022-04-20 16:46:26.085 [info] <0.33.0> Application rabbit_common started on node 'rabbit#node-34'
2022-04-20 16:46:26.088 [info] <0.33.0> Application amqp_client started on node 'rabbit#node-34'
2022-04-20 16:46:26.088 [info] <0.247.0>
Starting RabbitMQ 3.7.10 on Erlang 20.2.2
Copyright (C) 2007-2018 Pivotal Software, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/
2022-04-20 16:46:26.089 [info] <0.247.0>
node : rabbit#node-34
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : i***Q==
log(s) : /var/log/kolla/rabbitmq/rabbit#node-34.log
: /var/log/kolla/rabbitmq/rabbit#node-34_upgrade.log
database dir : /var/lib/rabbitmq/mnesia/rabbit#node-34
2022-04-20 16:46:26.385 [info] <0.331.0> Memory high watermark set to 618721 MiB (648776025702 bytes) of 1546802 MiB (1621940064256 bytes) total
2022-04-20 16:46:26.389 [info] <0.333.0> Enabling free disk space monitoring
2022-04-20 16:46:26.389 [info] <0.333.0> Disk free limit set to 50MB
2022-04-20 16:46:26.392 [info] <0.336.0> Limiting to approx 1048476 file handles (943626 sockets)
2022-04-20 16:46:26.392 [info] <0.337.0> FHC read buffering: OFF
2022-04-20 16:46:26.392 [info] <0.337.0> FHC write buffering: ON
2022-04-20 16:46:26.400 [info] <0.247.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-04-20 16:46:26.410 [info] <0.247.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-04-20 16:46:26.450 [info] <0.247.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-04-20 16:46:26.450 [info] <0.247.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping registration.
2022-04-20 16:46:26.451 [info] <0.247.0> Priority queues enabled, real BQ is rabbit_variable_queue
2022-04-20 16:46:26.477 [info] <0.454.0> Starting rabbit_node_monitor
2022-04-20 16:46:26.504 [info] <0.247.0> Management plugin: using rates mode 'basic'
2022-04-20 16:46:26.556 [info] <0.486.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit#node-34/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2022-04-20 16:46:26.574 [info] <0.486.0> Starting message stores for vhost '/'
2022-04-20 16:46:26.574 [info] <0.490.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2022-04-20 16:46:26.616 [info] <0.486.0> Started message store of type transient for vhost '/'
2022-04-20 16:46:26.617 [info] <0.493.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2022-04-20 16:46:26.617 [warning] <0.493.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": rebuilding indices from scratch
2022-04-20 16:46:26.618 [info] <0.486.0> Started message store of type persistent for vhost '/'
2022-04-20 16:46:26.627 [info] <0.486.0> Mirrored queue 'q-agent-notifier-port-update_fanout_9407d5931f8a498cb6c0268d585ed732' in vhost '/': Adding mirror on node 'rabbit#node-34': <0.512.0>
2022-04-20 16:46:26.627 [info] <0.486.0> Mirrored queue 'magnum-conductor_fanout_dd3536ae0b8e4efe8329be0454ba75b6' in vhost '/': Adding mirror on node 'rabbit#node-34': <0.516.0>
========== a lot of different "Mirrored queue" log
node-35 rabbitmq
2022-04-20 16:34:17.295 [error] <0.13948.7714> closing AMQP connection <0.13948.7714> (1.1.1.34:43322 -> 1.1.1.35:5672 - nova-conductor:23:3ca11891-5442-48ef-9b0f-f616ba13c1e3):
missed heartbeats from client, timeout: 60s
========== a lot of above "[info]" "[error]" "missed heartbeats" log, until restart the node-34 rabbitmq process
2022-04-20 16:34:57.656 [info] <0.12329.2581> accepting AMQP connection <0.12329.2581> (1.1.1.33:42474 -> 1.1.1.35:5672)
2022-04-20 16:34:57.657 [info] <0.12329.2581> Connection <0.12329.2581> (1.1.1.33:42474 -> 1.1.1.35:5672) has a client-provided name: nova-conductor:25:53cc4527-fa4b-41c0-b9e1-f3d24da7f31b
2022-04-20 16:34:57.658 [info] <0.12329.2581> connection <0.12329.2581> (1.1.1.33:42474 -> 1.1.1.35:5672 - nova-conductor:25:53cc4527-fa4b-41c0-b9e1-f3d24da7f31b): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:34:58.874 [error] <0.7604.103> closing AMQP connection <0.7604.103> (1.1.1.33:38718 -> 1.1.1.35:5672 - nova-conductor:21:b701bc54-c826-47bc-8c7d-803e27265e5f):
missed heartbeats from client, timeout: 60s
2022-04-20 16:34:59.531 [error] <0.24920.2100> closing AMQP connection <0.24920.2100> (1.1.1.35:41208 -> 1.1.1.35:5672 - nova-conductor:23:615ea986-452b-4943-ba4f-7d36d3b1536c):
missed heartbeats from client, timeout: 60s
========== less of "[error]" but lot of "[info]" log, until restart the node-34 rabbitmq process
2022-04-20 16:46:19.344 [info] <0.17721.2593> connection <0.17721.2593> (1.1.1.33:34162 -> 1.1.1.35:5672 - mod_wsgi:32:06e32cf4-2d5f-468e-9a50-a6c16b5f16bb): user 'openstack' authenticated and granted access to vhost '/'
2022-04-20 16:46:19.489 [info] <0.17996.2593> accepting AMQP connection <0.17996.2593> (1.1.1.33:34172 -> 1.1.1.35:5672)
2022-04-20 16:46:19.511 [info] <0.4731.0> Mirrored queue 'magnum-conductor_fanout_7af2012136fe49e88f5e561d2f03650f' in vhost '/': Slave <rabbit#node-35.3.4731.0> saw deaths of mirrors <rabbit#node-34.1.2887.0>
2022-04-20 16:46:19.511 [info] <0.25105.9> Mirrored queue 'scheduler_fanout_0a2d4e8a46b249018178164758b6736d' in vhost '/': Slave <rabbit#node-35.3.25105.9> saw deaths of mirrors <rabbit#node-34.1.14025.203>
node-33 nova-conduct
2022-04-20 16:34:20.689 23 ERROR oslo.messaging._drivers.impl_rabbit [req-30ed34ff-70e5-4e6f-a09f-a00de95385a3 - - - - -] [5b4dd218-31d1-4eb6-bab1-2b5bc8f1bc11] AMQP server on 1.1.1.35:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection
2022-04-20 16:34:21.700 23 INFO oslo.messaging._drivers.impl_rabbit [req-30ed34ff-70e5-4e6f-a09f-a00de95385a3 - - - - -] [5b4dd218-31d1-4eb6-bab1-2b5bc8f1bc11] Reconnected to AMQP server on 1.1.1.35:5672 via [amqp] client with port 38842.
========== duplicate above log
2022-04-20 16:46:12.294 21 INFO oslo.messaging._drivers.impl_rabbit [req-24421232-3a85-4ff4-a6fe-e2bb58188e65 - - - - -] [7bd7d6dc-7367-4690-aed8-81c3989f5c74] Reconnected to AMQP server on 1.1.1.34:5672 via [amqp] client with port 40340.
2022-04-20 16:46:14.356 25 ERROR oslo.messaging._drivers.impl_rabbit [req-beeac432-5509-4d6f-8709-9a8ea9b3d0ad - - - - -] [58b08ac8-7061-4969-8fae-de5acad3c23b] AMQP server on 1.1.1.34:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-20 16:46:14.357 23 ERROR oslo.messaging._drivers.impl_rabbit [req-bd979e87-5a7f-4f6a-af79-9d1ccb99f944 - - - - -] [d5bff161-b9df-44e8-9fe3-292aec5b13f7] AMQP server on 1.1.1.34:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-20 16:46:14.357 21 ERROR oslo.messaging._drivers.impl_rabbit [req-ca57569e-3328-48f8-9469-e78d6def839c - - - - -] [98444783-6694-4688-873e-066abf61932c] AMQP server on 1.1.1.34:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-20 16:46:14.358 21 ERROR oslo.messaging._drivers.impl_rabbit [req-24421232-3a85-4ff4-a6fe-e2bb58188e65 - - - - -] [7bd7d6dc-7367-4690-aed8-81c3989f5c74] AMQP server on 1.1.1.34:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-20 16:46:14.359 23 ERROR oslo.messaging._drivers.impl_rabbit [req-a69ce54a-1739-44b2-b169-12f030a744b1 - - - - -] [84c7fa9d-14d9-4664-be6c-7e6d43ae7e83] AMQP server on 1.1.1.34:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
2022-04-20 16:46:14.360 22 ERROR oslo.messaging._drivers.impl_rabbit [-] [09269999-5a4f-419a-899c-69deb49689fb] AMQP server on 1.1.1.34:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
========== duplicate above log
2022-04-20 16:46:16.422 23 INFO oslo.messaging._drivers.impl_rabbit [req-a69ce54a-1739-44b2-b169-12f030a744b1 - - - - -] [84c7fa9d-14d9-4664-be6c-7e6d43ae7e83] Reconnected to AMQP server on 1.1.1.33:5672 via [amqp] client with port 49066.
2022-04-20 16:46:16.425 25 ERROR oslo.messaging._drivers.impl_rabbit [req-50d0d15a-96a4-48a4-b60e-f2103ca9aa59 - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED
2022-04-20 16:46:16.556 21 INFO oslo.messaging._drivers.impl_rabbit [req-beeac432-5509-4d6f-8709-9a8ea9b3d0ad - - - - -] [3d76c8aa-8e05-4ce0-a26e-db670ff2e48c] Reconnected to AMQP server on 1.1.1.35:5672 via [amqp] client with port 33400.
2022-04-20 16:46:16.571 24 INFO oslo.messaging._drivers.impl_rabbit [req-beeac432-5509-4d6f-8709-9a8ea9b3d0ad - - - - -] [c7a51125-c7b5-4439-b5d5-36e82309b943] Reconnected to AMQP server on 1.1.1.35:5672 via [amqp] client with port 33418.
2022-04-20 16:46:16.573 22 INFO oslo.messaging._drivers.impl_rabbit [req-30ed34ff-70e5-4e6f-a09f-a00de95385a3 - - - - -] [6de6e8f9-10c8-48c6-8aac-55489aa24d9b] Reconnected to AMQP server on 1.1.1.35:5672 via [amqp] client with port 33410.
2022-04-20 16:46:16.634 21 INFO oslo.messaging._drivers.impl_rabbit [req-24421232-3a85-4ff4-a6fe-e2bb58188e65 - - - - -] [7bd7d6dc-7367-4690-aed8-81c3989f5c74] Reconnected to AMQP server on 1.1.1.35:5672 via [amqp] client with port 33406.
2022-04-20 16:46:19.829 23 INFO oslo.messaging._drivers.impl_rabbit [-] [3132b985-89a4-4c5a-82e2-a0050d14fddf] Reconnected to AMQP server on 1.1.1.35:5672 via [amqp] client with port 33412.
2022-04-20 16:46:20.373 22 INFO oslo.messaging._drivers.impl_rabbit [-] [09269999-5a4f-419a-899c-69deb49689fb] Reconnected to AMQP server on 1.1.1.35:5672 via [amqp] client with port 33376.
2022-04-20 16:46:20.409 24 INFO oslo.messaging._drivers.impl_rabbit [-] [cd614243-a1db-4e57-996d-b17e9b3aea28] Reconnected to AMQP server on 1.1.1.35:5672 via [amqp] client with port 33426.
2022-04-20 16:46:26.326 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2022-04-20 16:46:26.334 21 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
2022-04-20 16:46:26.340 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] ECONNREFUSED (retrying in 0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED
2022-04-20 16:46:26.347 21 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 111] ECONNREFUSED (retrying in 0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED
2022-04-20 16:46:32.784 23 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection
2022-04-20 16:46:47.707 24 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection
2022-04-20 16:46:47.851 25 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection
2022-04-20 16:47:02.833 25 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection
2022-04-20 16:49:16.486 25 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: Server unexpectedly closed connection: kombu.exceptions.OperationalError: Server unexpectedly closed connection
2022-04-20 16:49:16.494 25 ERROR oslo.messaging._drivers.impl_rabbit [-] Unable to connect to AMQP server on 1.1.1.35:5672 after inf tries: Server unexpectedly closed connection: kombu.exceptions.OperationalError: Server unexpectedly closed connection
2022-04-20 16:49:16.495 25 ERROR oslo_messaging._drivers.amqpdriver [-] Failed to process incoming message, retrying..: oslo_messaging.exceptions.MessageDeliveryFailure: Unable to connect to AMQP server on 1.1.1.35:5672 after inf tries: Server unexpectedly closed connection
I saw a similar issue with RabbitMQ 3.7.9. After upgrading to 3.7.19 and erlang 21.X the issue went away.
if the upgrade solutions didnt work. you can use quorum queues instead of classic mirror queues in rabbit-mq.
its recommended for large scale.
you can get deep understanding in link bellow
https://www.rabbitmq.com/quorum-queues.html
I'm trying to expose portainer agent port 9001 on a swarm cluster in order to reach it from an external portainer, it is deployed in 'global' mode.
Following docker-compose file works :
version: "3.2"
services:
agent:
image: "portainer/agent:1.1.2"
environment:
AGENT_CLUSTER_ADDR: tasks.agent
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/lib/docker/volumes:/var/lib/docker/volumes
networks:
- priv_portainer
deploy:
mode: global
networks:
priv_portainer:
driver: overlay
Then, when I try to expose port 9001, stack starts but there are log errors and portainer fails to connect these agents :
version: "3.2"
services:
agent:
image: "portainer/agent:1.1.2"
environment:
AGENT_CLUSTER_ADDR: tasks.agent
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/lib/docker/volumes:/var/lib/docker/volumes
ports:
- "9001:9001"
networks:
- priv_portainer
deploy:
mode: global
networks:
priv_portainer:
driver: overlay
Event with another port :
ports:
- "19001:9001"
And even with a port that has nothing to do :
ports:
- "12345:54321"
EDIT
Logs from stack :
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:50 [INFO] serf: EventMemberJoin: b6040a1ccc2a 10.255.0.13
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:50 [INFO] - Starting Portainer agent version 1.1.2 on 0.0.0.0:9001 (cluster mode: true)
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:50 [INFO] serf: EventMemberJoin: c6c277e3f60b 10.255.0.11
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.11:7946: write udp [::]:7946->10.255.0.11:7946: sendto: operation not permitted
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.11:7946: write udp [::]:7946->10.255.0.11:7946: sendto: operation not permitted
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:51 [INFO] serf: EventMemberJoin: 3e290151a5eb 10.255.0.12
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.11:7946: write udp [::]:7946->10.255.0.11:7946: sendto: operation not permitted
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.12:7946: write udp [::]:7946->10.255.0.12:7946: sendto: operation not permitted
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.12:7946: write udp [::]:7946->10.255.0.12:7946: sendto: operation not permitted
portainer_agent_agent.0.13cjb851d9me#ignochtulelk02d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.11:7946: write udp [::]:7946->10.255.0.11:7946: sendto: operation not permitted
portainer_agent_agent.0.985h7xcfkux0#ignopotulelk03d | 2018/11/26 05:28:51 [INFO] serf: EventMemberJoin: 3e290151a5eb 10.255.0.12
portainer_agent_agent.0.985h7xcfkux0#ignopotulelk03d | 2018/11/26 05:28:51 [INFO] serf: EventMemberJoin: b6040a1ccc2a 10.255.0.13
portainer_agent_agent.0.985h7xcfkux0#ignopotulelk03d | 2018/11/26 05:28:51 [INFO] serf: EventMemberJoin: c6c277e3f60b 10.255.0.11
portainer_agent_agent.0.985h7xcfkux0#ignopotulelk03d | 2018/11/26 05:28:51 [INFO] - Starting Portainer agent version 1.1.2 on 0.0.0.0:9001 (cluster mode: true)
portainer_agent_agent.0.985h7xcfkux0#ignopotulelk03d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.13:7946: write udp [::]:7946->10.255.0.13:7946: sendto: operation not permitted
portainer_agent_agent.0.985h7xcfkux0#ignopotulelk03d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.11:7946: write udp [::]:7946->10.255.0.11:7946: sendto: operation not permitted
portainer_agent_agent.0.mljirysir6px#ignopotulelk01d | 2018/11/26 05:28:50 [INFO] serf: EventMemberJoin: c6c277e3f60b 10.255.0.11
portainer_agent_agent.0.mljirysir6px#ignopotulelk01d | 2018/11/26 05:28:50 [INFO] serf: EventMemberJoin: b6040a1ccc2a 10.255.0.13
portainer_agent_agent.0.mljirysir6px#ignopotulelk01d | 2018/11/26 05:28:50 [INFO] - Starting Portainer agent version 1.1.2 on 0.0.0.0:9001 (cluster mode: true)
portainer_agent_agent.0.mljirysir6px#ignopotulelk01d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.13:7946: write udp [::]:7946->10.255.0.13:7946: sendto: operation not permitted
portainer_agent_agent.0.mljirysir6px#ignopotulelk01d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.13:7946: write udp [::]:7946->10.255.0.13:7946: sendto: operation not permitted
portainer_agent_agent.0.mljirysir6px#ignopotulelk01d | 2018/11/26 05:28:51 [INFO] serf: EventMemberJoin: 3e290151a5eb 10.255.0.12
portainer_agent_agent.0.mljirysir6px#ignopotulelk01d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.13:7946: write udp [::]:7946->10.255.0.13:7946: sendto: operation not permitted
portainer_agent_agent.0.mljirysir6px#ignopotulelk01d | 2018/11/26 05:28:51 [ERR] memberlist: Failed to send gossip to 10.255.0.12:7946: write udp [::]:7946->10.255.0.12:7946: sendto: operation not permitted
When I replace :
ports:
- "9001:9001"
With :
- target: 9001
published: 9001
protocol: tcp
mode: host
It works, why host mode solves this problem ?
I'm trying to get a consul server cluster up and running. I have 3 dockerized consul servers running, but I can't access the Web UI, the HTTP API nor the DNS.
$ docker logs net-sci_discovery-service_consul_1
==> WARNING: Expect Mode enabled, expecting 3 servers
==> Starting Consul agent...
==> Consul agent running!
Version: 'v0.8.5'
Node ID: 'ccd38897-6047-f8b6-be1c-2aa0022a1483'
Node name: 'consul1'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600)
Cluster Addr: 172.20.0.2 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
==> Log data will now stream in as it occurs:
2017/07/07 23:24:07 [INFO] raft: Initial configuration (index=0): []
2017/07/07 23:24:07 [INFO] raft: Node at 172.20.0.2:8300 [Follower] entering Follower state (Leader: "")
2017/07/07 23:24:07 [INFO] serf: EventMemberJoin: consul1 172.20.0.2
2017/07/07 23:24:07 [INFO] consul: Adding LAN server consul1 (Addr: tcp/172.20.0.2:8300) (DC: dc1)
2017/07/07 23:24:07 [INFO] serf: EventMemberJoin: consul1.dc1 172.20.0.2
2017/07/07 23:24:07 [INFO] consul: Handled member-join event for server "consul1.dc1" in area "wan"
2017/07/07 23:24:07 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
2017/07/07 23:24:07 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
2017/07/07 23:24:07 [INFO] agent: Started HTTP server on 127.0.0.1:8500
2017/07/07 23:24:09 [INFO] serf: EventMemberJoin: consul2 172.20.0.3
2017/07/07 23:24:09 [INFO] consul: Adding LAN server consul2 (Addr: tcp/172.20.0.3:8300) (DC: dc1)
2017/07/07 23:24:09 [INFO] serf: EventMemberJoin: consul2.dc1 172.20.0.3
2017/07/07 23:24:09 [INFO] consul: Handled member-join event for server "consul2.dc1" in area "wan"
2017/07/07 23:24:10 [INFO] serf: EventMemberJoin: consul3 172.20.0.4
2017/07/07 23:24:10 [INFO] consul: Adding LAN server consul3 (Addr: tcp/172.20.0.4:8300) (DC: dc1)
2017/07/07 23:24:10 [INFO] consul: Found expected number of peers, attempting bootstrap: 172.20.0.2:8300,172.20.0.3:8300,172.20.0.4:8300
2017/07/07 23:24:10 [INFO] serf: EventMemberJoin: consul3.dc1 172.20.0.4
2017/07/07 23:24:10 [INFO] consul: Handled member-join event for server "consul3.dc1" in area "wan"
2017/07/07 23:24:14 [ERR] agent: failed to sync remote state: No cluster leader
2017/07/07 23:24:17 [WARN] raft: Heartbeat timeout from "" reached, starting election
2017/07/07 23:24:17 [INFO] raft: Node at 172.20.0.2:8300 [Candidate] entering Candidate state in term 2
2017/07/07 23:24:17 [INFO] raft: Election won. Tally: 2
2017/07/07 23:24:17 [INFO] raft: Node at 172.20.0.2:8300 [Leader] entering Leader state
2017/07/07 23:24:17 [INFO] raft: Added peer 172.20.0.3:8300, starting replication
2017/07/07 23:24:17 [INFO] raft: Added peer 172.20.0.4:8300, starting replication
2017/07/07 23:24:17 [INFO] consul: cluster leadership acquired
2017/07/07 23:24:17 [INFO] consul: New leader elected: consul1
2017/07/07 23:24:17 [WARN] raft: AppendEntries to {Voter 172.20.0.3:8300 172.20.0.3:8300} rejected, sending older logs (next: 1)
2017/07/07 23:24:17 [WARN] raft: AppendEntries to {Voter 172.20.0.4:8300 172.20.0.4:8300} rejected, sending older logs (next: 1)
2017/07/07 23:24:17 [INFO] raft: pipelining replication to peer {Voter 172.20.0.3:8300 172.20.0.3:8300}
2017/07/07 23:24:17 [INFO] raft: pipelining replication to peer {Voter 172.20.0.4:8300 172.20.0.4:8300}
2017/07/07 23:24:18 [INFO] consul: member 'consul1' joined, marking health alive
2017/07/07 23:24:18 [INFO] consul: member 'consul2' joined, marking health alive
2017/07/07 23:24:18 [INFO] consul: member 'consul3' joined, marking health alive
2017/07/07 23:24:20 [INFO] agent: Synced service 'consul'
2017/07/07 23:24:20 [INFO] agent: Synced service 'messaging-service-kafka'
2017/07/07 23:24:20 [INFO] agent: Synced service 'messaging-service-zookeeper'
$ curl http://127.0.0.1:8500/v1/catalog/service/consul
curl: (52) Empty reply from server
dig #127.0.0.1 -p 8600 consul.service.consul
; <<>> DiG 9.8.3-P1 <<>> #127.0.0.1 -p 8600 consul.service.consul
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
$ dig #127.0.0.1 -p 8600 messaging-service-kafka.service.consul
; <<>> DiG 9.8.3-P1 <<>> #127.0.0.1 -p 8600 messaging-service-kafka.service.consul
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
I can't get my services to register via the HTTP API either; those shown above are registered using a config script when the container launches.
Here's my docker-compose.yml:
version: '2'
services:
consul1:
image: "consul:latest"
container_name: "net-sci_discovery-service_consul_1"
hostname: "consul1"
ports:
- "8400:8400"
- "8500:8500"
- "8600:8600"
volumes:
- ./etc/consul.d:/etc/consul.d
command: "agent -server -ui -bootstrap-expect 3 -config-dir=/etc/consul.d -bind=0.0.0.0"
consul2:
image: "consul:latest"
container_name: "net-sci_discovery-service_consul_2"
hostname: "consul2"
command: "agent -server -join=consul1"
links:
- "consul1"
consul3:
image: "consul:latest"
container_name: "net-sci_discovery-service_consul_3"
hostname: "consul3"
command: "agent -server -join=consul1"
links:
- "consul1"
I'm relatively new to both docker and consul. I've had a look around the web and the above options are my understanding of what is required. Any suggestions on the way forward would be very welcome.
Edit:
Result of docker container ps -all:
$ docker container ps --all
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e0a1c3bba165 consul:latest "docker-entrypoint..." 38 seconds ago Up 36 seconds 8300-8302/tcp, 8500/tcp, 8301-8302/udp, 8600/tcp, 8600/udp net-sci_discovery-service_consul_3
7f05555e81e0 consul:latest "docker-entrypoint..." 38 seconds ago Up 36 seconds 8300-8302/tcp, 8500/tcp, 8301-8302/udp, 8600/tcp, 8600/udp net-sci_discovery-service_consul_2
9e2dedaa224b consul:latest "docker-entrypoint..." 39 seconds ago Up 38 seconds 0.0.0.0:8400->8400/tcp, 8301-8302/udp, 0.0.0.0:8500->8500/tcp, 8300-8302/tcp, 8600/udp, 0.0.0.0:8600->8600/tcp net-sci_discovery-service_consul_1
27b34c5dacb7 messagingservice_kafka "start-kafka.sh" 3 hours ago Up 3 hours 0.0.0.0:9092->9092/tcp net-sci_messaging-service_kafka
0389797b0b8f wurstmeister/zookeeper "/bin/sh -c '/usr/..." 3 hours ago Up 3 hours 22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp net-sci_messaging-service_zookeeper
Edit:
Updated docker-compose.yml to include long format for ports:
version: '3.2'
services:
consul1:
image: "consul:latest"
container_name: "net-sci_discovery-service_consul_1"
hostname: "consul1"
ports:
- target: 8400
published: 8400
mode: host
- target: 8500
published: 8500
mode: host
- target: 8600
published: 8600
mode: host
volumes:
- ./etc/consul.d:/etc/consul.d
command: "agent -server -ui -bootstrap-expect 3 -config-dir=/etc/consul.d -bind=0.0.0.0 -client=127.0.0.1"
consul2:
image: "consul:latest"
container_name: "net-sci_discovery-service_consul_2"
hostname: "consul2"
command: "agent -server -join=consul1"
links:
- "consul1"
consul3:
image: "consul:latest"
container_name: "net-sci_discovery-service_consul_3"
hostname: "consul3"
command: "agent -server -join=consul1"
links:
- "consul1"
From the Consul Web Gui page, make sure you have launched an agent with the -ui parameter.
The UI is available at the /ui path on the same port as the HTTP API.
By default this is http://localhost:8500/ui
I do see 8500 mapped to your host on broadcast (0.0.0.0).
Check also (as in this answer) if the client_addr can help (at least for testing)
Hi i have configured a cluster with two nodes (two vm into virtualbox), cluster start correctly but advertise flag seems to be ignored by consul
vm1 (app) ip 192.168.20.10
vm2 (web) ip 192.168.20.11
docker-compose vm1 (app)
version: '2'
services:
appconsul:
build: consul/
ports:
- 192.168.20.10:8300:8300
- 192.168.20.10:8301:8301
- 192.168.20.10:8301:8301/udp
- 192.168.20.10:8302:8302
- 192.168.20.10:8302:8302/udp
- 192.168.20.10:8400:8400
- 192.168.20.10:8500:8500
- 172.32.0.1:53:53/udp
hostname: node_1
command: -server -advertise 192.168.20.10 -bootstrap-expect 2 -ui-dir /ui
networks:
net-app:
appregistrator:
build: registrator/
hostname: app
command: consul://192.168.20.10:8500
volumes:
- /var/run/docker.sock:/tmp/docker.sock
depends_on:
- appconsul
networks:
net-app:
networks:
net-app:
driver: bridge
ipam:
config:
- subnet: 172.32.0.0/24
docker-compose vm2 (web)
version: '2'
services:
webconsul:
build: consul/
ports:
- 192.168.20.11:8300:8300
- 192.168.20.11:8301:8301
- 192.168.20.11:8301:8301/udp
- 192.168.20.11:8302:8302
- 192.168.20.11:8302:8302/udp
- 192.168.20.11:8400:8400
- 192.168.20.11:8500:8500
- 172.33.0.1:53:53/udp
hostname: node_2
command: -server -advertise 192.168.20.11 -join 192.168.20.10
networks:
net-web:
webregistrator:
build: registrator/
hostname: web
command: consul://192.168.20.11:8500
volumes:
- /var/run/docker.sock:/tmp/docker.sock
depends_on:
- webconsul
networks:
net-web:
networks:
net-web:
driver: bridge
ipam:
config:
- subnet: 172.33.0.0/24
After start i not have error about advertise flag but the services has registered with private ip of internal network instead with IP declared in advertise (192.168.20.10 and 192.168.20.11), any idea?
Attach log of node_1, but they are the same as node_2
appconsul_1 | ==> WARNING: Expect Mode enabled, expecting 2 servers
appconsul_1 | ==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
appconsul_1 | ==> Starting raft data migration...
appconsul_1 | ==> Starting Consul agent...
appconsul_1 | ==> Starting Consul agent RPC...
appconsul_1 | ==> Consul agent running!
appconsul_1 | Node name: 'node_1'
appconsul_1 | Datacenter: 'dc1'
appconsul_1 | Server: true (bootstrap: false)
appconsul_1 | Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 53, RPC: 8400)
appconsul_1 | Cluster Addr: 192.168.20.10 (LAN: 8301, WAN: 8302)
appconsul_1 | Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
appconsul_1 | Atlas: <disabled>
appconsul_1 |
appconsul_1 | ==> Log data will now stream in as it occurs:
appconsul_1 |
appconsul_1 | 2017/06/13 14:57:24 [INFO] raft: Node at 192.168.20.10:8300 [Follower] entering Follower state
appconsul_1 | 2017/06/13 14:57:24 [INFO] serf: EventMemberJoin: node_1 192.168.20.10
appconsul_1 | 2017/06/13 14:57:24 [INFO] serf: EventMemberJoin: node_1.dc1 192.168.20.10
appconsul_1 | 2017/06/13 14:57:24 [INFO] consul: adding server node_1 (Addr: 192.168.20.10:8300) (DC: dc1)
appconsul_1 | 2017/06/13 14:57:24 [INFO] consul: adding server node_1.dc1 (Addr: 192.168.20.10:8300) (DC: dc1)
appconsul_1 | 2017/06/13 14:57:25 [ERR] agent: failed to sync remote state: No cluster leader
appconsul_1 | 2017/06/13 14:57:25 [ERR] agent: failed to sync changes: No cluster leader
appconsul_1 | 2017/06/13 14:57:26 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
appconsul_1 | 2017/06/13 14:57:48 [ERR] agent: failed to sync remote state: No cluster leader
appconsul_1 | 2017/06/13 14:58:13 [ERR] agent: failed to sync remote state: No cluster leader
appconsul_1 | 2017/06/13 14:58:22 [INFO] serf: EventMemberJoin: node_2 192.168.20.11
appconsul_1 | 2017/06/13 14:58:22 [INFO] consul: adding server node_2 (Addr: 192.168.20.11:8300) (DC: dc1)
appconsul_1 | 2017/06/13 14:58:22 [INFO] consul: Attempting bootstrap with nodes: [192.168.20.10:8300 192.168.20.11:8300]
appconsul_1 | 2017/06/13 14:58:23 [WARN] raft: Heartbeat timeout reached, starting election
appconsul_1 | 2017/06/13 14:58:23 [INFO] raft: Node at 192.168.20.10:8300 [Candidate] entering Candidate state
appconsul_1 | 2017/06/13 14:58:23 [WARN] raft: Remote peer 192.168.20.11:8300 does not have local node 192.168.20.10:8300 as a peer
appconsul_1 | 2017/06/13 14:58:23 [INFO] raft: Election won. Tally: 2
appconsul_1 | 2017/06/13 14:58:23 [INFO] raft: Node at 192.168.20.10:8300 [Leader] entering Leader state
appconsul_1 | 2017/06/13 14:58:23 [INFO] consul: cluster leadership acquired
appconsul_1 | 2017/06/13 14:58:23 [INFO] consul: New leader elected: node_1
appconsul_1 | 2017/06/13 14:58:23 [INFO] raft: pipelining replication to peer 192.168.20.11:8300
appconsul_1 | 2017/06/13 14:58:23 [INFO] consul: member 'node_1' joined, marking health alive
appconsul_1 | 2017/06/13 14:58:23 [INFO] consul: member 'node_2' joined, marking health alive
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_solr_1:8983'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8302'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8302:udp'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8301'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8500'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8300'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'consul'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_mysql_1:3306'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8400'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:53:udp'
appconsul_1 | 2017/06/13 14:58:26 [INFO] agent: Synced service 'app:dockerdata_appconsul_1:8301:udp'
Thanks for any reply
UPDATE:
I have tried to remove networks section from compose file but have same problem, i resolved using compose v1, this configuration works:
compose vm1 (app)
appconsul:
build: consul/
ports:
- 192.168.20.10:8300:8300
- 192.168.20.10:8301:8301
- 192.168.20.10:8301:8301/udp
- 192.168.20.10:8302:8302
- 192.168.20.10:8302:8302/udp
- 192.168.20.10:8400:8400
- 192.168.20.10:8500:8500
- 172.32.0.1:53:53/udp
hostname: node_1
command: -server -advertise 192.168.20.10 -bootstrap-expect 2 -ui-dir /ui
appregistrator:
build: registrator/
hostname: app
command: consul://192.168.20.10:8500
volumes:
- /var/run/docker.sock:/tmp/docker.sock
links:
- appconsul
compose vm2 (web)
webconsul:
build: consul/
ports:
- 192.168.20.11:8300:8300
- 192.168.20.11:8301:8301
- 192.168.20.11:8301:8301/udp
- 192.168.20.11:8302:8302
- 192.168.20.11:8302:8302/udp
- 192.168.20.11:8400:8400
- 192.168.20.11:8500:8500
- 172.33.0.1:53:53/udp
hostname: node_2
command: -server -advertise 192.168.20.11 -join 192.168.20.10
webregistrator:
build: registrator/
hostname: web
command: consul://192.168.20.11:8500
volumes:
- /var/run/docker.sock:/tmp/docker.sock
links:
- webconsul
The problem is version of compose file, v2 and v3 have same problem, work only with compose file v1
I have a consul running in a docker container.
When I start another consul agent (not on docker), it says:
[WARN] memberlist: Was able to reach container_name via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP
I am trying to form a cluster here, but leader election keeps failing.
How can I fix this?
My port specification in docker-compose.yml (docker-compose version: 1)
ports:
- "8300:8300"
- "8301:8301"
- "8301:8301/udp"
- "8302:8302"
- "8302:8302/udp"
- "8400:8400"
- "8500:8500"
- "8600:8600"
- "8600:8600/udp"
Log of Consul1 running in Docker Container:
Node name: '<host>'
Datacenter: 'dc1'
Server: true (bootstrap: true)
Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: <host_ip> (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
==> Log data will now stream in as it occurs:
2017/06/08 03:39:44 [INFO] raft: Restored from snapshot 13-35418-1496826625488
2017/06/08 03:39:44 [INFO] serf: EventMemberJoin: <host> <host_ip>
2017/06/08 03:39:44 [INFO] raft: Node at <host_ip>:8300 [Follower] entering Follower state
2017/06/08 03:39:44 [INFO] consul: adding LAN server <host> (Addr: <host_ip>:8300) (DC: dc1)
2017/06/08 03:39:44 [INFO] serf: EventMemberJoin: <host>.dc1 <host_ip>
2017/06/08 03:39:44 [INFO] consul: adding WAN server <host>.dc1 (Addr: <host_ip>:8300) (DC: dc1)
2017/06/08 03:39:44 [ERR] agent: failed to sync remote state: No cluster leader
2017/06/08 03:39:45 [WARN] raft: Heartbeat timeout reached, starting election
2017/06/08 03:39:45 [INFO] raft: Node at <host_ip>:8300 [Candidate] entering Candidate state
2017/06/08 03:39:45 [INFO] raft: Election won. Tally: 1
2017/06/08 03:39:45 [INFO] raft: Node at <host_ip>:8300 [Leader] entering Leader state
2017/06/08 03:39:45 [INFO] consul: cluster leadership acquired
2017/06/08 03:39:45 [INFO] consul: New leader elected: <host>
2017/06/08 03:39:45 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2017/06/08 03:39:45 [INFO] raft: Added peer <host_ip>:9300, starting replication
2017/06/08 03:39:45 [INFO] raft: Removed peer <host_ip>:9300, stopping replication (Index: 36201)
2017/06/08 03:39:45 [INFO] raft: Added peer <host_ip>:9300, starting replication
2017/06/08 03:39:45 [INFO] raft: Added peer <host_ip>:10300, starting replication
2017/06/08 03:39:45 [INFO] raft: Removed peer <host_ip>:10300, stopping replication (Index: 36228)
2017/06/08 03:39:45 [INFO] raft: Removed peer <host_ip>:9300, stopping replication (Index: 36230)
2017/06/08 03:39:45 [ERR] raft: Failed to AppendEntries to <host_ip>:10300: dial tcp <host_ip>:10300: getsockopt: connection refused
2017/06/08 03:39:45 [ERR] raft: Failed to AppendEntries to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:39:45 [ERR] raft: Failed to AppendEntries to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:39:45 [ERR] raft: Failed to AppendEntries to <host_ip>:10300: dial tcp <host_ip>:10300: getsockopt: connection refused
2017/06/08 03:39:45 [ERR] raft: Failed to AppendEntries to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:39:49 [WARN] agent: Check 'vault::8200:vault-sealed-check' missed TTL, is now critical
2017/06/08 03:39:50 [INFO] serf: EventMemberJoin: server2 <host_ip>
2017/06/08 03:39:50 [INFO] consul: adding LAN server server2 (Addr: <host_ip>:9300) (DC: dc1)
2017/06/08 03:39:50 [INFO] raft: Added peer <host_ip>:9300, starting replication
2017/06/08 03:39:50 [WARN] raft: AppendEntries to <host_ip>:9300 rejected, sending older logs (next: 36231)
2017/06/08 03:39:50 [INFO] raft: pipelining replication to peer <host_ip>:9300
2017/06/08 03:39:50 [INFO] consul: member 'server2' joined, marking health alive
2017/06/08 03:39:52 [INFO] agent: Synced service 'vault::8200'
2017/06/08 03:39:52 [INFO] agent: Synced check 'vault::8200:vault-sealed-check'
2017/06/08 03:40:06 [INFO] agent: Synced check 'vault::8200:vault-sealed-check'
2017/06/08 03:40:18 [ERR] raft: Failed to heartbeat to <host_ip>:9300: EOF
2017/06/08 03:40:18 [INFO] raft: aborting pipeline replication to peer <host_ip>:9300
2017/06/08 03:40:19 [ERR] raft: Failed to AppendEntries to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:19 [ERR] raft: Failed to heartbeat to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:19 [ERR] raft: Failed to AppendEntries to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:19 [ERR] raft: Failed to heartbeat to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:19 [ERR] raft: Failed to AppendEntries to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:19 [ERR] raft: Failed to AppendEntries to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:19 [ERR] raft: Failed to heartbeat to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:19 [WARN] raft: Failed to contact <host_ip>:9300 in 501.593114ms
2017/06/08 03:40:19 [WARN] raft: Failed to contact quorum of nodes, stepping down
2017/06/08 03:40:19 [INFO] raft: Node at <host_ip>:8300 [Follower] entering Follower state
2017/06/08 03:40:19 [INFO] consul: cluster leadership lost
2017/06/08 03:40:19 [ERR] raft: Failed to AppendEntries to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:20 [WARN] raft: Heartbeat timeout reached, starting election
2017/06/08 03:40:20 [INFO] raft: Node at <host_ip>:8300 [Candidate] entering Candidate state
2017/06/08 03:40:20 [ERR] raft: Failed to make RequestVote RPC to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:21 [INFO] memberlist: Suspect server2 has failed, no acks received
2017/06/08 03:40:22 [WARN] raft: Election timeout reached, restarting election
2017/06/08 03:40:22 [INFO] raft: Node at <host_ip>:8300 [Candidate] entering Candidate state
2017/06/08 03:40:22 [ERR] raft: Failed to make RequestVote RPC to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:23 [INFO] memberlist: Suspect server2 has failed, no acks received
2017/06/08 03:40:23 [WARN] dns: Query results too stale, re-requesting
2017/06/08 03:40:23 [ERR] dns: rpc error: No cluster leader
2017/06/08 03:40:23 [WARN] raft: Election timeout reached, restarting election
2017/06/08 03:40:23 [INFO] raft: Node at <host_ip>:8300 [Candidate] entering Candidate state
2017/06/08 03:40:23 [ERR] raft: Failed to make RequestVote RPC to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:24 [WARN] raft: Election timeout reached, restarting election
2017/06/08 03:40:24 [INFO] raft: Node at <host_ip>:8300 [Candidate] entering Candidate state
2017/06/08 03:40:24 [ERR] raft: Failed to make RequestVote RPC to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:24 [ERR] http: Request PUT /v1/session/renew/8c4efe65-07c3-f93e-6679-f2bc95f8e92c, error: No cluster leader from=172.17.0.4:57031
2017/06/08 03:40:25 [INFO] memberlist: Suspect server2 has failed, no acks received
2017/06/08 03:40:25 [ERR] http: Request PUT /v1/session/renew/8c4efe65-07c3-f93e-6679-f2bc95f8e92c, error: No cluster leader from=172.17.0.4:57061
2017/06/08 03:40:26 [INFO] memberlist: Suspect server2 has failed, no acks received
2017/06/08 03:40:26 [INFO] memberlist: Marking server2 as failed, suspect timeout reached
2017/06/08 03:40:26 [INFO] serf: EventMemberFailed: server2 <host_ip>
2017/06/08 03:40:26 [INFO] consul: removing LAN server server2 (Addr: <host_ip>:9300) (DC: dc1)
2017/06/08 03:40:26 [WARN] raft: Election timeout reached, restarting election
2017/06/08 03:40:26 [INFO] raft: Node at <host_ip>:8300 [Candidate] entering Candidate state
2017/06/08 03:40:26 [ERR] raft: Failed to make RequestVote RPC to <host_ip>:9300: dial tcp <host_ip>:9300: getsockopt: connection refused
2017/06/08 03:40:26 [ERR] agent: coordinate update error: No cluster leader
2017/06/08 03:40:26 [ERR] http: Request PUT /v1/session/renew/8c4efe65-07c3-f93e-6679-f2bc95f8e92c, error: No cluster leader from=172.17.0.4:57064
2017/06/08 03:40:27 [WARN] dns: Query results too stale, re-requesting
2017/06/08 03:40:27 [ERR] dns: rpc error: No cluster leader
2017/06/08 03:40:27 [WARN] raft: Election timeout reached, restarting election
Log of consul2:
==> WARNING: Expect Mode enabled, expecting 2 servers
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'server2'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 0.0.0.0 (HTTP: 9500, HTTPS: -1, DNS: 9600, RPC: 9400)
Cluster Addr: <host_ip> (LAN: 9301, WAN: 9302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>
==> Log data will now stream in as it occurs:
2017/06/08 09:09:50 [INFO] raft: Restored from snapshot 13-35418-1496892834061
2017/06/08 09:09:50 [INFO] serf: EventMemberJoin: server2 <host_ip>
2017/06/08 09:09:50 [INFO] serf: EventMemberJoin: server2.dc1 <host_ip>
2017/06/08 09:09:50 [INFO] raft: Node at <host_ip>:9300 [Follower] entering Follower state
2017/06/08 09:09:50 [INFO] consul: adding LAN server server2 (Addr: <host_ip>:9300) (DC: dc1)
2017/06/08 09:09:50 [INFO] consul: adding WAN server server2.dc1 (Addr: <host_ip>:9300) (DC: dc1)
2017/06/08 09:09:50 [ERR] agent: failed to sync remote state: No cluster leader
2017/06/08 09:09:50 [INFO] agent: Joining cluster...
2017/06/08 09:09:50 [INFO] agent: (LAN) joining: [<host_ip>:8301 <host_ip>:10301]
2017/06/08 09:09:50 [INFO] serf: EventMemberJoin: <host> <host_ip>
2017/06/08 09:09:50 [INFO] consul: adding LAN server <host> (Addr: <host_ip>:8300) (DC: dc1)
2017/06/08 09:09:50 [INFO] agent: (LAN) joined: 1 Err: <nil>
2017/06/08 09:09:50 [INFO] agent: Join completed. Synced with 1 initial agents
2017/06/08 09:09:50 [WARN] raft: Failed to get previous log: 36233 log not found (last: 36230)
2017/06/08 09:09:50 [INFO] raft: Removed ourself, transitioning to follower
2017/06/08 09:09:50 [INFO] raft: Removed ourself, transitioning to follower
2017/06/08 09:09:52 [WARN] memberlist: Was able to reach <host> via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP
==> Newer Consul version available: 0.8.3
2017/06/08 09:09:54 [WARN] memberlist: Was able to reach <host> via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP
2017/06/08 09:09:56 [WARN] memberlist: Was able to reach <host> via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP
2017/06/08 09:09:57 [WARN] memberlist: Was able to reach <host> via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP
What consul means regarding bidirectional UDP is that consul agent needs to see it's consul server and vice versa, consul server needs to see it's agent.
Consul agent -- [UDP] --> Consul Server
Consul agent <--[UDP] -- Consul Server
They are two different communications, unlike TCP, which uses the same channel that the Agent already initiated.
So, if your consul's agent and server are not in the same network (i.e. docker network) you need to expose ports in both ends. And take in account the concept of advertise that is the address that the agent announces to be contacted to.