unable to run consul in docker host network mode - docker

I am trying to run consul docker container in host network mode as suggested on docker hub. I am unable to access the UI at port 8500
My docker host IP address: 192.168.30.12
network interface which is used by host: ens192
Here is my docker run command:
docker run -d --net=host -v /home/docker/conf.json:/consul/config/config.json -v /home/docker/data/:/consul/data/ -e CONSUL_BIND_INTERFACE=ens192 -e CONSUL_CLIENT_INTERFACE=ens192 --name=consulserver1 -d consul agent -server -bootstrap-expect=1 -client 0.0.0.0 -bind=192.168.30.12
I also see following error in docker logs
==> Found address '192.168.30.12' for interface 'ens192', setting bind option...
==> Found address '192.168.30.12' for interface 'ens192', setting client option...
==> Starting Consul agent...
Version: '1.14.4'
Build Date: '2023-01-26 15:47:10 +0000 UTC'
Node ID: 'd8e91718-dcf3-70be-dd29-c558158959f0'
Node name: 'docker-try1'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: true)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, gRPC-TLS: 8503, DNS: 8600)
Cluster Addr: 192.168.30.12 (LAN: 8301, WAN: 8302)
Gossip Encryption: false
Auto-Encrypt-TLS: false
HTTPS TLS: Verify Incoming: false, Verify Outgoing: false, Min Version: TLSv1_2
gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
Internal RPC TLS: Verify Incoming: false, Verify Outgoing: false (Verify Hostname: false), Min Version: TLSv1_2
==> Log data will now stream in as it occurs:
2023-02-17T15:18:30.052Z [WARN] agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2023-02-17T15:18:30.052Z [WARN] agent: Node name "docker-try1" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2023-02-17T15:18:30.052Z [WARN] agent: bootstrap = true: do not enable unless necessary
2023-02-17T15:18:30.057Z [WARN] agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2023-02-17T15:18:30.057Z [WARN] agent.auto_config: Node name "docker-try1" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2023-02-17T15:18:30.057Z [WARN] agent.auto_config: bootstrap = true: do not enable unless necessary
2023-02-17T15:18:30.061Z [INFO] agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:d8e91718-dcf3-70be-dd29-c558158959f0 Address:192.168.30.12:8300}]"
2023-02-17T15:18:30.061Z [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.30.12:8300 [Follower]" leader-address= leader-id=
2023-02-17T15:18:30.062Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: docker-try1.dc1 192.168.30.12
2023-02-17T15:18:30.062Z [WARN] agent.server.serf.wan: serf: Failed to re-join any previously known node
2023-02-17T15:18:30.062Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: docker-try1 192.168.30.12
2023-02-17T15:18:30.063Z [INFO] agent.router: Initializing LAN area manager
2023-02-17T15:18:30.063Z [WARN] agent.server.serf.lan: serf: Failed to re-join any previously known node
2023-02-17T15:18:30.063Z [INFO] agent.server: Adding LAN server: server="docker-try1 (Addr: tcp/192.168.30.12:8300) (DC: dc1)"
2023-02-17T15:18:30.063Z [INFO] agent.server.autopilot: reconciliation now disabled
2023-02-17T15:18:30.064Z [INFO] agent.server: Handled event for server in area: event=member-join server=docker-try1.dc1 area=wan
2023-02-17T15:18:30.064Z [INFO] agent.server.cert-manager: initialized server certificate management
2023-02-17T15:18:30.064Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp
2023-02-17T15:18:30.065Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2023-02-17T15:18:30.065Z [INFO] agent: Starting server: address=[::]:8500 network=tcp protocol=http
2023-02-17T15:18:30.065Z [INFO] agent: Started gRPC listeners: port_name=grpc_tls address=[::]:8503 network=tcp
2023-02-17T15:18:30.065Z [INFO] agent: started state syncer
2023-02-17T15:18:30.065Z [INFO] agent: Consul agent running!
2023-02-17T15:18:37.152Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-leaf error="No cluster leader" index=0
2023-02-17T15:18:37.152Z [ERROR] agent.server.cert-manager: failed to handle cache update event: error="leaf cert watch returned an error: No cluster leader"
2023-02-17T15:18:37.248Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2023-02-17T15:18:39.483Z [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2023-02-17T15:18:39.483Z [INFO] agent.server.raft: entering candidate state: node="Node at 192.168.30.12:8300 [Candidate]" term=7
2023-02-17T15:18:39.486Z [INFO] agent.server.raft: election won: term=7 tally=1
2023-02-17T15:18:39.486Z [INFO] agent.server.raft: entering leader state: leader="Node at 192.168.30.12:8300 [Leader]"
2023-02-17T15:18:39.486Z [INFO] agent.server: cluster leadership acquired
2023-02-17T15:18:39.487Z [INFO] agent.server: New leader elected: payload=docker-try1
2023-02-17T15:18:39.493Z [INFO] agent.server.autopilot: reconciliation now enabled
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="federation state anti-entropy"
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="federation state pruning"
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="streaming peering resources"
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="metrics for streaming peering resources"
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="peering deferred deletion"
2023-02-17T15:18:39.493Z [INFO] connect.ca: initialized primary datacenter CA from existing CARoot with provider: provider=consul
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="intermediate cert renew watch"
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="CA root pruning"
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="CA root expiration metric"
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="CA signing expiration metric"
2023-02-17T15:18:39.493Z [INFO] agent.leader: started routine: routine="virtual IP version check"
2023-02-17T15:18:39.493Z [INFO] agent.leader: stopping routine: routine="virtual IP version check"
2023-02-17T15:18:39.493Z [INFO] agent.leader: stopped routine: routine="virtual IP version check"
2023-02-17T15:18:40.065Z [ERROR] agent.server.autopilot: Failed to reconcile current state with the desired state
2023-02-17T15:18:41.061Z [INFO] agent: Synced node info

I think I figured it out.
There was a firewall blocking tcp ports. As soon as I opened all ports recommended in Consul documentation Consul Ports, it started working

Related

single node ejabberd on kubernetes -ejabberdctl status shows node down

I'm trying to deploy ejabberd docker image in kubernetes with the following folders are mounted from a persistent volume,
/home/ejabberd/logs
/home/ejabberd/conf
/home/ejabberd/database
populated the database,and conf directory with our configuration files and the database folder
from the docker image using an init container .Upon setting the permissions, we could able to
start the ejabberd service , the logs says that the services (on ports 5222,5269,5280) are ready .
when I check the xmpp server status in the container using " ejabberdctl status " , the output says "node down"
===========ejabberd.log===================================================
2020-12-16 09:18:58.477630+00:00 [info] <0.3406.0>#mod_mqtt:init_topic_cache/2:611 Building MQTT cache for mydomain this may take a while
2020-12-16 09:18:59.087380+00:00 [info] <0.483.0>#ejabberd_mnesia:create/2:267 Creating Mnesia ram table 'bytestream'
2020-12-16 09:19:01.193203+00:00 [info] <0.126.0>#ejabberd_cluster_mnesia:wait_for_sync/1:123 Waiting for Mnesia synchronization to complete
2020-12-16 09:19:02.401537+00:00 [info] <0.126.0>#ejabberd_app:start/2:62 ejabberd 20.4.0 is started in the node 'ejabberd#mydomain' in 49.77s
2020-12-16 09:19:02.403414+00:00 [info] <0.601.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at [::]:5222 for ejabberd_c2s
2020-12-16 09:19:02.403479+00:00 [info] <0.602.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at [::]:5269 for ejabberd_s2s_in
2020-12-16 09:19:02.403956+00:00 [info] <0.603.0>#ejabberd_listener:init/4:159 Start accepting TLS connections at [::]:5443 for ejabberd_http
2020-12-16 09:19:02.403999+00:00 [info] <0.604.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at [::]:5280 for ejabberd_http
2020-12-16 09:19:02.404098+00:00 [info] <0.605.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at [::]:1883 for mod_mqtt
2020-12-16 09:19:02.404345+00:00 [info] <0.3418.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at 10.42.8.15:7777 for mod_proxy65_stream
========================================ejabberdctl status===========================
~ $ ./bin/ejabberdctl status
Failed RPC connection to the node 'ejabberd#mydomain': nodedown
Commands to start an ejabberd node:
start - Start an ejabberd node in server mode
debug - Attach an interactive Erlang shell to a running ejabberd node
iexdebug - Attach an interactive Elixir shell to a running ejabberd node
live - Start an ejabberd node in live (interactive) mode
iexlive - Start an ejabberd node in live (interactive) mode, within an Elixir shell
foreground - Start an ejabberd node in server mode (attached)
Optional parameters when starting an ejabberd node:
--config-dir dir Config ejabberd: /home/ejabberd/conf
--config file Config ejabberd: /home/ejabberd/conf/ejabberd.yml
--ctl-config file Config ejabberdctl: /home/ejabberd/conf/ejabberdctl.cfg
--logs dir Directory for logs: /home/ejabberd/logs
--spool dir Database spool dir: /home/ejabberd/database/ejabberd#mydomain
--node nodename ejabberd node name: ejabberd#mydomain
If anyone has tried ejabberd on kubernetes, Please share your thought on this issue
Thanks in advance

Management page won't load when using RabbitMQ docker container

I'm running RabbitMQ locally using:
docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management
Some log:
narley#brittes ~ $ docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management
2020-01-08 22:31:52.079 [info] <0.8.0> Feature flags: list of feature flags found:
2020-01-08 22:31:52.079 [info] <0.8.0> Feature flags: [ ] drop_unroutable_metric
2020-01-08 22:31:52.079 [info] <0.8.0> Feature flags: [ ] empty_basic_get_metric
2020-01-08 22:31:52.079 [info] <0.8.0> Feature flags: [ ] implicit_default_bindings
2020-01-08 22:31:52.080 [info] <0.8.0> Feature flags: [ ] quorum_queue
2020-01-08 22:31:52.080 [info] <0.8.0> Feature flags: [ ] virtual_host_metadata
2020-01-08 22:31:52.080 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2020-01-08 22:31:52.160 [info] <0.268.0> ra: meta data store initialised. 0 record(s) recovered
2020-01-08 22:31:52.162 [info] <0.273.0> WAL: recovering []
2020-01-08 22:31:52.164 [info] <0.277.0>
Starting RabbitMQ 3.8.2 on Erlang 22.2.1
Copyright (c) 2007-2019 Pivotal Software, Inc.
Licensed under the MPL 1.1. Website: https://rabbitmq.com
## ## RabbitMQ 3.8.2
## ##
########## Copyright (c) 2007-2019 Pivotal Software, Inc.
###### ##
########## Licensed under the MPL 1.1. Website: https://rabbitmq.com
Doc guides: https://rabbitmq.com/documentation.html
Support: https://rabbitmq.com/contact.html
Tutorials: https://rabbitmq.com/getstarted.html
Monitoring: https://rabbitmq.com/monitoring.html
Logs: <stdout>
Config file(s): /etc/rabbitmq/rabbitmq.conf
Starting broker...2020-01-08 22:31:52.166 [info] <0.277.0>
node : rabbit#1586b4698736
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : bwlnCFiUchzEkgAOsZwQ1w==
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rabbit#1586b4698736
2020-01-08 22:31:52.210 [info] <0.277.0> Running boot step pre_boot defined by app rabbit
...
...
...
2020-01-08 22:31:53.817 [info] <0.277.0> Setting up a table for connection tracking on this node: tracked_connection_on_node_rabbit#1586b4698736
2020-01-08 22:31:53.827 [info] <0.277.0> Setting up a table for per-vhost connection counting on this node: tracked_connection_per_vhost_on_node_rabbit#1586b4698736
2020-01-08 22:31:53.828 [info] <0.277.0> Running boot step routing_ready defined by app rabbit
2020-01-08 22:31:53.828 [info] <0.277.0> Running boot step pre_flight defined by app rabbit
2020-01-08 22:31:53.828 [info] <0.277.0> Running boot step notify_cluster defined by app rabbit
2020-01-08 22:31:53.829 [info] <0.277.0> Running boot step networking defined by app rabbit
2020-01-08 22:31:53.833 [info] <0.624.0> started TCP listener on [::]:5672
2020-01-08 22:31:53.833 [info] <0.277.0> Running boot step cluster_name defined by app rabbit
2020-01-08 22:31:53.833 [info] <0.277.0> Running boot step direct_client defined by app rabbit
2020-01-08 22:31:53.922 [info] <0.674.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2020-01-08 22:31:53.922 [info] <0.780.0> Statistics database started.
2020-01-08 22:31:53.923 [info] <0.779.0> Starting worker pool 'management_worker_pool' with 3 processes in it
completed with 3 plugins.
2020-01-08 22:31:54.316 [info] <0.8.0> Server startup complete; 3 plugins started.
* rabbitmq_management
* rabbitmq_management_agent
* rabbitmq_web_dispatch
Then I go to http:localhost:15672 and page doesn't load. No error is displayed.
Interesting thing is that it worked last time I used it (about 3 weeks ago).
Can anyone give me some help?
Cheers!
have a try:
step 1, going into docker container
docker exec -it rabbitmq bash
step 2, run it in docker container
rabbitmq-plugins enable rabbitmq_management
is work for me
I got it working by simply upgrading docker.
Was running docker 18.09.7 and upgrade to 19.03.5.
In my case, clearing the cookies up has fixed this issue instantly.

ECS exit with code 0, stopped and restarted docker container many times

I am deploying a docker with ECS. I can run the docker ok on local, but when i use ECS, it always stops it and then restart a new one. I checked the docker log, there is no error either. I also checked docker stats, things looks ok. Below is what the ecs log shows:
2018-02-16T07:24:01Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: Cgroup resource set up for task complete
2018-02-16T07:24:01Z [INFO] Task engine [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: pulling container openimage-http concurrently
2018-02-16T07:24:01Z [INFO] Task engine [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: Recording timestamp for starting image pulltime: 2018-02-16 07:24:01.209507829 +0000 UTC m=+30.870498308
2018-02-16T07:27:29Z [INFO] Adding image name- 035804961478.dkr.ecr.us-west-2.amazonaws.com/adobesearch/openimage:classifierscpu to Image state- sha256:b0b930a705e7df39404de30c58da2e573a2deb519b212c0d0a7ab74cf9d85192
2018-02-16T07:27:29Z [INFO] Updating container reference openimage-http in Image State - sha256:b0b930a705e7df39404de30c58da2e573a2deb519b212c0d0a7ab74cf9d85192
2018-02-16T07:27:29Z [INFO] Saving state! module="statemanager"
2018-02-16T07:27:29Z [INFO] Saving state! module="statemanager"
2018-02-16T07:27:29Z [INFO] Task engine [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: Finished pulling container 035804961478.dkr.ecr.us-west-2.amazonaws.com/adobesearch/openimage:classifierscpu in 3m27.982214992s
2018-02-16T07:27:29Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: Cgroup resource set up for task complete
2018-02-16T07:27:29Z [INFO] Task engine [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: creating container: openimage-http
2018-02-16T07:27:29Z [INFO] Task engine [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: created container name mapping for task: openimage-http -> ecs-classifierscpu-http-TaskDefinition-1AYVW69U0MPZX-1-openimage-http-b2cbd59b9dd7f4f7c501
2018-02-16T07:27:29Z [INFO] Saving state! module="statemanager"
2018-02-16T07:27:29Z [INFO] Task engine [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: created docker container for task: openimage-http -> e9710d91b36113d319dc04a274fe0fdf2908e102c25767ab2fab14e822277049
2018-02-16T07:27:29Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: unable to create task state change event []: create task state change event api: status not recognized by ECS: CREATED
2018-02-16T07:27:29Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: Cgroup resource set up for task complete
2018-02-16T07:27:29Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: redundant container state change. openimage-http to CREATED, but already CREATED
2018-02-16T07:27:29Z [INFO] Task engine [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: starting container: openimage-http
2018-02-16T07:27:30Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: sending container change event [openimage-http]: arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d openimage-http -> RUNNING, Ports [{5000 8080 0.0.0.0 0}], Known Sent: NONE
2018-02-16T07:27:30Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: sent container change event [openimage-http]: arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d openimage-http -> RUNNING, Ports [{5000 8080 0.0.0.0 0}], Known Sent: NONE
2018-02-16T07:27:30Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: sending task change event [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d -> RUNNING, Known Sent: NONE, PullStartedAt: 2018-02-16 07:24:01.209507829 +0000 UTC m=+30.870498308, PullStoppedAt: 2018-02-16 07:27:29.191733309 +0000 UTC m=+238.852723707, ExecutionStoppedAt: 0001-01-01 00:00:00 +0000 UTC]
2018-02-16T07:27:30Z [INFO] TaskHandler: batching container event: arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d openimage-http -> RUNNING, Ports [{5000 8080 0.0.0.0 0}], Known Sent: NONE
2018-02-16T07:27:30Z [INFO] TaskHandler: Adding event: TaskChange: [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d -> RUNNING, Known Sent: NONE, PullStartedAt: 2018-02-16 07:24:01.209507829 +0000 UTC m=+30.870498308, PullStoppedAt: 2018-02-16 07:27:29.191733309 +0000 UTC m=+238.852723707, ExecutionStoppedAt: 0001-01-01 00:00:00 +0000 UTC, arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d openimage-http -> RUNNING, Ports [{5000 8080 0.0.0.0 0}], Known Sent: NONE] sent: false
2018-02-16T07:27:30Z [INFO] TaskHandler: Sending task change: TaskChange: [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d -> RUNNING, Known Sent: NONE, PullStartedAt: 2018-02-16 07:24:01.209507829 +0000 UTC m=+30.870498308, PullStoppedAt: 2018-02-16 07:27:29.191733309 +0000 UTC m=+238.852723707, ExecutionStoppedAt: 0001-01-01 00:00:00 +0000 UTC, arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d openimage-http -> RUNNING, Ports [{5000 8080 0.0.0.0 0}], Known Sent: NONE] sent: false
2018-02-16T07:27:30Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: sent task change event [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d -> RUNNING, Known Sent: NONE, PullStartedAt: 2018-02-16 07:24:01.209507829 +0000 UTC m=+30.870498308, PullStoppedAt: 2018-02-16 07:27:29.191733309 +0000 UTC m=+238.852723707, ExecutionStoppedAt: 0001-01-01 00:00:00 +0000 UTC]
2018-02-16T07:27:30Z [INFO] Managed task [arn:aws:ecs:us-west-2:035804961478:task/c13ba3f3-6ac8-49c5-a649-3d90e363ce4d]: redundant container state change. openimage-http to RUNNING, but already RUNNING
2018-02-16T07:27:30Z [WARN] Error publishing metrics: stats engine: no task metrics to report
I also deployed another simple docker test image on ecs, i ONLY changed the tag name (the tag of the other docker image in ECS) in the template, and it works perfectly. But the previous failed case, the docker works perfectly well on local as well in instance if i manually do docker run. I really don't know where else to look for the error.

Issue when joining serf nodes located in different Docker containers

Context: Host is AWS-EC2 / Ubuntu 14.04.5 with Docker version 17.05.0-ce. Containers are built from publicly available repo image cbhihe/serf-alpine-bash. All containers are located on the same EC2 instance and share the same default bridge network with net-interface "docker0".
Trying to join nodes serfDC1 (id d4fd90692e18) and serfDC2 (id 6353e7f6134d), by passing cmds from the host's shell:
$ docker exec serfDC1 serf agent -node=Node1 -bind=0.0.0.0:7946
==> Starting Serf agent…
==> Starting Serf agent RPC...
==> Serf agent running!
Node name: 'd4fd90692e18'
Bind addr: '0.0.0.0:7946'
RPC addr: '127.0.0.1:7373'
Encrypted: false
Snapshot: false
Profile: lan
==> Log data will now stream in as it occurs:
2017/06/04 00:01:10 [INFO] agent: Serf agent starting
2017/06/04 00:01:10 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
2017/06/04 00:01:11 [INFO] agent: Received event: member-join
^C
After discovering Node1's container's IP=172.17.0.4, I can issue the serf agent -join cmd to Node2:
$ docker exec serfDC2 serf agent -node=Node2 -join=172.17.0.4
==> Starting Serf agent...
==> Starting Serf agent RPC...
==> Serf agent running!
Node name: '6353e7f6134d'
Bind addr: '0.0.0.0:7946'
RPC addr: '127.0.0.1:7373'
Encrypted: false
Snapshot: false
Profile: lan
==> Joining cluster...(replay: false)
Join completed. Synced with 1 initial agents
==> Log data will now stream in as it occurs:
2017/06/04 00:18:35 [INFO] agent: Serf agent starting
2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: 6353e7f6134d 127.0.0.1
2017/06/04 00:18:35 [INFO] agent: joining: [172.17.0.4] replay: false
2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
2017/06/04 00:18:35 [INFO] agent: joined: 1 nodes
2017/06/04 00:18:36 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
2017/06/04 00:18:36 [INFO] agent: Received event: member-join
2017/06/04 00:18:37 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34876
2017/06/04 00:18:37 [ERR] memberlist: Failed TCP fallback ping: EOF
2017/06/04 00:18:37 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
2017/06/04 00:18:38 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
2017/06/04 00:18:39 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34879
2017/06/04 00:18:39 [ERR] memberlist: Failed TCP fallback ping: EOF
2017/06/04 00:18:40 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
2017/06/04 00:18:41 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
2017/06/04 00:18:42 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34881
2017/06/04 00:18:42 [ERR] memberlist: Failed TCP fallback ping: EOF
2017/06/04 00:18:42 [INFO] memberlist: Marking d4fd90692e18 as failed, suspect timeout reached (0 peer confirmations)
2017/06/04 00:18:42 [INFO] serf: EventMemberFailed: d4fd90692e18 127.0.0.1
2017/06/04 00:18:43 [INFO] agent: Received event: member-failed
2017/06/04 00:18:44 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
2017/06/04 00:19:05 [INFO] serf: attempting reconnect to d4fd90692e18 127.0.0.1:7946
^C
Resulted in failure to join as shown by:
$ docker exec serfDC2 serf members
6353e7f6134d 127.0.0.1:7946 alive
d4fd90692e18 127.0.0.1:7946 failed
$ docker exec serfDC1 serf members
d4fd90692e18 127.0.0.1:7946 alive
6353e7f6134d 127.0.0.1:7946 failed
I have been at this for quite some time now and am at my wit's end as to where I should turn. Hashicorp's and Docker's documentation do not seem to cover this aspect of the initial handshake between two serf agents in different containers.
Could somebody show me where I took a wrong turn ? Any answer would be great, really. Tx.
Serf nodes need to 'announce' themselves with a routable address. In your case they are telling to each other: 'hi, I'm localhost:...', so each one tries to answer to localhost, which is something wrong because each container has its own localhost.
There is an option to configure the agent to use the eth0 ip to advertise to the others nodes in the network: -iface. Then you need to discard the -bind option. Those ports are default so there is no need to customize.
So, for the node1:
serf agent -node=Node1 -iface=eth0
And for the node2:
serf agent -node=Node2 -join=172.17.0.2 -iface=eth0
From docs:
-iface - This flag can be used to provide a binding interface. It can be used instead of -bind if the interface is known but not the address.
It's working properly for me:
Node1:
==> Log data will now stream in as it occurs:
2017/06/04 01:56:40 [INFO] agent: Serf agent starting
2017/06/04 01:56:40 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
2017/06/04 01:56:41 [INFO] agent: Received event: member-join
2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
2017/06/04 01:57:03 [INFO] agent: Received event: member-join
Node2:
==> Log data will now stream in as it occurs:
2017/06/04 01:57:02 [INFO] agent: Serf agent starting
2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
2017/06/04 01:57:02 [INFO] agent: joining: [172.17.0.2] replay: false
2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
2017/06/04 01:57:02 [INFO] agent: joined: 1 nodes
2017/06/04 01:57:03 [INFO] agent: Received event: member-join
Edit:
In the case that each container is in its own VM (EC2 instance), as each instance has its own docker network and not interconnected, you have to provide the EC2 instance IP and expose the corresponding ports. Use -advertise
-advertise - The advertise flag is used to change the address that we advertise to other nodes in the cluster.
Node1:
serf agent -node=Node1 -iface=eth0 -advertise=INSTANCE_IP
Node2:
serf agent -node=Node2 -join=NODE1_INSTANCE_IP -iface=eth0
And remember to expose the serf port in docker run
docker run -p 7946:7946 (...rest of the command...)

No nodes connecting to host in docker swarm

I just followed this tutorial step by step for setting up a docker swarm in EC2 -- https://docs.docker.com/swarm/install-manual/
I created 4 Amazon Servers using the Amazon Linux AMI.
manager + consul
manager
node1
node2
I followed the instructions to start the swarm and everything seems to go ok regarding making the docker instances.
Server 1
Running docker ps gives:
The Consul logs show this
2016/07/05 20:18:47 [INFO] serf: EventMemberJoin: 729a440e5d0d 172.17.0.2
2016/07/05 20:18:47 [INFO] serf: EventMemberJoin: 729a440e5d0d.dc1 172.17.0.2
2016/07/05 20:18:48 [INFO] raft: Node at 172.17.0.2:8300 [Follower] entering Follower state
2016/07/05 20:18:48 [INFO] consul: adding server 729a440e5d0d (Addr: 172.17.0.2:8300) (DC: dc1)
2016/07/05 20:18:48 [INFO] consul: adding server 729a440e5d0d.dc1 (Addr: 172.17.0.2:8300) (DC: dc1)
2016/07/05 20:18:48 [ERR] agent: failed to sync remote state: No cluster leader
2016/07/05 20:18:49 [WARN] raft: Heartbeat timeout reached, starting election
2016/07/05 20:18:49 [INFO] raft: Node at 172.17.0.2:8300 [Candidate] entering Candidate state
2016/07/05 20:18:49 [INFO] raft: Election won. Tally: 1
2016/07/05 20:18:49 [INFO] raft: Node at 172.17.0.2:8300 [Leader] entering Leader state
2016/07/05 20:18:49 [INFO] consul: cluster leadership acquired
2016/07/05 20:18:49 [INFO] consul: New leader elected: 729a440e5d0d
2016/07/05 20:18:49 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2016/07/05 20:18:49 [INFO] consul: member '729a440e5d0d' joined, marking health alive
2016/07/05 20:18:50 [INFO] agent: Synced service 'consul'
I registered each node using the following command with appropriate IP's
docker run -d swarm join --advertise=x-x-x-x:2375 consul://x-x-x-x:8500
Each of those created a docker instance
Node1
Running docker ps gives:
With logs that suggest there's a problem:
time="2016-07-05T21:33:50Z" level=info msg="Registering on the discovery service every 1m0s..." addr="172.31.17.35:2375" discovery="consul://172.31.3.233:8500"
time="2016-07-05T21:36:20Z" level=error msg="cannot set or renew session for ttl, unable to operate on sessions"
time="2016-07-05T21:37:20Z" level=info msg="Registering on the discovery service every 1m0s..." addr="172.31.17.35:2375" discovery="consul://172.31.3.233:8500"
time="2016-07-05T21:39:50Z" level=error msg="cannot set or renew session for ttl, unable to operate on sessions"
time="2016-07-05T21:40:50Z" level=info msg="Registering on the discovery service every 1m0s..." addr="172.31.17.35:2375" discovery="consul://172.31.3.233:8500"
...
And lastly when I get to the last step of trying to get host information like so on my Consul machine,
docker -H :4000 info
I see no nodes. Lastly when I try the step of running an app, I get the obvious error:
[ec2-user#ip-172-31-3-233 ~]$ docker -H :4000 run hello-world
docker: Error response from daemon: No healthy node available in the cluster.
See 'docker run --help'.
[ec2-user#ip-172-31-3-233 ~]$
Thanks for any insight on this. I'm still pretty confused by much of the swarm model and not sure where to go from here to diagnose.
It looks like Consul is either not binding to a public IP address, or is not accessible on the public IP due to security group or VPC settings. You are setting the discovery URL to consul://172.31.3.233:8500 on the Docker nodes, so I would sugest trying to connect to that address from an external IP, either in your browser or via curl like this:
% curl http://172.31.3.233:8500/ui/dist/
HTML
If you cannot connect (connection refused or timeout) then add a TCP port 8500 ingress rule to your AWS VMs, and try again.
After investigating your issue, I see that you forgot open port 2375 for Docker Engine in all four nodes.
Before starting Swarm Manager or Swarm Node, you have to open a TCP Port for Docker Engine, so Swarm will work with Docker Engine via that Port.
With Docker on Ubuntu 14.04, you can open the port by change file /etc/default/docker and add -H tcp://0.0.0.0:2375 to DOCKER_OPTS. For example:
DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"
After that, you restart Docker Engine
service docker restart
If you are using CentOS, the solution is same, you can read my blog article https://sonnguyen.ws/install-docker-docker-swarm-centos7/
And the other thing, I think that you should install and run Consul in all nodes (4 servers). So your Swarm can work with Consul on its node

Resources