Hyperledger sawtooth with docker (Test network tutorial). Connectivity problem between the nodes of the network - docker

I am trying to settup a sawtooth network like in the following tutorial.
I use the following docker-compose.yaml file as instructed in the tutorial to create a sawtooth network of 5 nodes using the pbft consesus engine.
The problem is that once I try to check whether peering has occurred on the network by submit a peers query to the REST API on the first node from the shell container I get a connection refused answer:
curl: (7) Failed to connect to sawtooth-rest-api-default-0 port 8008: Connection refused
Connectivity among the containers seems to be working fine (I have checked with ping from inside the containers).
I suspect that the problem stems from the following line of the docker-compose.yaml file:
sawtooth-validator -vv \
--endpoint tcp://validator-0:8800 \
--bind component:tcp://eth0:4004 \
--bind consensus:tcp://eth0:5050 \
--bind network:tcp://eth0:8800 \
--scheduler parallel \
--peering static \
--maximum-peer-connectivity 10000
and more specifically the --bind option. I noticed that eth0 is not resolved properly to the IP of the container network, but instead to the loopback:
terminal output for validator 0
Do you believe that this could be the problem or is there something else I might have overlooked?
Thannk you

Looks like the moment I post something here the answer magically reveals itself.
The backslash characters are not interpreted correctly so the --bind option was not taken into account and the default is the loopback.
What I did to fix it is either put the whole command in the same line or use double backslash.

Related

Docker Container Refuses to NOT use Proxy for Docker Network

I'm having issues trying to get networking to work correctly in my container inside a corp domain/behind a proxy.
I've correctly configured (I think) Docker to get around the proxy for downloading images, but now my container is having trouble talking to another container inside the same docker-compose network.
So far, the only resolution is to manually append the docker-compose network to the no_proxy variable in the docker config, but this seems wrong and would need to be configured for each docker-compose network and requires a restart of docker.
Here is how I configured the docker proxy settings on host:
cat << "EOF" >docker_proxy_setup.sh
#!/bin/bash
#Proxy
#ActiveProxyVar=127.0.0.1:80
#Domain
corpdom=domain.org
httpproxyvar=http://$ActiveProxyVar/
httpsproxyvar=http://$ActiveProxyVar/
mkdir ~/.docker
cat << EOL >~/.docker/config.json
{
"proxies":
{
"default":
{
"httpProxy": "$httpproxyvar",
"httpsProxy": "$httpsproxyvar",
"noProxy": ".$corpdom,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
}
}
}
EOL
mkdir -p /etc/systemd/system/docker.service.d
cat << EOL >/etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=$httpproxyvar"
Environment="HTTPS_PROXY=$httpsproxyvar"
Environment="NO_PROXY=.$corpdom,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
EOL
systemctl daemon-reload
systemctl restart docker
#systemctl show --property Environment docker
docker run hello-world
EOF
chmod +x docker_proxy_setup.sh
docker_proxy_setup.sh
and basically if I change to this:
#Domain
corpdom=domain.org,icinga_icinga-net
I am able to use curl to test network and it works correctly, but ONLY when using container_name.icinga_icinga-net
Eg:
This fails curl -k -u root:c54854140704eafc https://icinga2-api:5665/v1/objects/hosts
While this succeeds curl -k -u root:c54854140704eafc https://icinga2-api.icinga_icinga-net:5665/v1/objects/hosts
Note that using curl --noproxy seems to have no effect.
Here is some output from container for reference, any ideas what I can do to have containers NOT use proxy for Docker Networks (private IPv4)?
root#icinga2-web:/# ping icinga2-api
PING icinga2-api (172.30.0.5) 56(84) bytes of data.
64 bytes from icinga2-api.icinga_icinga-net (172.30.0.5): icmp_seq=1 ttl=64 time=0.138 ms
64 bytes from icinga2-api.icinga_icinga-net (172.30.0.5): icmp_seq=2 ttl=64 time=0.077 ms
^C
--- icinga2-api ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1025ms
rtt min/avg/max/mdev = 0.077/0.107/0.138/0.030 ms
root#icinga2-web:/# curl --noproxy -k -u root:c54854140704eafc https://172.30.0.5:5665/v1/objects/hosts
curl: (56) Received HTTP code 503 from proxy after CONNECT
root#icinga2-web:/# curl -k -u root:c54854140704eafc https://172.30.0.5:5665/v1/objects/hosts
curl: (56) Received HTTP code 503 from proxy after CONNECT
root#icinga2-web:/# curl -k -u root:c54854140704eafc https://icinga2-api:5665/v1/objects/hosts
curl: (56) Received HTTP code 503 from proxy after CONNECT
root#icinga2-web:/# curl -k -u root:c54854140704eafc https://icinga2-api.icinga_icinga-net:5665/v1/objects/hosts
{"results":[{"attrs":{"__name":"icinga2-api","acknowledgement":0,"acknowledgement_expiry":0,"acknowledgement_last_change":0,"action_url":"","active":true,"address":"127.0.0.1","address6":"::1","check_attempt":1,"check_command":"hostalive","check_interval":60,"check_period":"","check_timeout":null,"command_endpoint":"","display_name":"icinga2-api","downtime_depth":0,"enable_active_checks":true,"enable_event_handler":true,"enable_flapping":false,"enable_notifications":true,"enable_passive_checks":true,"enable_perfdata":true,"event_command":"","executions":null,"flapping":false,"flapping_current":0,"flapping_ignore_states":null,"flapping_last_change":0,"flapping_threshold":0,"flapping_threshold_high":30,"flapping_threshold_low":25,"force_next_check":false,"force_next_notification":false,"groups":["linux-servers"],"ha_mode":0,"handled":false,"icon_image":"","icon_image_alt":"","last_check":1663091644.161905,"last_check_result":{"active":true,"check_source":"icinga2-api","command":["/usr/lib/nagios/plugins/check_ping","-H","127.0.0.1","-c","5000,100%","-w","3000,80%"],"execution_end":1663091644.161787,"execution_start":1663091640.088944,"exit_status":0,"output":"PING OK - Packet loss = 0%, RTA = 0.05 ms","performance_data":["rta=0.055000ms;3000.000000;5000.000000;0.000000","pl=0%;80;100;0"],"previous_hard_state":99,"schedule_end":1663091644.161905,"schedule_start":1663091640.087908,"scheduling_source":"icinga2-api","state":0,"ttl":0,"type":"CheckResult","vars_after":{"attempt":1,"reachable":true,"state":0,"state_type":1},"vars_before":{"attempt":1,"reachable":true,"state":0,"state_type":1}},"last_hard_state":0,"last_hard_state_change":1663028345.921676,"last_reachable":true,"last_state":0,"last_state_change":1663028345.921676,"last_state_down":0,"last_state_type":1,"last_state_unreachable":0,"last_state_up":1663091644.161787,"max_check_attempts":3,"name":"icinga2-api","next_check":1663091703.191943,"next_update":1663091771.339701,"notes":"","notes_url":"","original_attributes":null,"package":"_etc","paused":false,"previous_state_change":1663028345.921676,"problem":false,"retry_interval":30,"severity":0,"source_location":{"first_column":1,"first_line":18,"last_column":20,"last_line":18,"path":"/etc/icinga2/conf.d/hosts.conf"},"state":0,"state_type":1,"templates":["icinga2-api","generic-host"],"type":"Host","vars":{"disks":{"disk":{},"disk /":{"disk_partitions":"/"}},"http_vhosts":{"http":{"http_uri":"/"}},"notification":{"mail":{"groups":["icingaadmins"]}},"os":"Linux"},"version":0,"volatile":false,"zone":""},"joins":{},"meta":{},"name":"icinga2-api","type":"Host"}]}
root#icinga2-web:/#
PS: I'm fairly certain this is not a specific issue to icinga as I've had some random proxy issues w/ other containers. But, I can say I've tested this icinga compose setup outside corp domain and it worked fine 100%.
Partial Resolution!
I would still prefer to use CIDR to have no_proxy work via container name without having to adjust docker-compose/.env but I got it to work.
A few things I did:
Added lowercase to docker service -->:
cat << EOL >/etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=$httpproxyvar"
Environment="HTTPS_PROXY=$httpsproxyvar"
Environment="NO_PROXY=.$corpdom,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
Environment="http_proxy=$httpproxyvar"
Environment="https_proy=$httpsproxyvar"
Environment="no_proxy=.$corpdom,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
EOL
Added no_proxy in caps and lower to docker-compose containers and set in .env
Note: lower and CAPS should be used
environment:
- 'NO_PROXY=${NO_PROXY}'
- 'no_proxy=${NO_PROXY}'
NO_PROXY=.domain.org,127.0.0.0/8,172.16.0.0/12,icinga_icinga-net
I would prefer to append to the existing variable at least, but I tried the following and it made the variable no_proxy = ,icinga_icinga-net
NO_PROXY=$NO_PROXY,icinga_icinga-net
NO_PROXY=${NO_PROXY},icinga_icinga-net
Note: NO_PROXY was set on host via export
I still don't understand why it fails when using:
curl --noproxy -k -u root:c54854140704eafc https://172.30.0.4:5665/v1/objects/hosts
when I have no_proxy 172.16.0.0/12 which should equal 172.16.0.0 – 172.31.255.255 but doesn't work.
Update:
I tried setting no_proxy to the IP explicitly (no CIDR) and that worked, but it still failed w/ just container as host (no .icinga-net).
This is all related to this great post -->
https://about.gitlab.com/blog/2021/01/27/we-need-to-talk-no-proxy/
This is the best I can come up with, happy to reward better answers!
Docker Setup (Global):
#!/bin/bash
#Proxy
ActiveProxyVar=127.0.0.7
#Domain
corpdom=domain.org
#NoProxy
NOT_PROXY=127.0.0.0/8,172.16.0.0/12,192.168.0.0/16,10.0.0.0/8,.$corpdom
httpproxyvar=http://$ActiveProxyVar/
httpsproxyvar=http://$ActiveProxyVar/
mkdir ~/.docker
cat << EOL >~/.docker/config.json
{
"proxies":
{
"default":
{
"httpProxy": "$httpproxyvar",
"httpsProxy": "$httpsproxyvar",
"noProxy": "$NOT_PROXY"
}
}
}
EOL
mkdir -p /etc/systemd/system/docker.service.d
cat << EOL >/etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=$httpproxyvar"
Environment="HTTPS_PROXY=$httpsproxyvar"
Environment="NO_PROXY=$NOT_PROXY"
Environment="http_proxy=$httpproxyvar"
Environment="https_proy=$httpsproxyvar"
Environment="no_proxy=$NOT_PROXY"
EOL
systemctl daemon-reload
systemctl restart docker
#systemctl show --property Environment docker
#docker run hello-world
docker-compose.yaml:
environment:
- 'NO_PROXY=${NO_PROXY}'
- 'no_proxy=${NO_PROXY}'
.env:
--Basically, add docker-compose network then each container name...
NO_PROXY=127.0.0.0/8,172.16.0.0/12,192.168.0.0/16,10.0.0.0/8,.icinga_icinga-net,icinga2-api,icinga2-web,icinga2-db,icinga2-webdb,icinga2-redis,icinga2-directordb,icinga2-icingadb,icinga2-web_director

curl fails when ran inside script

Trying to communicate with a running docker container by running a simple curl:
curl -v -s -X POST http://localhost:4873/_session -d \'name=some\&password=thing\'
Which works fine from any shell (login/interactive), but miserably fails when doing it in a script:
temp=$(curl -v -s -X POST http://localhost:4873/_session -d \'name=some\&password=thing\')
echo $temp
With error output suggesting a connection reset:
* Trying 127.0.0.1:4873...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 4873 (#0)
> POST /_session HTTP/1.1
> Host: localhost:4873
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Length: 29
> Content-Type: application/x-www-form-urlencoded
>
} [29 bytes data]
* upload completely sent off: 29 out of 29 bytes
* Recv failure: Connection reset by peer <-- this! why?
* Closing connection 0
I'm lost and any hint is appreciated.
PS: tried without subshell and same happens so it's something with the script or the way it's executed.
Edit 1
Added docker compose file. I don't see why regular shell works, but script does not. Note that script is not ran inside docker, it's also running from host.
version: "2.1"
services:
verdaccio:
image: verdaccio/verdaccio:4
container_name: verdaccio-docker-local-storage-vol
ports:
- "4873:4873"
volumes:
- "./storage:/verdaccio/storage"
- "./conf:/verdaccio/conf"
volumes:
verdaccio:
driver: local
Edit 2
So doing temp=$(curl -v -s http://www.google.com) works fine in the script. It's some kind of networking issue, but I still haven't managed to figure out why.
Edit 3
Lots of people suggested to reformat the payload data, but even without a payload same error is thrown. Also note I'm on Linux so not sure if there are any permissions that can play a role here.
if you are using bash script, Can you update the script with below change and try to run again.
address="http://127.0.0.1:4873/_session"
cred="{\"name\":\"some\", \"password\":\"thing\"}"
temp="curl -v -s -X POST $address -d $cred"
echo $temp
I suspect the issue is within the script and not with docker.
If you run your container in default mode, docker daemon will locate it in another network, so 'localhost' of your host machine and that one of your container are different.
If you want to see the host machine ports from your container, try to run it with key --network="host" (detailed description can be found here)

In Jupyter docker , cannot connect to kernel

When installing Jupyter docker, for example this one :
docker run -d \
--hostname jupyterhub-ds \
--log-opt max-size=50m \
-p 8000:8000 \
-p 5006:5006 \
-e DOCKER_USER=$(id -un) \
-e DOCKER_USER_ID=$(id -u) \
-e DOCKER_PASSWORD=$(id -un) \
-e DOCKER_GROUP_ID=$(id -g) \
-e DOCKER_ADMIN_USER=$(id -un) \
-v "$(pwd)":/workdir \
-v "$(dirname $HOME)":/home_host \
dclong/jupyterhub-ds /scripts/sys/init.sh
JupyterLab starts well and I enter the lab. through URL+port.
However, this is not possible to connect to the inernal python kernel (connection is hanging up).
What kind of security I am facing ?
Is this related to socket communication security ?
After Investigation, I have those messages :
[D 16:01:39.488 NotebookApp] Starting kernel: ['/usr/local/bin/python', '-m', 'ipykernel_launcher', '-f', '/root/.local/share/jupyter/runtime/kernel-f0420fbf-12e918f-20df7d3e804a.json']
[D 16:01:39.491 NotebookApp] Connecting to: tcp://127.0.0.1:51775
[D 16:01:39.491 NotebookApp] Connecting to: tcp://127.0.0.1:38609
[I 16:01:39.492 NotebookApp] Kernel started: f0420fbf-12ef-403e-918f-20df7d3e804a
[D 16:01:39.492 NotebookApp] Kernel args: {'kernel_name': 'python3', 'cwd': '/'}
[D 16:01:39.493 NotebookApp] Clearing buffer for 5e93046f-aa3e-4edd-a018-66b9d4c752e5
[I 16:01:39.493 NotebookApp] Kernel shutdown: 5e93046f-aa3e-4edd-a018-66b9d4c752e5
It seems linked to this one :
https://jupyter-notebook.readthedocs.io/en/stable/public_server.html
Firewall Setup
To function correctly, the firewall on the computer running the jupyter notebook server must be configured to allow connections from client machines
on the access port c.NotebookApp.port set in jupyter_notebook_config.py to allow connections to the web interface.
The firewall must also allow connections from 127.0.0.1 (localhost) on ports from 49152 to 65535. These ports are used by the server to communicate with the notebook kernels.
The kernel communication ports are chosen randomly by ZeroMQ,
and may require multiple connections per kernel, so a large range of ports must be accessible.
I'm not sure how you built the docker command, or why you chose that particular Docker image dclong/jupyterhub?
If it is designed to run jupyterhub (multiuser) then it doesn't sound like what you need if you're trying to run your own Jupyter server in docker, just for you.
I would suggest using something like jupyter/scipy-notebook instead that is designed just to run one Jupyter server.
Otherwise, please describe what you actually want to get running, or why you believe you need to use that image etc.

Testing minimal docker containers with healthcheck

I have 5 containers running one after another. First 3, (ABC) are very minimal. ABC containers need to be health checked, but curl,wget cannot be run on them, so currently I just run test:[CMD-SHELL], "whoami || exit 1" in docker-compose.yml. Which seems to bring them to a healthy state. Other 2 (DE) dependent on ABC to be healthy are being checked using test: [CMD-SHELL] , "curl --fail http://localhost" command. My question is how can I properly check health of those minimal containers, without using curl, wget etc. ?
If you can live with a TCP connection test to your internal service's port, you could use /dev/tcp for this:
HEALTHCHECK CMD bash -c 'echo -n > /dev/tcp/127.0.0.1/<port>'
Like this:
# PASS (webserver is serving on 8099)
root#ab7470ea0c8b:/app# echo -n > /dev/tcp/127.0.0.1/8099
root#ab7470ea0c8b:/app# echo $?
0
# FAIL (webserver is NOT serving on 9000)
root#ab7470ea0c8b:/app# echo -n > /dev/tcp/127.0.0.1/9000
bash: connect: Connection refused
bash: /dev/tcp/127.0.0.1/9000: Connection refused
root#ab7470ea0c8b:/app# echo $?
1
Unfortunately, I think this is the best that can be done without installing curl or wget.

Distributed RabbitMQ Nodes don't recognize each other

I'm working on a RabbitMQ distributed POC and I'm stuck at the basics of clustering the nodes.
I'm trying to follow the rabbit's tutorial on clustering so this is my reference.
After installing erlang (R14B04) and rabbit (2.8.2-1) I've copied the .erlang.cookie file contents from one node to the other two.
I wasn't sure about how to get erlang to notice this change to I had to restart the machines themselves (pretty brute force but I don't know erlang at all).
In addtion I opened in iptables 4369 and 5 additional ports for communications and placed under
/usr/lib64/erlang/bin/sys.config the following config:
{kernel,[{inet_dist_listen_min, XX00},{inet_dist_listen_max,XX05}]}]
Then another restart (dumb I know) to verify erlang takes these into consideration but still when I run:
rabbitmqctl cluster rabbit#HostName1
I get:
Clustering node rabbit#HostName2 with [rabbit#HostName1] ...
Error: {no_running_cluster_nodes,[rabbit#HostName1],
[rabbit#HostName1]}
There is a chance my fiddling with the erlang.cookie or with the ports did not succeed but I don't know how to check them. I tried typing erl in the cmd and then erl_epmd:names() or other commands to get more information but I'm probably way off in erlang land.
Would truly appreciate any help
Update:
I tried pinging two erlang nodes manually and got pang back.
I did the following:
Connected to two nodes, stopped rabbitmq (wasn't sure if needed but to be sure), started erlang like so (erl -sname dilbert and erl -sname dilbert2) when the erlang command line started i ran node(). on each of them and got dilbert#HostName1 and dilbert2#HostName2 respectively. I then tried to run net_adm:ping('dilbert'). and net_adm:ping('dilbert#HostName1'). with the single quote and without them from both nodes (changed names of course) and got on all 8 cases pang.
When I ran nodes(). on one of the machines I got back an empty array.
I've also tried to allow all traffic in the firewall (script) and then try to run the above commands (don't worry they're back on now) and still got back pang.
Update2:
For some reason I had cookies mismatch which I needed to resolve (thanks #kjw0188 for the suggestion [I ran erlang:get_cookie(). in the erlang command line]).
This did not help and I needed to stop iptables completely (not sure why but I'll figure it soon) and load the erlang node with -name dilbert#my-ip because my rackspace servers have no dns-name. This finally enabled me to get a pong and see the nodes see each other (nodes(). returns a non-empty array after the ping).
The problem I'm facing now is how to instruct RabbitMQ to use -name instead of -sname when starting erlang.
So I had multiple issues with connecting my two RabbitMQ nodes-
I'll add that my nodes are hosted on rackspace, and so don't have a default exposable hostname, and require iptables since there is no DMZ or built in security group concept like amazon.
Problems:
1. Cookie- Not sure how or why but I had multiple instances of .erlang.cookie (in /root, in my home directory and in /var/lib/rabbitmq/) I kept only the one in rabbitmq and verified all nodes have the same cookie.
2. IPTables- In order for the nodes to communicate I needed to open the epmd port and the range of ports for the actual communication inet_dist_listen_min inet_dist_listen_max.
/sbin/iptables -A INPUT -i eth1 -p tcp --dport ${epmd} -s ${otherNode} -j ACCEPT
/sbin/iptables -A INPUT -i eth1 -p tcp --dport ${inet_dist_listen_min}:${inet_dist_listen_max} -s ${otherNode} -j ACCEPT
empd is the usuall 4369 port and for the other range use whatever range you want.
${otherNode} is the ip of my other node.
I also needed to configure erlang through rabbitmq to use these ports (see config file at end)
3. HostName- Seeing as I don't have a hostname I needed to edit the rabbit scripts to use -name and not -sname (the first tells erlang to take the whole name, the latter stands for short name and thus appends an # symbol and the hostname).
This was accomplished by editing:
/usr/lib/rabbitmq/bin/rabbitmqctl
Added at the beginning the definition of the RABBITMQ_NODE_IP_ADDRESS property
DEFAULT_NODE_IP_ADDRESS=auto
DEFAULT_NODE_PORT=5672
[ "x" = "x$RABBITMQ_NODE_IP_ADDRESS" ] && RABBITMQ_NODE_IP_ADDRESS=${NODE_IP_ADDRESS}
[ "x" = "x$RABBITMQ_NODE_PORT" ] && RABBITMQ_NODE_PORT=${NODE_PORT}
[ "x" = "x$RABBITMQ_NODE_IP_ADDRESS" ] && [ "x" != "x$RABBITMQ_NODE_PORT" ] && RABBITMQ_NODE_IP_ADDRESS=${DEFAULT_NODE_IP_ADDRESS}
[ "x" != "x$RABBITMQ_NODE_IP_ADDRESS" ] && [ "x" = "x$RABBITMQ_NODE_PORT" ] && RABBITMQ_NODE_PORT=${DEFAULT_NODE_PORT}
and in the actual erl command I changed
-sname ${RABBITMQ_NODENAME} \ to
-name ${RABBITMQ_NODENAME}#${RABBITMQ_NODE_IP_ADDRESS}\.
This made rabbitmq listen only on the specified ip address (specified in the config file at the end) and load with that ip instead of the usuall hostname.
edited /usr/lib/rabbitmq/bin/rabbitmq-server
Changed the actual erl command from -sname ${RABBITMQ_NODENAME} \ to -name ${RABBITMQ_NODENAME}#${RABBITMQ_NODE_IP_ADDRESS}\
Added a rabbit conf (/etc/rabbitmq/rabbitmq-env.conf) file with-
#the ip address which rabbit should use, this is to limit rabbit to only use internal rackspace communication and not publicly accessible ports
NODE_IP_ADDRESS=myIpAdress
#had to change the nodename becaue otherwise rabbitmq used rabbit#Hostname and not only rabbit
NODENAME=myCompany
#This instructed rabbit to instruct erlang which ports it should use for its communications with other nodes
export SERVER_ERL_ARGS="$SERVER_ERL_ARGS -kernel inet_dist_listen_min somePort -kernel inet_dist_listen_max someOtherBiggerPort"
Some resources which helped me along the way:
RabbitMQ Clustering Guide
Clustering RabbitMQ servers for High Availability
rabbitmq-env.conf(5) manual page
Node communication by public IP address erlang mailing list (The middle post)
Configuring RabbitMQ Cluster on Cloud
Hope this will help anyone else.
EDIT:
Not sure how I was mistaken but it seemed my erlang-rabbit port instructions were not taken into consideration or were not enough. Ended up having to allow all communications between the two nodes...
One thing to really watch out for is whitespace of any kind in the erlang cookie file, especially line breaks AFTER the contents of the cookie. So long as both are identical, things are okay, but when one has a line break and the other doesn't, thing won't work.
Background: I was facing the same issue while setting up Rabbitmq cluster. I was using 2 docker containers running on my host-machine, which is equivalent to 2 separate nodes and I could not create a cluster of these two.
Solution: 1. Make sure you have same erlang cookie on all your cluster nodes, the default location is /var/lib/rabbitmq/.erlang.cookie. This file is used for authentication, so make sure, you have it same on all the nodes. After changing the .erlang.cookie restart your rabbitmq service.
Make sure that nodes are accessible from one other, use ping or telnet to check the connection.
Check that /etc/hosts have correct entries, for example if rabbit2 wants to join cluster rabbit1, /etc/hosts of rabbit2 should contain.
172.68.1.6 rabbit1
172.68.1.7 rabbit2
Now stop service using $rabbitmqctl stop_app followed by $rabbitmqctl join_cluster rabbit#rabbit1, start your service by rabbitmqctl start_app and check $rabbitmqctl cluster_status to see weather you have joined the cluster or not.
I followed the rabbitmq official documentation to setup the cluster.
to change RabbitMQ sname/name behaviour you can edit the scripts:
rabbitmq-multi
rabbitmq-server
rabbitmqctl
Example
In script rabbitmqctl there is the following piece of code:
exec erl \
-pa "${RABBITMQ_HOME}/ebin" \
-noinput \
-hidden \
${RABBITMQ_CTL_ERL_ARGS} \
-sname rabbitmqctl$$ \
-s rabbit_control \
-nodename $RABBITMQ_NODENAME \
-extra "$#"
You have to change it in:
exec erl \
-pa "${RABBITMQ_HOME}/ebin" \
-noinput \
-hidden \
${RABBITMQ_CTL_ERL_ARGS} \
-name rabbitmqctl$$ \
-s rabbit_control \
-nodename $RABBITMQ_NODENAME \
-extra "$#"
http://pearlin.info/?p=1672
so you need to copy the cookie from the node you trying to connect
example :- rabbit#node1
rabbit#node2
go to rabbit#node1 and copy the cookie from cat /var/lib/rabbitmq/.erlang.cookie
go to rabbit#node2 remove the current cookie and paste the new one.
on same node
/usr/sbin/rabbitmqctl stop_app
/usr/sbin/rabbitmqctl reset
/usr/sbin/rabbitmqctl cluster rabbit#node1
should do it.
same documented here.
http://pearlin.info/?p=1672

Resources