RabbitMQ Unable to Join Cluster

RabbitMQ Unable to Join Cluster - docker

I am trying to learn clustering rabbitmq nodes and I am following this tutorial as well as the official documentation.
I have 2 physical machines with rabbitmq deployed on them through docker. machine1 (192.168.1.2) is to be the cluster, and machine2 (192.168.1.3) is to join it.
When I attempt to run rabbitmqctl join_cluster rabbit#192.168.1.2 from machine2, this fails with the following message.
Clustering node rabbit#node2.rabbit with rabbit#192.168.1.2
Error: unable to perform an operation on node 'rabbit#192.168.1.2'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit#192.168.1.2
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit#192.168.1.2']
rabbit#192.168.1.3:
* connected to epmd (port 4369) on 192.168.1.2
* epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic
* TCP connection succeeded but Erlang distribution failed
* suggestion: check if the Erlang cookie identical for all server nodes and CLI tools
* suggestion: check if all server nodes and CLI tools use consistent hostnames when addressing each other
* suggestion: check if inter-node connections may be configured to use TLS. If so, all nodes and CLI tools must do that
* suggestion: see the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
Current node details:
* node name: 'rabbitmqcli-1352-rabbit#node2.rabbit'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: XXXXXXXXXXXXX
The error logs on machine1 show nothing related to such a connection attempt. I have verified the md5sum of the cookies on both docker containers and they are exactly the same. So are the permissions.
I assumed perhaps the port 4369 isn't reachable, but it is.
I am unsure what I am doing wrong. Can someone help here?
Additional information:
I am using the rabbitmq:3.85-management image. It uses Erlang/OTP 23 [erts-11.0.3].
I have been checking the troubleshooting guide, but I am unsure what seems wrong here. Please let me know if I can provide more information.

So thanks to #NeoAnderson and #José M, I was able to understand what happened.
The containers running RMQ need to be accessible via the hostname that Erlang uses within the service, across the network. Since the hostname of the containers were not accessible in a container on another machine, this clustering failed.
A simple fix would be to edit the /etc/hosts file on the containers so that it would point the IP to the "leader" node.
I was just doing this to avoid installing RMQ and not because I thought this was the best way to do this. Alternately, docker swarm or k8s would have provided the right networking for me.
But the root cause was definitely the nodename problem.

Related

Neo4j setup in OpenShift

I am having difficulties deploying Neo4j official docker image https://hub.docker.com/_/neo4j to an OpenShift environment and accessing it from outside (from my local machine)
I have performed the following steps:
oc new-app neo4j
Created route for port 7474
Set up the environment variable NEO4J_dbms_connector_bolt_listen__address to 0.0.0.0:7687 which is the equivalent of seting up the dbms.connector.bolt.listen_address=0.0.0.0:7687 in the neo4j.conf file.
Access the route url from local machine which will open the neo4j browser which requires authentication. At this point I am blocked because any combination of urls I try are unsuccessful.
As a workaround I have managed to forward 7687 port to my local machine, install Neo4j Desktop solution and connect via bolt://localhost:7687 but this is not the ideal solution.
Therefore there are two questions:
1. How can I connect from the neo4j browser to it's own database
How can I connect from external environment (trough OpenShift route) to the Neo4j DB

I have no experience with the OpenShift, but try to add the following config:
dbms.default_listen_address=0.0.0.0
Is there any other way for you to connect to Neo4j, so that you could further inspect the issue?

Short answer:
To connect to the DB that is most likely a configuration issue, maybe Tomaž Brataničs answer is the solution. As for accessing the DB from outside, you will most likely need a NodePort.
Long answer:
Note that OpenShift Routes are for HTTP / HTTPS traffic and not for any other kind of traffic. Typically, the "Routers" of an OpenShift cluster listen only on Port 80 and 443, so connecting to your database on any other port will most likely not work (although this heavily depends on your cluster configuration).
The solution for non-HTTP(S) traffic is to use NodePorts as described in the OpenShift documentation: https://docs.openshift.com/container-platform/3.11/dev_guide/expose_service/expose_internal_ip_nodeport.html
Note that also for NodePorts, you might need to have your cluster administrator add additional ports to the loadbalancer or you might need to connect to the OpenShift Nodes directly. Refer to the documentation on how to use NodePorts.

Akka Cluster with bind-port and bind-hostname

After configuring bind-hostname and bind-port in application.conf, as specified by the Akka FAQ, and bringing up the cluster, I'm receiving an error:
[ERROR] [07/09/2015 19:54:24.132] [default-akka.remote.default-remote-dispatcher-20]
[akka.tcp://default#54.175.105.30:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fdefault%4054.175.105.30%3A2552-757/endpointWriter]
dropping message [class akka.actor.ActorSelectionMessage]
for non-local recipient[Actor[akka.tcp://default#54.175.105.30:32810/]]
arriving at [akka.tcp://default#54.175.105.30:32810]
inbound addresses are [akka.tcp://default#54.175.105.30:2552]
What this seems to say is that the actor has received a message destined for port 32810 (the external port) but its dropping it because the internal port (2552) doesn't match.
The relevant portions of the file are:
hostname = 54.175.105.30
port = 32810
bind-hostname = 172.17.0.44
bind-port = 2552
I've tried this on 2.4-M1, 2.4-M2, and 2.4-SNAPSHOT, all with the same effect.
Has anyone else encountered this before? Any suggestions?
edit:
This actor system is running in ECS in docker containers. The docker container configuration is set to forward from the ephemeral range to 2552 on the container's private IP. ECS is successfully mapping the hostname:port to bind-hosname:bind-port. The actor is successfully running and binding to the local bind-hostname and bind-port, but is dropping messages and emitting the error described above.

bind-* configuration settings are meant to be used in situations when Akka nodes are started behind NAT (or in docker containers). Have you configured address translation from hostname:port to bind-hostname:bind-port?
In your particular configuration, when you do
ctx.actorSelection("akka.tcp://default#54.175.105.30:32810/user/actor") ! "Hi"
then someone at 54.175.105.30 should be listening for TCP port 32810 and port forwarding to 172.17.0.44:2552. The actor system should be running with your provided configuration at 172.17.0.44:2552. Is this the case?
Also you have to configure this for every node that is behind a NAT, because connections between Actor Systems are peer to peer.

This was due to a misconfiguration on my end. Some boilerplate code was remaining that was overriding the bind-port.

neo4j backup error when backing up from ha cluster

I'm trying to setup backup for a Neo4j cluster with 3 instances. Neo4j is embedded.
If I run:
./neo4j-backup -from ha://10.106.4.80:5001,10.106.4.203:5001,10.106.14.164:5001 -to /tmp/neobak2/
from a host outside the 10.106.4.0 network, I get this error:
Could not find backup server in cluster neo4j.ha at 10.106.4.80:5001,10.106.4.203:5001,10.106.14.164:5001, operation timed out.
If I run it from a cluster member it works just fine. Also if I run the backup script with single instead of ha works fine from anywhere.
Below the basic cluster config I'm using:
ha.server_id: 1
ha.initial_hosts:10.106.4.80:5001,10.106.4.203:5001,10.106.14.164:5001
ha.tx_push_factor: 2
I already checked for firewall issues, there aren't any. Neo4j version used is 1.9.5.
The webadmin interface shows the cluster has online backup enabled and listening to the default port.
Any help will be appreciated.

According to RFC 5735 IP Adresses 10.0.0.0/8 are private. So I assume they're not routed from an external host.

Jenkins Slave port number for firewall

We use Jenkins 1.504 on Windows.
We need to have Master and Slave in different sub-networks with firewall in between.
We can't have ANY to ANY port firewall rules, we must specify exact port numbers.
I know the port Master is listening on.
I also see that Slave opens connection to the Master from the arbitrary port dynamically assigned every run, and port on the Master side is also arbitrary.
I can fix Master's port by specifying it in Manage Jenkins > Configure Global Security > TCP port for JNLP slave agents).
How to fix Slave port?
UPDATE: Found Connection Mechanism described here: https://wiki.jenkins-ci.org/display/JENKINS/Jenkins+CLI#JenkinsCLI-Connectionmechanism
I think it might work for us, but still would be better to have fixed-2-fixed ports connection.

We had a similar situation, but in our case Infosec agreed to allow any to 1, so we didnt had to fix the slave port, rather fixing the master to high level JNLP port 49187 worked ("Configure Global Security" -> "TCP port for JNLP slave agents").
TCP
49187 - Fixed jnlp port
8080 - jenkins http port
Other ports needed to launch slave as a windows service
TCP
135
139
445
UDP
137
138

A slave isn't a server, it's a client type application. Network clients (almost) never use a specific port. Instead, they ask the OS for a random free port. This works much better since you usually run clients on many machines where the current configuration isn't known in advance. This prevents thousands of "client wouldn't start because port is already in use" bug reports every day.
You need to tell the security department that the slave isn't a server but a client which connects to the server and you absolutely need to have a rule which says client:ANY -> server:FIXED. The client port number should be >= 1024 (ports 1 to 1023 need special permissions) but I'm not sure if you actually gain anything by adding a rule for this - if an attacker can open privileged ports, they basically already own the machine.
If they argue, then ask them why they don't require the same rule for all the web browsers which people use in your company.

I have a similar scenario, and had no problem connecting after setting the JNLP port as you describe, and adding a single firewall rule allowing a connection on the server using that port. Granted it is a randomly selected client port going to a known server port (a host:ANY -> server:1 rule is needed).
From my reading of the source code, I don't see a way to set the local port to use when making the request from the slave. It's unfortunate, it would be a nice feature to have.
Alternatives:
Use a simple proxy on your client that listens on port N and then does forward all data to the actual Jenkins server on the remote host using a constant local port. Connect your slave to this local proxy instead of the real Jenkins server.
Create a custom Jenkins slave build that allows an option to specify the local port to use.
Remember also if you are using HTTPS via a self-signed certificate, you must alter the configuration jenkins-slave.xml file on the slave to specify the -noCertificateCheck option on the command line.

Cassandra Cluster Setup getting JMX error

I m trying setup a cassandra cluster as a test bed but gave the JMX remote connection error. I seem to found the answer for my error from cassandra FAQ page
Nodetool says "Connection refused to host: 127.0.1.1" for any remote host. What gives?
Nodetool relies on JMX, which in turn relies on RMI, which in turn sets up it's own listeners and connectors as needed on each end of the exchange. Normally all of this happens behind the scenes transparently, but incorrect name resolution for either the host connecting, or the one being connected to, can result in crossed wires and confusing exceptions.
If you are not using DNS, then make sure that your /etc/hosts files are accurate on both ends. If that fails try passing the -Djava.rmi.server.hostname=$IP option to the JVM at startup (where $IP is the address of the interface you can reach from the remote machine).
But can somebody help me on how to do -Djava.rmi.server.hostname=$IP
Or what to add is hosts file, i know that in hosts normally we add "IP Alias", but whose ip and alias.
I dont know much java or either linux
I m currently working on ubuntu v10.04 and cassandra v0.74
Sudesh

For JMX you need to enable JMX-remoting:
java -Dcom.sun.management.jmxremote
Depending on from where you want to access the jmx-server, you also need to specify a port:
-Dcom.sun.management.jmxremote.port=12345
and set or disable passwords.
Have a look at http://download.oracle.com/javase/1.5.0/docs/guide/management/agent.html for more details.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart