glusterfs geo-replication - server with two interfaces - private IP advertised

glusterfs geo-replication - server with two interfaces - private IP advertised - binding

I have been trying to setup a geo replication with glusterfs servers. Everything worked as expected in my test environment, on my staging environment, but then i tried the production and got stuck.
Let say I have
gluster fs server is on public ip 1.1.1.1
gluster fs slave is on public 2.2.2.2, but this IP is on interface eth1
The eth0 on gluster fs slave server is 192.168.0.1.
So when i start the command on 1.1.1.1 (firewall and ssh keys are set properly)
gluster volume geo-replication vol0 2.2.2.2::vol0 create push-pem
I get an error.
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
The error is not that important in this case, the problem is the slave IP address
2015-03-16T11:41:08.101229+00:00 xxx kernel: TCP LOGDROP: IN= OUT=eth0 SRC=1.1.1.1 DST=192.168.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=24243 DF PROTO=TCP SPT=1015 DPT=24007 WINDOW=14600 RES=0x00 SYN URGP=0
As you can see in the firewall drop log above, the port 24007 of the slave gluster daemon is advertised on private IP of the interface eth0 on slave server and should be the IP of the eth1 private IP. So master cannot connect and will time out
Is there a way to force gluster server to advertise interface eth1 or bind to it only?
I use cfengine and ansible to push configuration, so binding to Interface could be a better solution than IP, but whatever solution will do.
Thank you in advance.

I've encountered this issue but in a different context.
I was trying to geo-replicate two nodes which were both behind a NAT (AWS instances in different regions).
When the master connects to the slave via the public IP to check for volume compatability/size and other details, it retrieves the hostname of the slave, which usually resolves to something that only has meaning in that remote region.
Then it uses that hostname to dial back to the slave when later setting up the session, which fails, as that hostname resolves to a private IP in a different region.
My workaround for the issue was to use hostnames when creating the volumes, probing for peers, and establishing geo replication, and then add a /etc/hosts entry mapping slaves hostname which usually resolves to its private IP to its public IP, rather than it's private IP.
This gets you to the point where you establish a session, but I haven't had any luck actually getting it to sync, as it uses the wrong IP somewhere long the way again.
Edit:
I've actually managed to get it running by adding /etc/hosts hacks on both sides.

GlusterFS has no notion of the network layer. Check your routes. If the next-hop for your geo-replication slave is on eth1, then gluster will open a port on that interface for the slave IP address.
Also make sure your firewall is configured to forward geo-replication traffic on this port.

Related

kubeadm join times out on non-default NIC/IP

I am trying to configure a K8s cluster on-prem and the servers are running Fedora CoreOS using multiple NICs.
I am configuring the cluster to use a non-default NIC - a bond which is defined with 2 interfaces. All servers can reach each-other over that interface and have HTTP + HTTPS connectivity to the internet.
kubeadm join hangs at:
I0513 13:24:55.516837 16428 token.go:215] [discovery] Failed to request cluster-info, will try again: Get https://${BOND_IP}:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
The relevant kubeadm init config looks like this:
[...]
localAPIEndpoint:
advertiseAddress: ${BOND_IP}
bindPort: 6443
nodeRegistration:
kubeletExtraArgs:
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
runtime-cgroups: "/systemd/system.slice"
kubelet-cgroups: "/systemd/system.slice"
node-ip: ${BOND_IP}
criSocket: /var/run/dockershim.sock
name: master
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
[...]
The join config that am using looks like this:
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
bootstrapToken:
token: ${TOKEN}
caCertHashes:
- "${SHA}"
apiServerEndpoint: "${BOND_IP}:6443"
nodeRegistration:
kubeletExtraArgs:
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
runtime-cgroups: "/systemd/system.slice"
kubelet-cgroups: "/systemd/system.slice"
If I am trying to configure it using default eth0, it works without issues.
This is not a connectivity issue. The port test works fine:
# nc -s ${BOND_IP_OF_NODE} -zv ${BOND_IP_OF_MASTER} 6443
Ncat: Version 7.80 ( https://nmap.org/ncat )
Ncat: Connected to ${BOND_IP_OF_MASTER}:6443.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
I suspect this happens due to kubelet listening on eth0, if so, can I change it to use a different NIC/IP?
LE: The eth0 connection has been cut off completely (cable out, interface down, connection down).
Now, when we init, if we choose port 0.0.0.0 for the kube-api it defaults to the bond, which we wanted initially:
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 0.0.0.0
result:
[certs] apiserver serving cert is signed for DNS names [emp-prod-nl-hilv-quortex19 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.0.0.1 ${BOND_IP}]
I have even added the 6443 port in iptables for accept and it still times out.. All my CALICO pods are up and running (all pods for that matter in kube-system namespace)
LLE:
I have tested calico and weavenet and both show the same issue. The api-server is up and can be reached from the master using curl but it times out from the nodes.
LLLE:
On the premise that the kube-api is nothing but an HTTPS server, I have tried two options from the node that cannot reach it when doing the kubeadm join:
Ran a python3 simple http server over 6443 and WAS ABLE TO CONNECT from node
Ran an nginx pod and exposed it over another port as NodePort and WAS ABLE TO CONNECT from node
the node just cant reach the api-server on 6443 or any other port for that matter ....
what am i doing wrong...

The cause:
The interface used was in BOND of type ACTIVE-ACTIVE. This made it so kubeadm tried another interface from the 2 bonded, which was not in the same subnet as the IP of the advertised server apparently...
Using ACTIVE-PASSIVE did the trick and was able to join the nodes.
LE: If anyone knows why kubeadm join does not support LACP with ACTIVE-ACTIVE bond setups on FEDORA COREOS please advise here. Otherwise, if additional configurations are required, I would very much like to know what I have missed.

Joining a Docker swarm

I have 2 VMs.
On the first I run:
docker swarm join-token manager
On the second I run the result from this command.
i.e.
docker swarm join --token SWMTKN-1-0wyjx6pp0go18oz9c62cda7d3v5fvrwwb444o33x56kxhzjda8-9uxcepj9pbhggtecds324a06u 192.168.65.3:2377
However, this outputs:
Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.65.3:2377: connect: connection refused"
Any idea what's going wrong?
If it helps I'm spinning up these VMs using Vagrant.

Just add the port to firewall on master side
firewall-cmd --add-port=2377/tcp --permanent
firewall-cmd --reload
Then again try docker swarm join on second VM or node side

I was facing similar issue. and I spent couple of hours to figure out the root cause and share to those who may have similar issues.
Environment:
Oracle Cloud + AWS EC2 (2 +2)
OS: 20.04.2-Ubuntu
Docker version : 20.10.8
3 dynamic public IP+ 1 elastic IP
Issues
create two instances on the Oracle cloud at beginning
A instance (manager) docker swarm init --advertise-addr success
B instance (worker) docker join as worker is worker success
when I try to promo B as manager, encountered error
Unable to connect to remote host: No route to host
5. mesh routing is not working properly.
Investigation
Suspect it is related to network/firewall/Security group/security list
ssh to B server (worker), telnet (manager) 2377, with same error
Unable to connect to remote host: No route to host
3. login oracle console and add ingress rule under security list for all of relative port
TCP port 2377 for cluster management communications
TCP and UDP port 7946 for communication among nodes
UDP port 4789 for overlay network traffic
4. try again but still not work with telnet for same error
5. check the OS level firewall. if has disable it.
systemctl ufw disable
6. try again but still not work with same result
7. I suspect there have something wrong with oracle cloud, then I decide try to use AWS install the same version of OS/docker
8. add security group to allow all of relative ports/protocol and disable ufw
9. test with AWS instance C (leader/master) + D (worker). it works and also can promote D to manager. mesh routing was also work.
10. confirm the issue with oracle cloud
11. try to join the oracle instance (A) to C as worker. it works but still cannot promote as manager.
12. use journalctl -f to investigate the log and confirm there have socket timeout from A/B (oracle instances) to AWS instance(C)
13. relook the A/B, found there have iptables block request
14. remove all of setup in the iptables
# remove the rules
iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -F
15. remove all of setup in the iptables
Root Cause
It caused by firewall either in cloud security/WAF/ACL level or OS firewall/rules. e.g. ufw/iptables

I did firewall-cmd --add-port=2377/tcp --permanent firewall-cmd --reload already on master side and was still getting the same error.
I did telnet <master ip> 2377 on worker node and then I did reboot on master.
Then it is working fine.

It looks like your docker swarm manager leader is not running on port 2377. You can check it by firing this command on your swarm manager leader vm. If it is working just fine then you will get similar output
[root#host1]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
tilzootjbg7n92n4mnof0orf0 * host1 Ready Active Leader
Furthermore you can check the listening ports in leader swarm manager node. It should have port tcp 2377 for cluster management communications and tcp/udp port 7946 for communication among nodes opened.
[root#host1]# netstat -ntulp | grep dockerd
tcp6 0 0 :::2377 :::* LISTEN 2286/dockerd
tcp6 0 0 :::7946 :::* LISTEN 2286/dockerd
udp6 0 0 :::7946 :::* 2286/dockerd
In the second vm where you are configuring second swarm manager you will have to make sure you have connectivity to port 2377 of leader swarm manager. You can use tools like telnet, wget, nc to test the connectivity as given below
[root#host2]# telnet <swarm manager leader ip> 2377
Trying 192.168.44.200...
Connected to 192.168.44.200.

For me I was on linux and windows. My windows docker private network was the same as my local network address. So docker daemon wasn't able to find in his own network the master with the address I was giving to him.
So I did :
1- go to Docker Desktop app
2- go to Settings
3- go to Resources
4- go to Network section and change the Docker subnet address (need to be different from your local subnet address).
5- Then apply and restart.
6- use the docker join on the worker again.
Note: All this steps are performed on the node where the error appear. Make sure that the ports 2377, 7946 and 4789 are opens on the master (you can use iptables or ufw).
Hope it works for you.

route all traffic over gre tunnel

I have an openvswitch sw1 with subnet 10.207.39.0/24 that has lxc containers attached and I have the same on another physical server and I have successfully connected these using a GRE tunnel. However, the lxc containers have additional ports on additional openvswitches, e.g. sw4 with subnet 192.220.39.0/24 and I want to push that traffic over the single gre tunnel on sw1 because there is only one physical interface and it's not possible to have multiple gre tunnels on each openvswitch with the same physical interface IP addr endpoints. Is it possible to push the traffic on the other openvswitches over the gre tunnel on sw1? Or is there a better way to connect multiple subnets in lxc containers on two physical hosts? Thanks.

I solved this "myself" - with help from two links provided below - (after sleeping on it and relentless google searches over several frustrating days).
I realize the solution is pretty simple and would be clear to a networking professional. I am an Oracle DBA and only know as much networking as I need to work with orabuntu-lxc software, LXC containers, and Oracle software, so please keep that in mind if the below is "obvious" - it wasn't obvious to me in my network ignorance.
I got the clue on how to solve the actual steps from this blog post:
http://www.cnblogs.com/popsuper1982/p/3800548.html
I confirmed that any subnet should be routable over a GRE tunnel from this blog post (which gave me hope to keep working towards a solution):
https://supportforums.adtran.com/thread/1408
In particular the author stated in the adtran comment that "GRE tunnels have no limitation on the types of traffic which can traverse it. It can route multiple subnets without multiple tunnels."
That post told me that the solution was likely a routing solution and that only one GRE tunnel would be needed for this use case.
Note that this feature of "no limitation" on the types of traffic is great for Oracle RAC because we need to be able to send multicast over the GRE tunnel for RAC.
This use case:
I am building an Oracle RAC infrastructure to run in LXC Linux containers. I have a public network 10.207.39.0/24 on openvswitch sw1 and a private RAC interconnect network 192.220.39.0/24 on openvswitch sw4. I want to be able to build the RAC in LXC linux containers that span multiple physical hosts and so I created a GRE tunnel to connect the 10.207.39.1 tunnel endpoint on colossus to 10.207.39.5 tunnel endpoint on guardian.
Here is the setup details:
Host "guardian":
LAN wireless physical network interface: wlp4s0 (IP 192.168.1.11)
sw1 10.207.39.5
sw4 192.220.39.5
Host "colossus":
LAN wireless physical network interface: wlp4s0 (IP 192.168.1.15)
sw1 10.207.39.1
sw4 192.220.39.1
Step 1:
Create GRE tunnel between sw1 openvswitches on both physical hosts with physical wireless LAN network interface end points:
Host "guardian": Create gre tunnel phys hosts (guardian --> colossus).
sudo ovs-vsctl add-port sw1 gre0 -- set interface gre0 type=gre options:remote_ip=192.168.1.15
Host "colossus": Create gre tunnel phys hosts (colossus --> guardian).
sudo ovs-vsctl add-port sw1 gre0 -- set interface gre0 type=gre options:remote_ip=192.168.1.11
Step 2:
Route the 192.220.39.0/24 network over the established GRE tunnel as shown below:
Host "guardian": route 192.220.39.0/24 openvswitch sw4 over GRE tunnel:
sudo route add -net 192.220.39.0/24 gw 10.207.39.5 dev sw1
Host "colossus": route 192.220.39.0/24 openvswitch sw4 over GRE tunnel:
sudo route add -net 192.220.39.0/24 gw 10.207.39.1 dev sw1
Note: To add additional subnets repeat step 2 for each subnet.
Note on MTU:
Also, you have to allow for GRE encapsulation in MTU if you want to ssh over these tunnels.
Therefore in the above example for the main GRE tunnel connecting the hosts, we need MTU to be set to 1420 to allow 80 for the GRE header.
MTU on the LXC container virtual interfaces on the sw1 switches need to be set to MTU=1420 in the LXC container config files.
MTU on the LXC container virtual interfaces on the sw4 switches need to be set to MTU=1420 in the LXC container config files.
Note that the MTU on the openvswitches sw1 and sw4 should automatically set to the MTU on the LXC intefaces as long as ALL LXC virtual interfaces are set to the new lower MTU values, so explicitly setting MTU on the openvswitches sw1 and sw4 themselves should not be necessary.
If run into issues still with SSH over the tunnels, but ping works cross-hosts cross-containers, then re-check all MTU settings on the virtual interfaces and openvswitches and recheck.

Erlang - Nodes don't recognize

I'm trying to use distributing programming in Erlang.
But I had a problem, I can't communicate two Erlang's nodes to communicate.
I tried to put the same atom in the "Magical cookies", but it didn't work.
I tried to use command net:ping(node), but reponse was pang (didn't reconigze another node), or used nodes(), to see if my first node see the second node, but it didn't work again.
The first and second node is CentOS in VMWare, using bridge connection in network adaptor.
I entered command ping outside Erlang between VM's and they reconigze each one.
I start the first node, but the second node open process, but can't find the node pong.
(pong#localhost)8> tut17:start_pong().
true
(ping#localhost)5> c(tut17).
{ok,tut17}
(ping#localhost)6> tut17:start_ping(pong#localhost).
<0.55.0>
Thank you!

A similar question here.
The distribution is provided by a daemon called Erlang Port Mapper Daemon. By default it listens on port 4369 so you need to make sure that that port is opened between the nodes. Additionally, each started Erlang VM opens an additional port to communicate with other VMs. You can see those ports with epmd -names:
g#someserv1:~ % epmd -names
epmd: up and running on port 4369 with data:
name hbd at port 22200
You can check if the port is opened by doing telnet to it, e.g.:
g#someserv1:~ % telnet 127.0.0.1 22200
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
Connection closed by foreign host.
You can change the port to the port you want to check, e.g. 4369, and also the IP to the desired IP. Doing ping is not enough because it uses its own ICMP protocol which is different that TCP used by the Erlang distribution to communicate, e.g. ICMP may be allowed but TCP may be blocked.
Edit:
Please follow this guide Distributed Erlang to start an Erlang VM in distributed mode. Then you can use net_adm:ping/1 to connect to it from another node, e.g.:
(hbd#someserv1.somehost.com)17> net_adm:ping('hbd#someserv2.somehost.com').
pong
Only then epmd -names will show the started Erlang VM on the list.
Edit2:
Assume that there are tho hosts, A and B. Each one runs one Erlang VM. epmd -names run on each host shows for example:
Host A:
epmd: up and running on port 4369 with data:
name servA at port 22200
Host B:
epmd: up and running on port 4369 with data:
name servB at port 22300
You need to be able to do:
On Host A:
telnet HostB 4369
telent HostB 22300
On Host B:
telnet HostA 4369
telnet HostA 22200
where HostA and HostB are those hosts' IP addresses (.e.g HostA is IP of Host A, HostB is IP of Host B).
If the telnet works correctly then you should be able to do net_adm:ping/1 from one host to the other, e.g. on Host A you would ping the name of Host B. The name is what the command node(). returns.

You need to make sure you have a node name for your nodes, or they won't be available to connect with. E.g.:
erl -sname somenode#node1
If you're using separate hosts, then you need to make sure that the node names are resolvable to ip addresses somehow. An easy to way to do this is using /etc/hosts.
# Append a similar line to the /etc/hosts file
10.10.10.10 node1
For more helpful answers, you should post what you see in your terminal when you try this.
EDIT
It looks like your shell is auto picking "localhost" as the node name. You can't send messages to another host with the address "localhost". When specifying the name on the shell, try using the # syntax to specify the node name as well:
# On host 1:
erl -sname ping#host1
# On host 2
erl -sname pong#host2
Then edit the host file so host1 and host2 will resolve to the right IP.

Configure Couchbase 2.2 to use short hostname

I need to configure Couchbase 2.2 to use short hostname.
Currently I am using Couchbase 2.0.1 and in this case the solution was easy:
Set Short hostname in /opt/couchbase/var/lib/couchbase/ip and /opt/couchbase/var/lib/couchbase/ip_start files.
Change extra="-name ns_1#$ip" for extra="-sname ns_1#$ip" in _start() function into /opt/couchbase/bin/couchbase-server. This parameter was used to run erl (-run ns_bootstrap -- $extra)
This steps allows me to configure the node with the short hostname and create the cluster based on these.
In Couchbase 2.2 I can't do that because erl runs using babysitter. I try to configure babysitter to use short hostname but I couldn't make it work...
The servers were deployed in an a private Virtualization environment that only handle short hostname.
Each node has 2 ips, one public and one private. If I run a ping command from itself i get their private IP and I run ping a command from any other node I get their public IP.
For example, if I have one node:
myhost-00 (private IP: 192.168.8.170 public IP: 10.254.171.29)
from itself:
$ ping myhost-00
PING myhost-00 (192.168.8.170) 56(84) bytes of data.
from other node:
$ ping myhost-00
PING myhost-00 (10.254.171.29) 56(84) bytes of data.
Any ideas?

I figured out a workaround:
Firstly, I don't modify any of the Couchbase files.
Secondly, I add a fake domain to my short hostname in each /etc/hosts file. In the file I append the private IP for the current node and the public IP for other nodes with the fake domain.
For example, assuming I have 2 hosts:
myhost-00 (private IP: 192.168.8.170 public IP: 10.254.171.29)
myhost-01 (private IP: 192.168.8.168 public IP: 10.254.171.30)
myhost-00 /etc/hosts file:
...
192.168.8.170 myhost-00.mydomain
10.254.171.30 myhost-01.mydomain
...
myhost-01 /etc/hosts file:
...
10.254.171.29 myhost-00.mydomain
192.168.8.168 myhost-01.mydomain
...
Finally, I create de cluster using the hostnames with the fake domains (myhost-00.mydomain and myhost-01.mydomain)

At this time, Couchbase does not allow the use of short names for the node name. There are ticket updates that discuss and confirm this situation.
For long hostnames, you will find steps to use hostname at http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-getting-started-hostnames and http://docs.couchbase.com/couchbase-manual-2.5/cb-install/#using-hostnames, depending on version. You can leverage hostname when a cluster is created, a node is added to a cluster, or you can change from an IP address to a hostname via a REST API command. See the doc for full details.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart