route all traffic over gre tunnel - lxc

I have an openvswitch sw1 with subnet 10.207.39.0/24 that has lxc containers attached and I have the same on another physical server and I have successfully connected these using a GRE tunnel. However, the lxc containers have additional ports on additional openvswitches, e.g. sw4 with subnet 192.220.39.0/24 and I want to push that traffic over the single gre tunnel on sw1 because there is only one physical interface and it's not possible to have multiple gre tunnels on each openvswitch with the same physical interface IP addr endpoints. Is it possible to push the traffic on the other openvswitches over the gre tunnel on sw1? Or is there a better way to connect multiple subnets in lxc containers on two physical hosts? Thanks.

I solved this "myself" - with help from two links provided below - (after sleeping on it and relentless google searches over several frustrating days).
I realize the solution is pretty simple and would be clear to a networking professional. I am an Oracle DBA and only know as much networking as I need to work with orabuntu-lxc software, LXC containers, and Oracle software, so please keep that in mind if the below is "obvious" - it wasn't obvious to me in my network ignorance.
I got the clue on how to solve the actual steps from this blog post:
http://www.cnblogs.com/popsuper1982/p/3800548.html
I confirmed that any subnet should be routable over a GRE tunnel from this blog post (which gave me hope to keep working towards a solution):
https://supportforums.adtran.com/thread/1408
In particular the author stated in the adtran comment that "GRE tunnels have no limitation on the types of traffic which can traverse it. It can route multiple subnets without multiple tunnels."
That post told me that the solution was likely a routing solution and that only one GRE tunnel would be needed for this use case.
Note that this feature of "no limitation" on the types of traffic is great for Oracle RAC because we need to be able to send multicast over the GRE tunnel for RAC.
This use case:
I am building an Oracle RAC infrastructure to run in LXC Linux containers. I have a public network 10.207.39.0/24 on openvswitch sw1 and a private RAC interconnect network 192.220.39.0/24 on openvswitch sw4. I want to be able to build the RAC in LXC linux containers that span multiple physical hosts and so I created a GRE tunnel to connect the 10.207.39.1 tunnel endpoint on colossus to 10.207.39.5 tunnel endpoint on guardian.
Here is the setup details:
Host "guardian":
LAN wireless physical network interface: wlp4s0 (IP 192.168.1.11)
sw1 10.207.39.5
sw4 192.220.39.5
Host "colossus":
LAN wireless physical network interface: wlp4s0 (IP 192.168.1.15)
sw1 10.207.39.1
sw4 192.220.39.1
Step 1:
Create GRE tunnel between sw1 openvswitches on both physical hosts with physical wireless LAN network interface end points:
Host "guardian": Create gre tunnel phys hosts (guardian --> colossus).
sudo ovs-vsctl add-port sw1 gre0 -- set interface gre0 type=gre options:remote_ip=192.168.1.15
Host "colossus": Create gre tunnel phys hosts (colossus --> guardian).
sudo ovs-vsctl add-port sw1 gre0 -- set interface gre0 type=gre options:remote_ip=192.168.1.11
Step 2:
Route the 192.220.39.0/24 network over the established GRE tunnel as shown below:
Host "guardian": route 192.220.39.0/24 openvswitch sw4 over GRE tunnel:
sudo route add -net 192.220.39.0/24 gw 10.207.39.5 dev sw1
Host "colossus": route 192.220.39.0/24 openvswitch sw4 over GRE tunnel:
sudo route add -net 192.220.39.0/24 gw 10.207.39.1 dev sw1
Note: To add additional subnets repeat step 2 for each subnet.
Note on MTU:
Also, you have to allow for GRE encapsulation in MTU if you want to ssh over these tunnels.
Therefore in the above example for the main GRE tunnel connecting the hosts, we need MTU to be set to 1420 to allow 80 for the GRE header.
MTU on the LXC container virtual interfaces on the sw1 switches need to be set to MTU=1420 in the LXC container config files.
MTU on the LXC container virtual interfaces on the sw4 switches need to be set to MTU=1420 in the LXC container config files.
Note that the MTU on the openvswitches sw1 and sw4 should automatically set to the MTU on the LXC intefaces as long as ALL LXC virtual interfaces are set to the new lower MTU values, so explicitly setting MTU on the openvswitches sw1 and sw4 themselves should not be necessary.
If run into issues still with SSH over the tunnels, but ping works cross-hosts cross-containers, then re-check all MTU settings on the virtual interfaces and openvswitches and recheck.

Related

Docker-compose "ports": listen on multiple IP addresses / IP range

Instead of listening to a single IP address like e.g. localhost:
ports:
- "127.0.0.1:80:80"
I want the container to only listen to a local network, i.e. e.g.:
ports:
- "10.0.0.0/16:80:80"
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.SERVICE.ports contains an invalid type, it should be a number, or an object
Is this possible?
I don't want to use things like swarm mode etc., yet.
If IP range is not supported, maybe at least multiple IP addresses like 10.0.0.2 and 10.0.0.3?
ERROR: for CONTAINER Cannot start service SERVICE: driver failed programming external connectivity on endpoint CONTAINER (...): Error starting userland proxy: listen tcp 10.0.0.3:80: bind: cannot assign requested address
ERROR: for SERVICE Cannot start service SERVICE: driver failed programming external connectivity on endpoint CONTAINER (...): Error starting userland proxy: listen tcp 10.0.0.3:80: bind: cannot assign requested address
Or is it not even supported to listen to 10.0.0.3 ?
The host machine is connected to 10.0.0.0/16:
> ifconfig
ens10: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.0.0.2 netmask 255.255.255.255 broadcast 10.0.0.2
inet6 f**0::8**0:ff:f**9:b**7 prefixlen 64 scopeid 0x20<link>
ether **:00:00:**:**:** txqueuelen 1000 (Ethernet)
Listening to a single IP address seems not correct. The service is listening at an IP address.
Let's say your VM has two network interfaces (ethernet cards):
Network 1 → subnet: 10.0.0.0/24 and IP 10.0.0.100
Network 2 → subnet: 10.0.1.0/24 and IP 10.0.1.200
If you set 127.0.0.1:80:80 that means that your service listening at 127.0.0.1's (localhost) port 80.
If you want to access service from 10.0.0.0/24 subnet you should set 10.0.0.100:80:80 and use http://10.0.0.100:80 address to be able connect your container from external hosts
If you want to access service from multiple networks simultaneously you can bind the container port to multiple ports, where the IP is the connection source IP):
ports:
- 10.0.0.100:80:80
- 10.0.1.200:80:80
- 127.0.0.1:80:80
And don't forget to open 80 port at VM's firewall, if a firewall exists and restricts that network
I think you misunderstood this field.
When you map 127.0.0.1:80:80 you will map interface 127.0.0.1 from your host to your container.
In the case of the 127.0.0.1 you can only access it from inside your host.
When you map 10.0.0.3:80:80 you will map interface 10.0.0.3 from your host to your container. And all ip who can access 10.0.0.3 will have acces to your docker container mapping.
But in anycase this field will not do any filtering about who access this container
EDIT: After your modification i've seen my misunderstood about your question.
You want docker to create "bridge interface" to not share the ip of your host.
I don't think this is possible when using the port mapping
If you give Compose ports: (or docker run -p) an IP address, it must be a specific known IP address of a host interface, or 0.0.0.0 for "all interfaces". The Docker daemon gives this specific IP address to a bind(2) call, which takes an address and not a network, and follows the rules in ip(7) for IPv4.
With the output you show, you can only bind containers to 10.0.0.2. If you want to use other IP addresses on the same network, you also need to assign them to the host; see for example How can I (from CLI) assign multiple IP addresses to one interface? on Ask Ubuntu, and then you can bind a container to the newly-added address.
If your system is on multiple physical networks, you can have any number of ports: so long as the host address and host port are unique. In particular you can have multiple ports: that all forward to the same container port.
ports:
# make this visible to the external load balancer on port 80
- '192.168.17.2:80:3000'
# also make this visible to the internal network also on port 80
- '10.0.0.2:80:3000'
# and the management network but on port 3000
- '10.99.0.36:3000:3000'
Again, the host must already have these IP addresses in the ifconfig output.

Reaching between OVS bridges

want to ping different Open vSwitch bridges via a physical interface. My host has an eth1 and a wlan0 interface. I have created 3 OVS bridges and assigned IP addresses to them. wlan0 added to br0. Two virtual interface wlan0.1 and wlan0.2 is created by wlan0 and added them to br1 and br2. Another bridge breth is connected to eth1 interface and all other bridges are connected to breth by patch port. see the figure bellow
Now, host can ping all 3 brides. There are similar host in the network connected via wlan0 interface. They are under mesh network. Any host can ping any node's any bridge. But a PC is connected to eth1 interface only can ping br0 and other host's br0. That means, bridges assigned virtual interface are unreachable from PC. Is there any way to reach other bridges?

Docker Bridge Conflicts with Host Network

Docker seems to be creating a bridge after a container starts running that then conflicts with my host network. This is not the default bridge docker0, but rather another bridge that is created after a container has started. I am able to configure the default bridge according to the older user guide link https://docs.docker.com/v17.09/engine/userguide/networking/default_network/custom-docker0/, however, I do not know how to configure this other bridge so it does not conflict with 172.17.
This current issue is then that my container cannot access other systems on the host network when this bridge becomes active.
Any ideas?
Version of docker:
Version 18.03.1-ce-mac65 (24312)
This is the bridge that gets created. Sometimes it is not 172.17, but sometimes it is.
br-f7b50f41d024 Link encap:Ethernet HWaddr 02:42:7D:1B:05:A3
inet addr:172.17.0.1 Bcast:172.17.255.255 Mask:255.255.0.0
When docker networks are created (e.g. using docker network create or indirectly through docker-compose) without explicitly specifying a subnet range, dockerd allocates a new /16 network, starting from 172.N.0.0/16, where N is a number that is incremented (e.g. N=17, N=18, N=19, N=20, ...). A given N is skipped if a docker network (a custom one, or the default docker bridge) already exists in the range.
You can specify explicitly a safe IP range when creating a docker bridge (i.e. one that excludes the host ips in your network) on the CLI. But usually bridge networks are created automatically by docker-compose with default blocks. To exclude these IPs reliably would require modifying every docker-compose.yaml file you encounter. It's bad practice to include host-specific things inside a compose file.
Instead, you can play with the networks that docker considers allocated, to force dockerd to "skip" subnets. I'm outlining three methods below:
Method #0 -- configure the pool of ips in the daemon config
If your docker version is recent enough (TODO check minimum version), and you have permissions to configure the docker daemon's command line arguments, you can try passing --default-address-pool ARG options to the dockerd command. Ex:
# allocate /24 subnets with the given CIDR prefix only.
# note that this prefix excludes 172.17.*
--default-address-pool base=172.24.0.0/13,size=24
You can add this setting in one of the etc files: /etc/default/docker, or in /etc/sysconfig/docker, depending on your distribution. There is also a way to set this parameter in daemon.json (see syntax)
Method #1 -- create a dummy placeholder network
You can prevent the entire 172.17.0.0/16 from being used by dockerd (in future bridge networks) by creating a very small docker network anywhere inside 172.17.0.0/16.
Find 4 consecutive IPs in 172.17.* that you know are not in use in your host network, and sacrifice them in a "tombstone" docker bridge. Below, I'm assuming the ips 172.17.253.0, 172.17.253.1, 172.17.253.2, 172.17.253.3 (i.e. 172.17.253.0/30) are unused in your host network.
docker network create --driver=bridge --subnet 172.17.253.0/30 tombstone
# created: c48327b0443dc67d1b727da3385e433fdfd8710ce1cc3afd44ed820d3ae009f5
Note the /30 suffix here, which defines a block of 4 different IPs. In theory, the smallest valid network subnet should be a /31 which consists of a total of 2 IPs (network identifier + broadcast). Docker asks for a /30 minimum, probably to account for a gateway host, and another container. I picked .253.0 arbitrarily, you should pick something that's not in use in your environment. Also note that the identifier tombstone is nothing special, you can rename it to anything that will help you remember why it's there when you find it again several months later.
Docker will modify your routing table to send traffic for these 4 IPs to go through that new bridge instead of the host network:
# output of route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.5.1 0.0.0.0 UG 0 0 0 eth1
172.17.253.0 0.0.0.0 255.255.255.252 U 0 0 0 br-c48327b0443d
172.20.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
Note: Traffic for 172.17.253.{0,1,2,3} goes through the tombstone docker bridge just created (br-c4832...). Traffic for any other IP in the 172.17.* would go through the default route (host network). My docker bridge (docker0) is on 172.20.0.1, which may appear unusual -- I've modified bip in /etc/docker/daemon.json to do that. See this page for more details.
The twist: if there exists a bridge occupying even a subportion of a /16, new bridges created will skip that range. If we create new docker networks, we can see that the rest of 172.17.0.0/16 is skipped, because the range is not entirely available.
docker network create foo_test
# c9e1b01f70032b1eff08e48bac1d5e2039fdc009635bfe8ef1fd4ca60a6af143
docker network create bar_test
# 7ad5611bfa07bda462740c1dd00c5007a934b7fc77414b529d0ec2613924cc57
The resulting routing table:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.5.1 0.0.0.0 UG 0 0 0 eth1
172.17.253.0 0.0.0.0 255.255.255.252 U 0 0 0 br-c48327b0443d
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-c9e1b01f7003
172.19.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-7ad5611bfa07
172.20.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
Notice that the rest of the IPs in 172.17.0.0/16 have not been used. The new networks reserved .18. and .19.. Sending traffic to any of your conflicting IPs outside that tombstone network would go via your host network.
You would have to keep that tombstone network around in docker, but not use it in your containers. It's a dummy placeholder network.
Method #2 -- bring down the conflicting bridge network
If you wish to temporarily avoid the IP conflict, you can bring the conflicting docker bridge down using ip: ip link set dev br-xxxxxxx down (where xxxxxx represents the name of the bridge network from route -n or ip link show). This will have the effect of removing the corresponding bridge routing entry in the routing table, without modifying any of the docker metadata.
This is arguably not as good as the method above, because you'd have to bring down the interface possibly every time dockerd starts, and it would interfere with your container networking if there was any container using that bridge.
If method 1 stops working in the future (e.g. because docker tries to be smarter and reuse unused parts of an ip block), you could combine both approaches: e.g. create a large tombstone network with the entire /16, not use it in any container, and then bring its corresponding br-x device down.
Method #3 -- reconfigure your docker bridge to occupy a subportion of the conflicting /16
As a slight variation of the above, you could make the default docker bridge overlap with a region of 172.17.*.* that is not used in your host network. You can change the default docker bridge subnet by changing the bridge ip (i.e. bip key) in /etc/docker/daemon.json (See this page for more details). Just make it a subregion of your /16, e.g. in a /24 or smaller.
I've not tested this, but I presume any new docker network would skip the remainder of 172.17.0.0/16 and allocate an entirely different /16 for each new bridge.
The bridge was created from docker-compose, which can be configured within the compose file.
Answer found here: Docker create two bridges that corrupts my internet access

Centos 7 minimal install can't talk to internet

Newbie trying to install/set up Centos 7. Can ping other machines in the domain, but can't ping gateway, google.com etc. Gets destination host unreachable for gateway and unknown host google.com when pinging google.com
Please advice.
etc/sysconfig/network-scripts:
TYPE=Ethernet
BOOTPROTO=static
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=enp4s0
iUUID=c39e3407-a566-4586-8fb9-fd4e3bfc4617
DEVICE=enp4s0
ONBOOT=yes
IPADDR="192.168.192.150"
GATEWAY="208.67.254.41"
DNS1="8.8.8.8"
DNS2="8.8.4.4"
etc/resolv.conf
# Generated by NetworkManager
nameserver 8.8.8.8
nameserver 8.8.4.4
etc/sysconfig/network
# Created by anaconda
NETWORKING=yes
HOSTNAME=centos7
GATEWAY=208.67.254.41
Since it says unknown host google.com the machine is not able to route request to internet DNS server(8.8.8.8) to resolve google ip and when you ping the gateway it destination host not reachable
For a machine to connect to other machine their the machine should be within lan if not on lan then there should be a machine which acts a gateway machine within lan in your case you have pointed gateway to 208.67.254.41 obviously it is not on lan so this machine 208.67.254.41 should be accessible from some machine in lan to do so use route command
which add a routing entry in machines routing table
route add -host gw dev
In your case command goes like
route add -host 208.67.254.41 gw dev
eg : route add -host 192.168.12.45 gw 192.168.12.1 dev eth0
Comment entries if ipv6 is not used
Make sure to keep ip forwarding on in the gateway machine in /etc/sysclt.conf on gateway machine
Have you disabled Network Manager?
Command line:
service NetworkManager status

glusterfs geo-replication - server with two interfaces - private IP advertised

I have been trying to setup a geo replication with glusterfs servers. Everything worked as expected in my test environment, on my staging environment, but then i tried the production and got stuck.
Let say I have
gluster fs server is on public ip 1.1.1.1
gluster fs slave is on public 2.2.2.2, but this IP is on interface eth1
The eth0 on gluster fs slave server is 192.168.0.1.
So when i start the command on 1.1.1.1 (firewall and ssh keys are set properly)
gluster volume geo-replication vol0 2.2.2.2::vol0 create push-pem
I get an error.
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
The error is not that important in this case, the problem is the slave IP address
2015-03-16T11:41:08.101229+00:00 xxx kernel: TCP LOGDROP: IN= OUT=eth0 SRC=1.1.1.1 DST=192.168.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=24243 DF PROTO=TCP SPT=1015 DPT=24007 WINDOW=14600 RES=0x00 SYN URGP=0
As you can see in the firewall drop log above, the port 24007 of the slave gluster daemon is advertised on private IP of the interface eth0 on slave server and should be the IP of the eth1 private IP. So master cannot connect and will time out
Is there a way to force gluster server to advertise interface eth1 or bind to it only?
I use cfengine and ansible to push configuration, so binding to Interface could be a better solution than IP, but whatever solution will do.
Thank you in advance.
I've encountered this issue but in a different context.
I was trying to geo-replicate two nodes which were both behind a NAT (AWS instances in different regions).
When the master connects to the slave via the public IP to check for volume compatability/size and other details, it retrieves the hostname of the slave, which usually resolves to something that only has meaning in that remote region.
Then it uses that hostname to dial back to the slave when later setting up the session, which fails, as that hostname resolves to a private IP in a different region.
My workaround for the issue was to use hostnames when creating the volumes, probing for peers, and establishing geo replication, and then add a /etc/hosts entry mapping slaves hostname which usually resolves to its private IP to its public IP, rather than it's private IP.
This gets you to the point where you establish a session, but I haven't had any luck actually getting it to sync, as it uses the wrong IP somewhere long the way again.
Edit:
I've actually managed to get it running by adding /etc/hosts hacks on both sides.
GlusterFS has no notion of the network layer. Check your routes. If the next-hop for your geo-replication slave is on eth1, then gluster will open a port on that interface for the slave IP address.
Also make sure your firewall is configured to forward geo-replication traffic on this port.

Resources