Dataflow worker unable to connect to Kafka through Cloud VPN - google-cloud-dataflow

I have issues connecting a KafkaIO source to brokers available only through a Cloud VPN tunnel.
The tunnel is set up to allow traffic from a specific subnetwork (secure) and routes are set up and working for compute engines in that subnetwork.
Executing the pipeline with the DirectRunner KafkaIO is able to connect to the brokers, whether through the VPN on a standard compute engine in the secure subnetwork, or through a local machine with ssh tunnels setup by sshuttle.
Running the pipeline with the DataflowRunner connections to the brokers fail with:
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata. The pipeline gets executed within the secure subnetwork.
Connecting to the compute engine instance spanned by the job the following routes are visible:
jgrabber#REDACTED-harness-REDACTED ~ $ ip r
default via 10.74.252.1 dev eth0 proto dhcp src 10.74.252.3 metric 1024
default via 10.74.252.1 dev eth0 proto dhcp metric 1024
10.74.252.1 dev eth0 proto dhcp scope link src 10.74.252.3 metric 1024
10.74.252.1 dev eth0 proto dhcp metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
The IPv4 addresses of the brokers are within a 172.17.0.0/16 (remote) network. The VPN is configured with a remote network range of 172.16.0.0/12.
Could the remote 172.17.0.0/16 network be shadowed by the virtual network setup and used by docker?

Related

Why CURL didn't work from inside docker container?

I have 2 services in Docker (each service has its own docker-compose.yml, nginx + php-fpm).
Service #1 on port 48801.
Service #2 on port 48802.
My server IP 99.99.99.44 (CentOS 8).
I make CURL request (via PHP) from inside Service #1 to Service #2 (i.e. to 99.99.99.44:48802). But I get next error:
Failed to connect to 99.99.99.44 port 48802 after 1017 ms: Host is unreachable
There is a problem with my server. I need a help (or direction).
Some info.
On other server this services work fine.
Request from inside container to port 80 of this server works fine.
Request from host (not from inside container) to custom port 48802 works fine.
All services available from browser (via custom ports).
SELinux disabled.
Firewalld disabled.
My ip route result:
default via 99.99.99.1 dev eno1 proto static metric 100
99.99.99.1 dev eno1 proto static scope link metric 100
172.18.0.0/16 dev br-2f405adcc89d proto kernel scope link src 172.18.0.1
172.19.0.0/16 dev br-19c596fe7618 proto kernel scope link src 172.19.0.1

how to channelize docker container traffic through a hosts network adapter

I have the following network route on my host pc. I am using softether vpn. Its setup on the adapter vpn_se
$ ip route
default via 192.168.43.1 dev wlan0 proto dhcp metric 600
vpn.ip.adress via 192.168.43.1 dev wlan0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.30.0/24 dev vpn_se proto kernel scope link src 192.168.30.27
192.168.43.0/24 dev wlan0 proto kernel scope link src 192.168.43.103 metric 600
Now I want to route all traffic from a docker container to vpn_se
something like
172.17.0.6 via 192.168.30.1 dev vpn_se
How can i achieve this

Interpretation of ip routes rules

The citation comes from: https://github.com/docker/labs/blob/master/networking/concepts/05-bridge-networks.md
When we peek into the host routing table we can see the IP interfaces
in the global network namespace that now includes docker0. The host
routing table provides connectivity between docker0 and eth0 on the
external network, completing the path from inside the container to the
external network.
host$ ip route default via 172.31.16.1 dev eth0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.42.1
172.31.16.0/20 dev eth0 proto kernel scope link src 172.31.16.102
It is written: The host table provides connectivity between docker0 and eth0. I cannot see where in that rules the connectivity is introduced. Can you explain?

Docker swarm overlay network with vxlan routing over openvpn

I have setup a docker swarm with 3 nodes (docker 18.03). These nodes use an overlay network to communicate.
node1:
laptop
host tun0 172.16.0.6 --> openvpn -> nat gateway
container n1
ip = 192.169.1.10
node2:
aws ec2
host eth2 10.0.30.62
container n2
ip = 192.169.1.9
node3:
aws ec2
host eth2 10.0.140.122
container n3
ip = 192.169.1.12
nat-gateway:
aws ec2
tun0 172.16.0.1 --> openvpn --> laptop
eth0 10.0.30.198
The scheme is partly working:
1. Containers can ping eachother using name (n1,n2,n3)
2. Docker swarm commands are working, services can be deployed
The overlay is partly working. Some nodes cannot communicate with each other either using tcp/ip or udp. I tried all combinations of the 3 nodes with udp and tcp/ip:
I did a tcpdump on the nat gateway to monitor overlay vxlan network activity (port 4789):
tcpdump -l -n -i eth0 "port 4789"
tcpdump -l -n -i tun0 "port 4789"
Then I tried tcp/ip communication from node2 to node3. On node3:
nc -l -s 0.0.0.0 -p 8999
On node1:
telnet 192.169.1.12 8999
Node1 will then try to connect to node3. I see packets coming in on the nat-gateway over the tun0 interface:
on the nat-gateway eth0 interface:
it seems that the nat-gateway is not sending replies back over the tun0 interface.
The iptables configuration the nat-gateway
The routing of the nat-gateway
Can you help me solve this issue?
I have been able to fix the issue using the following configuration on the NAT gateway:
and
No masquerading of 172.16.0.0/22 is needed. All the workers and managers will route their traffic for 172.16.0.0/22 via the NAT gateway, and it knows how to send the packets over tun0.
Masquerading of eth0 was just wrong...
All the containers can now ping and establish tcp/ip connections to each other.

glusterfs geo-replication - server with two interfaces - private IP advertised

I have been trying to setup a geo replication with glusterfs servers. Everything worked as expected in my test environment, on my staging environment, but then i tried the production and got stuck.
Let say I have
gluster fs server is on public ip 1.1.1.1
gluster fs slave is on public 2.2.2.2, but this IP is on interface eth1
The eth0 on gluster fs slave server is 192.168.0.1.
So when i start the command on 1.1.1.1 (firewall and ssh keys are set properly)
gluster volume geo-replication vol0 2.2.2.2::vol0 create push-pem
I get an error.
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
The error is not that important in this case, the problem is the slave IP address
2015-03-16T11:41:08.101229+00:00 xxx kernel: TCP LOGDROP: IN= OUT=eth0 SRC=1.1.1.1 DST=192.168.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=24243 DF PROTO=TCP SPT=1015 DPT=24007 WINDOW=14600 RES=0x00 SYN URGP=0
As you can see in the firewall drop log above, the port 24007 of the slave gluster daemon is advertised on private IP of the interface eth0 on slave server and should be the IP of the eth1 private IP. So master cannot connect and will time out
Is there a way to force gluster server to advertise interface eth1 or bind to it only?
I use cfengine and ansible to push configuration, so binding to Interface could be a better solution than IP, but whatever solution will do.
Thank you in advance.
I've encountered this issue but in a different context.
I was trying to geo-replicate two nodes which were both behind a NAT (AWS instances in different regions).
When the master connects to the slave via the public IP to check for volume compatability/size and other details, it retrieves the hostname of the slave, which usually resolves to something that only has meaning in that remote region.
Then it uses that hostname to dial back to the slave when later setting up the session, which fails, as that hostname resolves to a private IP in a different region.
My workaround for the issue was to use hostnames when creating the volumes, probing for peers, and establishing geo replication, and then add a /etc/hosts entry mapping slaves hostname which usually resolves to its private IP to its public IP, rather than it's private IP.
This gets you to the point where you establish a session, but I haven't had any luck actually getting it to sync, as it uses the wrong IP somewhere long the way again.
Edit:
I've actually managed to get it running by adding /etc/hosts hacks on both sides.
GlusterFS has no notion of the network layer. Check your routes. If the next-hop for your geo-replication slave is on eth1, then gluster will open a port on that interface for the slave IP address.
Also make sure your firewall is configured to forward geo-replication traffic on this port.

Resources