I am trying to set the name of the interface inside the a container via netlink. IE: eth0 I want set to mang0.
Inside the container the root user gets permissions errors when they try to change the interface's properties:
root#d1df4b33fffc:/tmp/contbuild# ip link set eth0 down
RTNETLINK answers: Operation not permitted
root#d1df4b33fffc:/tmp/contbuild# ip link set eth0 name man0
RTNETLINK answers: Operation not permitted
root#d1df4b33fffc:/tmp/contbuild# ip link set eth0 alias man0
RTNETLINK answers: Operation not permitted
Outside the container, I see the set interface name command send in the kernel messages:
[ +11.115152] docker0: port 1(veth3a3f2f4) entered blocking state
[ +0.000007] docker0: port 1(veth3a3f2f4) entered disabled state
[ +0.000171] device veth3a3f2f4 entered promiscuous mode
[ +0.009358] IPv6: ADDRCONF(NETDEV_UP): veth3a3f2f4: link is not ready
[ +0.386448] eth0: renamed from vetheac9d07
[ +0.000259] IPv6: ADDRCONF(NETDEV_CHANGE): veth3a3f2f4: link becomes ready
[ +0.000031] docker0: port 1(veth3a3f2f4) entered blocking state
[ +0.000002] docker0: port 1(veth3a3f2f4) entered forwarding state
I also see the corresponding veth pair on the host veth3a3f2f4#if662, but I cannot see the container's veth in any other netns (ip netns show is blank).
So I would like tp know:
how is docker setting the name to eth0 and is there a way to easily change it
why can I not see the netns for the container and/or the container's interface from the host?
I found a work around by running the container with --cap-add=NET_ADMIN and doing the following internally
ip link set dev eth0 down
ip link set dev eth0 name eth1
ip link set dev eth0 up
The citation comes from: https://github.com/docker/labs/blob/master/networking/concepts/05-bridge-networks.md
When we peek into the host routing table we can see the IP interfaces
in the global network namespace that now includes docker0. The host
routing table provides connectivity between docker0 and eth0 on the
external network, completing the path from inside the container to the
external network.
host$ ip route default via 172.31.16.1 dev eth0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.42.1
172.31.16.0/20 dev eth0 proto kernel scope link src 172.31.16.102
It is written: The host table provides connectivity between docker0 and eth0. I cannot see where in that rules the connectivity is introduced. Can you explain?
Docker seems to be creating a bridge after a container starts running that then conflicts with my host network. This is not the default bridge docker0, but rather another bridge that is created after a container has started. I am able to configure the default bridge according to the older user guide link https://docs.docker.com/v17.09/engine/userguide/networking/default_network/custom-docker0/, however, I do not know how to configure this other bridge so it does not conflict with 172.17.
This current issue is then that my container cannot access other systems on the host network when this bridge becomes active.
Any ideas?
Version of docker:
Version 18.03.1-ce-mac65 (24312)
This is the bridge that gets created. Sometimes it is not 172.17, but sometimes it is.
br-f7b50f41d024 Link encap:Ethernet HWaddr 02:42:7D:1B:05:A3
inet addr:172.17.0.1 Bcast:172.17.255.255 Mask:255.255.0.0
When docker networks are created (e.g. using docker network create or indirectly through docker-compose) without explicitly specifying a subnet range, dockerd allocates a new /16 network, starting from 172.N.0.0/16, where N is a number that is incremented (e.g. N=17, N=18, N=19, N=20, ...). A given N is skipped if a docker network (a custom one, or the default docker bridge) already exists in the range.
You can specify explicitly a safe IP range when creating a docker bridge (i.e. one that excludes the host ips in your network) on the CLI. But usually bridge networks are created automatically by docker-compose with default blocks. To exclude these IPs reliably would require modifying every docker-compose.yaml file you encounter. It's bad practice to include host-specific things inside a compose file.
Instead, you can play with the networks that docker considers allocated, to force dockerd to "skip" subnets. I'm outlining three methods below:
Method #0 -- configure the pool of ips in the daemon config
If your docker version is recent enough (TODO check minimum version), and you have permissions to configure the docker daemon's command line arguments, you can try passing --default-address-pool ARG options to the dockerd command. Ex:
# allocate /24 subnets with the given CIDR prefix only.
# note that this prefix excludes 172.17.*
--default-address-pool base=172.24.0.0/13,size=24
You can add this setting in one of the etc files: /etc/default/docker, or in /etc/sysconfig/docker, depending on your distribution. There is also a way to set this parameter in daemon.json (see syntax)
Method #1 -- create a dummy placeholder network
You can prevent the entire 172.17.0.0/16 from being used by dockerd (in future bridge networks) by creating a very small docker network anywhere inside 172.17.0.0/16.
Find 4 consecutive IPs in 172.17.* that you know are not in use in your host network, and sacrifice them in a "tombstone" docker bridge. Below, I'm assuming the ips 172.17.253.0, 172.17.253.1, 172.17.253.2, 172.17.253.3 (i.e. 172.17.253.0/30) are unused in your host network.
docker network create --driver=bridge --subnet 172.17.253.0/30 tombstone
# created: c48327b0443dc67d1b727da3385e433fdfd8710ce1cc3afd44ed820d3ae009f5
Note the /30 suffix here, which defines a block of 4 different IPs. In theory, the smallest valid network subnet should be a /31 which consists of a total of 2 IPs (network identifier + broadcast). Docker asks for a /30 minimum, probably to account for a gateway host, and another container. I picked .253.0 arbitrarily, you should pick something that's not in use in your environment. Also note that the identifier tombstone is nothing special, you can rename it to anything that will help you remember why it's there when you find it again several months later.
Docker will modify your routing table to send traffic for these 4 IPs to go through that new bridge instead of the host network:
# output of route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.5.1 0.0.0.0 UG 0 0 0 eth1
172.17.253.0 0.0.0.0 255.255.255.252 U 0 0 0 br-c48327b0443d
172.20.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
Note: Traffic for 172.17.253.{0,1,2,3} goes through the tombstone docker bridge just created (br-c4832...). Traffic for any other IP in the 172.17.* would go through the default route (host network). My docker bridge (docker0) is on 172.20.0.1, which may appear unusual -- I've modified bip in /etc/docker/daemon.json to do that. See this page for more details.
The twist: if there exists a bridge occupying even a subportion of a /16, new bridges created will skip that range. If we create new docker networks, we can see that the rest of 172.17.0.0/16 is skipped, because the range is not entirely available.
docker network create foo_test
# c9e1b01f70032b1eff08e48bac1d5e2039fdc009635bfe8ef1fd4ca60a6af143
docker network create bar_test
# 7ad5611bfa07bda462740c1dd00c5007a934b7fc77414b529d0ec2613924cc57
The resulting routing table:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.5.1 0.0.0.0 UG 0 0 0 eth1
172.17.253.0 0.0.0.0 255.255.255.252 U 0 0 0 br-c48327b0443d
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-c9e1b01f7003
172.19.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-7ad5611bfa07
172.20.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
Notice that the rest of the IPs in 172.17.0.0/16 have not been used. The new networks reserved .18. and .19.. Sending traffic to any of your conflicting IPs outside that tombstone network would go via your host network.
You would have to keep that tombstone network around in docker, but not use it in your containers. It's a dummy placeholder network.
Method #2 -- bring down the conflicting bridge network
If you wish to temporarily avoid the IP conflict, you can bring the conflicting docker bridge down using ip: ip link set dev br-xxxxxxx down (where xxxxxx represents the name of the bridge network from route -n or ip link show). This will have the effect of removing the corresponding bridge routing entry in the routing table, without modifying any of the docker metadata.
This is arguably not as good as the method above, because you'd have to bring down the interface possibly every time dockerd starts, and it would interfere with your container networking if there was any container using that bridge.
If method 1 stops working in the future (e.g. because docker tries to be smarter and reuse unused parts of an ip block), you could combine both approaches: e.g. create a large tombstone network with the entire /16, not use it in any container, and then bring its corresponding br-x device down.
Method #3 -- reconfigure your docker bridge to occupy a subportion of the conflicting /16
As a slight variation of the above, you could make the default docker bridge overlap with a region of 172.17.*.* that is not used in your host network. You can change the default docker bridge subnet by changing the bridge ip (i.e. bip key) in /etc/docker/daemon.json (See this page for more details). Just make it a subregion of your /16, e.g. in a /24 or smaller.
I've not tested this, but I presume any new docker network would skip the remainder of 172.17.0.0/16 and allocate an entirely different /16 for each new bridge.
The bridge was created from docker-compose, which can be configured within the compose file.
Answer found here: Docker create two bridges that corrupts my internet access
Hello everyone i'm really new in networking, so i i'm a little bit lost please i hope anyone can help me...
I have two physical nodes with the same configuration in the interface:
# The primary network interface
#auto eth0
#iface eth0 inet dhcp
auto br0
iface br0 inet dhcp
bridge_ports eth0
bridge_fd 9
bridge_hello 2
bridge_maxage 12
bridge_stp off
my nodes have the following public ip:
ubuntu001: 158.42.104.129
ubuntu002: 158.42.104.139
I run one VM in each node using the default configuration of libvirt:
Vm in ubuntu001: 10.1.1.189
Vm in ubuntu002: 10.1.1.59
I want to do ping between the VMs through "gre tunnel using OVS", so i did the next but it didn't work:
First i create an OVS bridge:
# ovs-vsctl add-br ovs-br0
Second i connect my bridge with its uplink which in this case is eth0
# ovs-vsctl add-port ovs-br0 eth0
Third i run a VM in each node (ubuntu001: 10.1.1.189 and ubuntu002: 10.1.1.59 respectively)
Fourth i add a port for the GRE tunnel:
# ovs-vsctl add-port ovs-br0 gre0 -- set interface gre0 type=gre options:remote_ip=158.42.104.139
# ovs-vsctl add-port ovs-br0 gre0 -- set interface gre0 type=gre options:remote_ip=158.42.104.129
i did the same in the other node and this show when i use ovs-vsctl show:
root#ubuntu001:~# ovs-vsctl show
41268e02-3996-4caa-b941-e4fe9c718e35
Bridge "ovs-br0"
Port "ovs-br0"
Interface "ovs-br0"
type: internal
Port "gre0"
Interface "gre0"
type: gre
options: {remote_ip="158.42.104.139"}
Port "eth0"
Interface "eth0"
ovs_version: "2.0.2"
root#ubuntu002:~# ovs-vsctl show
f0128df4-1a89-4999-8add-b5076ff055ee
Bridge "ovs-br0"
Port "ovs-br0"
Interface "ovs-br0"
type: internal
Port "gre0"
Interface "gre0"
type: gre
options: {remote_ip="158.42.104.129"}
Port "eth0"
Interface "eth0"
ovs_version: "2.0.2"
what i am doing wrong or is missing something??
Add this to /etc/network/interfaces:
auto br-ovs=br-ovs
iface br-ovs inet manual
ovs_type OVSBridge
ovs_ports gre1 gre2
ovs_extra set bridge ${IFACE} stp_enable=true
mtu 1462
allow-br-ovs gre1
iface gre1 inet manual
ovs_type OVSPort
ovs_bridge br-ovs
ovs_extra set interface ${IFACE} type=gre options:remote_ip=158.42.104.139 options:key=1
auto br1
iface br1 inet manual# (or static, or DHCP)
mtu 1462
I do not know how to do this with commands.
I think eth0 should not be in the output of ovs-vsctl show.
stp_enable=true is optional, I don't think it is needed in case of 2 nodes.
Set mtu to suit your needs. This example is for when the real NIC's mtu is 1500.
remote_ip=158.42.104.139 should contain the other node's IP. It is different on the 2 nodes.
options:key=1 is also optional, it can be used to label 2 GRE networks (eg. the second mesh would have key=2 etc.).
You can add VMs to br1 and they will be able to ping each other.
Don't forget to set the VMs' mtu to 1462.
This tutorial might be useful: https://wiredcraft.com/blog/multi-host-docker-network/
It would be awesome to make use of existing SR-IOV capable NICs. I would like to understand if a docker containers can be attached to Virtual Functions such that they communicate over the NICs hardware bridge (instead of the virtual docker0 bridge).
To be more specific, consider this scenario:
Container A is attached to VF#1
Container B is attached to VF#2
A and B are linked together and when they exchange data it should happen over the hardware bridge on NIC (instead of docker0).
Is the above supported natively in docker?
If not, can pipework help here? (I have heard pipework can do amazing things)
Examples would be very helpful.
Well, I figured that a little modification in pipework script can let us attach VFs to containers. Containers set up this way were able to ping each other without having to create macvlan subinterface or software bridges. This indicates that the hardware bridge in adapter is doing the L2 switching for them.
The change in pipework is basically something like this:
[ "$IFTYPE" = phys ] && {
[ "$VLAN" ] && {
[ ! -d "/sys/class/net/${IFNAME}.${VLAN}" ] && {
ip link add link "$IFNAME" name "$IFNAME.$VLAN" mtu "$MTU" type vlan id "$VLAN"
}
ip link set "$IFNAME" up
IFNAME=$IFNAME.$VLAN
}
# Let's not create the macvlan subinterface
# GUEST_IFNAME=ph$NSPID$CONTAINER_IFNAME
# ip link add link "$IFNAME" dev "$GUEST_IFNAME" mtu "$MTU" type macvlan mode bridge
GUEST_IFNAME=$IFNAME
ip link set "$IFNAME" up
}
ip link set "$GUEST_IFNAME" netns "$NSPID"
ip netns exec "$NSPID" ip link set "$GUEST_IFNAME" name "$CONTAINER_IFNAME"
---
Off-course a neater way would be to add a new argument ("--direct-attach" or something) to the script to treat specified interface differently