I have been doing some experiments on ovs these days. I have 2 physical machines with openstack running on it, and GRE tunnel is configured. I add 2 internal ports on br-int (integration bridge) of each machine and assign them to different namespace(ns1, ns2, ns3, ns4) and ip from same subnet(172.16.0.200,172.16.0.201,172.16.0.202,172.16.0.203). After configuration is done, VM(in same subnet)<-> virtual ports , virtual port <->virtual port on same/different nodes are all reachable(Use ping to test). However, weird thing shows up: I have used iperf to test the bandwidth, testing result shows as following:
Physical node<-> Physical node: 1GB/s
VM<->VM on same machine: 10GB/s
VM<->VM on different machines: 1GB/s
VM<->Virtual port same machine: 10GB/s
VM<->Virtual port different machines: 1GB/s
Virtual port<->Virtual port same machine: 16GB/s
Virtual port<->Virtual port different machines: 100~200kb/s (WEIRD!)
I have tried replace internal port with veth pairs, same behavior shows up.
As I expect, the veth pair should behave similar to a VM because they both have separate namespace , and openstack VM uses same way (Veth pairs) to connect to br-int. But the experiment shows that the VM(node1) -> Virtual port(node2) has 1GB/s bandwidth but Virtual port(node1) -> Virtual port(node2) only has 100kb/s ? Anybody has any idea?
Thanks for your help.
When using GRE (or VXLAN, or other overlay network), you need to make sure that the MTU inside your virtual machines is smaller than the MTU of your physical interfaces. The GRE/VXLAN/etc header adds bytes to outgoing packets, which means that an MTU sized packet coming from a virtual machine will end up larger than the MTU of your host interfaces, causing fragmentation and poor performance.
This is documented, for example, here:
Tunneling protocols such as GRE include additional packet headers that
increase overhead and decrease space available for the payload or user
data. Without knowledge of the virtual network infrastructure,
instances attempt to send packets using the default Ethernet maximum
transmission unit (MTU) of 1500 bytes. Internet protocol (IP) networks
contain the path MTU discovery (PMTUD) mechanism to detect end-to-end
MTU and adjust packet size accordingly. However, some operating
systems and networks block or otherwise lack support for PMTUD causing
performance degradation or connectivity failure.
Ideally, you can prevent these problems by enabling jumbo frames on
the physical network that contains your tenant virtual networks. Jumbo
frames support MTUs up to approximately 9000 bytes which negates the
impact of GRE overhead on virtual networks. However, many network
devices lack support for jumbo frames and OpenStack administrators
often lack control over network infrastructure. Given the latter
complications, you can also prevent MTU problems by reducing the
instance MTU to account for GRE overhead. Determining the proper MTU
value often takes experimentation, but 1454 bytes works in most
environments. You can configure the DHCP server that assigns IP
addresses to your instances to also adjust the MTU.
Related
How docker’s network interface bandwidth limitation is determine? is it based on the physical network card bandwidth? and if not, from where is taken the bandwidth?
I have a certain application deployment where docker creates multiple nics. Now, when we send data to this node, we send it to the physical NIC, which is 1Gbps, we are able to see incoming data in the physical nic and, as we are expecting we also see data on the nics created by docker, now when I want to determine the bandwidth usage per second for that node, can I assume that the bandwidth used by all the docker nics is it taken from the physical bandwidth?
For example: in a run test, if the physical nic bandwidth usage was 100Mbps and the total of 4 docker nics was 200Mbps, then could we say the physical nic total bandwidth usage was 400Mbps?
Docker don't handle such feature. See: https://github.com/moby/moby/issues/9607
Have a look at cgroup net_cls https://www.kernel.org/doc/Documentation/cgroup-v1/net_cls.txt
How do we retrieve the Local Ip address set of the NIC in NDIS 6. I will be doing some IP header modifications on the received Ethernet Frames, so will be looking for local ip of NIC card that my Filter Attached to.
It's generally a layering violation for an NDIS LWF driver (which operates at layer 2 of the OSI stack) to get involved with IP addresses (which are at layer 3 of the OSI stack).
If you have a very good reason do to this, you can query GetUnicastIpAddressTable. Keep in mind that a NIC may not have any IP address (e.g., it's used for non-IP protocols). Or it may carry IP traffic, but the OS doesn't know about any IP address (e.g., a guest VM is sending IP traffic through the host's NIC, but only the guest really knows the IP address).
In other words, NICs don't really have IP addresses. At best, you can say that the NIC may be associated with an IP interface which has some number of IP addresses.
I had a question about applications running within Docker containers and UUID generation.
Here’s our scenario:
Currently our applications are using an event driven framework.
For the events we generate the UUID’s based on mac address, pid,
time-stamp and counter.
For running containers on a distributed system like CoreOS (while a very very very low chance), there is no guarantee that all those parameters used to generate a UUID would be unique for each container as one container on one server in the cluster could generate a UUID using the same mac, pid, time-stamp and counter as another container on the cluster.
In essence if these two UUID’s were both to generate an event and send it to our messaging bus, then obviously there would be a conflict.
In our analysis, this scenario seems to boil down to the uniqueness of mac addresses on each Docker container.
So to be frank:
How unique are the mac addresses within containers?
How are mac addresses generated if they are not manually set?
From my reading of generateMacAddr function (edit: answer concerned 1.3.0-dev, but is still correct for 17.05), MAC addresses generated by docker are essentially the IPv4 address of the container's interface on the docker0 bridge: they are guaranteed to be consistent with the IP address.
The docker0 bridge's subnet you have to operate in, usually 255.255.0.0 as per this example of 172.17.42.1/16, has 65,534 routable addresses. This does reduce entropy for UUID generation, but MAC address collision isn't possible as IPs must be unique, and the scenario of identical MAC, PID, time and counter in two containers on the same docker server/CoreOS host should not be a possibility.
However two CoreOS hosts (each running one docker server) could potentially choose the same random subnet, resulting in the possibility of duplicated MACs for containers on different hosts. You could evade this by setting a fixed CIDR for the docker server on each host:
--fixed-cidr=CIDR — restrict the IP range from the docker0 subnet, using the standard CIDR notation like 172.167.1.0/28. This range must be and IPv4 range for fixed IPs (ex: 10.20.0.0/16) and must be a subset of the bridge IP range (docker0 or set using --bridge). For example with --fixed-cidr=192.168.1.0/25, IPs for your containers will be chosen from the first half of 192.168.1.0/24 subnet.
This should ensure unique MAC addresses across the cluster.
The original IEEE 802 MAC address comes from the original Xerox Ethernet addressing scheme. This 48-bit address space contains potentially 248 or 281,474,976,710,656 possible MAC addresses.
source
If you are concerned about lack of entropy (the IP to MAC mapping reduces it considerably), a better option may be to use a different mechanism for UUID generation. UUID versions 3, 4 and 5 do not take MAC address into account. Alternatively you could include the host machine's MAC in UUID generation.
Of course, whether this "considerable MAC space reduction" will have any impact of UUID generation should probably be tested before any code is changed.
Source linked to above:
// Generate a IEEE802 compliant MAC address from the given IP address.
//
// The generator is guaranteed to be consistent: the same IP will always yield the same
// MAC address. This is to avoid ARP cache issues.
func generateMacAddr(ip net.IP) net.HardwareAddr {
hw := make(net.HardwareAddr, 6)
// The first byte of the MAC address has to comply with these rules:
// 1. Unicast: Set the least-significant bit to 0.
// 2. Address is locally administered: Set the second-least-significant bit (U/L) to 1.
// 3. As "small" as possible: The veth address has to be "smaller" than the bridge address.
hw[0] = 0x02
// The first 24 bits of the MAC represent the Organizationally Unique Identifier (OUI).
// Since this address is locally administered, we can do whatever we want as long as
// it doesn't conflict with other addresses.
hw[1] = 0x42
// Insert the IP address into the last 32 bits of the MAC address.
// This is a simple way to guarantee the address will be consistent and unique.
copy(hw[2:], ip.To4())
return hw
}
I would like to do a scan in a LAN network to find devices linked.
I'm developping an app in IOS for IPAD
How do I do???
Because those are mobile devices I will assume you want to find devices on a wireless network. Theoretically, since wifi uses shared medium for communication, you can passively listen for traffic flowing through the network and collect data about client without sending any packets. This is something that is commonly referred to as a promiscuous mode. In practice there is 99% chance that the network adapter driver will allow you only to get traffic destined for your MAC address. In that case you will need to resort to actively scanning the network subnet which is not 100% accurate and depending on how the network is implemented can be considered as a possible attack.
The simple way of scanning is sending ICMP requests (ping) to every IP address in the subnet and collecting data from those who send back the echo reply. This is not reliable because some hosts won't respond to ICMP echo request even if they are active. First thing you need is to find out your own IP address and the subnet mask, and calculate the range of possible addresses in your subnet. The range is obtained by using logical AND operator where operands are binary values of your IP address and subnet mask. This is an example from the program that calculates this for typical 192.168.1.1 subnet with 255.255.255.0 subnet mask (192.168.1.1/24 in CIDR notation):
Address: 192.168.1.1 11000000.10101000.00000001 .00000001
Netmask: 255.255.255.0 = 24 11111111.11111111.11111111 .00000000
Wildcard: 0.0.0.255 00000000.00000000.00000000 .11111111
Network: 192.168.1.0/24 11000000.10101000.00000001 .00000000
Broadcast: 192.168.1.255 11000000.10101000.00000001 .11111111
HostMin: 192.168.1.1 11000000.10101000.00000001 .00000001
HostMax: 192.168.1.254 11000000.10101000.00000001 .11111110
Then you would iterate through the range and ping every address. Another thing you can consider is listening for broadcast traffic such as ARP and collecting some of the information that way. I don't know what are you trying to make but you can't get many useful information this way, except for vendor of a host's network adapter.
Check my LAN Scan on Github. It does exactly what you want.
I recently used MMLANScan that was pretty good. It discovers IP, Hostname and MAC Address.
Bonjour have been around since 2002, have a look at it!
I mean, just look at their current tagline:
Bonjour, also known as zero-configuration networking, enables automatic discovery of devices and services on a local network using industry standard IP protocols. Bonjour makes it easy to discover, publish, and resolve network services with a sophisticated, yet easy-to-use, programming interface that is accessible from Cocoa, Ruby, Python, and other languages.
I am working on a distributed application in which a set of logical nodes communicate with each other.
In the initial discovery phase, each logical node starts up and sends out a UDP broadcast packet to the network to inform the rest of the nodes of its existence.
With different physical hosts, this can easily be handled by agreeing on a port number and keeping track of UDP broadcasts received from other hosts.
My problem is - I need to be able to be able to handle the case of multiple logical nodes on the same machine as well.
So in this case, it seems I cannot bind to the same port twice. How do I handle the node discovery case if there are two logical nodes on the same box ?? Thanks a lot in advance !!
Your choices are:
Create a RAW socket and listen to all packets on a particular NIC, this way ,by looking at the content of each packet, the process will identify if the packet is for destined for itself. The problem with this is tons of packets you would have to process. This is why kernels of our operating systems bind sockets to processes, so the traffic gets distributed optimally.
Create a specialized service, i.e. a daemon that will handle announcements of new processes that will be available to execute the work. When launched, the process will have to announce its port number to the service. This is usually how it is done.
Use virtual IP addresses for each process you want to run, each process binds to different IP address. If you are running on a local network, this is the simplest way.
Define range of ports and scan this range on all ip addresses you have defined.