How docker’s network interface bandwidth limitation is determine? is it based on the physical network card bandwidth? and if not, from where is taken the bandwidth?
I have a certain application deployment where docker creates multiple nics. Now, when we send data to this node, we send it to the physical NIC, which is 1Gbps, we are able to see incoming data in the physical nic and, as we are expecting we also see data on the nics created by docker, now when I want to determine the bandwidth usage per second for that node, can I assume that the bandwidth used by all the docker nics is it taken from the physical bandwidth?
For example: in a run test, if the physical nic bandwidth usage was 100Mbps and the total of 4 docker nics was 200Mbps, then could we say the physical nic total bandwidth usage was 400Mbps?
Docker don't handle such feature. See: https://github.com/moby/moby/issues/9607
Have a look at cgroup net_cls https://www.kernel.org/doc/Documentation/cgroup-v1/net_cls.txt
Related
For example, I have a 4vCPU, 8GB mem VM. At first, I ran a Nginx container on it and then used a stress test tool to continuously send requests to it and got some information like QPS, average latency. Then I ran three same Nginx containers on the VM and parallelly send the same requests above to these containers.I found that the respective QPS all decreased, and average latency all increased.
So what factors can affect different containers processing on one machine at the same time? I think the CPU and memory are enough to provide resources to these containers. What factors below the docker can affect these, firstly I think is network, but what else? And Specifically, why can network affect these QPS, average latency metrics?
We are currently operating a backend stack in central europe, Japan and Taiwan and are perparing our stack to transition to docker swarm.
We are working with real time data streams from sensor networks to do fast desaster warnings which means that latency is critical for some services. Therefore, we currently have brokers (rabbitmq) running on dedicated servers in each region as well as a backend instance digesting the data that is sent accross these brokers.
I'm uncertain how to best achieve a comparable topology using docker swarm. Is it possible to group nodes, let's say by nationality and then deploy a latency critical service stacks to each of these groups? Should I create separate swarms for each region (feels conceptually contradictory to docker swarm)?
The swarm managers should be in a low latency zone. Swarm workers can be anywhere. You can use a node label to indicate the location of the node, and restrict your workloads to a particular label as needed.
Latency critical considerations on the container-to-container network across large regional boundaries may be relevant depending on your required data path. If the only latency-critical data path is to the rabbitmq service that is external to the swarm, then you won't need to worry about the container-to-container latency.
It is also a valid pattern to have one swarm per region. If you need to be able to lose any region without impacting services on another region, then you'd want to split it up. If you have multiple low latency regions, then you can spread the master nodes across those.
I have been working on developing IOT solution for docker, and I found an help usage that docker does supports (pass-through for gpu, disks, usb and serial ports):
Exposing Nvidia GPU device for computation
Exposing particular serial device etc. using flag device
So before just working around from scratch to implement the passthrough mechanism for system memory I wanted to know if there exist any way already exist to achieve the same or not to aid:
Read/Write physical system memory
Expose all IO devices in one shot!
I'm new to docker,and I want to build a docker cluster with docker-swarm.
In the link:https://docs.docker.com/swarm/scheduler/strategy/ I have a question:
Suppose I have 2 nodes with 2G RAM.What if I run a container ask for 3G RAM.Will it work?
Or there's another method?
Thanks.
If you do not set any user memory constraints at runtime, the process will be able to use as much memory as it wants, eventually swapping to disk like any other process would when there is no free physical memory on the host.
I have been doing some experiments on ovs these days. I have 2 physical machines with openstack running on it, and GRE tunnel is configured. I add 2 internal ports on br-int (integration bridge) of each machine and assign them to different namespace(ns1, ns2, ns3, ns4) and ip from same subnet(172.16.0.200,172.16.0.201,172.16.0.202,172.16.0.203). After configuration is done, VM(in same subnet)<-> virtual ports , virtual port <->virtual port on same/different nodes are all reachable(Use ping to test). However, weird thing shows up: I have used iperf to test the bandwidth, testing result shows as following:
Physical node<-> Physical node: 1GB/s
VM<->VM on same machine: 10GB/s
VM<->VM on different machines: 1GB/s
VM<->Virtual port same machine: 10GB/s
VM<->Virtual port different machines: 1GB/s
Virtual port<->Virtual port same machine: 16GB/s
Virtual port<->Virtual port different machines: 100~200kb/s (WEIRD!)
I have tried replace internal port with veth pairs, same behavior shows up.
As I expect, the veth pair should behave similar to a VM because they both have separate namespace , and openstack VM uses same way (Veth pairs) to connect to br-int. But the experiment shows that the VM(node1) -> Virtual port(node2) has 1GB/s bandwidth but Virtual port(node1) -> Virtual port(node2) only has 100kb/s ? Anybody has any idea?
Thanks for your help.
When using GRE (or VXLAN, or other overlay network), you need to make sure that the MTU inside your virtual machines is smaller than the MTU of your physical interfaces. The GRE/VXLAN/etc header adds bytes to outgoing packets, which means that an MTU sized packet coming from a virtual machine will end up larger than the MTU of your host interfaces, causing fragmentation and poor performance.
This is documented, for example, here:
Tunneling protocols such as GRE include additional packet headers that
increase overhead and decrease space available for the payload or user
data. Without knowledge of the virtual network infrastructure,
instances attempt to send packets using the default Ethernet maximum
transmission unit (MTU) of 1500 bytes. Internet protocol (IP) networks
contain the path MTU discovery (PMTUD) mechanism to detect end-to-end
MTU and adjust packet size accordingly. However, some operating
systems and networks block or otherwise lack support for PMTUD causing
performance degradation or connectivity failure.
Ideally, you can prevent these problems by enabling jumbo frames on
the physical network that contains your tenant virtual networks. Jumbo
frames support MTUs up to approximately 9000 bytes which negates the
impact of GRE overhead on virtual networks. However, many network
devices lack support for jumbo frames and OpenStack administrators
often lack control over network infrastructure. Given the latter
complications, you can also prevent MTU problems by reducing the
instance MTU to account for GRE overhead. Determining the proper MTU
value often takes experimentation, but 1454 bytes works in most
environments. You can configure the DHCP server that assigns IP
addresses to your instances to also adjust the MTU.