Continue despite failed packet when using tcpreplay - tcpreplay

I am trying to use tcpreplay to send the contents of a pcap file. It is refusing to send some packets because they are too long. However, instead of continuing with the next packet, it stops:
$ tcpreplay -i p4p1 multi.pcap
Warning: May need to run as root to get access to all network interfaces.
Warning: Unable to send packet: Error with PF_PACKET send() [444]: Message too long (errno = 90)
Actual: 443 packets (63852 bytes) sent in 0.203486 seconds
Rated: 313790.6 Bps, 2.51 Mbps, 2177.05 pps
Flows: 115 flows, 565.14 fps, 405 flow packets, 39 non-flow
Statistics for network device: p4p1
Successful packets: 443
Failed packets: 1
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
I would like to skip failed packets and send the rest.

I was having the same problem with several files, especially with streaming. Example:
~# tcpreplay -i eth1 -t -K facebook_audio2b.pcapng
File Cache is enabled
Warning: Unable to send packet: Error with PF_PACKET send() [1611]: Message
too long (errno = 90)
Actual: 1610 packets (382007 bytes) sent in 0.021233 seconds
Rated: 17991192.9 Bps, 143.92 Mbps, 75825.36 pps
Flows: 71 flows, 3343.85 fps, 94008 flow packets, 84 non-flow
Statistics for network device: eth1
Successful packets: 1610
Failed packets: 1
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
I
So I followed the FAQ on the Tcpreplay website at the [link].(https://tcpreplay.appneta.com/wiki/faq.html#packet-length-8892-is-greater-then-mtu-skipping-packet) that says:
In case the packet is larger than the MTU, a alternatively, you can specify the tcpreplay-edit --mtu-trunc option - packets will be truncated to the MTU size, the checksums will be fixed and then sent. Note that this may impact performance.
It works for me in the next run:
~# tcpreplay-edit --mtu-trunc -i eth1 -t -K facebook_audio2b.pcapng
File Cache is enabled
Actual: 94092 packets (14586277 bytes) sent in 0.847842 seconds
Rated: 17204003.8 Bps, 137.63 Mbps, 110978.22 pps
Flows: 71 flows, 83.74 fps, 94008 flow packets, 84 non-flow
Statistics for network device: eth1
Successful packets: 94092
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0

The reason you receive "Message too long" is because the MTU on your network interface is smaller than your packet size.
You can increase the MTU size of your system and than all packets on the pcap will be sent.
In Linux from user root you can use the command: ifconfig {interface} mtu {New Size}

Related

Nodes cannot pull image from docker registry

I followed hobby-kube/guide how to set up your own cluster, but I stuck. I created an issue on repo as well but maybe here would me more people to help me with it.
I am trying to set up my cluster on Scaleway. I follow instructions one by one, and I am at the point where I installed wave as CNI, and I've got:
kube-system weave-net-dtwbj 2/2 Running 1 9d
kube-system weave-net-kmxq7 0/2 Init:ImagePullBackOff 0 9d
kube-system weave-net-pzfcj 0/2 Init:ImagePullBackOff 0 9d
So my issue is on my nodes but not on master.
I found suggestions in one of issues and this time I applied these suggestions, but the output is the same.
UFW / Firewall
I skip the part with firewall, on every VPS I've got:
> ufw status
Status: inactive
In scaleway config all my VPS have the same security policy applied. Only outbound traffic on ports [25, 465, 587] is dropping.
Internet connection
On both my nodes I've issue to download images from docker's registry and I believe that this is the real issue here
> docker pull hello-world
Using default tag: latest
Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
On master hello-world was pulled successfully.
Nodes have internet connection outside:
--- google.com ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8011ms
rtt min/avg/max/mdev = 1.008/1.139/1.258/0.073 ms
WireGuard
By the output of wg show I assume that VPN between my VPS is set up correctly
peer: 3
endpoint: 3-priv IP:51820
allowed ips: 10.0.1.3/32
latest handshake: 1 minute, 17 seconds ago
transfer: 7.50 GiB received, 6.50 GiB sent
peer: 2
endpoint: 2-priv IP:51820
allowed ips: 10.0.1.2/32
latest handshake: 1 minute, 41 seconds ago
transfer: 4.96 GiB received, 6.11 GiB sent
Could anybody help me track the issue down and help me to fix it? I can provide any kinds of logs you wish just tell me how I can get it

Getting a high amount of unreachable hosts and Network errors in local Network

I have a medium sized Zabbix Setup. I have one Central Zabbix Server and Multiple Zabbix Proxies, one at each Site I'm monitoring. All of those are setup with the Official Docker Containers, the main Server:
postgres:11-alpine
zabbix/zabbix-web-nginx-pgsql:alpine-4.0-latest
zabbix/zabbix-snmptraps:alpine-4.0-latest
zabbix/zabbix-server-pgsql:alpine-4.0-latest
The Proxies are all just a single Docker image:
zabbix/zabbix-proxy-sqlite3:ubuntu-4.0-latest
The Proxies mostly monitor other VMs on in the same VMWare vCenter.
The Problem that arises is that on the Proxies in the Logs I see a very high amount of network errors that all look somewhat like this:
Zabbix agent item "some.item" on host "SOME HOST" failed: first network error, wait for 15 seconds
From that it arises, that there is a High Amount of False Positive Problems in Zabbix. Mostly Zabbix agent on SOME HOST is unreachable for 5 minutes, but sometimes also other Problems that are triggered by .nodata().
There is also a high amount of missing item Data, since the hosts with network errors are considered "offline" for a bit and no items from them are checked.
I've also tried to investigate it a bit and found the source code that produces this error: https://github.com/zabbix/zabbix/blob/135111a0fd1f16f203226f8632881ac0a8bf541a/src/zabbix_server/poller/poller.c#L302
Unfortunatly the same message seems to be triggerd in 3 different failure cases: https://github.com/zabbix/zabbix/blob/135111a0fd1f16f203226f8632881ac0a8bf541a/src/zabbix_server/poller/poller.c#L749
Therefore I couldn't really find out anything that way. I also of cause looked at cpu, ram, disk and network usage on the proxies and couldn't find anything that looked out of the norm for me.
How should I proceed to find out the cause of these errors? Has anyone else had this happen to them?
Check the network stats for errors in the RX-ERR and TX-ERR columns on some of your hosts and proxy servers:
$ ifconfig -s
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
docker0 1500 953410 0 0 0 1691396 0 0 0 BMRU
enp0s31f 1500 300757888 4 0 0 192733308 0 0 0 BMRU
lo 65536 23324198 0 0 0 23324198 0 0 0 LRU
tap0 1500 1317609 0 0 0 4530946 0 9 0 BMPU
tun0 1500 0 0 0 0 589 0 0 0 MOPRU
vboxnet0 1500 0 0 0 0 2324 0 0 0 BMU
veth20f8 1500 11690 0 0 0 538547 0 0 0 BMRU
veth2238 1500 0 0 0 0 76 0 0 0 BMRU
virbr0 1500 1317609 0 0 0 4519309 0 0 0 BMU
wlp2s0 1500 5584389 0 0 0 4387278 0 0 0 BMU
I did a lot more digging. I also posted this question, along with my discoveries on the Zabbix forum: https://www.zabbix.com/forum/zabbix-troubleshooting-and-problems/393381-getting-a-high-amount-of-unreachable-hosts-and-network-errors-in-local-network
I solved the problem for myself, but one error just vanished after I didn't thouch the system for two weeks, not sure what exactly happend.
The other Problem I encounterd, was because I am kinda new to Linux in some ways and didn't quite grasp how systemd works.
Systemd looks for a pid file, in the case of zabbix it looks for it in /run/zabbix/zabbix_agentd.pid, I did not tell Zabbix where to write that file to. In the end the fix was in /etc/zabbix/zabbix_agentd.conf to insert PidFile=/run/zabbix/zabbix_agentd.conf.
Before that Zabbix Agent started and was happy. But it didn't tell systemd about it, so after a timeout where systemd let's deamons startup it just restarted zabbix agent. If the Proxy tried to connect to the agent while it was not running... it produced network errors.

vdev_probe() fails when starting `pktgen` app in a docker container

I am trying to start a Pktgen app in docker container using openvswitch virtual ports. The structure of my ideal system is shown as follows:
(1) Build dpdk-stable-17.11.4 and ovs-2.9.0, and create 4 virtual ports as follows:
Bridge "ovs-br0"
Port "vhost-user2"
Interface "vhost-user2"
type: dpdkvhostuser
Port "ovs-br0"
Interface "ovs-br0"
type: internal
Port "vhost-user3"
Interface "vhost-user3"
type: dpdkvhostuser
Port "vhost-user0"
Interface "vhost-user0"
type: dpdkvhostuser
Port "vhost-user1"
Interface "vhost-user1"
type: dpdkvhostuser
When I check the log of creating virtual ports, I find an error:
2019-01-27T15:59:23.346Z|00041|bridge|INFO|bridge ovs-br0: added interface ovs-br0 on port 65534
2019-01-27T15:59:23.375Z|00042|bridge|INFO|bridge ovs-br0: using datapath ID 000066b38a29f447
2019-01-27T15:59:23.375Z|00043|connmgr|INFO|ovs-br0: added service controller "punix:/usr/local/var/run/openvswitch/ovs-br0.mgmt"
2019-01-27T15:59:23.411Z|00044|dpdk|INFO|VHOST_CONFIG: vhost-user server: socket created, fd: 44
2019-01-27T15:59:23.411Z|00045|netdev_dpdk|INFO|Socket /usr/local/var/run/openvswitch/vhost-user0 created for vhost-user port vhost-user0
2019-01-27T15:59:23.411Z|00046|dpdk|INFO|VHOST_CONFIG: bind to /usr/local/var/run/openvswitch/vhost-user0
2019-01-27T15:59:23.411Z|00047|netdev_dpdk|WARN|dpdkvhostuser ports are considered deprecated; please migrate to dpdkvhostuserclient ports.
2019-01-27T15:59:23.411Z|00048|dpif_netdev|INFO|PMD thread on numa_id: 0, core id: 2 created.
2019-01-27T15:59:23.411Z|00049|dpif_netdev|INFO|There are 1 pmd threads on numa node 0
2019-01-27T15:59:23.412Z|00050|dpdk|ERR|RING: Cannot reserve memory for tailq
2019-01-27T15:59:23.424Z|00051|dpdk|ERR|RING: Cannot reserve memory for tailq
2019-01-27T15:59:23.434Z|00052|dpdk|ERR|RING: Cannot reserve memory for tailq
2019-01-27T15:59:23.443Z|00053|dpdk|ERR|RING: Cannot reserve memory for tailq
2019-01-27T15:59:23.458Z|00054|dpif_netdev|INFO|Core 2 on numa node 0 assigned port 'vhost-user0' rx queue 0 (measured processing cycles 0).
2019-01-27T15:59:23.458Z|00055|bridge|INFO|bridge ovs-br0: added interface vhost-user0 on port 1
2019-01-27T15:59:23.478Z|00056|dpdk|INFO|VHOST_CONFIG: vhost-user server: socket created, fd: 49
2019-01-27T15:59:23.478Z|00057|netdev_dpdk|INFO|Socket /usr/local/var/run/openvswitch/vhost-user1 created for vhost-user port vhost-user1
2019-01-27T15:59:23.478Z|00058|dpdk|INFO|VHOST_CONFIG: bind to /usr/local/var/run/openvswitch/vhost-user1
2019-01-27T15:59:23.478Z|00059|dpif_netdev|INFO|Core 2 on numa node 0 assigned port 'vhost-user0' rx queue 0 (measured processing cycles 0).
2019-01-27T15:59:23.479Z|00060|dpif_netdev|INFO|Core 2 on numa node 0 assigned port 'vhost-user1' rx queue 0 (measured processing cycles 0).
It says that RING: Cannot reserve memory for tailq. I have no idea about this error. I have already allocated some hugepages in DPDK.
(2) Build pktgen-3.5.0
(3) Write Dockerfile and start docker. The Dockerfile:
FROM ubuntu:latest
RUN apt update -y
RUN apt-get install -y numactl libpcap-dev
WORKDIR /root/dpdk
COPY dpdk-stable-17.11.4 /root/dpdk/.
COPY pktgen-3.5.0 /root/pktgen/.
RUN ln -s /root/pktgen/app/x86_64-native-linuxapp-gcc/pktgen /usr/bin/pktgen
ENV PATH "$PATH:/root/dpdk/x86_64-native-linuxapp-gcc/app/"
build the images:
sudo docker build -t pktgen-docker .
start the container:
sudo docker run -ti --privileged --name=pktgen-docker \
-v /mnt/huge:/mnt/huge -v /usr/local/var/run/openvswitch:/var/run/openswitch \
pktgen-docker:latest
(4) start pktgen app.
app/x86_64-native-linuxapp-gcc/pktgen -l 0-1 -n 3 --file-prefix=pktgen-docker --no-pci --log-level=8\
--vdev 'net_virtio_user0,mac=00:00:00:00:00:05,path=/var/run/openvswitch/vhost-user0' \
--vdev 'net_virtio_user1,mac=00:00:00:00:00:01,path=/var/run/openvswitch/vhost-user1' \
-- -T -P -m '1.[0-1]'
I simply use 2 lcores, lcore-1 is for rx/tx of 2 virtual port. I set up 2 --vdev using the ovs-dpdk ports, i.e., vhost_user0, vhost_user1.
However, the error come:
Copyright (c) <2010-2017>, Intel Corporation. All rights reserved. Powered by DPDK
sh: 1: lspci: not found
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 4 lcore(s)
EAL: Probing VFIO support...
EAL: Module /sys/module/vfio not found! error 2 (No such file or directory)
EAL: VFIO modules not loaded, skipping VFIO support...
EAL: Module /sys/module/vfio not found! error 2 (No such file or directory)
EAL: Setting up physically contiguous memory...
EAL: Trying to obtain current memory policy.
EAL: Hugepage /mnt/huge/pktgen-dockermap_1 is on socket 0
EAL: Hugepage /mnt/huge/pktgen-dockermap_2 is on socket 0
EAL: Hugepage /mnt/huge/pktgen-dockermap_3 is on socket 0
EAL: Hugepage /mnt/huge/pktgen-dockermap_4 is on socket 0
EAL: Hugepage /mnt/huge/pktgen-dockermap_5 is on socket 0
EAL: Hugepage /mnt/huge/pktgen-dockermap_6 is on socket 0
...
EAL: Hugepage /mnt/huge/pktgen-dockermap_167 is on socket 0
EAL: Hugepage /mnt/huge/pktgen-dockermap_0 is on socket 0
EAL: Ask a virtual area of 0x15000000 bytes
EAL: Virtual area found at 0x100000000 (size = 0x15000000)
EAL: Requesting 168 pages of size 2MB from socket 0
EAL: TSC frequency is ~2594018 KHz
EAL: Master lcore 0 is ready (tid=8f222900;cpuset=[0])
EAL: lcore 1 is ready (tid=8d5fd700;cpuset=[1])
vdev_probe(): failed to initialize net_virtio_user0 device
vdev_probe(): failed to initialize net_virtio_user1 device
EAL: Bus (vdev) probe failed.
Lua 5.3.4 Copyright (C) 1994-2017 Lua.org, PUC-Rio
Copyright (c) <2010-2017>, Intel Corporation. All rights reserved.
Pktgen created by: Keith Wiles -- >>> Powered by DPDK <<<
>>> Packet Burst 64, RX Desc 1024, TX Desc 2048, mbufs/port 16384, mbuf cache 2048
!PANIC!: *** Did not find any ports to use ***
PANIC in pktgen_config_ports():
*** Did not find any ports to use ***6: [app/x86_64-native-linuxapp-gcc/pktgen(_start+0x2a) [0x5592bd24d66a]]
5: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f5b8de20b97]]
4: [app/x86_64-native-linuxapp-gcc/pktgen(main+0xe67) [0x5592bd24a1e7]]
3: [app/x86_64-native-linuxapp-gcc/pktgen(pktgen_config_ports+0x170b) [0x5592bd277a4b]]
2: [app/x86_64-native-linuxapp-gcc/pktgen(__rte_panic+0xc5) [0x5592bd244bb7]]
1: [app/x86_64-native-linuxapp-gcc/pktgen(rte_dump_stack+0x2e) [0x5592bd2dcb3e]]
Aborted (core dumped)
I use IGB_UIO not VFIO.
I don't set --socket-mem, and lcore1 and lcore2 are both ready.
--vdev fails to initialize. I have checked /var/run/openvswitch, vhost-user0 and vhost-user1 exists.
Thank you for sharing your idea. Best wishes.
I think /var/run/openswitch should be /var/run/openvswitch.
sudo docker run -ti --privileged --name=pktgen-docker -v /mnt/huge:/mnt/huge -v /usr/local/var/run/openvswitch:/var/run/openswitch pktgen-docker:latest
OVS creates the virtio-user sock in the directory as it is running in server mode. The pktgen is also started with virtio-user mode. This conflicts as both are running in server mode.
The correct way is to start pktgen in client mode using virtio_userx. If the goal is to have packet transfer between Docker-1 Pktgen and Docker-2 that is connected by OVS-DPDK one has to use net_vhost and virtio_user pair.
Docker-1 Pktgen (net_vhost) <==> OVS-DPDK port-1 (virtio_user) {Rule to forward} OVS-DPDK port-2 (virtio_user) <==> Docker-2 Pktgen (net_vhost)
In the current setup, you will have to make the following changes
start DPDK pktgen in Docker-1 by changing from --vdev 'net_virtio_user0,mac=00:00:00:00:00:05,path=/var/run/openvswitch/vhost-user0' to --vdev 'net_vhost0,iface=/var/run/openvswitch/vhost-user0'
start DPDK testpmd by changing from --vdev
start DPDK pktgen in Docker-2 by changing from --vdev 'net_virtio_user0,mac=00:00:00:00:00:05,path=/var/run/openvswitch/vhost-user0' to --vdev 'net_vhost1,iface=/var/run/openvswitch/vhost-user1'
then start DPDK-OVS with --vdev=virtio_user0,path=/var/run/openvswitch/vhost-user0 and --vdev=virtio_user1,path=/var/run/openvswitch/vhost-user1
add rules to allow the port to port forwarding between Docker-1 pktgen and Docker-2 pktgen
Note:
please update the command line for multiple ports.
explained in comments. This is more of a usage problem

Can't parse tcpdump output correctly

I'm trying to run this command: sudo tcpdump "ether proto 0x888e and ether host <BSSID>" -I -w -U -vvv -I <INTERFACENAME> -w ~/Desktop/handshake.cap which works perfectly in terms of its function, however, when I run the command, I get the following output:
tcpdump: listening on en0, link-type IEEE802_11_RADIO (802.11 plus radiotap header), capture size 262144 bytes
Got 0
Where Got 0 counts the number of packets captured. Furthermore when stopping the command, I get the following:
tcpdump: listening on en0, link-type IEEE802_11_RADIO (802.11 plus radiotap header), capture size 262144 bytes
^C0 packets captured
3526 packets received by filter
0 packets dropped by kernel
I'm trying to integrate this command into a script and would simply like everything but Got 0 to be omitted from the output.
I have experienced this sort of problem before but have simply used 2> /dev/null to get rid of the output I don't want. However, it seems that Got 0 is included in this blocked output and as a result, I get no output at all. Similarly, &>/dev/null removes all output as well. I have also tried piping it to sed -n '1!p' to ignore the first line but this has no effect and would not be preferable because in theory it would not remove 0 packets captured
3526 packets received by filter
0 packets dropped by kernel
Is anyone aware of how to resolve this issue?
Thank you in advance for any help,
Kind regards, Rocco
P.S. I am running macOS

Kubernetes node ulimit settings

I am running Kubernets v1.11.1 cluster, sometime my kube-apiserver server started throwing the 'too many open files' message. I noticed to many open TCP connections node kubelet port 10250
My server configured with 65536 file descriptors. Do I need to increase the number of open files for the container host? What are the recommended ulimit settings for the container host?
api server log message
I1102 13:57:08.135049 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
I1102 13:57:09.135191 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
I1102 13:57:10.135437 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
I1102 13:57:11.135589 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
I1102 13:57:12.135755 1 logs.go:49] http: Accept error: accept tcp [::]:6443: accept4: too many open files; retrying in 1s
my host ulimit values:
# ulimit -a
-f: file size (blocks) unlimited
-t: cpu time (seconds) unlimited
-d: data seg size (kb) unlimited
-s: stack size (kb) 8192
-c: core file size (blocks) unlimited
-m: resident set size (kb) unlimited
-l: locked memory (kb) 64
-p: processes unlimited
-n: file descriptors 65536
-v: address space (kb) unlimited
-w: locks unlimited
-e: scheduling priority 0
-r: real-time priority 0
Thanks
SR
65536 seems a bit low, although there are many apps that recommend that number. This is what I have on one K8s cluster for the kube-apiserver:
# kubeapi-server-container
# |
# \|/
# ulimit -a
-f: file size (blocks) unlimited
-t: cpu time (seconds) unlimited
-d: data seg size (kb) unlimited
-s: stack size (kb) 8192
-c: core file size (blocks) unlimited
-m: resident set size (kb) unlimited
-l: locked memory (kb) 16384
-p: processes unlimited
-n: file descriptors 1048576 <====
-v: address space (kb) unlimited
-w: locks unlimited
-e: scheduling priority 0
-r: real-time priority 0
Different from a regular bash process system limits:
# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15447
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024 <===
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15447
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
But yet the total max of the whole system:
$ cat /proc/sys/fs/file-max
394306
If you see this nothing can exceed /proc/sys/fs/file-max on the system, so I would also check that value. I would also check the number of file descriptors being used (first column), this will give you an idea of how many open files you have:
$ cat /proc/sys/fs/file-nr
2176 0 394306

Resources