I made a tcpdump and captured packets, the configured MTU is 2140. I am analysing pcap files using Wireshark.
According to the configured MTU the expected maximum size of the packets should be 2154 (2140 bytes +14 ethernet header bytes). But I see packets of size greater than 2154 (ex 9010 bytes), On analyzing I found that these packets are generated on the machine where I made tcpdump (let's say A) and have the destination to another machine (let's say B). I expect a packet to be fragmented before it is sent to another host. I found some explanations online that says tcpdump captures packets before NIC breakdown, though this seems to be a valid explanation but it seems to contradict in my case because on machine A, I received packets of size greater than 2154 from B. Any thoughts, on why machine A is sending and receiving packets greater than configured MTU.
What you are seeing is most likely the result of TCP Segment Reassembly Offloading. This is a feature available on some network cards with matching drivers.
The idea is that the reassembly of many of the TCP segments is handled in the NIC itself. This turns out to be pretty effective in reducing overhead on the CPU/OS side since the network driver need only handle, perhaps, 1 "packet" out of 10, seeing just one large packet, rather than receiving and reassembling all 10.
You can read more about it here.
Updated answer
If your packet is UDP
This behaviour is normal. but there is not much you can do to see the individual packets on the end machines. The UDP packet is broken down into MTU compliant packets and reassembled at the Link layer, usually by specific hardware. This is too low to to be captured by Wireshark/pcap.
If you want to capture the individual broken down packets, you have to do this on an intermediate machine/network card, for example a gateway between the two hosts, because the original UDP packet is not reassembled until it reaches its final destination. Note : this gateway can be virtual.
See notes.shichao.io/tcpv1/ch10
Leaving this here in case someone with the same problem comes...
If your packet is TCP :
It sounds like Wireshark is reassembling packets for you. This is often the default for TCP streams. You can change this by richt-click on a stream -> Protocol Preferences -> Allow subdissectors to reassemble TCP.
Related
I have multiple readers on a single system which bind to a single address (IP:port ex. 239.0.0.1:1234). Another computer on group sends a UDP multicast packet to this group and readers should receive it. I used GLib 2.0 networking stack, g_socket_bind with allow_reuse set to true.
When there is a single reader (single socket binded to that address) or up to three readers everything is ok and readers will receive packets correctly. But when the number of readers increases to four or above, the packet loss occurs and linearly increases with number of readers on system.
If socket is a UDP socket, then allow_reuse determines whether or not other UDP sockets can be bound to the same address at the same time. In particular, you can have several UDP sockets bound to the same address, and they will all receive all of the multicast and broadcast packets sent to that address.
As stated in GIO Reference Manual, when allow_reuse set true, all readers should read all of data but it doesn't happen as the stated above.
Anybody knows what the problem is? Is there a kernel related problem?
All your sockets need to join the multicast group. If you're just relying on the bind to effect that, you are into undefined behaviour.
Background: Coding multiplayer for a simulator (Windows, .net), using peer-to-peer UDP transmission. This Q is not about advantages of UDP vs. TCP nor about packet headers. A related discussion to this Q is here.
Consider: I send a UDP packet with payload size X, where X can be anything between 1 and 500 bytes.
Q: Will there/can there, at any point during the transmission, temporarily be added slack bytes to the packet, ie. bytes in addition to needed headers/payload? For example, could it be that any participant in the transmission (Windows OS - NAT - internet - NAT - Windows OS) added bytes to fulfill a certain block size, so that these added bytes become a part of the transmission (even though cut off later on), and actually are transmitted, thus consuming processor (switch, server CPU) cycles?
(Reason for asking is how much effort to spend on composing/decomposing the packet, of course :-). Squeeze it to the last bit (small, more local CPU cycles) vs. allow the packet to be partially self-describing (bigger, less local CPU). Note that packet size is always less than the (nearest to me, that i know of) MTU, the normally-closer-to 1500 bytes)
Thx!
The short answer is: Yes.
Take Ethernet as an example. For collision detection purposes, the minimum payload size of an Ethernet frame is 42 bytes. If payload (which includes application data, UDP and IP header, in this case) is less than that, a padding will be added to the Ethernet frame.
Also, as far as I know, the network card will to this job, not it's driver or the OS.
If you want to decide whether it is better to send small packets or wait and send bigger ones, take a look at Nagle's algorithm.
Here ou can see the Ethernet padding in practice:
What are the 0 bytes at the end of an Ethernet frame in Wireshark?
I have a small confusion and curiosity.
When a buffer is full, How does the respective OSI layer start removing packets? Does it discriminate between broadcasts vs unicasts?
(For specifics if required, 802.11g and 802.15.4)
I remember reading in some paper that it starts with discarding unicasts packets first. But i can't find some credible source on the subject.
Thank you for your time.
Best regards,
Rehan
Context
I am trying to highlight the inherent differences between how broadcast packets are handled vs unicasts. Namely:
1) Unlike Unicast packets, Broadcast packets don't require RTS/CTS
2) Unlike Unicast packets, Broadcast packets don't require a destination address
3) In an event of full buffer,... ?
I want to send broadcast message to all peers in my local network. Message is a 32 bit integer. I can be sure, that message will not me fragmented, right? There will be two options:
- peer will receive whole message at once
- peer will not receive message at all
Going further, 4 bytes is maximum size of data, that can be sent in one UDP datagram? I use Ethernet based network in 99%.
IPv4 specifies a minimum supported MTU of 576 bytes, including the IP header. Your 4 byte UDP payload will result in an IP packet far smaller than this, so you need not fear fragmentation.
Furthermore, your desired outcome - "peer will receive whole message at once or peer will not receive message at all" is always how UDP works, even in the presence of fragmentation. If a fragment doesn't arrive, your app won't recieve the packet at all.
The rules for UDP are "The packet may arrive out-of-order, duplicated, or not at all. If the packet does arrive, it will be the whole packet and error-free.". ("Error-free" is obviously only true within the modest limits of the IP checksum).
Ethernet packets can be up to around 1500 bytes (and that's not counting jumbo frames). If you send broadcast messages with a payload of only 4 bytes, they shouldn't get fragmented at all. Fragmentation should only occur when the packet is larger than the Maximum Transmission Unit (so about 1500 bytes over Ethernet).
I have created a raw socket which takes all IPv4 packets from data link layer (with data link layer header removed). And for reading the packets I use recvfrom.
My doubt is:
Suppose due to some scheduling done by OS, my process was asleep for 1 sec. When it woke up,it did recvfrom (with number of bytes to be received say 1000) on this raw socket (with the intention of receiving only one IPv4 packet and say the size of this packet is 380 bytes). And suppose many network applications were also running simultaneously during this time, so all IPv4 packets must have been queued in the receive buffer of this socket. So now recvfrom will return all 1000 bytes (with other IPv4 packets from 381th byte onwards) bcoz it has enough data in its buffer to return. Although my program was meant to understand only one IPv4 packet
So how to prevent this thing? Should I read byte by byte and parse each byte but it is very inefficient.
IIRC, recvfrom() will only return one packet at a time, even if there are more in the queue.
Raw sockets operate at the packet layer, there is no concept of data streams.
You might be interested in recvmmsg() if you want to read multiple packets in one system call. Recent Linux kernels only, there is no equivalent send side implementation.