Should I split data or allow fragmentation - network-programming

I am preparing to write a program that will be sending/receiving UDP datagrams.
Here is my question:
Should I manage the data so that what I am sending will fit in a single datagram? Basically splitting the data within the application and then sending multiple datagrams. Or, should I allow the network to handle the fragmentation and reassembly?
I am assuming standard MTU size of 1500 bytes. In theory, the maximum UDP payload on top of IPv4 is 65507 bytes.
Performance/overhead info, best practices, and other information is appreciated.

IP stack layers (and OSI layers) are designed so that works in one layer are transparent to upper layers. So, initially, you don't have to worry about how you send the data. You should know only aspects related to UDP: connectionless, unreliable ... and shouldn't care about IP layer.
I cannot see any advantage in considering the MTU at a higher layer unless you have your own custom stack.

Related

When to write a Custom Kernel Module

Problem Statement:
I have a very high bandwidth data link that is UDP based. The source of this data is not configurable, and sends on UDP a stream of datagrams. We have code that uses the standard methods for receiving data on the UDP socket that works adequately. I wanted to know if
Does there exist a command interface to extract multiple UDP datagrams at a time? to improve efficiency?
If one doesn't exist, does it make sense to create a kernel module to provide the capability?
I am a novice, and i wanted to understand what thought process has to happen when writing your own kernel module seems appropriate. I know that such a surgical procedure isn't meant to done lightly, but there must be a set of criteria where that action is prudent. Maybe not in my case, but in general.
HW / Kernel Module Perspective
A typical network adapter these days would be capable of distributing received packets across multiple hardware Rx queues thus letting the host run multiple software Rx queues bound to different CPU cores reading out packets in parallel. From a single HW/SW queue perspective, the host may poll it for new packets (see Linux NAPI), with each poll ideally yielding a batch of packets, and, alternatively, the host may still use interrupt-driven approach for Rx signalling with interrupt coalescing turned on for improved efficiency.
Existing NIC drivers in Linux kernel strive to stick with the most performant techniques, and the kernel itself should be able to leverage all of that properly.
Userland / Application Perspective
There's PACKET_MMAP interface provided by Linux kernel for improved Rx/Tx efficiency on the application side. Long story short, an application can set up a memory buffer shared between the kernel- and userspace and read out incoming packets from it, ideally in batches, or blocks, thus avoiding costly kernel-to-userspace copies and context switches so customary when using regular methods.
For added efficiency, the application may have multiple sockets bound to the NIC in separate threads / processes and demand that packet reception be load balanced across these sockets (see AF_PACKET fanout mode description).
DPDK Perspective
Kernel bypass framework that allows an application to seize full control of a network adapter by means of a vendor-specific poll-mode driver, or PMD, effectively running in userspace as part of the application and by its very nature not needing any kernel-to-userspace copies, context switches and, most likely, locking. Multi-queue receive operation, load balancing (round robin, RSS, you name it) and more cutting edge offloads are likely to be available, too (it's vendor specific).
Summary
The short of it, given the fact that multiple network acceleration techniques already exist, one need never write their own kernel module to solve the problem in question. By the looks of it, your application, which, as you say, uses standard methods, is not aware of PACKET_MMAP technique. So I'd be tempted to suggest looking at this one closely. DPDK approach might require that the application be effectively re-implemented from scratch, so I would first go for PACKET_MMAP approach as a low-hanging fruit.

Simulating packet encapsulation

I am developing an application which aims to simulate a real network. In order to do this, I need to have detailed information about how a packet is formed in a system.
Imagine you have an application layer message and you want to encapsulate it in a transport layer payload and add a specific port number for desired process in the header, and then encapsulate it in network layer payload and add IP addresses.
My question is that
Where does the encapsulation of upper layer protocols' packets to lower layers happen?
Is network card driver responsible for that or some other part in OS? and if so, which part?
I just want to note that I’ve read computer networks: A top down approach and Foruzan's book on the subject but all the information there ,was so theoretical.
Thanks in advance.
If you are asking about a real implementation, usually every message of a layer is conveyed as the whole payload of the lower layer message. Talking about TCP/IP stack in an OS like Windows or Linux, without SSL/TLS, this depends on the types of sockets you use. Supposing you use TCP, STREAM sockets, the application layer message you send with send or write system calls will become the payload of the TCP message. The processing of a TCP segment and an IP datagram happens in the OS Kernel. The processing of a layer 2 frame happens part in the NIC's device driver (in the kernel) and part in the NIC hardware. This depends on the specific NIC.
Something else to add is that some NIC's are able to calculate the checksum of TCP segments and UDP datagrams. Then the kernel offloads this task to the NIC. Only the checksum.

Converting TCP/IP traffic to Halfduplex in linux

I am developing a custom network driver for a PHY media which doesn't support full duplex mode.
I want to use TCP/IP traffic with this network driver and on top of this half-duplex PHY media.
But TCP/IP traffic can be full duplex. I would like to implement some mechanism/algorithm in this driver so that this custom network driver will convert TCP/IP traffic to Half duplex in linux.
Please let me know if this can be achieved or how to do it.
So you are trying to write a driver which supports full duplex traffic on a card which actually does NOT support the feature...
Well..you must be aware that the networking subsystem is one of the largest subsystems in the kernel and one of the few which actually uses softirqs (because it is always looking at getting scaled appropriately in this day and age of multiprocessers) and still had to resort to some trickery (NAPI) ir order to manage the deluge of interrupt requests generated by the ever increasing rates of the present day media...why im saying all this is because I just would want to remind you of the real life complexities involved in writing a 'regular' network driver, let alone a 'pseudo full duplex' driver.
Now I believe what you pretty much want is to give an illusion of 'full duplexity' to the...TCP/IP stack ( is it ??) i.e your driver is just another full duplex driver and it (any one of this driver's clients, be it the MAC layer or something like ethtool) can go have a ball with it (in terms of dumping & retrieving packets) in the same manner as it does with (and expects results out of) a 'regular full duplex' driver...
So if this is really the case, I wonder what good giving such an illusion might be? Perhaps you are just experimenting? IN any case, TCP is by default full duplex anyways and by using a half duplex media the data rates anyway are a bit lower (although not exactly half) than those obtained by using a full duplex adapter. I don't think it even matters at the higher layers (in terms of functionality) whether the media is full or half duplex (except may be in the MAC layer?) correct me if Im wrong.
There were (and still are) quite a few half duplex media in use currently and as such there are many media which support both full duplex and half duplex traffic..I fail to see how it will affect the clients of the driver (besides lowering the over all data rate as the only tangible effect)...which means you can pretty much look at any netwrk driver in the kernel and see that it has ways to configure the adapter to use either full or half duplex (and the user space can, say ethtool as one of the ways to toggle this...)...
Anyways, you may want to have a look and perhaps take a few tips from MODBus driver (the bus being by default half duplex) here.
I'm not sure how you're relating MAC layer with TCP layer. Duplex mode is a Ethernet domain and it doesn't propagate to IP and not event to TCP, in Ethernet terms duplexity means you can send or receive a MAC frames exclusively at different times (half duplex) or at the same time (full duplex).
The upper layers of the network stack are completely (at least they should) unaware of this process. Consider the following example, you're sending a huge file over the network using FTP, assuming most normal network systems the stack would be FTP/TCP/IP/Ethernet. From FTP perspective you have a virtual session, from TCP you have a virtual pipe, from IP you just know how to reach the end system and from Ethernet perspective you just know how to reach the next node in the network.
TCP doesn't care your packets are chopped during the transmission nor if your packet is delayed within a certain threshold due an incoming packet arriving It only cares to receive a confirmation receipt that the packet made it to the final destination. I hope I can show my point.

programing practices : how to choose a packet size for UDP datagrams?

Disclaimer : this is not a "how to" question. I would more like to know, as a background information what are the different actual practices actually used.
We know that UDP does not have a PMTU discovery like TCP. So I see several approaches to avoid IP fragmentation with UDP :
Sending 512 bytes packets max (the UDP approach)
re-implementing some PMTU  (using ICMP "needs fragmentation" message).
Relying on local MTU (but how far is it reliable, as UDP isn't a connected protocol, how can it knows which interface its packets are going to go through ?)
others... ?
So what I would like is to have a "background" idea of which approaches are being used by current UDP programs/protocols, especially regarding streaming/VoIP common applications ?
Thanks by advance,
Jocelyn
Limiting to 576 bytes is very common. Most of the Internet protocols such as DNS do this. Most real-time streaming protocols also use smaller packets since it has the added benefit of providing lower serialization delay and less impact if a single packet is lost.
Some protocols have ways of negotiating a larger packet size, though often not in a way that's as robust as PMTU discovery (DHCP, for example allows a maximum message size negotiation).
There's also stuff that defaults to 1500 or so and lets the user lower it if necessary. Most implementations of SNMP seem to do something like this.
In any event, the DF bit isn't generally set, so the consequence of being overly optimistic is fragmentation, not brokenness.

Building a Network Appliance Prototype Using a standard PC with Linux and Two NIC's

I am willing to build a prototype of network appliance.
This appliance is suppose to transparently manipulate Ethernet packets. It suppose to have two network interface cards having one card connected to the outside leg (i.e. eth0) and the other to the inside leg (i.e. eth1).
In a typical network layout as in the attached image, it will be placed between the router and the LAN's switch.
My plans are to write a software that hooks at the kernel driver level and do whatever I need to do to incoming and outgoing packets.
For instance, an "outgoing" packet (at eth1) would be manipulated and passed over to the other NIC (eth0) which then should be transported over to the next hope
My questions are:
Is this doable?
Those NIC's will have no IP address, is that should be a problem?
Thanks in advance for your answers.
(And no, there is no such device yet in the market, so please, "why reinvent the wheel" style of answers are irrelevant)
typical network diagram http://img163.imageshack.us/img163/1249/stackpost.png
I'd suggest libipq, which seems to do just what you want:
Netfilter provides a mechanism for passing packets out of the stack for queueing to userspace, then receiving these packets back into the kernel with a verdict specifying what to do with the packets (such as ACCEPT or DROP). These packets may also be modified in userspace prior to reinjection back into the kernel.
Apparently, it can be done.
I am actually trying to build a prototype of it using scapy
as long as the NICs are set to promiscous mode, they catch packets on the network without the need of an IP address set on them. I know it can be done as there are a lot of companies that produce the same type of equipment (I.E: Juniper Networks, Cisco, F5, Fortinet ect.)

Resources