libpcap: what is the efficiency of pcap_dispatch or pcap_next - network-programming

I use libpcap to capture a lot packets, and then process/modify these packets and send them to another host.
First, I create a libpcap handler handle and set it NON-BLOCKING, and use pcap_get_selecable_fd(handle) to get a corresponding file descriptor pcap_fd.
Then I add an event for this pcap_fd to a libevent loop(it is like select() or epoll()).
In order to avoid frequently polling this file descriptor, each time there are packet arrival event, I use pcap_dispatch to collect a bufferful of packets and put them into a queue packet_queue, and then call process_packet to process/modify/send each packet in the queue packet_queue.
pcap_dispatch(handle, -1, collect_pkt, (u_char *)packet_queue);
process_packet(packet_queue);
I use tcpdump to capture the packets that are sent by process_packet(packet_queue), and notice:
at the very beginning, the interval between sent packets is small
after that several packets are sent, the interval becomes around 0.055 second
after 20 packets are sent, the interval becomes 0.031 second and keeps on being 0.031 second
I carefully checked my source code and find no suspicious blocks or logic which leads to so big intervals. So I wonder whether it is due to the problem of the function pcap_dispatch.
are there any efficiency problem on pcap_dispatch or pcap_next or even the libpcap file descriptor?
thanks!

On many platforms libpcap uses platform-specific implementations for faster packet capture, so YMMV. Generally they involve a shared buffer between the kernel and the application.
At the very beginning you have a time window between the moment packets start piling up on the RX buffer and the moment you start processing. The accumulation of these packets may cause the higher frequency here. This part is true regardless of implementation.
I haven't found a satisfying explanation to this. Maybe you got behind and missed a few packets, so you the time between packets resent becomes higher.
This is what you'd expect in normal operation, I think.
pcap_dispatch is pretty much as good as it gets, at least in libpcap. pcap_next, on the other hand, incurs in two penalties (at least on Linux, but I think it does in other mainstream platforms too): a syscall per packet (libpcap calls poll for error checking, even in non-blocking mode) and a copy (libpcap releases the "slot" in the shared buffer ASAP, so it can't just return that pointer). An implementation detail is that, on Linux, pcap_next just calls pcap_dispatch for one packet and with a copy callback.

Related

Read Timeout TIdTCPClient

Good day. I use the TIdTCPClient component to send requests to the server and read the response. I know the size of the response for certain requests, but not for others.
When I know the size of the response, then my data reading code looks like this:
IdTCPClient1->Socket->Write(requestBuffer);
IdTCPClient1->Socket->ReadBytes(answerBuffer, expectSize);
When the size of the response is not known to me, then I use this code:
IdTCPClient1->Socket->Write(requestBuffer);
IdTCPClient1->Socket->ReadBytes(answerBuffer, -1);
In both cases, I ran into problems.
In the first case, if the server does not return all the data (less than expectSize), then IdTCPClient1 will wait for ReadTimeout to finish, but there will be no data at all in the answerBuffer (even if the server sent something). Is this the logic behind TIdTCPClient? It is right?
In the second case, ReadTimeout does not work at all. That is, the ReadBytes function ends immediately and nothing is written to the answerBuffer, or several bytes from the server are written. However, I expected that since this function in this case does not know the number of bytes to read, it must wait for ReadTimeout and read the bytes, who came during this time. For the experiment, I inserted Sleep (500) between writing and reading, and then I read all the data that arrived.
May I ask you to answer why this is happening?
Good day. I use the TIdTCPClient component to send requests to the server and read the response. I know the size of the response for certain requests, but not for others.
Why do you not know the size of all of the responses? What does your protocol actually look like? TCP is a byte stream, each message MUST be framed in such a way that a receiver can know where each message begins and ends in order to read the messages correctly and preserve the integrity of the stream. As such, messages MUST either include their size in their payload, or be uniquely delimited between messages. So, which is the case in your situation? It doesn't sound like you are handling either possibility.
When the size of the response is not known to me, then I use this code:
IdTCPClient1->Socket->Write(requestBuffer);
IdTCPClient1->Socket->ReadBytes(answerBuffer, -1);
When you set AByteCount to -1, that tells ReadBytes() to return whatever bytes are currently available in the IOHandler's InputBuffer. If the InputBuffer is empty, ReadBytes() waits, up to the ReadTimeout interval, for at least 1 byte to arrive, and then it returns whatever bytes were actually received into the InputBuffer, up to the maximum specified by the IOHandler's RecvBufferSize. So it may still take multiple reads to read an entire message in full.
In general, you should NEVER set AByteCount to -1 when dealing with an actual protocol. -1 is good to use only when proxying/streaming arbitrary data, where you don't care what the bytes actually are. Any other use require knowledge of the protocol's details of how messages are framed.
In the first case, if the server does not return all the data (less than expectSize), then IdTCPClient1 will wait for ReadTimeout to finish, but there will be no data at all in the answerBuffer (even if the server sent something). Is this the logic behind TIdTCPClient? It is right?
Yes. When AByteCount is > 0, ReadBytes() waits for the specified number of bytes to be available in the InputBuffer before then extracting that many bytes into your output TIdBytes. Your answerBuffer will not be modified unless all of the requested bytes are available. If the ReadTimeout elapses, an EIdReadTimeout exception is raised, and your answerBuffer is left untouched.
If that is not the behavior you want, then consider using ReadStream() instead of ReadBytes(), using a TIdMemoryBufferStream or TBytesStream to read into.
In the second case, ReadTimeout does not work at all. That is, the ReadBytes function ends immediately and nothing is written to the answerBuffer.
I have never heard of ReadBytes() not waiting for the ReadTimeout. What you describe should only happen if there are no bytes available in the InputBuffer and the ReadTimeout is set to some very small value, like 0 msecs.
or several bytes from the server are written.
That is a perfectly reasonable outcome given you are asking ReadBytes() to read an arbitrary number of bytes between 1..RecvBufferSize, inclusive, or read no bytes if the timeout elapses.
However, I expected that since this function in this case does not know the number of bytes to read, it must wait for ReadTimeout and read the bytes, who came during this time.
That is how it should be working, yes. And how it has always worked. So I suggest you debug into ReadBytes() at runtime and find out why it is not working the way you are expecting. Also, make sure you are using an up-to-date version of Indy to begin with (or at least a version from the last few years).
Why do you not know the size of all of the responses?
Because, in fact, I'm doing a survey of an electronic device. This device has its own network IP address and port. So, the device can respond to the same request in different ways, depending on its status. Strictly speaking, there can be two answers to some queries and they have different lengths. It is in these cases, when reading, I specify AByteCount = -1 to read any device response.
I have never heard of ReadBytes() not waiting for the ReadTimeout.
You're right! I was wrong. When specifying AByteCount = -1, I get one byte. As you said, if at least one byte arrives, it returns and ReadBytes() ends.
Also, make sure you are using an up-to-date version of Indy to begin with (or at least a version from the last few years).
I am working with C++ Builder 10.3 Community Edition, Indy version 10.6.2.5366.

how to detect XMIT FIFO is full on a UART 16550 or higher

I have read already lot of specs and code about UART, but I cannot find any indication on how to find by software interface if the transmit FIFO is full. There is an interrupt when the FIFO is empty. Then I can write at least N characters, where N is the fifo size. But when I have written these N characters, a number of them have already been sent. So I can in fact write more than N characters, but there is no FIFO full interrupt. The specs says that when the fifo is full indeed the TXREADY pin on the chip is inverted. Is there a way to find this by software ? The Line Status Register bit only says that the fifo is not empty, which does not mean it is full...
Anyone can help ? I want to write characters until the fifo is full...
Looks to me also that they neglected this, but most people get by with the thing as it is. The usual way to use it is to get an interrupt, fill the FIFO (normally very fast compared to serial data rate) and then return.
There is a situation where it seems to me that what you are asking for could be nice...if transmitting in a polling mode...you want to send 10 bytes, your polling shows the FIFO is not empty, so you have not way to know if you can send them all or not...either you wait there until it is empty, which sort of defeats the purpose of the FIFO, or you continue polling other stuff until you get back to checking for FIFO empty, and maybe that slows your overall transmission rate. Guess it is not a very usual way to operate, so nobody worries about it.
The 16550D datasheet says the following:
The transmitter holding register interrupt (02) occurs when the XMIT
FIFO is empty; it is cleared as soon as the transmitter holding
register is written to (1 to 16 characters may be written to the XMIT
FIFO while servicing this interrupt) or the IIR is read.
This means that when the Line Status Register register (port base + 5) indicates Transmitter Empty condition (in bit 5), the transmit FIFO is completely empty and you may write up to 16 bytes to the transmitter holding register (port base + 0). It is important not to write more than 16 bytes between occurrences of the transmitter empty bit being set.
If you don't need to write 16 bytes at the point when you received the IRQ (or saw the transmitter register empty bit set, if polling), you can either keep track of how many bytes you wrote since the last transmitter empty state, or, just defer writing further bytes until the next transmitter empty state.

Why is it not safe to use Socket.ReceiveLength?

Well, even Embarcadero states that it is not guaranteed to return accurate result of the bytes ready to read in the socket buffer, but if you look at it, when you place -1 at Socket.ReceiveBuf (this is what ReceiveLength wraps) it calls ioctlsocket with FIONREAD to determine the amount of data pending in the network's input buffer that can be read from socket s.
so, how is it not safe or bad ?
e.g: ioctlsocket(Socket.SocketHandle, FIONREAD, Longint(i));
The documentation you mention specifically says (emphasis mine)
Note: ReceiveLength is not guaranteed to be accurate for streaming socket connections.
This means that the length is not known ahead of time because it's being supplied by a stream of data. Obviously, if you don't know how big the data is that's being sent ahead of time, you can't properly set the length the client should expect.
Consider it like generic code to copy a file. If you don't know ahead of time how big the file is you'll be copying, you can't predict how many bytes you'll be copying. In the case of the socket, the stream size that's supplying the socket isn't known in advance (for instance, for data being generated real-time and sent), so there's no way to inform the client socket how much to expect.

Is transmitted bytes event exist in Linux kernel?

I need to write a rate limiter, that will perform some stuff each time X bytes were transmitted.
The straightforward is to check the length of each transmitted packet, but I think it will be to slow for me.
Is there a way to use some kind of network event, that will be triggered by transmitted packets/bytes?
I think you may look at netfilter.
Using its (kernel level) api, you can have your custom code triggered by network events, modify received messages before passing it to application, and so on.
http://www.netfilter.org/
It's protocol dependent, actually. But for TCP, you can setsockopt the SO_RCVLOWAT option to define the minimum number of bytes (watermark) to permit the read operation.
If you need to enforce the maximum size too, adjust the receive buffer size using SO_RCVBUF.

Double system call to write() causes massive network slowdown

In a partially distributed network app I'm working on in C++ on Linux, I have a message-passing abstraction which will send a buffer over the network. The buffer is sent in two steps: first a 4-byte integer containing the size is sent, and then the buffer is sent afterwards. The receiving end then receives in 2 steps as well - one call to read() to get the size, and then a second call to read in the payload. So, this involves 2 system calls to read() and 2 system calls to write().
On the localhost, I setup two test processes. Both processes send and receive messages to each other continuously in a loop. The size of each message was only about 10 bytes. For some reason, the test performed incredibly slow - about 10 messages sent/received per second. And this was on localhost, not even over a network.
If I change the code so that there is only 1 system call to write, i.e. the sending process packs the size at the head of the buffer and then only makes 1 call to write, the whole thing speeds up dramatically - about 10000 messages sent/received per second. This is an incredible difference in speed for only one less system call to write.
Is there some explanation for this?
You might be seeing the effects of the Nagle algorithm, though I'm not sure it is turned on for loopback interfaces.
If you can combine your two writes into a single one, you should always do that. No sense taking the overhead of multiple system calls if you can avoid it.
Okay, well I'm using TCP/IP (SOCK_STREAM) sockets. The example code is pretty straight forward. Here is a basic snippet that reproduces the problem. This doesn't include all the boiler plate setup code, error-checking, or ntohs code:
On the sending end:
// Send size
uint32_t size = strlen(buffer);
int res = write(sock, &size, sizeof(size));
// Send payload
res = write(sock, buffer, size);
And on the receiving end:
// Receive size
uint32_t size;
int res = read(sock, &size, sizeof(size));
// Receive payload
char* buffer = (char*) malloc(size);
read(sock, buffer, size);
Essentially, if I change the sending code by packing the size into the send buffer, and only making one call to write(), the performance increase is almost 1000x faster.
This is essentially the same question: C# socket abnormal latency .
In short, you'll want to use the TCP_NODELAY socket option. You can set it with setsockopt.
You don't give enough information to say for sure. You don't even say which protocol you're using.
Assuming TCP/IP, the socket could be configured to send a packet on every write, instead of buffering output in the kernel until the buffer is full or the socket is explicitly flushed. This means that TCP sends the two pieces of data in different fragments and has to defeagment them at the other end.
You might also be seeing the effect of the TCP slow-start algorithm. The first data sent is transmitted as part of the connection handshake. Then the TCP window size is slowly ramped up as more data is transmitted until it matches the rate at which the receiver can consume data. This is useful in long-lived connections but a big performance hit in short-lived ones. You can turn off slow-start by setting a socket option.
Have a look at the TCP_NODELAY and TCP_NOPUSH socket options.
An optimization you can use to avoid multiple system calls and fragmentation is scatter/gather I/O. Using the sendv or writev system call you can send the 4-byte size and variable sized buffer in a single syscall and both pieces of data will be sent in the same fragment by TCP.
The problem is that with the first call to send, the system has no idea the second call is coming, so it sends the data immediately. With the second call to send, the system has no idea a third call isn't coming, so it delays the data in hopes that it can combine the data with a subsequent call.
The correct fix is to use a 'gather' operation such as writev if your operating system supports it. Otherwise, allocate a buffer, copy the two chunks in, and make a single call to write. (Some operating systems have other solutions, for example Linux has a 'TCP cork' operation.)
It's not as important, but you should optimize your receiving code too. Call 'read' asking for as many bytes as possible and then parse them yourself. You're tying to teach the operating system your protocol, and that's not a good idea.

Resources