Is transmitted bytes event exist in Linux kernel? - network-programming

I need to write a rate limiter, that will perform some stuff each time X bytes were transmitted.
The straightforward is to check the length of each transmitted packet, but I think it will be to slow for me.
Is there a way to use some kind of network event, that will be triggered by transmitted packets/bytes?

I think you may look at netfilter.
Using its (kernel level) api, you can have your custom code triggered by network events, modify received messages before passing it to application, and so on.
http://www.netfilter.org/

It's protocol dependent, actually. But for TCP, you can setsockopt the SO_RCVLOWAT option to define the minimum number of bytes (watermark) to permit the read operation.
If you need to enforce the maximum size too, adjust the receive buffer size using SO_RCVBUF.

Related

What does CBATTError Code insufficientResources really mean?

I'm trying to send data over BLE from my iPhone to an ESP32 board. I'm developing in flutter platform and I'm using flutter_reactive_ble library.
My iPhone can connect to the other device and it can also send 1 byte using writeCharacterisiticWithResponse function. But when I try to send my real data which is large (>7000 bytes), it then gives me the error:
flutter: Error occured when writing 9f714672-888c-4450-845f-602c1331cdeb :
Exception: GenericFailure<WriteCharacteristicFailure>(
code: WriteCharacteristicFailure.unknown,
message: "Error Domain=CBATTErrorDomain Code=17
"Resources are insufficient."
UserInfo={NSLocalizedDescription=Resources are insufficient.}")
I tried searching for this error but didn't find additional info, even in Apple Developer website. It just says:
Resources are insufficient to complete the ATT request.
What does this error really means? Which resources are not sufficient and how to work around this problem?
This is almost certainly larger than this characteristic's maximum value length (which is probably on the order of 10s of bytes, not 1000s of bytes). Before writing, you need to call maximumWriteValueLength(for:) to see how much data can be written. If you're trying to send serial data over a characteristic (which is common, but not really what they were designed for), you'll need to break your data up into chunks and reassemble them on the peripheral. You will likely either need an "end" indicator of some kind, or you will need to send the length of the payload first so that the receiver knows how much to excpect.
First of all, a characteristic value cannot be larger than 512 bytes. This is set by the ATT standard (Bluetooth Core Specification v5.3, Vol 3, Part F (ATT), section 3.2.9). This number has been set arbitrarily by the protocol designers and does not map to any technical limitation of the protocol.
So, don't send 7000 bytes in a single write. You need to keep it at most 512 to be standard compliant.
If you say that it works with another Bluetooth stack running on the GATT server, then I guess CoreBluetooth does not enforce/check the maximum length of 512 bytes on the client side (I haven't tested). Therefore I also guess the error code you see was sent by the remote device rather than by CoreBluetooth locally as a pre-check.
There are three different common ways of writing a characteristic on the protocol level (Bluetooth Core Specification v5.3, Vol 3, Part G (GATT), section 4.9 Characteristic Value Write):
Write Without Response (4.9.1)
Write Characteristic Value (4.9.3)
Write Long Characteristic Values (4.9.4)
Number one is unidirectional and does not result in a response packet. It uses a single ATT_WRITE_CMD packet where the value must be at most ATT_MTU-3 bytes in length. This length can be retrieved using maximumWriteValueLength(for:) with .withoutResponse. The requestMtu method in flutter_reactive_ble uses this method internally. If you execute many writes of this type rapidly, be sure to add flow control to avoid CoreBluetooth dropping outgoing packets before they are sent. This can be done through peripheralIsReadyToSendWriteWithoutResponse by simply always waiting for this callback after each write, before you write the next packet. Unfortunately, it seems flutter_reactive_ble does not implement this flow control mechanism.
Number two uses a single ATT_WRITE_REQ where the value must be at most ATT_MTU-3 bytes in length, just as above. Use the same approach as above to retrieve that maximum length (note that maximumWriteValueLength with .withResponse always returns 512 and is not what you want). Here however, either an ATT_WRITE_RSP will be returned on success or an error packet will be received with an error code. Only one ATT transaction can be outstanding at a time, which significantly lowers throughput compared to Write Without Response.
Number three uses a sequence of multiple ATT_PREPARE_WRITE_REQ packets (containing offset and value) followed by an ATT_EXECUTE_WRITE_REQ. The maximum length of the value in each each chunk is ATT_MTU-5. Each _REQ packet also requires a corresponding _RSP packet before it can continue (alternatively, an error code could be sent by the remote device). This approach is used when the characteristic value to be written is too long to be sent using a single ATT_WRITE_REQ.
For any of the above write methods, you are always also limited by the maximum attribute size of 512 bytes as per the specification.
Any Bluetooth stack I know of transparently chooses between "Write Characteristic Value" and "Write Long Characteristic Values" when you tell it to write with response, depending on the value length and MTU. Server side it's a bit different. Some stacks put the burden on the user to combine all packets but it seems nimble handles that on its own. From what I can see in the source code (https://github.com/apache/mynewt-nimble/blob/26ccb8af1f3ea6ad81d5d7cbb762747c6e06a24b/nimble/host/src/ble_att_svr.c#L2099) it can return the "Insufficient Resources" error code when it tries to allocate memory but fails (most likely due to too much buffered data). This is what might happen for you. To answer your first actual question, the standard itself does not say anything else about this error code than simply "Insufficient Resources to complete the request".
The error has nothing to do with LE Data Length extension, which is simply an optimization for a lower layer (BLE Link Layer) that does not affect the functionality of the host stack. The L2CAP layer will take care of the reassembling of smaller link layer packets if necessary, and must always support up to the negotiated MTU without overflowing any buffers.
Now, to answer your second question, if you send very large amounts of data (7000 bytes), you must divide the data in multiple chunks and come up with a way to correctly be able to combine them. Each chunk is written as a full characteristic value. When you do this, be sure to send values at most of size ATT_MTU-3 (but never larger than 512 bytes), to avoid the inefficient overheads of "Write Long Characteristic Values". It's then up to your application code to make sure you don't run out of memory in case too much data is sent.

How do I receive arbitrary length data using a UdpSocket?

I am writing an application which sends and receives packages using UDP. However, the documentation of recv_from states:
If a message is too long to fit in the supplied buffer, excess bytes may be discarded.
Is there any way to receive all bytes and write them into a vector? Do I really have to allocate an array with the maximum packet length (which, as far as I know, is 65,507 bytes for IPv4) in order to be sure to receive all data? That seems a bit much for me.
Check out the next method in the docs, UdpSocket::peek_from (emphasis mine):
Receives a single datagram message on the socket, without removing it from the queue.
You can use this method to read a known fixed amount of data, such as a header which contains the length of the entire packet. You can use crates like byteorder to decode the appropriate part of the header, use that to allocate exactly the right amount of space, then call recv_from.
This does require that the protocol you are implementing always provides that total size information at a known location.
Now, is this a good idea?
As ArtemGr states:
Because extra system calls are much more expensive than getting some space from the stack.
And from the linked question:
Obviously at some point you will start wondering if doubling the number of system calls to save memory is worth it. I think it isn't.
With the recent Spectre / Meltdown events, now's a pretty good time to be be reminded to avoid extra syscalls.
You could, as suggested, just allocate a "big enough" array ahead of time. You'll need to track how many bytes you've actually read vs allocated though. I recommend something like arrayvec to make it easier.
You could instead implement a pool of pre-allocated buffers on the heap. When you read from the socket, you use a buffer or create a new one. When you are done with the buffer, you put it back in the pool for reuse. That way, you incur the memory allocation once and are only passing around small Vecs on the stack.
See also:
How can I create a stack-allocated vector-like container?
How large should my recv buffer be when calling recv in the socket library
How to read UDP packet with variable length in C

Why is it not safe to use Socket.ReceiveLength?

Well, even Embarcadero states that it is not guaranteed to return accurate result of the bytes ready to read in the socket buffer, but if you look at it, when you place -1 at Socket.ReceiveBuf (this is what ReceiveLength wraps) it calls ioctlsocket with FIONREAD to determine the amount of data pending in the network's input buffer that can be read from socket s.
so, how is it not safe or bad ?
e.g: ioctlsocket(Socket.SocketHandle, FIONREAD, Longint(i));
The documentation you mention specifically says (emphasis mine)
Note: ReceiveLength is not guaranteed to be accurate for streaming socket connections.
This means that the length is not known ahead of time because it's being supplied by a stream of data. Obviously, if you don't know how big the data is that's being sent ahead of time, you can't properly set the length the client should expect.
Consider it like generic code to copy a file. If you don't know ahead of time how big the file is you'll be copying, you can't predict how many bytes you'll be copying. In the case of the socket, the stream size that's supplying the socket isn't known in advance (for instance, for data being generated real-time and sent), so there's no way to inform the client socket how much to expect.

Double system call to write() causes massive network slowdown

In a partially distributed network app I'm working on in C++ on Linux, I have a message-passing abstraction which will send a buffer over the network. The buffer is sent in two steps: first a 4-byte integer containing the size is sent, and then the buffer is sent afterwards. The receiving end then receives in 2 steps as well - one call to read() to get the size, and then a second call to read in the payload. So, this involves 2 system calls to read() and 2 system calls to write().
On the localhost, I setup two test processes. Both processes send and receive messages to each other continuously in a loop. The size of each message was only about 10 bytes. For some reason, the test performed incredibly slow - about 10 messages sent/received per second. And this was on localhost, not even over a network.
If I change the code so that there is only 1 system call to write, i.e. the sending process packs the size at the head of the buffer and then only makes 1 call to write, the whole thing speeds up dramatically - about 10000 messages sent/received per second. This is an incredible difference in speed for only one less system call to write.
Is there some explanation for this?
You might be seeing the effects of the Nagle algorithm, though I'm not sure it is turned on for loopback interfaces.
If you can combine your two writes into a single one, you should always do that. No sense taking the overhead of multiple system calls if you can avoid it.
Okay, well I'm using TCP/IP (SOCK_STREAM) sockets. The example code is pretty straight forward. Here is a basic snippet that reproduces the problem. This doesn't include all the boiler plate setup code, error-checking, or ntohs code:
On the sending end:
// Send size
uint32_t size = strlen(buffer);
int res = write(sock, &size, sizeof(size));
// Send payload
res = write(sock, buffer, size);
And on the receiving end:
// Receive size
uint32_t size;
int res = read(sock, &size, sizeof(size));
// Receive payload
char* buffer = (char*) malloc(size);
read(sock, buffer, size);
Essentially, if I change the sending code by packing the size into the send buffer, and only making one call to write(), the performance increase is almost 1000x faster.
This is essentially the same question: C# socket abnormal latency .
In short, you'll want to use the TCP_NODELAY socket option. You can set it with setsockopt.
You don't give enough information to say for sure. You don't even say which protocol you're using.
Assuming TCP/IP, the socket could be configured to send a packet on every write, instead of buffering output in the kernel until the buffer is full or the socket is explicitly flushed. This means that TCP sends the two pieces of data in different fragments and has to defeagment them at the other end.
You might also be seeing the effect of the TCP slow-start algorithm. The first data sent is transmitted as part of the connection handshake. Then the TCP window size is slowly ramped up as more data is transmitted until it matches the rate at which the receiver can consume data. This is useful in long-lived connections but a big performance hit in short-lived ones. You can turn off slow-start by setting a socket option.
Have a look at the TCP_NODELAY and TCP_NOPUSH socket options.
An optimization you can use to avoid multiple system calls and fragmentation is scatter/gather I/O. Using the sendv or writev system call you can send the 4-byte size and variable sized buffer in a single syscall and both pieces of data will be sent in the same fragment by TCP.
The problem is that with the first call to send, the system has no idea the second call is coming, so it sends the data immediately. With the second call to send, the system has no idea a third call isn't coming, so it delays the data in hopes that it can combine the data with a subsequent call.
The correct fix is to use a 'gather' operation such as writev if your operating system supports it. Otherwise, allocate a buffer, copy the two chunks in, and make a single call to write. (Some operating systems have other solutions, for example Linux has a 'TCP cork' operation.)
It's not as important, but you should optimize your receiving code too. Call 'read' asking for as many bytes as possible and then parse them yourself. You're tying to teach the operating system your protocol, and that's not a good idea.

How to determine total data upload+download in TCP/IP

I need to calculate total data transfer while transferring a fixed size data from client to server in TCP/IP. It includes connecting to the server, sending request,header, receiving response, receiving data etc.
More precisely, how to get total data transfer while using POST and GET method?
Is there any formula for that? Even a theoretical one will do fine (not considering packet loss or connection retries etc)
FYI I tried RFC2616 and RFC1180. But those are going over my head.
Any suggestion?
Thanks in advance.
You can't know the total transfer size in advance, even ignoring retransmits. There are several things that will stop you:
TCP options are negotiated between the hosts when the connection is established. Some options (e.g., timestamp) add additional data to the TCP header
"total data transfer size" is not clear. Ethernet, for example, adds quite a few more bits on top of whatever IP used. 802.11 (wireless) will add even more. So do HDLC or PPP going over a T1. Don't even think about frame relay. Some links may use compression (which will reduce the total size). The total size depends on where you measure it, even for a single packet.
Assuming you're just interested in the total octet size at layer 2, and you know the TCP options that will be negotiated in advance, you still can't know the path MTU. Which may change, even while the connection is in progress. Or if you're not doing path MTU discovery (which would be wierd), then the packet may get fragmented somewhere, and the remote end will see a different amount of data transfer than you.
I'm not sure why you need to know this, but I suggest that:
If you just want an estimate, watch a typical connection in Wireshark. Calculate the percent overhead (vs. the size of data you gave to TCP, and received from TCP). Use that number to estimate: it will be close enough, except in pathological situations.
If you need to know for sure how much data your end saw transmitted and received, use libpcap to capture the packet stream and check.
i'd say on average that request and response have about 8 lines of headers each and about 30 chars per line. Then allow for the size increase of converting any uploaded binary to Base64.
You didn't say if you also want to count TCP packet headers, in which case you could assume an MTU of about 1500 so add 16 bytes (tcp header) per 1500 data bytes
Finally, you could always setup a packet sniffer and count actual bytes for a sample of data.
oh yeah, and you may need to allow for deflate/gzip encoding as well.

Resources