Why must endpoints manage conversions between bytes sent and received TSNs sent and received in SCTP congestion control? - sctp

As stated in RFC 3286:
"...endpoints must manage the conversion between bytes sent and received and TSNs sent and received, since TSN is per chunk rather than per byte".
How does this affect the congestion control algorithm?

There are two reasons:
1. Pragmatically, RFC 3286 refers RFC 2581 for most of the congestion control, and it works in bytes.
2. Practically, and this is a stronger reason, there needs to be a buffer assigned at each end and these would be hard to define in terms of TSNs (chunks) since these are variably size. This would either mean over-allocating space in the buffer e.g. 64K * TSNs, or using a dynamically allocated list. The former is wasteful of space, the latter relatively slow.
Does this answer your question, or was it more related to your last question?

Related

Does SCTP really prevent head-of-line blocking?

I've known about SCTP for a decade or so, and although I never got to use it yet, I've always wanted to, because of some of its promising (purported) features:
multi-homing
multiplexing w/o head-of-line blocking
mixed order/unordered delivery on the same connection (aka association)
no TIME_WAIT
no SYN flooding
A Comparison between QUIC and SCTP however claims
SCTP intended to get rid of HOL-Blocking by substreams, but its
Transmission Sequence Number (TSN) couples together the transmission
of all data chunks. [...] As a result, in SCTP if a packet is lost,
all the packets with TSN after this lost packet cannot be received
until it is retransmitted.
That statement surprised me because:
removing head-of-line blocking is a stated goal of SCTP
SCTP does have a per-stream sequence number, see below quote from RFC 4960, which should allow processing per stream, regardless of the association-global TSN
SCTP has been in use in the telecommunications sector for perhaps close to 2 decades, so how could this have been missed?
Internally, SCTP assigns a Stream Sequence Number to each message
passed to it by the SCTP user. On the receiving side, SCTP ensures
that messages are delivered to the SCTP user in sequence within a
given stream. However, while one stream may be blocked waiting for
the next in-sequence user message, delivery from other streams may
proceed.
Also, there is a paper Head-of-line Blocking in TCP and SCTP: Analysis and Measurements that actually measures round-trip time of a multiplexed echo service in the face of package loss and concludes:
Our results reveal that [..] a small number of SCTP streams or SCTP unordered mode can avoid this head-of-line blocking. The alternative solution of multiple TCP connections performs worse in most cases.
The answer is not very scholarly, but at least according to the specification in RFC 4960, SCTP seems capable of circumventing head-of-line blocking. The relevant claim seems to be in Section 7.1.
Note: TCP guarantees in-sequence delivery of data to its upper-layer protocol within a single TCP session. This means that when TCP notices a gap in the received sequence number, it waits until the gap is filled before delivering the data that was received with sequence numbers higher than that of the missing data. On the other hand, SCTP can deliver data to its upper-layer protocol even if there is a gap in TSN if the Stream Sequence Numbers are in sequence for a particular stream (i.e., the missing DATA chunks are for a different stream) or if unordered delivery is indicated. Although this does not affect cwnd, it might affect rwnd calculation.
A dilemma is what does "are in sequence for a particular stream" entail? There is some stipulation about delaying delivery to the upper layer until packages are reordered (see Section 6.6, below), but reordering doesn't seem to be conditioned by filling the gaps at the level of the association. Also note the mention in Section 6.2 on the complex distinction between ACK and delivery to the ULP (Upper Layer Protocol).
Whether other stipulations of the RFC indirectly result in the occurence of HOL, and whether it is effective in real-life implementations and situations - these questions warrant further investigation.
Below are some of the excerpts which I've come across in the RFC and which may be relevant.
RFC 4960, Section 6.2 Acknowledgement on Reception of DATA Chunks
When the receiver's advertised window is 0, the receiver MUST drop any new incoming DATA chunk with a TSN larger than the largest TSN received so far. If the new incoming DATA chunk holds a TSN value less than the largest TSN received so far, then the receiver SHOULD drop the largest TSN held for reordering and accept the new incoming DATA chunk. In either case, if such a DATA chunk is dropped, the receiver MUST immediately send back a SACK with the current receive window showing only DATA chunks received and accepted so far. The dropped DATA chunk(s) MUST NOT be included in the SACK, as they were not accepted.
Under certain circumstances, the data receiver may need to drop DATA chunks that it has received but hasn't released from its receive buffers (i.e., delivered to the ULP). These DATA chunks may have been acked in Gap Ack Blocks. For example, the data receiver may be holding data in its receive buffers while reassembling a fragmented user message from its peer when it runs out of receive buffer space. It may drop these DATA chunks even though it has acknowledged them in Gap Ack Blocks. If a data receiver drops DATA chunks, it MUST NOT include them in Gap Ack Blocks in subsequent SACKs until they are received again via retransmission. In addition, the endpoint should take into account the dropped data when calculating its a_rwnd.
Circumstances which highlight how senders may receive acknowledgement for chunks which are ultimately not delivered to the ULP (Upper Layer Protocol).Note this applies to chunks with TSN higher than the Cumulative TSN (i.e. from Gap Ack Blocks). This together with unreliability of SACK order represent good reasons for the stipulation in Section 7.1 (see below).
RFC 4960, Section 6.6 Ordered and Unordered Delivery
Within a stream, an endpoint MUST deliver DATA chunks received with the U flag set to 0 to the upper layer according to the order of their Stream Sequence Number. If DATA chunks arrive out of order of their Stream Sequence Number, the endpoint MUST hold the received DATA chunks from delivery to the ULP until they are reordered.
This is the only stipulation on ordered delivery within a stream in this section; seemingly, reordering does not depend on filling the gaps in ACK-ed chunks.
RFC 4960, Section 7.1 SCTP Differences from TCP Congestion Control
Gap Ack Blocks in the SCTP SACK carry the same semantic meaning as the TCP SACK. TCP considers the information carried in the SACK as advisory information only. SCTP considers the information carried in the Gap Ack Blocks in the SACK chunk as advisory. In SCTP, any DATA chunk that has been acknowledged by SACK, including DATA that arrived at the receiving end out of order, is not considered fully delivered until the Cumulative TSN Ack Point passes the TSN of the DATA chunk (i.e., the DATA chunk has been acknowledged by the Cumulative TSN Ack field in the SACK).
This is stated from the perspective of the sending endpoint, and is accurate for the reason emphasized in section 6.6 above.
Note: TCP guarantees in-sequence delivery of data to its upper-layer protocol within a single TCP session. This means that when TCP notices a gap in the received sequence number, it waits until the gap is filled before delivering the data that was received with sequence numbers higher than that of the missing data. On the other hand, SCTP can deliver data to its upper-layer protocol even if there is a gap in TSN if the Stream Sequence Numbers are in sequence for a particular stream (i.e., the missing DATA chunks are for a different stream) or if unordered delivery is indicated. Although this does not affect cwnd, it might affect rwnd calculation.
This seems to be the core answer to what interests you.
In support of this argument, the format of the SCTP SACK chunk as exposed here and here.

How do I receive arbitrary length data using a UdpSocket?

I am writing an application which sends and receives packages using UDP. However, the documentation of recv_from states:
If a message is too long to fit in the supplied buffer, excess bytes may be discarded.
Is there any way to receive all bytes and write them into a vector? Do I really have to allocate an array with the maximum packet length (which, as far as I know, is 65,507 bytes for IPv4) in order to be sure to receive all data? That seems a bit much for me.
Check out the next method in the docs, UdpSocket::peek_from (emphasis mine):
Receives a single datagram message on the socket, without removing it from the queue.
You can use this method to read a known fixed amount of data, such as a header which contains the length of the entire packet. You can use crates like byteorder to decode the appropriate part of the header, use that to allocate exactly the right amount of space, then call recv_from.
This does require that the protocol you are implementing always provides that total size information at a known location.
Now, is this a good idea?
As ArtemGr states:
Because extra system calls are much more expensive than getting some space from the stack.
And from the linked question:
Obviously at some point you will start wondering if doubling the number of system calls to save memory is worth it. I think it isn't.
With the recent Spectre / Meltdown events, now's a pretty good time to be be reminded to avoid extra syscalls.
You could, as suggested, just allocate a "big enough" array ahead of time. You'll need to track how many bytes you've actually read vs allocated though. I recommend something like arrayvec to make it easier.
You could instead implement a pool of pre-allocated buffers on the heap. When you read from the socket, you use a buffer or create a new one. When you are done with the buffer, you put it back in the pool for reuse. That way, you incur the memory allocation once and are only passing around small Vecs on the stack.
See also:
How can I create a stack-allocated vector-like container?
How large should my recv buffer be when calling recv in the socket library
How to read UDP packet with variable length in C

File Transfer using winsock

I want to send files(text or binary) through winsock,I have a buffer with 32768 byte size, In the other side the buffer size is same,But when the packet size <32768 then i don't know how determine the end of packet in buffer,Also with binary file it seems mark the end of packet with a unique character is not possible,Any solution there?
thx
With fixed-size "packets," we would usually that every packet except the last will be completely full of valid data. Only the last one will be "partial," and if the recipient knows how many bytes to expect (because, using Davita's suggestion, the sender told it the file size in advance), then that's no problem. The recipient can simply ignore the remainder of the last packet.
But your further description makes it sound like there may be multiple partially full packets associated with a single file transmission. There is a similarly easy solution to that: Prefix each packet with the number of valid bytes.
You later mention TCustomWinSocket.ReceiveText, and you wonder how it knows how much text to read, and then you quote the answer, which is that it calls ReceiveBuf(Pointer(nul)^, -1)) to set the length of the result buffer before filling it. Perhaps you just didn't understand what that code is doing. It's easier to understand if you look at that same code in another context, the ReceiveLength method. It makes that same call to ReceiveBuf, indicating that when you pass -1 to ReceiveBuf, it returns the number of bytes it received.
In order for that to work for your purposes, you cannot send fixed-size packets. If you always send 32KB packets, and just pad the end with zeroes, then ReceiveLength will always return 32768, and you'll have to combine Davita's and my solutions of sending file and packet lengths along with the payload. But if you ensure that every byte in your packet is always valid, then the recipient can know how much to save based on the size of the packet.
One way or another, you need to make sure the sender provides the recipient with the information it needs to do its job. If the sender sends garbage without giving the recipient a way to distinguish garbage from valid data, then you're stuck.
Well, you can always send file size before you start file transfer, so you'll know when to stop writing to file.

Is transmitted bytes event exist in Linux kernel?

I need to write a rate limiter, that will perform some stuff each time X bytes were transmitted.
The straightforward is to check the length of each transmitted packet, but I think it will be to slow for me.
Is there a way to use some kind of network event, that will be triggered by transmitted packets/bytes?
I think you may look at netfilter.
Using its (kernel level) api, you can have your custom code triggered by network events, modify received messages before passing it to application, and so on.
http://www.netfilter.org/
It's protocol dependent, actually. But for TCP, you can setsockopt the SO_RCVLOWAT option to define the minimum number of bytes (watermark) to permit the read operation.
If you need to enforce the maximum size too, adjust the receive buffer size using SO_RCVBUF.

Writing a stream protocol: Message size field or Message delimiter?

I am about to write a message protocol going over a TCP stream. The receiver needs to know where the message boundaries are.
I can either send 1) fixed length messages, 2) size fields so the receiver knows how big the message is, or 3) a unique message terminator (I guess this can't be used anywhere else in the message).
I won't use #1 for efficiency reasons.
I like #2 but is it possible for the stream to get out of sync?
I don't like idea #3 because it means receiver can't know the size of the message ahead of time and also requires that the terminator doesn't appear elsewhere in the message.
With #2, if it's possible to get out of sync, can I add a terminator or am I guaranteed to never get out of sync as long as the sender program is correct in what it sends? Is it necessary to do #2 AND #3?
Please let me know.
Thanks,
jbu
You are using TCP, the packet delivery is reliable. So the connection either drops, timeouts or you will read the whole message.
So option #2 is ok.
I agree with sigjuice.
If you have a size field, it's not necessary to add and end-of-message delimiter --
however, it's a good idea.
Having both makes things much more robust and easier to debug.
Consider using the standard netstring format, which includes both a size field and also a end-of-string character.
Because it has a size field, it's OK for the end-of-string character to be used inside the message.
If you are developing both the transmit and receive code from scratch, it wouldn't hurt to use both length headers and delimiters. This would provide robustness and error detection. Consider the case where you just use #2. If you write a length field of N to the TCP stream, but end up sending a message which is of a size different from N, the receiving end wouldn't know any better and end up confused.
If you use both #2 and #3, while not foolproof, the receiver can have a greater degree of confidence that it received the message correctly if it encounters the delimiter after consuming N bytes from the TCP stream. You can also safely use the delimiter inside your message.
Take a look at HTTP Chunked Transfer Coding for a real world example of using both #2 and #3.
Depending on the level at which you're working, #2 may actually not have an issues with going out of sync (TCP has sequence numbering in the packets, and does reassemble the stream in correct order for you if it arrives out of order).
Thus, #2 is probably your best bet. In addition, knowing the message size early on in the transmission will make it easier to allocate memory on the receiving end.
Interesting there is no clear answer here. #2 is generally safe over TCP, and is done "in the real world" quite often. This is because TCP guarantees that all data arrives both uncorrupted* and in the order that it was sent.
*Unless corrupted in such a way that the TCP checksum still passes.
Answering to old message since there is stuff to correnct:
Unlike many answers here claim, TCP does not guarantee data to arrive uncorrupted. Not even practically.
TCP protocol has a 2-byte crc-checksum that obviously has a 1:65536 chance of collision if more than one bit flips. This is such a small chance it will never be encountered in tests, but if you are developing something that either transmits large amounts of data and/or is used by very many end users, that dice gets thrown trillions of times (not kidding, youtube throws it about 30 times a second per user.)
Option 2: size field is the only practical option for the reasons you yourself listed. Fixed length messages would be wasteful, and delimiter marks necessitate running the entire payload through some sort of encoding-decoding stage to replace at least three different symbols: start-symbol, end-symbol, and the replacement-symbol that signals replacement has occurred.
In addition to this one will most likely want to use some sort of error checking with a serious checksum. Probably implemented in tandem with the encryption protocol as a message validity check.
As to the possibility of getting out of sync:
This is possible per message, but has a remedy.
A useful scheme is to start each message with a header. This header can be quite short (<30 bytes) and contain the message payload length, eventual correct checksum of the payload, and a checksum for that first portion of the header itself. Messages will also have a maximum length. Such a short header can also be delimited with known symbols.
Now the receiving end will always be in one of two states:
Waiting for new message header to arrive
Receiving more data to an ongoing message, whose length and checksum are known.
This way the receiver will in any situation get out of sync for at most the maximum length of one message. (Assuming there was a corrupted header with corruption in message length field)
With this scheme all messages arrive as discrete payloads, the receiver cannot get stuck forever even with maliciously corrupted data in between, the length of arriving payloads is know in advance, and a successfully transmitted payload has been verified by an additional longer checksum, and that checksum itself has been verified. The overhead for all this can be a mere 26 byte header containing three 64-bit fields, and two delimiting symbols.
(The header does not require replacement-encoding since it is expected only in a state whout ongoing message, and the entire 26 bytes can be processed at once)
There is a fourth alternative: a self-describing protocol such as XML.

Resources