Writing a stream protocol: Message size field or Message delimiter? - stream

I am about to write a message protocol going over a TCP stream. The receiver needs to know where the message boundaries are.
I can either send 1) fixed length messages, 2) size fields so the receiver knows how big the message is, or 3) a unique message terminator (I guess this can't be used anywhere else in the message).
I won't use #1 for efficiency reasons.
I like #2 but is it possible for the stream to get out of sync?
I don't like idea #3 because it means receiver can't know the size of the message ahead of time and also requires that the terminator doesn't appear elsewhere in the message.
With #2, if it's possible to get out of sync, can I add a terminator or am I guaranteed to never get out of sync as long as the sender program is correct in what it sends? Is it necessary to do #2 AND #3?
Please let me know.
Thanks,
jbu

You are using TCP, the packet delivery is reliable. So the connection either drops, timeouts or you will read the whole message.
So option #2 is ok.

I agree with sigjuice.
If you have a size field, it's not necessary to add and end-of-message delimiter --
however, it's a good idea.
Having both makes things much more robust and easier to debug.
Consider using the standard netstring format, which includes both a size field and also a end-of-string character.
Because it has a size field, it's OK for the end-of-string character to be used inside the message.

If you are developing both the transmit and receive code from scratch, it wouldn't hurt to use both length headers and delimiters. This would provide robustness and error detection. Consider the case where you just use #2. If you write a length field of N to the TCP stream, but end up sending a message which is of a size different from N, the receiving end wouldn't know any better and end up confused.
If you use both #2 and #3, while not foolproof, the receiver can have a greater degree of confidence that it received the message correctly if it encounters the delimiter after consuming N bytes from the TCP stream. You can also safely use the delimiter inside your message.
Take a look at HTTP Chunked Transfer Coding for a real world example of using both #2 and #3.

Depending on the level at which you're working, #2 may actually not have an issues with going out of sync (TCP has sequence numbering in the packets, and does reassemble the stream in correct order for you if it arrives out of order).
Thus, #2 is probably your best bet. In addition, knowing the message size early on in the transmission will make it easier to allocate memory on the receiving end.

Interesting there is no clear answer here. #2 is generally safe over TCP, and is done "in the real world" quite often. This is because TCP guarantees that all data arrives both uncorrupted* and in the order that it was sent.
*Unless corrupted in such a way that the TCP checksum still passes.

Answering to old message since there is stuff to correnct:
Unlike many answers here claim, TCP does not guarantee data to arrive uncorrupted. Not even practically.
TCP protocol has a 2-byte crc-checksum that obviously has a 1:65536 chance of collision if more than one bit flips. This is such a small chance it will never be encountered in tests, but if you are developing something that either transmits large amounts of data and/or is used by very many end users, that dice gets thrown trillions of times (not kidding, youtube throws it about 30 times a second per user.)
Option 2: size field is the only practical option for the reasons you yourself listed. Fixed length messages would be wasteful, and delimiter marks necessitate running the entire payload through some sort of encoding-decoding stage to replace at least three different symbols: start-symbol, end-symbol, and the replacement-symbol that signals replacement has occurred.
In addition to this one will most likely want to use some sort of error checking with a serious checksum. Probably implemented in tandem with the encryption protocol as a message validity check.
As to the possibility of getting out of sync:
This is possible per message, but has a remedy.
A useful scheme is to start each message with a header. This header can be quite short (<30 bytes) and contain the message payload length, eventual correct checksum of the payload, and a checksum for that first portion of the header itself. Messages will also have a maximum length. Such a short header can also be delimited with known symbols.
Now the receiving end will always be in one of two states:
Waiting for new message header to arrive
Receiving more data to an ongoing message, whose length and checksum are known.
This way the receiver will in any situation get out of sync for at most the maximum length of one message. (Assuming there was a corrupted header with corruption in message length field)
With this scheme all messages arrive as discrete payloads, the receiver cannot get stuck forever even with maliciously corrupted data in between, the length of arriving payloads is know in advance, and a successfully transmitted payload has been verified by an additional longer checksum, and that checksum itself has been verified. The overhead for all this can be a mere 26 byte header containing three 64-bit fields, and two delimiting symbols.
(The header does not require replacement-encoding since it is expected only in a state whout ongoing message, and the entire 26 bytes can be processed at once)

There is a fourth alternative: a self-describing protocol such as XML.

Related

What does CBATTError Code insufficientResources really mean?

I'm trying to send data over BLE from my iPhone to an ESP32 board. I'm developing in flutter platform and I'm using flutter_reactive_ble library.
My iPhone can connect to the other device and it can also send 1 byte using writeCharacterisiticWithResponse function. But when I try to send my real data which is large (>7000 bytes), it then gives me the error:
flutter: Error occured when writing 9f714672-888c-4450-845f-602c1331cdeb :
Exception: GenericFailure<WriteCharacteristicFailure>(
code: WriteCharacteristicFailure.unknown,
message: "Error Domain=CBATTErrorDomain Code=17
"Resources are insufficient."
UserInfo={NSLocalizedDescription=Resources are insufficient.}")
I tried searching for this error but didn't find additional info, even in Apple Developer website. It just says:
Resources are insufficient to complete the ATT request.
What does this error really means? Which resources are not sufficient and how to work around this problem?
This is almost certainly larger than this characteristic's maximum value length (which is probably on the order of 10s of bytes, not 1000s of bytes). Before writing, you need to call maximumWriteValueLength(for:) to see how much data can be written. If you're trying to send serial data over a characteristic (which is common, but not really what they were designed for), you'll need to break your data up into chunks and reassemble them on the peripheral. You will likely either need an "end" indicator of some kind, or you will need to send the length of the payload first so that the receiver knows how much to excpect.
First of all, a characteristic value cannot be larger than 512 bytes. This is set by the ATT standard (Bluetooth Core Specification v5.3, Vol 3, Part F (ATT), section 3.2.9). This number has been set arbitrarily by the protocol designers and does not map to any technical limitation of the protocol.
So, don't send 7000 bytes in a single write. You need to keep it at most 512 to be standard compliant.
If you say that it works with another Bluetooth stack running on the GATT server, then I guess CoreBluetooth does not enforce/check the maximum length of 512 bytes on the client side (I haven't tested). Therefore I also guess the error code you see was sent by the remote device rather than by CoreBluetooth locally as a pre-check.
There are three different common ways of writing a characteristic on the protocol level (Bluetooth Core Specification v5.3, Vol 3, Part G (GATT), section 4.9 Characteristic Value Write):
Write Without Response (4.9.1)
Write Characteristic Value (4.9.3)
Write Long Characteristic Values (4.9.4)
Number one is unidirectional and does not result in a response packet. It uses a single ATT_WRITE_CMD packet where the value must be at most ATT_MTU-3 bytes in length. This length can be retrieved using maximumWriteValueLength(for:) with .withoutResponse. The requestMtu method in flutter_reactive_ble uses this method internally. If you execute many writes of this type rapidly, be sure to add flow control to avoid CoreBluetooth dropping outgoing packets before they are sent. This can be done through peripheralIsReadyToSendWriteWithoutResponse by simply always waiting for this callback after each write, before you write the next packet. Unfortunately, it seems flutter_reactive_ble does not implement this flow control mechanism.
Number two uses a single ATT_WRITE_REQ where the value must be at most ATT_MTU-3 bytes in length, just as above. Use the same approach as above to retrieve that maximum length (note that maximumWriteValueLength with .withResponse always returns 512 and is not what you want). Here however, either an ATT_WRITE_RSP will be returned on success or an error packet will be received with an error code. Only one ATT transaction can be outstanding at a time, which significantly lowers throughput compared to Write Without Response.
Number three uses a sequence of multiple ATT_PREPARE_WRITE_REQ packets (containing offset and value) followed by an ATT_EXECUTE_WRITE_REQ. The maximum length of the value in each each chunk is ATT_MTU-5. Each _REQ packet also requires a corresponding _RSP packet before it can continue (alternatively, an error code could be sent by the remote device). This approach is used when the characteristic value to be written is too long to be sent using a single ATT_WRITE_REQ.
For any of the above write methods, you are always also limited by the maximum attribute size of 512 bytes as per the specification.
Any Bluetooth stack I know of transparently chooses between "Write Characteristic Value" and "Write Long Characteristic Values" when you tell it to write with response, depending on the value length and MTU. Server side it's a bit different. Some stacks put the burden on the user to combine all packets but it seems nimble handles that on its own. From what I can see in the source code (https://github.com/apache/mynewt-nimble/blob/26ccb8af1f3ea6ad81d5d7cbb762747c6e06a24b/nimble/host/src/ble_att_svr.c#L2099) it can return the "Insufficient Resources" error code when it tries to allocate memory but fails (most likely due to too much buffered data). This is what might happen for you. To answer your first actual question, the standard itself does not say anything else about this error code than simply "Insufficient Resources to complete the request".
The error has nothing to do with LE Data Length extension, which is simply an optimization for a lower layer (BLE Link Layer) that does not affect the functionality of the host stack. The L2CAP layer will take care of the reassembling of smaller link layer packets if necessary, and must always support up to the negotiated MTU without overflowing any buffers.
Now, to answer your second question, if you send very large amounts of data (7000 bytes), you must divide the data in multiple chunks and come up with a way to correctly be able to combine them. Each chunk is written as a full characteristic value. When you do this, be sure to send values at most of size ATT_MTU-3 (but never larger than 512 bytes), to avoid the inefficient overheads of "Write Long Characteristic Values". It's then up to your application code to make sure you don't run out of memory in case too much data is sent.

How to understand which runicast message you have succesfully transmitted in Contiki (Rime)?

After that I send different runicast messages with the function runicast_send, how can I understand which message was acknowledged when the callback sent_runicast is triggered?
The runicast.h file states:
The runicast primitive adds two packet attributes: the single-hop
packet type and the single-hop packet ID. The runicast primitive
uses the packet ID attribute as a sequence number for matching
acknowledgement packets to the corresponding data packets.
but I didn't understand how to do it in practice. Can somebody provide an example?
One way would be to look at the field sndnxt of struct runicast_conn *c before you send the packet, and then compare that value of the packetbuf_attr(PACKETBUF_ATTR_PACKET_ID) in the "sent" callback of your code.
However note that by default the runicast packet ID is just 2 bits long. Enough to demultiplex the ACK in most cases, but may be insufficient for your purposes. (The packet ID size in bits can be changed by redefining RUNICAST_PACKET_ID_BITS.)
Also Rime is obsolete. Don't use it in your code, especially production code unless you know what you're doing. runicast was never one of the highlights of Rime, I doubt there are no better alternatives (e.g. the uIPv6 stack) for what you want to do.

how to detect XMIT FIFO is full on a UART 16550 or higher

I have read already lot of specs and code about UART, but I cannot find any indication on how to find by software interface if the transmit FIFO is full. There is an interrupt when the FIFO is empty. Then I can write at least N characters, where N is the fifo size. But when I have written these N characters, a number of them have already been sent. So I can in fact write more than N characters, but there is no FIFO full interrupt. The specs says that when the fifo is full indeed the TXREADY pin on the chip is inverted. Is there a way to find this by software ? The Line Status Register bit only says that the fifo is not empty, which does not mean it is full...
Anyone can help ? I want to write characters until the fifo is full...
Looks to me also that they neglected this, but most people get by with the thing as it is. The usual way to use it is to get an interrupt, fill the FIFO (normally very fast compared to serial data rate) and then return.
There is a situation where it seems to me that what you are asking for could be nice...if transmitting in a polling mode...you want to send 10 bytes, your polling shows the FIFO is not empty, so you have not way to know if you can send them all or not...either you wait there until it is empty, which sort of defeats the purpose of the FIFO, or you continue polling other stuff until you get back to checking for FIFO empty, and maybe that slows your overall transmission rate. Guess it is not a very usual way to operate, so nobody worries about it.
The 16550D datasheet says the following:
The transmitter holding register interrupt (02) occurs when the XMIT
FIFO is empty; it is cleared as soon as the transmitter holding
register is written to (1 to 16 characters may be written to the XMIT
FIFO while servicing this interrupt) or the IIR is read.
This means that when the Line Status Register register (port base + 5) indicates Transmitter Empty condition (in bit 5), the transmit FIFO is completely empty and you may write up to 16 bytes to the transmitter holding register (port base + 0). It is important not to write more than 16 bytes between occurrences of the transmitter empty bit being set.
If you don't need to write 16 bytes at the point when you received the IRQ (or saw the transmitter register empty bit set, if polling), you can either keep track of how many bytes you wrote since the last transmitter empty state, or, just defer writing further bytes until the next transmitter empty state.

File Transfer using winsock

I want to send files(text or binary) through winsock,I have a buffer with 32768 byte size, In the other side the buffer size is same,But when the packet size <32768 then i don't know how determine the end of packet in buffer,Also with binary file it seems mark the end of packet with a unique character is not possible,Any solution there?
thx
With fixed-size "packets," we would usually that every packet except the last will be completely full of valid data. Only the last one will be "partial," and if the recipient knows how many bytes to expect (because, using Davita's suggestion, the sender told it the file size in advance), then that's no problem. The recipient can simply ignore the remainder of the last packet.
But your further description makes it sound like there may be multiple partially full packets associated with a single file transmission. There is a similarly easy solution to that: Prefix each packet with the number of valid bytes.
You later mention TCustomWinSocket.ReceiveText, and you wonder how it knows how much text to read, and then you quote the answer, which is that it calls ReceiveBuf(Pointer(nul)^, -1)) to set the length of the result buffer before filling it. Perhaps you just didn't understand what that code is doing. It's easier to understand if you look at that same code in another context, the ReceiveLength method. It makes that same call to ReceiveBuf, indicating that when you pass -1 to ReceiveBuf, it returns the number of bytes it received.
In order for that to work for your purposes, you cannot send fixed-size packets. If you always send 32KB packets, and just pad the end with zeroes, then ReceiveLength will always return 32768, and you'll have to combine Davita's and my solutions of sending file and packet lengths along with the payload. But if you ensure that every byte in your packet is always valid, then the recipient can know how much to save based on the size of the packet.
One way or another, you need to make sure the sender provides the recipient with the information it needs to do its job. If the sender sends garbage without giving the recipient a way to distinguish garbage from valid data, then you're stuck.
Well, you can always send file size before you start file transfer, so you'll know when to stop writing to file.

Using PARSE on a PORT! value

I tried using PARSE on a PORT! and it does not work:
>> parse open %test-data.r [to end]
** Script error: parse does not allow port! for its input argument
Of course, it works if you read the data in:
>> parse read open %test-data.r [to end]
== true
...but it seems it would be useful to be able to use PARSE on large files without first loading them into memory.
Is there a reason why PARSE couldn't work on a PORT! ... or is it merely not implemented yet?
the easy answer is no we can't...
The way parse works, it may need to roll-back to a prior part of the input string, which might in fact be the head of the complete input, when it meets the last character of the stream.
ports copy their data to a string buffer as they get their input from a port, so in fact, there is never any "prior" string for parse to roll-back to. its like quantum physics... just looking at it, its not there anymore.
But as you know in rebol... no isn't an answer. ;-)
This being said, there is a way to parse data from a port as its being grabbed, but its a bit more work.
what you do is use a buffer, and
APPEND buffer COPY/part connection amount
Depending on your data, amount could be 1 byte or 1kb, use what makes sense.
Once the new input is added to your buffer, parse it and add logic to know if you matched part of that buffer.
If something positively matched, you remove/part what matched from the buffer, and continue parsing until nothing parses.
you then repeat above until you reach the end of input.
I've used this in a real-time EDI tcp server which has an "always on" tcp port in order to break up a (potentially) continuous stream of input data, which actually piggy-backs messages end to end.
details
The best way to setup this system is to use /no-wait and loop until the port closes (you receive none instead of "").
Also make sure you have a way of checking for data integrity problems (like a skipped byte, or erroneous message) when you are parsing, otherwise, you will never reach the end.
In my system, when the buffer was beyond a specific size, I tried an alternate rule which skipped bytes until a pattern might be found further down the stream. If one was found, an error was logged, the partial message stored and a alert raised for sysadmin to sort out the message.
HTH !
I think that Maxim's answer is good enough. At this moment the parse on port is not implemented. I don't think it's impossible to implement it later, but we must solve other issues first.
Also as Maxim says, you can do it even now, but it very depends what exactly you want to do.
You can parse large files without need to read them completely to the memory, for sure. It's always good to know, what you expect to parse. For example all large files, like files for music and video, are divided into chunks, so you can just use copy|seek to get these chunks and parse them.
Or if you want to get just titles of multiple web pages, you can just read, let's say, first 1024 bytes and look for the title tag here, if it fails, read more bytes and try it again...
That's exactly what must be done to allow parse on port natively anyway.
And feel free to add a WISH in the CureCode database: http://curecode.org/rebol3/

Resources