File Transfer using winsock - delphi

I want to send files(text or binary) through winsock,I have a buffer with 32768 byte size, In the other side the buffer size is same,But when the packet size <32768 then i don't know how determine the end of packet in buffer,Also with binary file it seems mark the end of packet with a unique character is not possible,Any solution there?
thx

With fixed-size "packets," we would usually that every packet except the last will be completely full of valid data. Only the last one will be "partial," and if the recipient knows how many bytes to expect (because, using Davita's suggestion, the sender told it the file size in advance), then that's no problem. The recipient can simply ignore the remainder of the last packet.
But your further description makes it sound like there may be multiple partially full packets associated with a single file transmission. There is a similarly easy solution to that: Prefix each packet with the number of valid bytes.
You later mention TCustomWinSocket.ReceiveText, and you wonder how it knows how much text to read, and then you quote the answer, which is that it calls ReceiveBuf(Pointer(nul)^, -1)) to set the length of the result buffer before filling it. Perhaps you just didn't understand what that code is doing. It's easier to understand if you look at that same code in another context, the ReceiveLength method. It makes that same call to ReceiveBuf, indicating that when you pass -1 to ReceiveBuf, it returns the number of bytes it received.
In order for that to work for your purposes, you cannot send fixed-size packets. If you always send 32KB packets, and just pad the end with zeroes, then ReceiveLength will always return 32768, and you'll have to combine Davita's and my solutions of sending file and packet lengths along with the payload. But if you ensure that every byte in your packet is always valid, then the recipient can know how much to save based on the size of the packet.
One way or another, you need to make sure the sender provides the recipient with the information it needs to do its job. If the sender sends garbage without giving the recipient a way to distinguish garbage from valid data, then you're stuck.

Well, you can always send file size before you start file transfer, so you'll know when to stop writing to file.

Related

Read Timeout TIdTCPClient

Good day. I use the TIdTCPClient component to send requests to the server and read the response. I know the size of the response for certain requests, but not for others.
When I know the size of the response, then my data reading code looks like this:
IdTCPClient1->Socket->Write(requestBuffer);
IdTCPClient1->Socket->ReadBytes(answerBuffer, expectSize);
When the size of the response is not known to me, then I use this code:
IdTCPClient1->Socket->Write(requestBuffer);
IdTCPClient1->Socket->ReadBytes(answerBuffer, -1);
In both cases, I ran into problems.
In the first case, if the server does not return all the data (less than expectSize), then IdTCPClient1 will wait for ReadTimeout to finish, but there will be no data at all in the answerBuffer (even if the server sent something). Is this the logic behind TIdTCPClient? It is right?
In the second case, ReadTimeout does not work at all. That is, the ReadBytes function ends immediately and nothing is written to the answerBuffer, or several bytes from the server are written. However, I expected that since this function in this case does not know the number of bytes to read, it must wait for ReadTimeout and read the bytes, who came during this time. For the experiment, I inserted Sleep (500) between writing and reading, and then I read all the data that arrived.
May I ask you to answer why this is happening?
Good day. I use the TIdTCPClient component to send requests to the server and read the response. I know the size of the response for certain requests, but not for others.
Why do you not know the size of all of the responses? What does your protocol actually look like? TCP is a byte stream, each message MUST be framed in such a way that a receiver can know where each message begins and ends in order to read the messages correctly and preserve the integrity of the stream. As such, messages MUST either include their size in their payload, or be uniquely delimited between messages. So, which is the case in your situation? It doesn't sound like you are handling either possibility.
When the size of the response is not known to me, then I use this code:
IdTCPClient1->Socket->Write(requestBuffer);
IdTCPClient1->Socket->ReadBytes(answerBuffer, -1);
When you set AByteCount to -1, that tells ReadBytes() to return whatever bytes are currently available in the IOHandler's InputBuffer. If the InputBuffer is empty, ReadBytes() waits, up to the ReadTimeout interval, for at least 1 byte to arrive, and then it returns whatever bytes were actually received into the InputBuffer, up to the maximum specified by the IOHandler's RecvBufferSize. So it may still take multiple reads to read an entire message in full.
In general, you should NEVER set AByteCount to -1 when dealing with an actual protocol. -1 is good to use only when proxying/streaming arbitrary data, where you don't care what the bytes actually are. Any other use require knowledge of the protocol's details of how messages are framed.
In the first case, if the server does not return all the data (less than expectSize), then IdTCPClient1 will wait for ReadTimeout to finish, but there will be no data at all in the answerBuffer (even if the server sent something). Is this the logic behind TIdTCPClient? It is right?
Yes. When AByteCount is > 0, ReadBytes() waits for the specified number of bytes to be available in the InputBuffer before then extracting that many bytes into your output TIdBytes. Your answerBuffer will not be modified unless all of the requested bytes are available. If the ReadTimeout elapses, an EIdReadTimeout exception is raised, and your answerBuffer is left untouched.
If that is not the behavior you want, then consider using ReadStream() instead of ReadBytes(), using a TIdMemoryBufferStream or TBytesStream to read into.
In the second case, ReadTimeout does not work at all. That is, the ReadBytes function ends immediately and nothing is written to the answerBuffer.
I have never heard of ReadBytes() not waiting for the ReadTimeout. What you describe should only happen if there are no bytes available in the InputBuffer and the ReadTimeout is set to some very small value, like 0 msecs.
or several bytes from the server are written.
That is a perfectly reasonable outcome given you are asking ReadBytes() to read an arbitrary number of bytes between 1..RecvBufferSize, inclusive, or read no bytes if the timeout elapses.
However, I expected that since this function in this case does not know the number of bytes to read, it must wait for ReadTimeout and read the bytes, who came during this time.
That is how it should be working, yes. And how it has always worked. So I suggest you debug into ReadBytes() at runtime and find out why it is not working the way you are expecting. Also, make sure you are using an up-to-date version of Indy to begin with (or at least a version from the last few years).
Why do you not know the size of all of the responses?
Because, in fact, I'm doing a survey of an electronic device. This device has its own network IP address and port. So, the device can respond to the same request in different ways, depending on its status. Strictly speaking, there can be two answers to some queries and they have different lengths. It is in these cases, when reading, I specify AByteCount = -1 to read any device response.
I have never heard of ReadBytes() not waiting for the ReadTimeout.
You're right! I was wrong. When specifying AByteCount = -1, I get one byte. As you said, if at least one byte arrives, it returns and ReadBytes() ends.
Also, make sure you are using an up-to-date version of Indy to begin with (or at least a version from the last few years).
I am working with C++ Builder 10.3 Community Edition, Indy version 10.6.2.5366.

How do I receive arbitrary length data using a UdpSocket?

I am writing an application which sends and receives packages using UDP. However, the documentation of recv_from states:
If a message is too long to fit in the supplied buffer, excess bytes may be discarded.
Is there any way to receive all bytes and write them into a vector? Do I really have to allocate an array with the maximum packet length (which, as far as I know, is 65,507 bytes for IPv4) in order to be sure to receive all data? That seems a bit much for me.
Check out the next method in the docs, UdpSocket::peek_from (emphasis mine):
Receives a single datagram message on the socket, without removing it from the queue.
You can use this method to read a known fixed amount of data, such as a header which contains the length of the entire packet. You can use crates like byteorder to decode the appropriate part of the header, use that to allocate exactly the right amount of space, then call recv_from.
This does require that the protocol you are implementing always provides that total size information at a known location.
Now, is this a good idea?
As ArtemGr states:
Because extra system calls are much more expensive than getting some space from the stack.
And from the linked question:
Obviously at some point you will start wondering if doubling the number of system calls to save memory is worth it. I think it isn't.
With the recent Spectre / Meltdown events, now's a pretty good time to be be reminded to avoid extra syscalls.
You could, as suggested, just allocate a "big enough" array ahead of time. You'll need to track how many bytes you've actually read vs allocated though. I recommend something like arrayvec to make it easier.
You could instead implement a pool of pre-allocated buffers on the heap. When you read from the socket, you use a buffer or create a new one. When you are done with the buffer, you put it back in the pool for reuse. That way, you incur the memory allocation once and are only passing around small Vecs on the stack.
See also:
How can I create a stack-allocated vector-like container?
How large should my recv buffer be when calling recv in the socket library
How to read UDP packet with variable length in C

How to understand which runicast message you have succesfully transmitted in Contiki (Rime)?

After that I send different runicast messages with the function runicast_send, how can I understand which message was acknowledged when the callback sent_runicast is triggered?
The runicast.h file states:
The runicast primitive adds two packet attributes: the single-hop
packet type and the single-hop packet ID. The runicast primitive
uses the packet ID attribute as a sequence number for matching
acknowledgement packets to the corresponding data packets.
but I didn't understand how to do it in practice. Can somebody provide an example?
One way would be to look at the field sndnxt of struct runicast_conn *c before you send the packet, and then compare that value of the packetbuf_attr(PACKETBUF_ATTR_PACKET_ID) in the "sent" callback of your code.
However note that by default the runicast packet ID is just 2 bits long. Enough to demultiplex the ACK in most cases, but may be insufficient for your purposes. (The packet ID size in bits can be changed by redefining RUNICAST_PACKET_ID_BITS.)
Also Rime is obsolete. Don't use it in your code, especially production code unless you know what you're doing. runicast was never one of the highlights of Rime, I doubt there are no better alternatives (e.g. the uIPv6 stack) for what you want to do.

Why is it not safe to use Socket.ReceiveLength?

Well, even Embarcadero states that it is not guaranteed to return accurate result of the bytes ready to read in the socket buffer, but if you look at it, when you place -1 at Socket.ReceiveBuf (this is what ReceiveLength wraps) it calls ioctlsocket with FIONREAD to determine the amount of data pending in the network's input buffer that can be read from socket s.
so, how is it not safe or bad ?
e.g: ioctlsocket(Socket.SocketHandle, FIONREAD, Longint(i));
The documentation you mention specifically says (emphasis mine)
Note: ReceiveLength is not guaranteed to be accurate for streaming socket connections.
This means that the length is not known ahead of time because it's being supplied by a stream of data. Obviously, if you don't know how big the data is that's being sent ahead of time, you can't properly set the length the client should expect.
Consider it like generic code to copy a file. If you don't know ahead of time how big the file is you'll be copying, you can't predict how many bytes you'll be copying. In the case of the socket, the stream size that's supplying the socket isn't known in advance (for instance, for data being generated real-time and sent), so there's no way to inform the client socket how much to expect.

Writing a stream protocol: Message size field or Message delimiter?

I am about to write a message protocol going over a TCP stream. The receiver needs to know where the message boundaries are.
I can either send 1) fixed length messages, 2) size fields so the receiver knows how big the message is, or 3) a unique message terminator (I guess this can't be used anywhere else in the message).
I won't use #1 for efficiency reasons.
I like #2 but is it possible for the stream to get out of sync?
I don't like idea #3 because it means receiver can't know the size of the message ahead of time and also requires that the terminator doesn't appear elsewhere in the message.
With #2, if it's possible to get out of sync, can I add a terminator or am I guaranteed to never get out of sync as long as the sender program is correct in what it sends? Is it necessary to do #2 AND #3?
Please let me know.
Thanks,
jbu
You are using TCP, the packet delivery is reliable. So the connection either drops, timeouts or you will read the whole message.
So option #2 is ok.
I agree with sigjuice.
If you have a size field, it's not necessary to add and end-of-message delimiter --
however, it's a good idea.
Having both makes things much more robust and easier to debug.
Consider using the standard netstring format, which includes both a size field and also a end-of-string character.
Because it has a size field, it's OK for the end-of-string character to be used inside the message.
If you are developing both the transmit and receive code from scratch, it wouldn't hurt to use both length headers and delimiters. This would provide robustness and error detection. Consider the case where you just use #2. If you write a length field of N to the TCP stream, but end up sending a message which is of a size different from N, the receiving end wouldn't know any better and end up confused.
If you use both #2 and #3, while not foolproof, the receiver can have a greater degree of confidence that it received the message correctly if it encounters the delimiter after consuming N bytes from the TCP stream. You can also safely use the delimiter inside your message.
Take a look at HTTP Chunked Transfer Coding for a real world example of using both #2 and #3.
Depending on the level at which you're working, #2 may actually not have an issues with going out of sync (TCP has sequence numbering in the packets, and does reassemble the stream in correct order for you if it arrives out of order).
Thus, #2 is probably your best bet. In addition, knowing the message size early on in the transmission will make it easier to allocate memory on the receiving end.
Interesting there is no clear answer here. #2 is generally safe over TCP, and is done "in the real world" quite often. This is because TCP guarantees that all data arrives both uncorrupted* and in the order that it was sent.
*Unless corrupted in such a way that the TCP checksum still passes.
Answering to old message since there is stuff to correnct:
Unlike many answers here claim, TCP does not guarantee data to arrive uncorrupted. Not even practically.
TCP protocol has a 2-byte crc-checksum that obviously has a 1:65536 chance of collision if more than one bit flips. This is such a small chance it will never be encountered in tests, but if you are developing something that either transmits large amounts of data and/or is used by very many end users, that dice gets thrown trillions of times (not kidding, youtube throws it about 30 times a second per user.)
Option 2: size field is the only practical option for the reasons you yourself listed. Fixed length messages would be wasteful, and delimiter marks necessitate running the entire payload through some sort of encoding-decoding stage to replace at least three different symbols: start-symbol, end-symbol, and the replacement-symbol that signals replacement has occurred.
In addition to this one will most likely want to use some sort of error checking with a serious checksum. Probably implemented in tandem with the encryption protocol as a message validity check.
As to the possibility of getting out of sync:
This is possible per message, but has a remedy.
A useful scheme is to start each message with a header. This header can be quite short (<30 bytes) and contain the message payload length, eventual correct checksum of the payload, and a checksum for that first portion of the header itself. Messages will also have a maximum length. Such a short header can also be delimited with known symbols.
Now the receiving end will always be in one of two states:
Waiting for new message header to arrive
Receiving more data to an ongoing message, whose length and checksum are known.
This way the receiver will in any situation get out of sync for at most the maximum length of one message. (Assuming there was a corrupted header with corruption in message length field)
With this scheme all messages arrive as discrete payloads, the receiver cannot get stuck forever even with maliciously corrupted data in between, the length of arriving payloads is know in advance, and a successfully transmitted payload has been verified by an additional longer checksum, and that checksum itself has been verified. The overhead for all this can be a mere 26 byte header containing three 64-bit fields, and two delimiting symbols.
(The header does not require replacement-encoding since it is expected only in a state whout ongoing message, and the entire 26 bytes can be processed at once)
There is a fourth alternative: a self-describing protocol such as XML.

Resources