empty buffer but IdTCPClient.IOHandler.InputBufferIsEmpty is false - delphi

I have problem in below code with idTCPClient for reading buffer from a telnet server:
procedure TForm2.ReadTimerTimer(Sender: TObject);
var
S: String;
begin
if IdTCPClient.IOHandler.InputBufferIsEmpty then
begin
IdTCPClient.IOHandler.CheckForDataOnSource(10);
if IdTCPClient.IOHandler.InputBufferIsEmpty then Exit;
end;
s := idTCPClient.IOHandler.InputBufferAsString(TEncoding.UTF8);
CheckText(S);
end;
this procedure run every 1000 milliseconds and when the buffer have a value CheckText called.
this code works but sometimes this return the empty buffer to CheckText.
what's the problem?
thanks

Your code is attempting to read arbitrary blocks of data from the InputBuffer and expects them to be complete and valid strings. It is doing this without ANY consideration for what kind of data you are receiving. That is a recipe for disaster on multiple levels.
You are connected to a Telnet server, but you are using TIdTCPClient directly instead of using TIdTelnet, so you MUST manually decode any Telnet sequences that are received BEFORE you can then process any remaining string data. Look at the source code for TIdTelnet. There is a lot of decoding logic that takes place before the OnDataAvailable event is fired. All Telnet sequence data is handled internally, then the OnDataAvailable event provides whatever non-Telnet data is left over after decoding.
Once you have Telnet decoding taken care of, another problem you have to watch out for is that TEncoding.UTF8 only handles properly encoded COMPLETE UTF-8 sequences. If it encounters a badly encoded sequence, or more importantly encounters an incomplete sequence, THE ENTIRE DECODE FAILS and it returns a blank string. This has already been reported as a bug (see QC #79042).
CheckForDataOnSource() stores whatever raw bytes are in the socket at that moment into the InputBuffer. InputBufferAsString() extracts whatever raw bytes are in the InputBuffer at that moment and attempts to decode them using the specified encoding. It is very possible and likely that the raw bytes that are in the InputBuffer when you call InputBufferAsString() do not always contain COMPLETE UTF-8 sequences. Chances are that sometimes the last sequence in the InputBuffer is still waiting for bytes to arrive in the socket and they will not be read until the next call to CheckForDataOnSource(). That would explain why your CheckText() function is receiving blank strings when using TEncoding.UTF8.
You should use IndyUTF8Encoding() instead (Indy implements its own UTF-8 encoder/decoder to avoid the decoding bug in TEncoding.UTF8). At the very least, you will not get blank strings anymore, however you can still lose data when a UTF-8 sequence spans multiple CheckForDataOnSource() calls (incomplete UTF-8 sequences will be converted to ? characters). For that reason alone, you should not be using InputBufferAsString() in this situation (even if TEncoding.UTF8 did work properly). To handle this properly, you should either:
1) scan through the InputBuffer manually, calculating how many bytes constitute COMPLETE UTF-8 sequences only, and then pass that count to InputBuffer.Extract() or TIdIOHandler.ReadString(). Any left over bytes will remain in the InputBuffer for the next time. For that to work, you will have to get rid of the first InputBufferIsEmpty() call and just call CheckForDataOnSource() unconditionally so that you are always checking for more bytes even if you already have some.
2) use TIdIOHandler.ReadChar() instead and get rid of the calls to InputBufferIsEmpty() and CheckForDataOnSource() altogether. The downside is that you will lose data if a UTF-8 sequence decodes into a UTF-16 surrogate pair. ReadChar() can decode surrogates, but it cannot return the second character in the pair (I have started working on new ReadChar() overloads for a future release of Indy that return String instead of Char so full surrogate pairs can be returned).

While your code is correct, the problem is most likely that the inputBuffer contains data that might contain null characters (#0) which would end the string.
Try Remy's solution, and check what you get in the rawbytestring.
Edit
I didn't read that the OP was reading from a TelnetServer.
OP should use TidTelnet instead of IdTCPClient.
Edit2
I just read an older post of OP which explains the reason why he is not using TidTelnet.
/Daddy

Telnet servers send a null character (#0) after each carriage return. This is most likely what you are seeing.
A null character encoded to UTF8 is still a single byte with the value of 0. Check to see if that's what you are receiving.

Related

Read Timeout TIdTCPClient

Good day. I use the TIdTCPClient component to send requests to the server and read the response. I know the size of the response for certain requests, but not for others.
When I know the size of the response, then my data reading code looks like this:
IdTCPClient1->Socket->Write(requestBuffer);
IdTCPClient1->Socket->ReadBytes(answerBuffer, expectSize);
When the size of the response is not known to me, then I use this code:
IdTCPClient1->Socket->Write(requestBuffer);
IdTCPClient1->Socket->ReadBytes(answerBuffer, -1);
In both cases, I ran into problems.
In the first case, if the server does not return all the data (less than expectSize), then IdTCPClient1 will wait for ReadTimeout to finish, but there will be no data at all in the answerBuffer (even if the server sent something). Is this the logic behind TIdTCPClient? It is right?
In the second case, ReadTimeout does not work at all. That is, the ReadBytes function ends immediately and nothing is written to the answerBuffer, or several bytes from the server are written. However, I expected that since this function in this case does not know the number of bytes to read, it must wait for ReadTimeout and read the bytes, who came during this time. For the experiment, I inserted Sleep (500) between writing and reading, and then I read all the data that arrived.
May I ask you to answer why this is happening?
Good day. I use the TIdTCPClient component to send requests to the server and read the response. I know the size of the response for certain requests, but not for others.
Why do you not know the size of all of the responses? What does your protocol actually look like? TCP is a byte stream, each message MUST be framed in such a way that a receiver can know where each message begins and ends in order to read the messages correctly and preserve the integrity of the stream. As such, messages MUST either include their size in their payload, or be uniquely delimited between messages. So, which is the case in your situation? It doesn't sound like you are handling either possibility.
When the size of the response is not known to me, then I use this code:
IdTCPClient1->Socket->Write(requestBuffer);
IdTCPClient1->Socket->ReadBytes(answerBuffer, -1);
When you set AByteCount to -1, that tells ReadBytes() to return whatever bytes are currently available in the IOHandler's InputBuffer. If the InputBuffer is empty, ReadBytes() waits, up to the ReadTimeout interval, for at least 1 byte to arrive, and then it returns whatever bytes were actually received into the InputBuffer, up to the maximum specified by the IOHandler's RecvBufferSize. So it may still take multiple reads to read an entire message in full.
In general, you should NEVER set AByteCount to -1 when dealing with an actual protocol. -1 is good to use only when proxying/streaming arbitrary data, where you don't care what the bytes actually are. Any other use require knowledge of the protocol's details of how messages are framed.
In the first case, if the server does not return all the data (less than expectSize), then IdTCPClient1 will wait for ReadTimeout to finish, but there will be no data at all in the answerBuffer (even if the server sent something). Is this the logic behind TIdTCPClient? It is right?
Yes. When AByteCount is > 0, ReadBytes() waits for the specified number of bytes to be available in the InputBuffer before then extracting that many bytes into your output TIdBytes. Your answerBuffer will not be modified unless all of the requested bytes are available. If the ReadTimeout elapses, an EIdReadTimeout exception is raised, and your answerBuffer is left untouched.
If that is not the behavior you want, then consider using ReadStream() instead of ReadBytes(), using a TIdMemoryBufferStream or TBytesStream to read into.
In the second case, ReadTimeout does not work at all. That is, the ReadBytes function ends immediately and nothing is written to the answerBuffer.
I have never heard of ReadBytes() not waiting for the ReadTimeout. What you describe should only happen if there are no bytes available in the InputBuffer and the ReadTimeout is set to some very small value, like 0 msecs.
or several bytes from the server are written.
That is a perfectly reasonable outcome given you are asking ReadBytes() to read an arbitrary number of bytes between 1..RecvBufferSize, inclusive, or read no bytes if the timeout elapses.
However, I expected that since this function in this case does not know the number of bytes to read, it must wait for ReadTimeout and read the bytes, who came during this time.
That is how it should be working, yes. And how it has always worked. So I suggest you debug into ReadBytes() at runtime and find out why it is not working the way you are expecting. Also, make sure you are using an up-to-date version of Indy to begin with (or at least a version from the last few years).
Why do you not know the size of all of the responses?
Because, in fact, I'm doing a survey of an electronic device. This device has its own network IP address and port. So, the device can respond to the same request in different ways, depending on its status. Strictly speaking, there can be two answers to some queries and they have different lengths. It is in these cases, when reading, I specify AByteCount = -1 to read any device response.
I have never heard of ReadBytes() not waiting for the ReadTimeout.
You're right! I was wrong. When specifying AByteCount = -1, I get one byte. As you said, if at least one byte arrives, it returns and ReadBytes() ends.
Also, make sure you are using an up-to-date version of Indy to begin with (or at least a version from the last few years).
I am working with C++ Builder 10.3 Community Edition, Indy version 10.6.2.5366.

How to convert hexadecimal data (stored in a string variable) to an integer value

Edit (abstract)
I tried to interpret Char/String data as Byte, 4 bytes at a time. This was because I could only get TComport/TDatapacket to interpret streamed data as String, not as any other data type. I still don't know how to get the Read method and OnRxBuf event handler to work with TComport.
Problem Summary
I'm trying to get data from a mass spectrometer (MS) using some Delphi code. The instrument is connected with a serial cable and follows the RS232 protocol. I am able to send commands and process the text-based outputs from the MS without problems, but I am having trouble with interpreting the data buffer.
Background
From the user manual of this instrument:
"With the exception of the ion current values, the output of the RGA are ASCII character strings terminated by a linefeed + carriage return terminator. Ion signals are represented as integers in units of 10^-16 Amps, and transmitted directly in hex format (four byte integers, 2's complement format, Least Significant Byte first) for maximum data throughput."
I'm not sure whether (1) hex data can be stored properly in a string variable. I'm also not sure how to (2) implement 2's complement in Delphi and (3) the Least Significant Byte first.
Following #David Heffernan 's advice, I went and revised my data types. Attempting to harvest binary data from characters doesn't work, because not all values from 0-255 can be properly represented. You lose data along the way, basically. Especially it your data is represented 4 bytes at a time.
The solution for me was to use the Async Professional component instead of Denjan's Comport lib. It handles datastreams better and has a built-in log that I could use to figure out how to interpret streamed resposes from the instrument. It's also better documented. So, if you're new to serial communications (like I am), rather give that a go.

UTF8 Encoding and Network Streams

A client and server communicate with each other via TCP. The server and client send each other UTF-8 encoded messages.
When encoding UTF-8, the amount of bytes per character is variable. It could take one or more bytes to represent a single character.
Lets say that I am reading a UTF-8 encoded message on the network stream and it is a huge message. In my case it was about 145k bytes. To create a buffer of this size to read from the network stream could lead to an OutMemoryException since the byte array needs that amount of sequential memory.
It would be best then to read from the network stream in a while loop until the entire message is read, reading the pieces in to a smaller buffer (probably 4kb) and then decoding the string and concatenating.
What I am wondering is what happens when the very last byte of the read buffer is actually one of the bytes of a character which is represented by multiple bytes. When I decode the read buffer, that last byte and the beginning bytes of the next read would either be invalid or the wrong character. The quickest way to solve this in my mind would be to encode using a non variable encoding (like UTF-16), and then make your buffer a multiple of the amount of bytes in each character (with UTF-16 being a buffer using the power 2, UTF-32 the power of 4).
But UTF-8 seems to be a common encoding, which would leave me to believe this is a solved problem. Is there another way to solve my concern other than changing the encoding? Perhaps using a linked-list type object to store the bytes would be the way to handle this since it would not use sequential memory.
It is a solved problem. Woot woot!
http://mikehadlow.blogspot.com/2012/07/reading-utf-8-characters-from-infinite.html

File Transfer using winsock

I want to send files(text or binary) through winsock,I have a buffer with 32768 byte size, In the other side the buffer size is same,But when the packet size <32768 then i don't know how determine the end of packet in buffer,Also with binary file it seems mark the end of packet with a unique character is not possible,Any solution there?
thx
With fixed-size "packets," we would usually that every packet except the last will be completely full of valid data. Only the last one will be "partial," and if the recipient knows how many bytes to expect (because, using Davita's suggestion, the sender told it the file size in advance), then that's no problem. The recipient can simply ignore the remainder of the last packet.
But your further description makes it sound like there may be multiple partially full packets associated with a single file transmission. There is a similarly easy solution to that: Prefix each packet with the number of valid bytes.
You later mention TCustomWinSocket.ReceiveText, and you wonder how it knows how much text to read, and then you quote the answer, which is that it calls ReceiveBuf(Pointer(nul)^, -1)) to set the length of the result buffer before filling it. Perhaps you just didn't understand what that code is doing. It's easier to understand if you look at that same code in another context, the ReceiveLength method. It makes that same call to ReceiveBuf, indicating that when you pass -1 to ReceiveBuf, it returns the number of bytes it received.
In order for that to work for your purposes, you cannot send fixed-size packets. If you always send 32KB packets, and just pad the end with zeroes, then ReceiveLength will always return 32768, and you'll have to combine Davita's and my solutions of sending file and packet lengths along with the payload. But if you ensure that every byte in your packet is always valid, then the recipient can know how much to save based on the size of the packet.
One way or another, you need to make sure the sender provides the recipient with the information it needs to do its job. If the sender sends garbage without giving the recipient a way to distinguish garbage from valid data, then you're stuck.
Well, you can always send file size before you start file transfer, so you'll know when to stop writing to file.

Using PARSE on a PORT! value

I tried using PARSE on a PORT! and it does not work:
>> parse open %test-data.r [to end]
** Script error: parse does not allow port! for its input argument
Of course, it works if you read the data in:
>> parse read open %test-data.r [to end]
== true
...but it seems it would be useful to be able to use PARSE on large files without first loading them into memory.
Is there a reason why PARSE couldn't work on a PORT! ... or is it merely not implemented yet?
the easy answer is no we can't...
The way parse works, it may need to roll-back to a prior part of the input string, which might in fact be the head of the complete input, when it meets the last character of the stream.
ports copy their data to a string buffer as they get their input from a port, so in fact, there is never any "prior" string for parse to roll-back to. its like quantum physics... just looking at it, its not there anymore.
But as you know in rebol... no isn't an answer. ;-)
This being said, there is a way to parse data from a port as its being grabbed, but its a bit more work.
what you do is use a buffer, and
APPEND buffer COPY/part connection amount
Depending on your data, amount could be 1 byte or 1kb, use what makes sense.
Once the new input is added to your buffer, parse it and add logic to know if you matched part of that buffer.
If something positively matched, you remove/part what matched from the buffer, and continue parsing until nothing parses.
you then repeat above until you reach the end of input.
I've used this in a real-time EDI tcp server which has an "always on" tcp port in order to break up a (potentially) continuous stream of input data, which actually piggy-backs messages end to end.
details
The best way to setup this system is to use /no-wait and loop until the port closes (you receive none instead of "").
Also make sure you have a way of checking for data integrity problems (like a skipped byte, or erroneous message) when you are parsing, otherwise, you will never reach the end.
In my system, when the buffer was beyond a specific size, I tried an alternate rule which skipped bytes until a pattern might be found further down the stream. If one was found, an error was logged, the partial message stored and a alert raised for sysadmin to sort out the message.
HTH !
I think that Maxim's answer is good enough. At this moment the parse on port is not implemented. I don't think it's impossible to implement it later, but we must solve other issues first.
Also as Maxim says, you can do it even now, but it very depends what exactly you want to do.
You can parse large files without need to read them completely to the memory, for sure. It's always good to know, what you expect to parse. For example all large files, like files for music and video, are divided into chunks, so you can just use copy|seek to get these chunks and parse them.
Or if you want to get just titles of multiple web pages, you can just read, let's say, first 1024 bytes and look for the title tag here, if it fails, read more bytes and try it again...
That's exactly what must be done to allow parse on port natively anyway.
And feel free to add a WISH in the CureCode database: http://curecode.org/rebol3/

Resources