Copying zero-terminated raw buffer of bytes to String

Copying zero-terminated raw buffer of bytes to String - delphi

Sorry for all these questions I keep asking.Anyways my question is
am I properly converting the value to a string?(Not an unincode string).
const
address:dword=$0057B568;
var
a:string;
len,i:dword;
begin
len:=0;
repeat
inc(len);
until ((pbyte(address+len)^=0));//and(pbyte(address+1)^=0));(for unincode)
for I:=0 to len do
a:=a+chr(pbyte(address+I)^);
//stringreplace(a,#0,'',[rfreplaceall,rfignorecase]);
MessageBox(0,pchar(a),'',0);
end.

No, it's not correct. The code is off by one byte. First, it assumes the string is at least one character long by ignoring the first byte. Next, it copies one extra byte. Your code can be greatly simplified:
a := PAnsiChar(address);

Related

Sending text over a named pipe crashes Delphi application

I have a Delphi application, which sends piece of text to a named pipe via call
SendMessageToNamedPipe(hPipe, CurMsg);
It works fine for some messages, but sending other texts leads to a crash of the application.
The only difference between normal and "crashing" messages I'm aware of is that the crashing messages contain lots of Cyrillic characters.
How should I encode them in order for the aforementioned call to be executed properly?
Update 1: Here's the implementation of SendMessageToNamedPipe.
procedure SendMessageToNamedPipe(hPipe:THandle; msg:string);
const
OUT_BUF_SIZE = 100;
var
dwWrite : DWORD;
lpNumberOfBytesWritten : LongBool;
utf8String : RawByteString;
sendBuf: array[0..OUT_BUF_SIZE] of WideChar;
begin
utf8String := UTF8Encode(msg);
sendBuf[0] := #0;
lstrcatw(sendBuf, PChar(msg));
lpNumberOfBytesWritten := WriteFile(hPipe, sendBuf, OUT_BUF_SIZE, dwWrite, NIL);
if not lpNumberOfBytesWritten then
begin
OutputDebugString(PChar('Sending error: ' + SysErrorMessage(GetLastError)));
end
else
begin
OutputDebugString(PChar('Message sent, dwWrite: ' + IntToStr(dwWrite)));
end;
end;
Update 2: Version of the function, which seems to work.
procedure SendMessageToNamedPipe(hPipe:THandle; msg:string);
const
OUT_BUF_SIZE = 200;
var
dwWrite : DWORD;
Success : LongBool;
msgToSend : PChar;
utf8String : RawByteString;
sendBuf: array[0..OUT_BUF_SIZE-1] of WideChar;
AnsiCharString : PAnsiChar;
begin
OutputDebugString(PChar('SendMessageToNamedPipe.Length(msg): ' + IntToStr(Length(msg))));
OutputDebugString(PChar('Sending message: ' + msg));
utf8String := UTF8Encode(msg);
sendBuf[0] := #0;
lstrcatw(sendBuf, PChar(msg));
Success := WriteFile(hPipe, sendBuf, Length(sendbuf), dwWrite, NIL);
if not Success then
begin
OutputDebugString(PChar('Sending error: ' + SysErrorMessage(GetLastError)));
end
else
begin
OutputDebugString(PChar('Message sent, dwWrite: ' + IntToStr(dwWrite)));
end;
end;

Since the Windows pipe functions see the data written to the pipe as a binary stream it is not plausible that the type of data being written could cause a crash.
But SendMessageToNamedPipe is neither a Delphi library function nor a Windows API call. I think you need to look at what SendMessageToNamedPipe is doing, since that is almost certainly where the bug is. You might like to ask questions such as: what is the data type of CurMsg? How does SendMessageToNamedPipe calculate how many bytes to write to the pipe?
Update:
Reading through the implementation of SendMessageToNamedPipe that you've added to your question:
sendBuf is OUT_BUF_SIZE+1 wide characters. You probably meant to define it as array[0..OUT_BUF_SIZE-1]. I see this mistake all the time. (But it is not the cause of the crash.)
utf8String is assigned but never used.
I think the cause of the crash is lstrcatw(sendBuf, PChar(msg)). It would crash if the length of the string passed in is greater than OUT_BUF_SIZE + 1 characters because this would overflow the buffer sendBuf.
The test if not lpNumberOfBytesWritten is wrong. Or more to the point, the return value from WriteFile is a Boolean saying whether or not the write succeeded, not a count of the number of bytes written. WriteFile modifies the value of dwWrite on exit to give the count of the number of bytes written.
Just spotted another one: WriteFile is sending OUT_BUF_SIZE bytes, but OUT_BUF_SIZE is the count of the number of characters in sendBuf, not the number of bytes. A Char in Delphi 2009 is 2 bytes (utf-16). (However good code would always use SizeOf(Char) rather than 2 because it could change in a future Delphi version, and has already changed once in the past.)
As David wrote, you don't actually need to copy msg to a different buffer before writing it to the pipe. The expression PChar(msg) returns a pointer to the start of the a null-terminated array of Chars that comprises the data in msg.
Reflecting on your code, I'd ask whether you are clear in your own mind whether the program at the other end of the pipe expects to receive a utf-16 string or a utf-8 string (or even an ANSI string for that matter). You need to settle this question and then modify SendMessageToNamedPipe accordingly. (Also, if it expects a null-terminated string, rather than a fixed-length buffer, you should send just the intended number of bytes, not OUT_BUF_SIZE bytes.)
Reply to your comment below:
Your code above doesn't write utf-8 to the pipe. It writes utf-16. Although you called UTF8Encode you threw the result away, as I mentioned above.
You can pass utf8String directly to WriteFile as the buffer to send by casting it like this: PRawByteString(utf8String). That expression returns a pointer to the first character in the utf8String just as for PChar(msg) which I explained above.
You need to pass the correct number of bytes to write to WriteFile, instead of OUT_BUF_SIZE. Since it is a utf-8 string, a character can take up anything from 1 to 4 bytes. But as it happens, Delphi's Length function returns the number of bytes when applied to a Utf8String or a RawByteString so you can write Length(utf8String)+1. The +1 is to include the terminating #0 character, which is not included in the Length count. (If you were passing a utf-16 string Length would return the number of characters, so would need to be multiplied by SizeOf(Char)).
If you are still unclear, then you would probably benefit greatly from reading Delphi and Unicode.

CharInSet accepting Unicode NULL character

I'm reading some data from memory, and this area of memory is in Unicode. So to make one ansi string I need something like this:
while CharInSet(Chr(Ord(Buff[aux])), ['0'..'9', #0]) do
begin
Target:= Target + Chr(Ord(Buff[aux]));
inc(aux);
end;
Where Buff is array of Bytes and Target is string. I just want keep getting Buff and adding in Target while it's 0..9, but when it finds NULL memory char (00), it just stops. How can I keep adding data in Target until first letter or non-numeric character?? The #0 has no effect.

I would not even bother with CharInSet() since you are dealing with bytes and not characters:
var
b: Byte;
while aux < Length(Buff) do
begin
b := Buff[aux];
if ((b >= Ord('0')) and (b <= Ord('9'))) or (b = 0) then
begin
Target := Target + Char(Buff[aux]);
Inc(aux);
end else
Break;
end;

If your data is Unicode, then I am assuming that the encoding is UTF-16. In which case you cannot process it byte by byte. A character unit is 2 bytes wide. Put the data into a Delphi string first, and then parse it:
var
str: string;
....
SetString(str, PChar(Buff), Length(Buff) div SizeOf(Char));
Do it this way and your loop can look like this:
for i := 1 to Length(str) do
if not CharInSet(str[i], ['0'..'9']) then
begin
SetLength(str, i-1);
break;
end;
I believe that your confusion was caused by processing byte by byte. With UTF-16 encoded text, ASCII characters are encoded as a pair of bytes, the most significant of which is zero. I suspect that explains what you were trying to achieve with your CharInSet call.
If you want to cater for other digit characters then you can use the Character unit and test with TCharacter.IsDigit().

How can I use a large file in Delphi?

When I use a large file in memorystream or filestream I see an error which is "out of memory"
How can I solve this problem?
Example:
procedure button1.clıck(click);
var
mem:TMemoryStream;
str:string;
begin
mem:=Tmemorystream.create;
mem.loadfromfile('test.txt');----------> there test.txt size 1 gb..
compressstream(mem);
end;

Your implementation is very messy. I don't know exactly what CompressStream does, but if you want to deal with a large file as a stream, you can save memory by simply using a TFileStream instead of trying to read the whole thing into a TMemoryStream all at once.
Also, you're never freeing the TMemoryStream when you're done with it, which means that you're going to leak a whole lot of memory. (Unless CompressStream takes care of that, but that's not clear from the code and it's really not a good idea to write it that way.)

You can't fit the entire file into a single contiguous block of 32 bit address space. Hence the out of memory error.
Read the file in smaller pieces and process it piece by piece.

Answering the question in the title, you need to process the file piece by piece, byte by byte if that's needed: you definitively do not load the file all at once into memory! How you do that obviously depends on what you need to do with the file; But since we know you're trying to implement an Huffman encoder, I'll give you some specific tips.
An Huffman encoder is a stream encoder: Bytes go in and bits go out. Each unit of incoming data is replaced with it's corresponding bit pattern. The encoder doesn't need to see the whole file at once, because it is in fact only working on one byte each time.
Here's how you'd huffman-compress a file without loading it all into memory; Of course, the actual Huffman encoder is not shown, because the question is about working with big files, not about building the actual encoder. This piece of code includes buffered input and output and shows how you'd link an actual encoder procedure to it.
(beware, code written in browser; if it doesn't compile you're expected to fix it!)
type THuffmanBuffer = array[0..1023] of Byte; // Because I need to pass the array as parameter
procedure DoActualHuffmanEncoding(const EncodeByte:Byte; var BitBuffer: THuffmanBuffer; var AtBit: Integer);
begin
// This is where the actual Huffman encoding would happen. This procedure will
// copy the correct encoding for EncodeByte in BitBuffer starting at AtBit bit index
// The procedure is expected to advance the AtBit counter with the number of bits
// that were actually written (that's why AtBit is a var parameter).
end;
procedure HuffmanEncoder(const FileNameIn, FileNameOut: string);
var InFile, OutFile: TFileStream;
InBuffer, OutBuffer: THuffmanBuffer;
InBytesCount: Integer;
OutBitPos: Integer;
i: Integer;
begin
// First open the InFile
InFile := TFileStream.Create(FileNameIn, fmOpenRead or fmShareDenyWrite);
try
// Now prepare the OutFile
OutFile := TFileStream.Create(FileNameOut, fmCreate);
try
// Start the out bit counter
OutBitPos := 0;
// Read from the input file, one buffer at a time (for efficiency)
InBytesCount := InFile.Read(InBuffer, SizeOf(InBuffer));
while InBytesCount <> 0 do
begin
// Process the input buffer byte-by-byte
for i:=0 to InBytesCount-1 do
begin
DoActualHuffmanEncoding(InBuffer[i], OutBuffer, OutBitPos);
// The function writes bits to the outer buffer, not full bytes, and the
// encoding for a rare byte might be significantly longer then 1 byte.
// Whenever the output buffer approaches it's capacity we'll flush it
// out to the OutFile
if (OutBitPos > ((SizeOf(OutBuffer)-10)*8) then
begin
// Ok, we've got less then 10 bytes available in the OutBuffer, time to
// flush!
OutFile.Write(OutBuffer, OutBitPos div 8);
// We're now possibly left with one incomplete byte in the buffer.
// We'll copy that byte to the start of the buffer and continue.
OutBuffer[0] := OutBuffer[OutBitPos div 8];
OutBitPos := OutBitPos mod 8;
end;
end;
// Read next chunk
InBytesCount := InFile.Read(InBuffer, SizeOf(InBuffer));
end;
// Flush the remaining of the output buffer. This time we want to flush
// the final (potentially incomplete) byte as well, because we've got no
// more input, there'll be no more output.
OutFile.Write(OutBuffer, (OutBitPos + 7) div 8);
finally OutFile.Free;
end;
finally InFile.Free;
end;
end;
The Huffman encoder is not a difficult encoder to implement, but doing it both correctly and fast might be a challenge. I suggest you start with a correct encoder, once you've got both encoding and decoding working figure out how to do a fast encoder.

try something like http://www.explainth.at/en/delphi/mapstream.shtml

Is it safe to modify the content of a string variable through a pointer?

Consider I have a procedure with Str parameter passed by reference, and I want to modify content of the given variable through the procedure, e.g.
procedure Replace(var Str: string);
var
PStr: PChar;
i: Integer;
begin
PStr := #Str[1];
for i := 1 to Length(Str) do begin
PStr^ := 'x';
Inc(PStr);
end;
end;
Is it an acceptable pointer usage? I'm not sure whether it has a memory leak.
What really happen in PStr := #Str[1], does compiler make a copy of Str internally, or what?
Is this kind of code optimization worth?

Is it an acceptable pointer usage?
You need to make sure that you don't call
PStr := #Str[1];
for an empty string, as that would crash. The easiest way to do that is to replace that line with
PStr := PChar(Str);
so that the compiler will make sure that either a pointer to the first char of the string, or a pointer to #0 is returned. As Ken correctly pointed out in a comment there is no call to UniqueString() in this case, so you would need to do it yourself.
I'm not sure whether it has a memory leak.
No, there is no memory leak. Obtaining a pointer to a string character will call UniqueString() internally, but that will happen for write access to a string character too, so there's nothing special about the character pointer.
What really happen in PStr := #Str[1], does compiler make a copy of Str internally, or what?
No, it just makes sure that the string is unique (so that write access through the pointer does not change the contents of any other string that shares the same data). Afterwards it returns a pointer to that character in the string, which you can then treat as any other PChar variable, pass it to API functions, increment it and so on.
Is this kind of code optimization worth?
It is not only worth it, it is necessary to really achieve good performance for large strings. The reason for this is that the compiler is not smart enough to only call UniqueString() once, but it will insert calls to it for each write access to a character in the string. So if you process a large string character by character you will have a big overhead from all these calls.

Yes, it's safe, as long as you don't go beyond the bounds of the string. The string has metadata attached that tells how long it is, and if you write beyond the length of the string, you won't leak memory, but you could corrupt it.

If Str is passed by reference, why would you need another pointer to the string? Apart from that, there should be no memory leak: PStr is initialized with the adress of the first element of the string and then incremented, so it will always point to one of the characters in your string.
The compile does not make a copy of Str internally. One of the uses for pointers is to avoid making copies. When you say
PStr := #Str[1]
is that PStr will now store the adress of Str[1], that is, the adress of the first char in the string.

I am sure this will work for AnsiString and PAnsiChar, but will it still work for unicode strings in Delphi 2009 and above? I think it should, because both, a char of a string (str[i]) and the char pointed to by PChar, should be 2 bytes in size.
Could somebody with more experience with unicode strings please confirm this?

As in D2010, looks like codegen employs copy-on-write on such construct
Unit9.pas.34: S := 'abcd';
004B32EF 8D45F4 lea eax,[ebp-$0c]
004B32F2 BA98334B00 mov edx,$004b3398
004B32F7 E89C35F5FF call #UStrLAsg
Unit9.pas.35: P := #S[1];
004B32FC 8D45F4 lea eax,[ebp-$0c]
004B32FF E8343FF5FF call #UniqueStringU ; <== here you are
004B3304 8945F0 mov [ebp-$10],eax
Unit9.pas.36: Exit;
004B3307 EB61 jmp $004b336a
by the way, generic referencing P := #S does not emit UniqueString.
As conclusion, i do not recommend to count on codegen's internals and use recommended PChar(S) construct (emits one xStrToPxChar call as overhead)

how to show hex code char?

i have a file contains numbers like FB8E,FB8F,FB90 on each line.
i want in my program to load this file and take each line and print the character corresponded to that number/line.
for expamle, my firnst line is FB8E, i want something to convert it like #$FB8E (arabic Kaf), how do i do that?

If you are in D2009/2010:
var
F: TextFile;
Line: string;
Code: Integer;
Ch: Char;
...
Readln(F, Line);
Code := StrToInt('$' + Line);
Ch := Char(Code);
...
otherwise replace Char with WideChar.
Of course the code can be compressed a little bit, but I left this out for clarity.
EDIT: For those of you being not afraid of type casting there is also the HexToBin function in classes.pas.

You won't be too happy with simply converting the line to #$FB8E as the compiler most likely sorts these out for you.
So the general approach here would be to read the line, parse the hex value and create a WideChar from that value. But I didn't do much Delphi in recent years so I'm afraid I can't tell you exactly how to do this.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart