Convert a Pointer to TBytes - delphi

I have this code:
procedure MyFunct(const aBin; aBinSize : Cardinal);
var bytes: Tbytes;
begin
bytes := Tbytes(#aBin);
for var I := 0 to aBinSize - 1 do
writeln(bytes[i]);
end;
var Memory: Pointer
...init the memory...
MyFunct(Memory^, sizeOfMemory);
this was working very well for several years with {$R-} (range check off). however today I decide to deactivate {$R-} and now the code below crash with range check error and when yes that normal because I do length(bytes) it's often equal to 0.
So I can reactivate the {$R-} but now I think it's a fundamental mistake because as far as I understand the length of a Tbyte is store at bytes[-32bit] and most important the reference count of the Tbytes is store at bytes[-64bit]. So now I m affraid that the code before was simply writen the reference count in bytes[-64bit] and destroying my memory (maybe not not sure).
so is it a good practice to do
bytes := Tbytes(#aBin);
If not why the compiler authorize it ? How without a Tbytes I can navigate through each byte of my memory (ie how to access myMemory[x])

You can't type-cast an arbitrary pointer to a TBytes like you are, they are completely different things. The code will fail if the memory being pointed at is not a valid dynamic array to begin with. Your code has been faulty for years, and you are just lucky it did anything at all.
The function needs to look more like this instead when using TBytes as you are:
procedure MyFunct(const aBin; aBinSize : Cardinal);
var bytes: TBytes;
begin
SetLength(bytes, aBinSize);
Move(aBin, bytes[0], aBinSize);
for var I := 0 to aBinSize - 1 do
WriteLn(bytes[i]);
end;
Otherwise, a simpler approach (which is likely what you were attempting to do) would be more like this instead:
procedure MyFunct(const aBin; aBinSize : Cardinal);
var bytes: PByte;
begin
bytes := PByte(#aBin);
for var I := 0 to aBinSize - 1 do
WriteLn(bytes[i]);
end;

Related

Can I convert a buffer + size to TBytes?

Given a buffer and its size in bytes, is there a way to convert this to TBytes without copying it?
Example:
procedure HandleBuffer(_Buffer: PByte; _BufSize: integer);
var
Arr: TBytes;
i: Integer;
begin
// some clever code here to get contents of the buffer into the Array
for i := 0 to Length(Arr)-1 do begin
HandleByte(Arr[i]);
end;
end;
I could of course copy the data:
procedure HandleBuffer(_Buffer: PByte; _BufSize: integer);
var
Arr: TBytes;
i: Integer;
begin
// this works but is very inefficient
SetLength(Arr, _BufSize);
Move(PByte(_Buffer)^, Arr[0], _BufSize);
//
for i := 0 to Length(Arr)-1 do begin
HandleByte(Arr[i]);
end;
end;
But for a large buffer (about a hundred megabytes) this would mean I have double the memory requirement and also spend a lot of time unnecessarily copying data.
I am aware that I could simply use a PByte to process each byte in the buffer, I'm only interested in a solution to use a TBytes instead.
I think it's not possible, but I have been wrong before.
No, this is not possible (without unreasonable hacks).
The problem is that TBytes = TArray<Byte> = array of Byte is a dynamic array and the heap object for a non-empty dynamic array has a header containing the array's reference count and length.
A function that accepts a TBytes parameter, when given a plain pointer to an array of bytes, might (rightfully) attempt to read the (non-existing) header, and then you are in serious trouble.
Also, dynamic arrays are managed types (as indicated by the reference count I mentioned), so you might have problems with that as well.
However, in your particular example code, you don't actually use the dynamic array nature of the data at all, so you can work directly with the buffer:
procedure HandleBuffer(_Buffer: PByte; _BufSize: integer);
var
i: Integer;
begin
for i := 0 to _BufSize - 1 do
HandleByte(_Buffer[i]);
end;

Efficient way to find a string in a stream in Delphi

I have come up with this function to return the number of occurrences of a string in a Delphi Stream. However, I suspect there is a more efficient way to achieve this, since I am using "higher level" constructs (char), and not working at the lower byte/pointer level (which I am not that familiar with)
function ReadStream(const S: AnsiString; Stream: TMemoryStream): Integer;
var
Arr: Array of AnsiChar;
Buf: AnsiChar;
ReadCount: Integer;
procedure AddChar(const C: AnsiChar);
var
I: Integer;
begin
for I := 1 to Length(S) - 1 do
Arr[I] := Arr[I+1];
Arr[Length(S)] := C;
end;
function IsEqual: Boolean;
var
I: Integer;
begin
Result := True;
for I := 1 to Length(S) do
if S[I] <> Arr[I] then
begin
Result := False;
Break;;
end;
end;
begin
Stream.Position := 0;
SetLength(Arr, Length(S));
Result := 0;
repeat
ReadCount := Stream.Read(Buf, 1);
AddChar(Buf);
if IsEqual then
Inc(Result);
until ReadCount = 0;
end;
Can someone supply a procedure that is more efficient?
Stream has a method that will let you get into the internal buffer.
You can get a pointer to the internal buffer using the Memory property.
If you are working in 32 bit and you are willing to let go of the deprecated TMemoryStream and use TBytesStream instead you can use abuse the fact that a dynamic array and an AnsiString share the same structure in 32 bit.
Unfortunately Emba broke that compatibility in X64, Which means that for no good reason whatsoever you cannot have strings > 2GB in X64.
Note that this trick will break in 64 bit! (See fix below)
You can use Boyer-Moore string searching.
This allows you to write code like this:
function CountOccurrances(const Needle: AnsiString; const Haystack: TBytesStream): integer;
var
Start: cardinal;
Count: integer;
begin
Start:= 1;
Count:= 0;
repeat
{$ifdef CPUx86}
Start:= _FindStringBoyerAnsiString(string(HayStack.Memory), Needle, false, Start);
{$else}
Start:= __FindStringBoyerAnsiStringIn64BitTArrayByte(TArray<Byte>(HaySAtack.Memory), Needle, false, Start);
{$endif}
if Start >= 1 then begin
Inc(Start, Length(Needle));
Inc(Count);
end;
until Start <= 0;
Result:= Count;
end;
For 32 bit you'll have to rewrite the BoyerMoore code to use AnsiString; a trivial rewrite.
For 64 bit you'll have to rewrite the BoyerMoore code to use a TArray<byte> as a first parameter; a relatively simple task.
If you are looking for efficiency, please try and avoid WinAPI calls that use pchars. c-style strings are a horrible idea, because they do not have a length prefix.
Johan has given you a good answer about Boyer-Moore searching. BM is fine if
your are content to use it as a "black box", but if you want to understand what's going on,
there is a bit of a gulf between the complexity of your own code and a BM implementation.
You might find it helpful to explore searching that's more efficient than your own code
but not so complex as BM. There is one ultra-simple way to do what you want without
getting invoved with pointers, PChars, etc.
Let's leave aside for a moment the fact that you want to work with a TMemoryStream, and
consider finding the number of occurrences of a string SubStr in another string Target.
For efficiency, things you want to avoid are a) repeatedly scanning the same characters
over and over and b) copying one or both strings.
Since D7, Delphi has included a PosEx function:
function PosEx(const SubStr, S: string; Offset: Cardinal = 1): Integer;
Description
PosEx returns the index of SubStr in S, beginning the search at Offset. If Offset is 1 (default), PosEx is equivalent to Pos.
PosEx returns 0 if SubStr is not found, if Offset is greater than the length of S, or if Offset is less than 1.
So what you can do is repeatedly call PosEx, starting with Offset = 1, and each time it
finds SubStr in Target you increment Offset to skip over it, like this (in a console application):
function ContainsCount(const SubStr, Target : String) : Integer;
var
i : Integer;
begin
Result := 0;
i := 1;
repeat
i := PosEx(SubStr, Target, i);
if i > 0 then begin
Inc(Result);
i := i + Length(SubStr);
end;
until i <= 0;
end;
var
Count : Integer;
Target : String;
begin
Target := 'aa b ca';
Count := ContainsCount('a', Target);
writeln(Count);
readln;
end.
The fact that PosEx and ContainsCount both pass SubStr and Target as
consts meants that no string copying is involved, and it should be obvious
that ContainsCount never scans the same characters more that once.
Once you've satisfied yourself that this works, you might care to trace
into PosEx to see how it does its stuff.
You can do something which works in a similar way on PChars using the RTL functions StrPos/AnsiStrPos
To convert your memory stream to a string, you could use this code from
Rob Kennedy's answer to this q Converting TMemoryStream to 'String' in Delphi 2009
function MemoryStreamToString(M: TMemoryStream): string;
begin
SetString(Result, PChar(M.Memory), M.Size div SizeOf(Char));
end;
(Note what he says about the alternative version later in his answer)
Btw, if you look through the VCL + RTL code, you'll see that quite a lot of the string-parsing and processing code (e.g. in TParser, TStringList, TExpressionParser) all does its work with PChars. There's a reason for that of course, because it minimizes character copying and means that most scanning operations can be done by changing pointer (PChar) values.

How to find out char code for a character of an Ansistring

In older versions of Delphi, like D7, you could do like ord(s[i]) where s was a string, but trying this with an AnsiString results in an exception (access violation).
P.S. I was w/delphi 7 for a long time.
Here are the steps to reproduce the error:
Create a new project and through a memo on the form (let it be memo1) than add the following code to the form create event handler:
procedure TForm1.FormCreate(Sender: TObject);
var u: ansistring;
begin
u := 'stringtest';
memo1.Lines.Add(inttostr(ord(u[2])));
end;
For me this code produces an AV.
It does work with an ansistring, but you cannot read past the end of it and you must make sure the string is initialized.
function CharCode(const S: ansistring; pos: integer): byte;
begin
if pos <= 0 then result:= 0
//else if s='' then Result:= 0 //unassigned string;
else if Length(s) < Pos then Result:= 0 //cannot read past the end.
else Result:= Ord(s[pos]);
end;
Note that if s='' is the same as asking if pointer(s) = nil. An empty string is really a nil pointer.
This is probably why you where getting an access violation.
If you want to force the ansistring to be a certain length you can use SetLength(MyAnsistring, NewLength);
The length of the (ansi)string is variable. That means it grows and shrinks as needed. If you read past the end of the string you may get an access violation.
Note that you don't have to get an AV, the RTL leaves a bit of slack in its allocation; it usually allocates a slightly bigger buffer than requested, this is due to performance and architectural reasons.
The other reason why you may not get an AV if reading past the end of a string is that your program may own both the string buffer and whatever happens to be right next to it.
For this reason it is a good idea to enable range checking in debug mode {$R+} it adds extra checks to protect against reading past the end of structures.
The difference between shortstring and (ansi)string
A short string has a fixed length and it lives on the stack.
A long string (ansi or wide) is a pointer to a record that gets allocated on the heap; it looks like this:
type
TStringRecord = record
CodePage: word;
ElementSize: word; //(1, 2 or 4)
ReferenceCount: integer;
Length: Integer;
StringData: array[1..length(s)] of char;
NullChar: char;
end;
The compiler hides all these details from you.
see: http://docwiki.embarcadero.com/RADStudio/Seattle/en/Internal_Data_Formats

How I determine the number of references to a dynamic array?

Following on from this question (Dynamic arrays and memory management in Delphi), if I create a dynamic array in Delphi, how do I access the reference count?
SetLength(a1, 100);
a2 := a1;
// The reference count for the array pointed to by both
// a1 and a2 should be 2. How do I retrieve this?
Additionally, if the reference count can be accessed, can it also be modified manually? This latter question is mainly theoretical rather than for use practically (unlike the first question above).
You can see how the reference count is managed by inspecting the code in the System unit. Here are the pertinent parts from the XE3 source:
type
PDynArrayRec = ^TDynArrayRec;
TDynArrayRec = packed record
{$IFDEF CPUX64}
_Padding: LongInt; // Make 16 byte align for payload..
{$ENDIF}
RefCnt: LongInt;
Length: NativeInt;
end;
....
procedure _DynArrayAddRef(P: Pointer);
begin
if P <> nil then
AtomicIncrement(PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt);
end;
function _DynArrayRelease(P: Pointer): LongInt;
begin
Result := AtomicDecrement(PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt);
end;
A dynamic array variable holds a pointer. If the array is empty, then the pointer is nil. Otherwise the pointer contains the address of the first element of the array. Immediately before the first element of the array is stored the metadata for the array. The TDynArrayRec type describes that metadata.
So, if you wish to read the reference count you can use the exact same technique as does the RTL. For instance:
function DynArrayRefCount(P: Pointer): LongInt;
begin
if P <> nil then
Result := PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt
else
Result := 0;
end;
If you want to modify the reference count then you can do so by exposing the functions in System:
procedure DynArrayAddRef(P: Pointer);
asm
JMP System.#DynArrayAddRef
end;
function DynArrayRelease(P: Pointer): LongInt;
asm
JMP System.#DynArrayRelease
end;
Note that the name DynArrayRelease as chosen by the RTL designers is a little mis-leading because it merely reduces the reference count. It does not release memory when the count reaches zero.
I'm not sure why you would want to do this mind you. Bear in mind that once you start modifying the reference count, you have to take full responsibility for getting it right. For instance, this program leaks:
{$APPTYPE CONSOLE}
var
a, b: array of Integer;
type
PDynArrayRec = ^TDynArrayRec;
TDynArrayRec = packed record
{$IFDEF CPUX64}
_Padding: LongInt; // Make 16 byte align for payload..
{$ENDIF}
RefCnt: LongInt;
Length: NativeInt;
end;
function DynArrayRefCount(P: Pointer): LongInt;
begin
if P <> nil then
Result := PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt
else
Result := 0;
end;
procedure DynArrayAddRef(P: Pointer);
asm
JMP System.#DynArrayAddRef
end;
function DynArrayRelease(P: Pointer): LongInt;
asm
JMP System.#DynArrayRelease
end;
begin
ReportMemoryLeaksOnShutdown := True;
SetLength(a, 1);
Writeln(DynArrayRefCount(a));
b := a;
Writeln(DynArrayRefCount(a));
DynArrayAddRef(a);
Writeln(DynArrayRefCount(a));
a := nil;
Writeln(DynArrayRefCount(b));
b := nil;
Writeln(DynArrayRefCount(b));
end.
And if you make a call to DynArrayRelease that takes the reference count to zero then you would also need to dispose of the array, for reasons discussed above. I've never encountered a problem that would require manipulation of the reference count, and strongly suggest that you avoid doing so.
One final point. The RTL does not offer this functionality through its public interface. Which means that all of the above is private implementation detail. And so is subject to change in a future release. If you do attempt to read or modify the reference count then you must recognise that doing so relies on such implementation detail.
After some googling, I found an excellent article by Rudy Velthuis. I highly recommend to read it. Quoting dynamic arrays part from http://rvelthuis.de/articles/articles-pointers.html#dynarrays
At the memory location below the address to which the pointer points, there are two more fields, the number of elements allocated, and the reference count.
If, as in the diagram above, N is the address in the dynamic array variable, then the reference count is at address N-8, and the number of allocated elements (the length indicator) at N-4. The first element is at address N.
How to access these:
SetLength(a1, 100);
a2 := a1;
// Reference Count = 2
refCount := PInteger(NativeUInt(#a1[0]) - SizeOf(NativeInt) - SizeOf(Integer))^;
// Array Length = 100
arrLength := PNativeInt(NativeUInt(#a1[0]) - SizeOf(NativeInt))^;
The trick in computing proper offsets is to account for differences between 32bit and 64bit platforms code. Fields size in bytes is as follows:
32bit 64bit
RefCount 4 4
Length 4 8

Efficient Pointer Handling in Function Parms

As the topic indicates above, I'm wondering if there's a good example of a clean and efficient way to handle pointers as passed in function parms when processing the data sequentially. What I have is something like:
function myfunc(inptr: pointer; inptrsize: longint): boolean;
var
inproc: pointer;
i: integer;
begin
inproc := inptr;
for i := 1 to inptrsize do
begin
// do stuff against byte data here.
inc(longint(inproc), 1);
end;
end;
The idea is that instead of finite pieces of data, I want it to be able to process whatever is pushed its way, no matter the size.
Now when it comes to processing the data, I've figured out a couple of ways to do it successfully.
Assign the parm pointers to identical temporary pointers, then use those to access each piece of data, incrementing them to move on. This method is quickest, but not very clean looking with all the pointer increments spread all over the code. (this is what I'm talking about above)
Assign the parm pointers to a pointer representing a big array value and then incremently process that using standard table logic. Much cleaner, but about 500 ms slower than #1.
Is there another way to efficiently handle processing pointers in this way, or is there some method I'm missing that will both be clean and not time inefficient?
Your code here is basically fine. I would always choose to increment a pointer than cast to a fake array.
But you should not cast to an integer. That is semantically wrong and you'll pay the penalty anytime you compile on a platform that has pointer size different from your integer size. Always use a pointer to an element of the right size. In this case a pointer to byte.
function MyFunc(Data: PByte; Length: Integer): Boolean;
var
i: Integer;
begin
for i := 1 to Length do
begin
// do stuff against byte data here.
inc(Data);
end;
end;
Unless the compiler is having a really bad day, you won't find it easy to get better performing code than this. What's more, I think this style is actually rather clear and easy to understand. Most of the clarity gain comes in avoiding the need to cast. Always strive to remove casts from your code.
If you want to allow any pointer type to be passed then you can write it like this:
function MyFunc(P: Pointer; Length: Integer): Boolean;
var
i: Integer;
Data: PByte;
begin
Data := P;
for i := 1 to Length do
begin
// do stuff against byte data here.
inc(Data);
end;
end;
Or if you want to avoid pointers in the interface, then use an untyped const parameter.
function MyFunc(const Buffer; Length: Integer): Boolean;
var
i: Integer;
Data: PByte;
begin
Data := PByte(#Buffer);
for i := 1 to Length do
begin
// do stuff against byte data here.
inc(Data);
end;
end;
Use a var parameter if you need to modify the buffer.
I have a different opinion: For sake of readability I would use an array. Pascal was not designed to be able to access memory directly. Original pascal did not even have pointer arithmetic.
This is how I would use an array:
function MyFunc(P: Pointer; Length: Integer): Boolean;
var
ArrayPtr : PByteArray Absolute P;
I : Integer;
begin
For I := 0 to Length-1 do
// do stuff against ArrayPtr^[I]
end;
But if performance matters, I would write it like this
function MyFunc(P: Pointer; Length: Integer): Boolean;
var
EndOfMemoryBlock: PByte;
begin
EndOfMemoryBlock := PByte(Int_Ptr(Data)+Length);
While P<EndOfMemoryBlock Do begin
// do stuff against byte data here.
inc(P);
end;
end;

Resources