TArray<Byte> VS TBytes VS PByteArray - delphi

Those 3 types are very similar...
TArray is the generic version of TBytes.
Both can be casted to PByteArray and used as buffer for calls to Windows API. (with the same restrictions as string to Pchar).
What I would like to know: Is this behavior "by design" or "By Implementation". Or more specifically, could it break in future release?
//Edit
As stated lower...
What I really want to know is: Is this as safe to typecast TBytes(or TArray) to PByteArray as it is to typecast String to PChar as far as forward compatibility is concerned. (Or maybe AnsiString to PAnsiChar is a better exemple ^_^)

Simply put, an array of bytes is an array of bytes, and as long as the definitions of a byte and an array don't change, this won't change either. You're safe to use it that way, as long as you make sure to respect the array bounds, since casting it out of Delphi's array types nullifies your bounds checking.
EDIT: I think I see what you're asking a bit better now.
No, you shouldn't cast a dynamic array reference to a C-style array pointer. You can get away with it with strings because the compiler helps you out a little.
What you can do, though, is cast a pointer to element 0 of the dynamic array to a C-style array pointer. That will work, and won't change.

Two of those types are similar (identical in fact). The third is not.
TArray is declared as "Array of Byte", as is TBytes. You missed a further very relevant type however, TByteArray (the type referenced by PByteArray).
Being a pointer to TByteArray, PByteArray is strictly speaking a pointer to a static byte array, not a dynamic array (which the other byte array types all are). It is typed in this way in order to allow reference to offsets from that base pointer using an integer index. And note that this indexing is limited to 2^15 elements (0..32767). For arbitrary byte offsets (> 32767) from some base pointer, a PByteArray is no good:
var
b: Byte;
ab: TArray<Byte>;
pba: PByteArray;
begin
SetLength(ab, 100000);
pba := #ab; // << No cast necessary - the compiler knows (magic!)
b := pba[62767]; // << COMPILE ERROR!
end;
i.e. casting an Array of Byte or a TArray to a PByteArray is potentially going to lead to problems where the array has > 32K elements (and the pointer is passed to some code which attempts to access all elements). Casting to an untyped pointer avoids this of course (as long as the "recipient" of the pointer then handles access to the memory reference by the pointer appropriately).
BUT, none of this is likely to change in the future, it is merely a consequence of the implementation details that have long since applied in this area. The introduction of a syntactically sugared generic type declaration is a kipper rouge.

Related

How can pointer to a type be treated as a pointer to an array of that type

So according to this (applies to the same page for XE4 to XE8):
When a pointer holds the address of another variable, we say that it points to the location of that variable in memory or to the data stored there. In the case of an array or other structured type, a pointer holds the address of the first element in the structure.
To me the above sounds exactly like this:
Remark: Free Pascal treats pointers much the same way as C does. This means that a pointer to some type can be treated as being an array of this type.
From this point of view, the pointer then points to the zeroeth element of this array.
FPC example:
program PointerArray;
var i : Longint;
p : ^Longint;
pp : array[0..100] of Longint;
begin
for i := 0 to 100 do pp[i] := i; { Fill array }
p := #pp[0]; { Let p point to pp }
for i := 0 to 100 do
if p[i]<>pp[i] then
WriteLn (’Ohoh, problem !’)
end.
The example obviously doesn't compile and complains about p - Array type required. I have never seen such shananigans in delphi, but I am very confused by the info from the embarcadero wiki. The wiki itself gives no examples of such use.
Could someone explain to me what is exactly meant by the wiki? And if it is in fact similar to the FPC/C in any way, could someone provide a working example?
In Delphi you have to enable a special compiler mode to be able to treat a pointer as if it were a pointer to an array. That mode is enabled with the POINTERMATH directive like this:
{$POINTERMATH ON}
From the documentation:
Pointer math is simply treating any given typed pointer in some narrow
instances as a scaled ordinal where you can perform simple arithmetic
operations directly on the pointer variable. It also allows you to
treat such a pointer variable as an unbounded array using the array []
operator. Notice in the example above that incrementing the index of
an array of a type is equivalent to incrementing a pointer to that
type. An increment of one bumps the pointer by the size an array
element in bytes, not by one byte.
The POINTERMATH directive has a local scope. That is, if you turn this
directive ON and do not turn it off in the module, it will remain on
only until the end of the module. Also, if you declare a typed pointer
with this directive ON, any variable of that type allows scaled
pointer arithmetic and array indexing, even after the directive has
been turned OFF Similarly, any block of code surrounded by this
directive allows arithmetic operations for ANY typed pointers within
the block regardless of whether the typed pointer was originally
declared with POINTERMATH ON.
This directive affects only typed pointers. Variables of type Pointer
do not allow the pointer math features, since type Pointer is
effectively pointing to a void element, which is 0 bytes in size.
Untyped var or const parameters are not affected because they are not
really pointers.

Displaying the result of mmioRead

After locating the data chunk using mmioDescend, then how i suppose to read and display the sample data into for example into a memo in delphi 7?
I have follow the step like open the file, locating the riff, locating the fmt, locating data chunk.
if (mmioDescend(HMMIO, #ckiData, #ckiRIFF, MMIO_FINDCHUNK) = MMSYSERR_NOERROR) then
SetLength(buf, ckiData.cksize);
mmioRead(HMMIO, PAnsiChar(buf), ckiData.cksize);
I use mmioRead too but i don't know how to display the data.Can anyone help give an example how to use the mmioRead and then display the result?
Well, I'd probably read into a buffer that was declared using a more appropriate type.
For example, suppose your data are 16 bit integers, Smallint in Delphi. Then declare a dynamic array of Smallint.
var
buf: array of Smallint;
Then allocate enough space for the data:
Assert(ckiData.cksize mod SizeOf(buf[0])=0);
SetLength(buf, ckiData.cksize div SizeOf(buf[0]));
And then read the buffer:
mmioRead(HMMIO, PAnsiChar(buf), ckiData.cksize);
Now you can access the elements as Smallint values.
If you have different element types, then you can adjust your array declaration. If you don't know until runtime what the element type is you may be better off with array of Byte and then using pointer arithmetic and casting to access the actual content.
I'd say that the design of the interface to mmioRead is a little weak. The buffer isn't really a string. It's probably best considered as a byte array. But perhaps because C does not have separate byte and character types, the function is declared as taking a pointer to char array. Really the Delphi translation would be better exposing a pointer to byte or even better in my view, a plain untyped Pointer type.
I assumed that you were struggling with interpreting the output of mmioRead since that was the code that you included in the question. But, according to now deleted comments, your question is a GUI question.
You want to add content to a memo. Do it like this:
Memo1.Clear;
for i := low(buf) to high(buf) do
Memo1.Items.Add(IntToStr(buf[i]));
If you want to convert to floating point then, still assuming 16 bit signed data, do this:
Memo1.Clear;
for i := low(buf) to high(buf) do
Memo1.Items.Add(FormatFloat('0.00000', buf[i]/32768.0));//show 5dp

Why use string[1] rather than string while using readbuffer

I am having a record like this
TEmf_SrectchDIBits = packed record
rEMF_STRETCHDI_BITS: TEMRStretchDIBits;
rBitmapInfo: TBitmapInfo;
ImageSource: string;
end;
---
---
RecordData: TEmf_SrectchDIBits;
If i am reading data into it by using TStream like this an exception is occuring
SetLength(RecordData.ImageSource, pRecordSize);
EMFStream.ReadBuffer(RecordData.ImageSource,pRecordSize)
But if i use below code, it was working normally
SetLength(RecordData.ImageSource, pRecordSize);
EMFStream.ReadBuffer(RecordData.ImageSource[1], pRecordSize);
So what is the difference between using String and String[1]
The difference is a detail related to the signature of the .ReadBuffer method.
The signature is:
procedure ReadBuffer(var Buffer; Count: Longint);
As you can see, the Buffer parameter does not have a type. In this case, you're saying that you want access to the underlying variable.
However, a string is two parts, a pointer (the content of the variable) and the string (the variable points to this).
So, if ReadBuffer were given just the string variable, it would have 4 bytes to store data into, the string variable, and that would not work out too well since the string variable is supposed to hold a pointer, not just any random binary data. If ReadBuffer wrote more than 4 bytes, it would overwrite something else in memory with new data, a potentially disastrous action to do.
By passing the [1] character to a var parameter, you're giving ReadBuffer access to the data that the string variable points to, which is what you want. You want to change the string content after all.
Also, make sure you've set up the length of the string variable to be big enough to hold whatever you're reading into it.
Also, final note, one that I cannot verify. In older Delphi versions, a string variable contained 1-byte characters. In newer, I think they're two, due to unicode, so that code might not work as expected in newer versions of Delphi. You probably would like to use a byte array or heap memory instead.
String types are implemented actually as pointers to something we could call a "string descriptor block". Basically, you have a level of indirection.
That block contains some string control data (reference count, length, and in later versions character set info as well) at negative offsets, and the string characters at positive ones. A string variable is a pointer to the decription block (and if you print SizeOf(stringvar) you get 4), when you work on strings the compiler knows where to find the string data and handle them. But when using an untyped parameter (var Buffer;), the compiler does not know that, it will simply access the memory at "Buffer", but with a string variable that's the pointer to the string block, not the actual string characters. Using string[1] you pass the location of the first character data.

Delphi Unicode String Type Stored Directly at its Address (or "Unicode ShortString")

I want a string type that is Unicode and that stores the string directly at the adress of the variable, as is the case of the (Ansi-only) ShortString type.
I mean, if I declare a S: ShortString and let S := 'My String', then, at #S, I will find the length of the string (as one byte, so the string cannot contain more than 255 characters) followed by the ANSI-encoded string itself.
What I would like is a Unicode variant of this. That is, I want a string type such that, at #S, I will find a unsigned 32-bit integer (or a single byte would be enough, actually) containing the length of the string in bytes (or in characters, which is half the number of bytes) followed by the Unicode representation of the string. I have tried WideString, UnicodeString, and RawByteString, but they all appear only to store an adress at #S, and the actual string somewhere else (I guess this has do do with reference counting and such). Update: The most important reason for this is probably that it would be very problematic if sizeof(string) were variable.
I suspect that there is no built-in type to use, and that I have to come up with my own way of storing text the way I want (which actually is fun). Am I right?
Update
I will, among other things, need to use these strings in packed records. I also need manually to read/write these strings to files/the heap. I could live with fixed-size strings, such as <= 128 characters, and I could redesign the problem so it will work with null-terminated strings. But PChar will not work, for sizeof(PChar) = 1 - it's merely an address.
The approach I eventually settled for was to use a static array of bytes. I will post my implementation as a solution later today.
You're right. There is no exact analogue to ShortString that holds Unicode characters. There are lots of things that come close, including WideString, UnicodeString, and arrays of WideChar, but if you're not willing to revisit the way you intend to use the data type (make byte-for-byte copies in memory and in files while still being using them in all the contexts a string could be allowed), then none of Delphi's built-in types will work for you.
WideString fails because you insist that the string's length must exist at the address of the string variable, but WideString is a reference type; the only thing at its address is another address. Its length happens to be at the address held by the variable, minus four. That's subject to change, though, because all operations on that type are supposed to go through the API.
UnicodeString fails for that same reason, as well as because it's a reference-counted type; making a byte-for-byte copy of one breaks the reference counting, so you'll get memory leaks, invalid-pointer-operation exceptions, or more subtle heap corruption.
An array of WideChar can be copied without problems, but it doesn't keep track of its effective length, and it also doesn't act like a string very often. You can assign string literals to it and it will act like you called StrLCopy, but you can't assign string variables to it.
You could define a record that has a field for the length and another field for a character array. That would resolve the length issue, but it would still have all the rest of the shortcomings of an undecorated array.
If I were you, I'd simply use a built-in string type. Then I'd write functions to help transfer it between files, blocks of memory, and native variables. It's not that hard; probably much easier than trying to get operator overloading to work just right with a custom record type. Consider how much code you will write to load and store your data versus how much code you're going to write that uses your data structure like an ordinary string. You're going to write the data-persistence code once, but for the rest of the project's lifetime, you're going to be using those strings, and you're going to want them to look and act just like real strings. So use real strings. "Suffer" the inconvenience of manually producing the on-disk format you want, and gain the advantage of being able to use all the existing string library functions.
PChar should work like this, right? AFAIK, it's an array of chars stored right where you put it. Zero terminated, not sure how that works with Unicode Chars.
You actually have this in some way with the new unicode strings.
s as a pointer points to s[1] and the 4 bytes on the left contains the length.
But why not simply use Length(s)?
And for direct reading of the length from memory:
procedure TForm9.Button1Click(Sender: TObject);
var
s: string;
begin
s := 'hlkk ljhk jhto';
{$POINTERMATH ON}
Assert(Length(s) = (PInteger(s)-1)^);
//if you don't want POINTERMATH, replace by PInteger(Cardinal(s)-SizeOf(Integer))^
showmessage(IntToStr(length(s)));
end;
There's no Unicode version of ShortString. If you want to store unicode data inline inside an object instead of as a reference type, you can allocate a buffer:
var
buffer = array[0..255] of WideChar;
This has two disadvantages. 1, the size is fixed, and 2, the compiler doesn't recognize it as a string type.
The main problem here is #1: The fixed size. If you're going to declare an array inside of a larger object or record, the compiler needs to know how large it is in order to calculate the size of the object or record itself. For ShortString this wasn't a big problem, since they could only go up to 256 bytes (1/4 of a K) total, which isn't all that much. But if you want to use long strings that are addressed by a 32-bit integer, that makes the max size 4 GB. You can't put that inside of an object!
This, not the reference counting, is why long strings are implemented as reference types, whose inline size is always a constant sizeof(pointer). Then the compiler can put the string data inside a dynamic array and resize it to fit the current needs.
Why do you need to put something like this into a packed array? If I were to guess, I'd say this probably has something to do with serialization. If so, you're better off using a TStream and a normal Unicode string, and writing an integer (size) to the stream, and then the contents of the string. That turns out to be a lot more flexible than trying to stuff everything into a packed array.
The solution I eventually settled for is this (real-world sample - the string is, of course, the third member called "Ident"):
TASStructMemHeader = packed record
TotalSize: cardinal;
MemType: TASStructMemType;
Ident: packed array[0..63] of WideChar;
DataSize: cardinal;
procedure SetIdent(const AIdent: string);
function ReadIdent: string;
end;
where
function TASStructMemHeader.ReadIdent: string;
begin
result := WideCharLenToString(PWideChar(#(Ident[0])), length(Ident));
end;
procedure TASStructMemHeader.SetIdent(const AIdent: string);
var
i: Integer;
begin
if length(AIdent) > 63 then
raise Exception.Create('Too long structure identifier.');
FillChar(Ident[0], length(Ident) * sizeof(WideChar), 0);
Move(AIdent[1], Ident[0], length(AIdent) * sizeof(WideChar));
end;
But then I realized that the compiler really can interpret array[0..63] of WideChar as a string, so I could simply write
var
MyStr: string;
Ident := 'This is a sample string.';
MyStr := Ident;
Hence, after all, the answer given by Mason Wheeler above is actually the answer.

Are there any optimisations for retrieving of return value in Delphi?

I'm trying to find an elegant way to access the fields of some objects in some other part of my program through the use of a record that stores a byte and accesses fields of another record through the use of functions with the same name as the record's fields.
TAilmentP = Record // actually a number but acts like a pointer
private
Ordinal: Byte;
public
function Name: String; inline;
function Description: String; inline;
class operator Implicit (const Number: Byte): TAilmentP; inline;
End;
TSkill = Class
Name: String;
Power: Word;
Ailment: TAilmentP;
End;
class operator TAilmentP.Implicit (const Number: Byte): TAilmentP;
begin
Result.Ordinal := Number;
ShowMessage (IntToStr (Integer (#Result))); // for release builds
end;
function StrToAilment (const S: String): TAilmentP; // inside same unit
var i: Byte;
begin
for i := 0 to Length (Ailments) - 1 do
if Ailments [i].Name = S then
begin
ShowMessage (IntToStr (Integer (#Result))); // for release builds
Result := i; // uses the Implicit operator
Exit;
end;
raise Exception.Create ('"' + S + '" is not a valid Ailment"');
end;
Now, I was trying to make my life easier by overloading the conversion operator so that when I try to assign a byte to a TAilmentP object, it assigns that to the Ordinal field.
However, as I've checked, it seems that this attempt is actually costly in terms of performance since any call to the implicit "operator" will create a new TAilmentP object for the return value, do its business, and then return the value and make a byte-wise copy back into the object that called it, as the addresses differ.
My code calls this method quite a lot, to be honest, and it seems like this is slower than just assigning my value directly to the Ordinal field of my object.
Is there any way to make my program actually assign the value directly to my field through the use of ANY method/function? Even inlining doesn't seem to work. Is there a way to return a reference to a (record) variable, rather than an object itself?
Finally (and sorry for being off topic a bit), why is operator overloading done through static functions? Wouldn't making them instance methods make it faster since you can access object fields without dereferencing them? This would really come in handy here and other parts of my code.
[EDIT] This is the assembler code for the Implicit operator with all optimizations on and no debugging features (not even "Debug Information" for breakpoints).
add al, [eax] /* function entry */
push ecx
mov [esp], al /* copies Byte parameter to memory */
mov eax, [esp] /* copies stored Byte back to register; function exit */
pop edx
ret
What's even funnier is that the next function has a mov eax, eax instruction at start-up. Now that looks really useful. :P Oh yeah, and my Implicit operator didn't get inlined either.
I'm pretty much convinced [esp] is the Result variable, as it has a different address than what I'm assigning to. With optimizations off, [esp] is replaced with [ebp-$01] (what I assigning to) and [ebp-$02] (the Byte parameter), one more instruction is added to move [ebp-$02] into AL (which then puts it in [ebp-$01]) and the redundant mov instruction is still there with [epb-$02].
Am I doing something wrong, or does Delphi not have return-value optimizations?
Return types — even records — that will fit in a register are returned via a register. It's only larger types that are internally transformed into "out" parameters that get passed to the function by reference.
The size of your record is 1. Making a copy of your record is just as fast as making a copy of an ordinary Byte.
The code you've added for observing the addresses of your Result variables is actually hurting the optimizer. If you don't ask for the address of the variable, then the compiler is not required to allocate any memory for it. The variable could exist only in a register. When you ask for the address, the compiler needs to allocate stack memory so that it has an address to give you.
Get rid of your "release mode" code and instead observe the compiler's work in the CPU window. You should be able to observe how your record exists primarily in registers. The Implicit operator might even compile down to a no-op since the input and output registers should both be EAX.
Whether operators are instance methods or static doesn't make much difference, especially not in terms of performance. Instance methods still receive a reference to the instance they're called on. It's just a matter of whether the reference has a name you choose or whether it's called Self and passed implicitly. Although you wouldn't have to write "Self." in front of your field accesses, the Self variable still needs to get dereferenced just like the parameters of a static operator method.
All I'll say about optimizations in other languages is that you should look up the term named return-value optimization, or its abbreviation NRVO. It's been covered on Stack Overflow before. It has nothing to do with inlining.
Delphi is supposed to optimize return assignment by using pointers. This is also true for C++ and other OOP compiled languages. I stopped writing Pascal before operator overloading was introduced, so my knowledge is a bit dated. What follows is what I would try:
What I'm thinking is this... can you create an object on the heap (use New) and pass a pointer back from your "Implicit" method? This should avoid unnecessary overhead, but will cause you to deal with the return value as a pointer. Overload your methods to deal with pointer types?
I'm not sure if you can do it this with the built-in operator overloading. Like I mentioned, overloading is something I wanted in Pascal for nearly a decade and never got to play with. I think it's worth a shot. You might need to accept that you'll must kill your dreams of elegant type casting.
There are some caveats with inlining. You probably already know that the hint is disabled (by default) for debug builds. You need to be in release mode to profile / benchmark or modify your build settings. If you haven't gone into release mode (or altered build settings) yet, it's likely that your inline hints are being ignored.
Be sure to use const to hint the compiler to optimize further. Even if it doesn't work for your case, it's a great practice to get into. Marking what should not change will prevent all kinds of disasters... and additionally give the compiler the opportunity to aggressively optimize.
Man, I wish I know if Delphi allowed cross-unit inlining by now, but I simply don't. Many C++ compilers only inline within the same source code file, unless you put the code in the header (headers have no correlate in Pascal). It's worth a search or two. Try to make inlined functions / methods local to their callers, if you can. It'll at least help compile time, if not more.
All out of ideas. Hopefully, this meandering helps.
Now that I think about it, maybe it's absolutely necessary to have the return value in a different memory space and copied back into the one being assigned to.
I'm thinking of the cases where the return value may need to be de-allocated, like for example calling a function that accepts a TAilmentP parameter with a Byte value... I don't think you can directly assign to the function's parameters since it hasn't been created yet, and fixing that would break the normal and established way of generating function calls in assembler (i.e.: trying to access a parameter's fields before it's created is a no-no, so you have to create that parameter before that, then assign to it what you have to assign OUTSIDE a constructor and then call the function in assembler).
This is especially true and obvious for the other operators (with which you could evaluate expressions and thus need to create temporary objects), just not so much this one since you'd think it's like the assignment operator in other languages (like in C++, which can be an instance member), but it's actually much more than that - it's a constructor as well.
For example
procedure ShowAilmentName (Ailment: TAilmentP);
begin
ShowMessage (Ailment.Name);
end;
[...]
begin
ShowAilmentName (5);
end.
Yes, the implicit operator can do that too, which is quite cool. :D
In this case, I'm thinking that 5, like any other Byte, would be converted into a TAilmentP (as in creating a new TAilmentP object based on that Byte) given the implicit operator, the object then being copied byte-wise into the Ailment parameter, then the function body is entered, does it's job and on return the temporary TAilmentP object obtained from conversion is destroyed.
This is even more obvious if Ailment would be const, since it would have to be a reference, and constant one too (no modifying after the function was called).
In C++, the assignment operator would have no business with function calls. Instead, one could've used a constructor for TAilmentP which accepts a Byte parameter. The same can be done in Delphi, and I suspect it would take precedence over the implicit operator, however what C++ doesn't support but Delphi does is to have down-conversion to primitive types (Byte, Integer, etc.) since the operators are overloaded using class operators. Thus, a procedure like "procedure ShowAilmentName (Number: Byte);" would never be able to accept a call like "ShowAilmentName (SomeAilment)" in C++, but in Delphi it can.
So, I guess this is a side-effect of the Implicit operator also being like a constructor, and this is necessary since records can not have prototypes (thus you could not convert both one way and the other between two records by just using constructors).
Anyone else think this might be the cause?

Resources