Char and Chr in Delphi

Char and Chr in Delphi - delphi

The difference between Chr and Char when used in converting types is that one is a function and the other is cast
So: Char(66) = Chr(66)
I don't think there is any performance difference (at least I've never noticed any, one probably calls the other).... I'm fairly sure someone will correct me on this!
EDIT Thanks to Ulrich for the test proving they are in fact identical.
EDIT 2 Can anyone think of a case where they might not be identical, e.g. you are pushed towards using one over the other due to the context?
Which do you use in your code and why?

I did a small test in D2007:
program CharChr;
{$APPTYPE CONSOLE}
uses
Windows;
function GetSomeByte: Byte;
begin
Result := Random(26) + 65;
end;
procedure DoTests;
var
b: Byte;
c: Char;
begin
b := GetSomeByte;
IsCharAlpha(Chr(b));
b := GetSomeByte;
IsCharAlpha(Char(b));
b := GetSomeByte;
c := Chr(b);
b := GetSomeByte;
c := Char(b);
end;
begin
Randomize;
DoTests;
end.
Both calls produce the same assembly code:
CharChr.dpr.19: IsCharAlpha(Chr(b));
00403AE0 8A45FF mov al,[ebp-$01]
00403AE3 50 push eax
00403AE4 E86FFFFFFF call IsCharAlpha
CharChr.dpr.21: IsCharAlpha(Char(b));
00403AF1 8A45FF mov al,[ebp-$01]
00403AF4 50 push eax
00403AF5 E85EFFFFFF call IsCharAlpha
CharChr.dpr.24: c := Chr(b);
00403B02 8A45FF mov al,[ebp-$01]
00403B05 8845FE mov [ebp-$02],al
CharChr.dpr.26: c := Char(b);
00403B10 8A45FF mov al,[ebp-$01]
00403B13 8845FE mov [ebp-$02],al
Edit: Modified sample to mitigate Nick's concerns.
Edit 2: Nick's wish is my command. ;-)

The help says: Chr returns the character with the ordinal value (ASCII value) of the byte-type expression, X. *
So, how is a character represented in a computer's memory? Guess what, as a byte*. Actually the Chr and Ord functions are only there for Pascal being a strictly typed language prohibiting the use of bytes* where characters are requested. For the computer the resulting char is still represented as byte* - to what shall it convert then? Actually there is no code emitted for this function call, just as there is no code omitted for a type cast. Ergo: no difference.
You may prefer chr just to avoid a type cast.
Note: type casts shall not be confused with explicit type conversions! In Delphi 2010 writing something like Char(a) while a is an AnsiChar, will actually do something.
**For Unicode please replace byte with integer*
Edit:
Just an example to make it clear (assuming non-Unicode):
var
a: Byte;
c: char;
b: Byte;
begin
a := 60;
c := Chr(60);
c := Chr(a);
b := a;
end;
produces similar code
ftest.pas.46: a := 60;
0045836D C645FB3C mov byte ptr [ebp-$05],$3c
ftest.pas.47: c := Chr(60);
00458371 C645FA3C mov byte ptr [ebp-$06],$3c
ftest.pas.48: c := Chr(a);
00458375 8A45FB mov al,[ebp-$05]
00458378 8845FA mov [ebp-$06],al
ftest.pas.49: b := a;
0045837B 8A45FB mov al,[ebp-$05]
0045837E 8845F9 mov [ebp-$07],al
Assigning byte to byte is actually the same as assigning byte to char via CHR().

chr is a function, thus it returns a new value of type char.
char(x) is a cast, that means the actual x object is used but as a different type.
Many system functions, like inc, dec, chr, ord, are inlined.
Both char and chr are fast. Use the one that is most appropriate each time,
and reflects better what you want to do.

Chr is function call, it is a bit (tiny-tiny) more expensive then type cast. But i think Chr is inlined by compiler.

They are identical, but they don't have to be identical. There's no requirement that the internal representation of characters map 1-to-1 with their ordinal values. Nothing says that a Char variable holding the value 'A' must hold the numeric value 65. The requirement is that when you call Ord on that variable, the result must be 65 because that's the code point designated for the letter A in your program's character encoding.
Of course, the easiest implementation of that requirement is for the variable to hold the numeric value 65 as well. Because of this, the function calls and the type-casts are always identical.
If the implementation were different, then when you called Chr(65), the compiler would go look up what character is at code point 65 and use it as the result. When you write Char(65), the compiler wouldn't worry about what character it really represents, as long as the numeric result stored in memory was 65.
Is this splitting hairs? Yes, absolutely, because in all current implementations, they're identical. I liken this to the issue of whether the null pointer is necessarily zero. It's not, but under all implementations, it ends up that way anyway.

chr is typesafe, char isn't: Try to code chr(256) and you'll get a compiler error. Try to code char(256) and you will either get the character with the ordinal value 0 or 1, depending on your computers internal representation of integers.
I'll suffix the above by saying that that applies to pre-unicode Delphi. I don't know if chr and char have been updated to take unicode into account.

Related

I want to create a Fibonacci sequence using a for loop, but the integers are not adding up

procedure TForm1.Button1Click(Sender: TObject);
var
term1: integer;
term2: integer;
term3: integer;
j: integer;
begin
term1 := (0);
term2 := (1);
for j := 1 to 100 do;
begin
term3 :=( term1 + term2);
Memo1.Text:=inttostr(term3);
term1 := term2;
term2 := term3;
end;
end;
end.
This is what I have so far, but term1 and term2 don't want to add up. I have tried some different things, but for some reason the integers never want to add up.

There are several problems with your code
The semicolon after for j := 1 to 100 do prevents your next code that is withing begin..end block to be run in a loop. Why? The code that is to be run in each cycle of for loop is the one that follows the do until the first semicolon. Since you put semicolon just after the do this basically means that empty block of code is ran in a loop. Your begin..end block comes after that. Removing the semicolon after do will fix that.
You are using Memo1.Text:=inttostr(term3); to write the result into Memo. The problem with this is that this will rewrite entire text of the Memo every time so you will end up with only one line showing the last number. You should use Memo1.Lines.Add(inttostr(term3)); instead so that new line is added each time.
Lastly you are using Integer type for your variables. Since numbers in Fibonacci sequence grows very fast you will quickly exceed the maximum value that can be stored in Integer which in Delphi is Signed 32 bit Integer with a max value of 2147483647. You will have to use bigger integer types like 64 bit Integer type and since you are only dealing with positive numbers you should therefore use Unsigned 64 bit Integer that in declared in Delphi by UInt64 type. You can read more about Delphi default Integer types in documentation. Unfortunately not even UInt64 will is big enough for value of all first 100 numbers of Fibonacci sequence. So you will have to use one of the BigIntegers libraries for Delphi to do this properly. There are several of them available on internet.

You have an erroneous ; on your loop that you need to remove:
for j := 1 to 100 do;
^

Asm equivalent to a delphi procedure

I have a simple delphi function called SetCompare that compares two singles and if they are not equal then one value is set to the other.
procedure SetCompare( A : single; B : single );
begin
if( A <> B ) then
A := B;
end;
I am trying to convert this into asm as such:
procedure SetCompare( A : Single; B : Single ); register;
begin
asm
mov EAX,A
mov ECX,B
cmp EAX,ECX
jne SetValue
#SetValue:
mov EAX,ECX
end;
end;
Will this work?

Will this work?
No this will not work, because floating point comparison is not the same as binary comparison. For instance 0 and -0 have different bit patterns, but compare as equal. Similarly, NaN compares unequal to all values, including a NaN with the same bit pattern.
The simplest way to work out how to write your code is to get the compiler to compile the Pascal code, and inspect the generated assembly code.
Some asides:
Your function is pointless anyway, because it returns no value and has no side effects.
If performance matters enough to write assembler, then you should write pure assembler functions, rather than inline asm blocks in a Pascal function. Which in any case is not supported by the x64 compiler.
Your arguments are already in registers, so it makes little sense to copy them around to other registers. For x86 code, A arrives in EAX, and B arrives in EDX. Given that EAX already contains A, why would you copy it into EAX? It is already there. And B is already in EDX, why copy it to ECX? For x64 code, the two arguments are passed in floating point registers, and can be compared there directly. As soon as your start writing assembler you need to understand the register use of the calling convention.
Your jne is pointless. If execution does not take the jump, then it moves to the next line of code. Which is where you jumped to.

Why does dcc64 say this value is never used?

The code below is from DDetours.pas. When it compiles for 32 bit, there are no warnings emitted. When it compiles for 64-bit, it emits this warning: (Delphi Berlin Update 2)
[dcc64 Hint] DDetours.pas(1019): H2077 Value assigned to 'Prf' never used
Here is the function in question
function GetPrefixesCount(Prefixes: WORD): Byte;
var
Prf: WORD;
i: Byte;
begin
{ Get prefixes count used by the instruction. }
Result := 0;
if Prefixes = 0 then
Exit;
Prf := 0;
i := 0;
Prefixes := Prefixes and not Prf_VEX;
while Prf < $8000 do
begin
Prf := (1 shl i);
if (Prf and Prefixes = Prf) then
Inc(Result);
Inc(i);
end;
end;
It sure looks to me like the very first time Prf is compared against $8000 that initial value is used.

It's a compiler bug. There are a few of this nature. Quite frustrating. Sometimes the 32 bit compiler will complain in an irrational manner and then when you workaround that the 64 bit compiler in turn complains in an irrational manner about your workaround.
I don't think that Embarcadero habitually compiler with hints and warnings enabled, because their library code is full of hints and warnings.
Anyway in this case the compiler sees the two writes to the variable but for some reason does not recognise the intervening read of the variable.
There's not a whole lot that you can do. You could submit a bug report. I expect that you don't want to change the code because it is third party code. If you don't change it then you'll have to put up with the bogus hint.
Notifying the author of the library might allow them to workaround the issue. Perhaps by suppressing hints for that function.

I suspect the 64bit compiler has a small amount of smarts built-in to allow it to recognize that
the variable is initialized and not touched again until the loop is entered.
the loop condition is comparing the variable to a literal.
the initial value of 0 satisfies the loop condition, thus ensuring the loop will always run at least once at runtime, effectively making it act like a repeat..until loop.
Since the compiler still has to generate code for the initialization, but knows the initial value is not needed at runtime to enter the loop, it can issue a warning that the initial value will be unused.
If you don't initialize the variable, the compiler doesn't know at compile-time whether the loop will be entered or not since the behavior is undefined, so the compiler issues a different warning about the variable being uninitialized.
Viewing the disassembly, you can see that the loop is turned into a repeat..until loop and the Prf := 0; assignment is removed by optimization:
Project87.dpr.17: Result := 0;
00000000004261AA 4833D2 xor rdx,rdx
Project87.dpr.18: if Prefixes = 0 then
00000000004261AD 6685C0 test ax,ax
00000000004261B0 7460 jz GetPrefixesCount + $72
Project87.dpr.22: i := 0;
00000000004261B2 4D33C0 xor r8,r8
Project87.dpr.23: Prefixes := Prefixes and not Prf_VEX;
00000000004261B5 0FB7C0 movzx eax,ax
00000000004261B8 81E02DFBFFFF and eax,$fffffb2d
00000000004261BE 81F8FFFF0000 cmp eax,$0000ffff
00000000004261C4 7605 jbe GetPrefixesCount + $2B
00000000004261C6 E8050FFEFF call #BoundErr
Project87.dpr.26: Prf := (1 shl i);
00000000004261CB 41C7C101000000 mov r9d,$00000001
00000000004261D2 418BC8 mov ecx,r8d
00000000004261D5 41D3E1 shl r9d,r9b
00000000004261D8 4489C9 mov ecx,r9d
00000000004261DB 81F9FFFF0000 cmp ecx,$0000ffff
00000000004261E1 7605 jbe GetPrefixesCount + $48
00000000004261E3 E8E80EFEFF call #BoundErr
Project87.dpr.27: if (Prf and Prefixes = Prf) then
00000000004261E8 448BC9 mov r9d,ecx
00000000004261EB 664423C8 and r9w,ax
00000000004261EF 66443BC9 cmp r9w,cx
00000000004261F3 750A jnz GetPrefixesCount + $5F
Project87.dpr.28: Inc(Result);
00000000004261F5 80C201 add dl,$01
00000000004261F8 7305 jnb GetPrefixesCount + $5F
00000000004261FA E8F10EFEFF call #IntOver
Project87.dpr.29: Inc(i);
00000000004261FF 4180C001 add r8b,$01
0000000000426203 7305 jnb GetPrefixesCount + $6A
0000000000426205 E8E60EFEFF call #IntOver
Project87.dpr.24: while Prf < $8000 do
000000000042620A 6681F90080 cmp cx,$8000
000000000042620F 72BA jb GetPrefixesCount + $2B

well, I guess the answer is that dcc64 is a buggy compiler when it comes to messages. Because if you comment out the offending line, "value never used" becomes "might not have been initialized." Same compiler.
[dcc64 Warning] DDetours.pas(1022): W1036 Variable 'Prf' might not have been initialized

Undocumented Members of TPropInfo

System.TypInfo.TPropInfo has two function members (at least in D-XE3):
function NameFld: TTypeInfoFieldAccessor; inline;
function Tail: PPropInfo; inline;
I cannot find any documentation for them or any examples of their use. What are they for and how can they be used? (Hope that qualifies as one question.)

The NameFld function returns the name of a property as a TTypeInfoFieldAccessor.
This allows you to do the following:
MyPropertyName:= MyPropInfo.NameFld.ToString;
if (PropInfoA.NameFld = PropInfoB.NameFld) then begin
writeln('property names are the same');
end;
The TTypeInfoFieldAccessor stores the name of a property in a shortstring internally.
Because the NextGen compiler does not support shortstrings, a PByte type is used.
(I guess the author did not want to litter the source with ifdefs and ripped out the PShortstring references)
The input of Tail is a PByte pointing to length field of the internal shortstring.
Here's the source code for tail.
function TTypeInfoFieldAccessor.Tail: PByte;
begin
Result:=
FData //Start of the shortstring
+ FData^ + //Length of the stringData
+ 1; //Add one for the length byte itself
end;
Because shortstrings are not null terminated, you cannot do a simple "loop until the null char is found" kind of loop.
Therefore a loop from start to tail can employed to transfer the shortstring into a normal string.
Strangely enough in the actual RTL sourcecode the length byte is used everywhere instead of the tail function; so it looks like a leftover.
It would have made more sense to include a size function and rip out the tail.

Is it safe to modify the content of a string variable through a pointer?

Consider I have a procedure with Str parameter passed by reference, and I want to modify content of the given variable through the procedure, e.g.
procedure Replace(var Str: string);
var
PStr: PChar;
i: Integer;
begin
PStr := #Str[1];
for i := 1 to Length(Str) do begin
PStr^ := 'x';
Inc(PStr);
end;
end;
Is it an acceptable pointer usage? I'm not sure whether it has a memory leak.
What really happen in PStr := #Str[1], does compiler make a copy of Str internally, or what?
Is this kind of code optimization worth?

Is it an acceptable pointer usage?
You need to make sure that you don't call
PStr := #Str[1];
for an empty string, as that would crash. The easiest way to do that is to replace that line with
PStr := PChar(Str);
so that the compiler will make sure that either a pointer to the first char of the string, or a pointer to #0 is returned. As Ken correctly pointed out in a comment there is no call to UniqueString() in this case, so you would need to do it yourself.
I'm not sure whether it has a memory leak.
No, there is no memory leak. Obtaining a pointer to a string character will call UniqueString() internally, but that will happen for write access to a string character too, so there's nothing special about the character pointer.
What really happen in PStr := #Str[1], does compiler make a copy of Str internally, or what?
No, it just makes sure that the string is unique (so that write access through the pointer does not change the contents of any other string that shares the same data). Afterwards it returns a pointer to that character in the string, which you can then treat as any other PChar variable, pass it to API functions, increment it and so on.
Is this kind of code optimization worth?
It is not only worth it, it is necessary to really achieve good performance for large strings. The reason for this is that the compiler is not smart enough to only call UniqueString() once, but it will insert calls to it for each write access to a character in the string. So if you process a large string character by character you will have a big overhead from all these calls.

Yes, it's safe, as long as you don't go beyond the bounds of the string. The string has metadata attached that tells how long it is, and if you write beyond the length of the string, you won't leak memory, but you could corrupt it.

If Str is passed by reference, why would you need another pointer to the string? Apart from that, there should be no memory leak: PStr is initialized with the adress of the first element of the string and then incremented, so it will always point to one of the characters in your string.
The compile does not make a copy of Str internally. One of the uses for pointers is to avoid making copies. When you say
PStr := #Str[1]
is that PStr will now store the adress of Str[1], that is, the adress of the first char in the string.

I am sure this will work for AnsiString and PAnsiChar, but will it still work for unicode strings in Delphi 2009 and above? I think it should, because both, a char of a string (str[i]) and the char pointed to by PChar, should be 2 bytes in size.
Could somebody with more experience with unicode strings please confirm this?

As in D2010, looks like codegen employs copy-on-write on such construct
Unit9.pas.34: S := 'abcd';
004B32EF 8D45F4 lea eax,[ebp-$0c]
004B32F2 BA98334B00 mov edx,$004b3398
004B32F7 E89C35F5FF call #UStrLAsg
Unit9.pas.35: P := #S[1];
004B32FC 8D45F4 lea eax,[ebp-$0c]
004B32FF E8343FF5FF call #UniqueStringU ; <== here you are
004B3304 8945F0 mov [ebp-$10],eax
Unit9.pas.36: Exit;
004B3307 EB61 jmp $004b336a
by the way, generic referencing P := #S does not emit UniqueString.
As conclusion, i do not recommend to count on codegen's internals and use recommended PChar(S) construct (emits one xStrToPxChar call as overhead)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart