Delphi reserved words and identifiers - delphi

Declaring variables in Delphi brought me to consider a thing that I can't understand.
The question is this: declaring strings, one can observe that string is a reserved word, while declaring other data types, say integers, the data type qualifier is not a reserved word but an identifier (i.e. Integer, the capital I tells so).
In fact, Delphi lets you go to the definition of Integer, which you discover it is contained within the System unit, but it is only representative, because there is a comment stating that some constants (like True), identifiers (like Integer), functions and procedures are directly built into the compiler.
I can't figure out the reasons behind this choice.
Could someone help?
A little explanation of the difference between string and Integer types. The next code
type
Integer = Char;
var
I: Integer;
begin
I:= 'A';
ShowMessage(I);
end;
is correct and works as expected, while the next line
type
string = Integer;
gives compile-time error.

As far i know string is a reserved word since the Turbo Pascal times. So the reason to keep it in this way must be for compatibility.
Pascal -> Turbo Pascal - > Object Pascal -> Delphi.
Check these resources.
The Pascal Programming Language (this shows the original reserved word of Pascal, without string)
Turbo_Pascal Version 6.0 Programmers Guide (this shows how the string is a reserved word)

string must be a reserved word, because it is not exclusively used to refer to the type System.[Ansi|Unicode]String. If string were a simple alias for some internal compiler type, then string[20] would no longer work. This is not a problem for Integer, because Integer always means nothing more than "the type System.Integer".

Related

OPerand mismatch converting from D6 to RS10

I took a break from porting code, and now I'm spending some more time on it again.
Problem is, I guess i'm still stuck backwards in my head (everything works fine on D6 :D).
Can anyone tell me why this simple code is not working?
if NewSig <> NewCompressionSignature then
E2015 Operator not applicable to this operand type
Here are the definitions of the above:
NewCompressionSignature: TCompressionSignature = 'DRM$IG01';
NewSig: array[0..SizeOf(NewCompressionSignature)-1] of Char;
I'm just guessing here because the type of TCompressionSignature is not given, but I can reproduce ERROR2015 if TCompressionSignature is declared as some kind of ShortString like
type
TCompressionSignature = String[8]
As you might know, Delphi is currently using Unicode as its standard internal string encoding. For backward compatibility reasons, the type ShortString and other short string types (like String[8]) were left unchanged. These strings have the same encoding like AnsiString and are composed of standard plain old 1-byte characters (AnsiChar).
NewSig on the other hand is composed of two-byte Unicode characters and can not be compared directly with an ShortString.
One solution of your problem would be to declare:
NewSig: array[0..SizeOf(NewCompressionSignature)-1] of AnsiChar;
Another solution would be be a cast to string:
if NewSig <> String(NewCompressionSignature) then ...
But I would prefer to change the array declaration if possible.
Please review the documentation short strings and about unicode - especially if you're doing io operations to ensure your input and output is read and written with the correct codepage.

Why AnsiSameText is not ANSI?

One would believe, looking at the name, that AnsiSameText defined in SysUtils (Delphi XE) will receive ANSI strings as parameters but the function is defined like this:
function AnsiSameText(const S1, S2: string): Boolean
What am I missing here?
There is an ANSI function in AnsiStrings unit, but still why is this one (in Sysutils) called 'ansi'?
In older versions of Delphi, pre-Unicode, there were two sets of string comparison functions:
SameText, CompareText, etc. These performed comparisons that ignore locale.
AnsiSameText, AnsiCompareText, etc. These performed comparisons that took locale into account.
When Unicode was introduced, these functions, which operate on string, now operate on UTF-16 data. For the sake of backwards compatibility, they retain the same names, and behave in the same way. That is SameText does not account for locale, but AnsiSameText does.
So, whilst the names are misleading, the Ansi prefix simply indicates that the function is locale aware. For what it is worth, in my view the Ansi prefix is poor even in pre-Unicode Delphi.
The reason that locale is important is that different locales have different rules for letter ordering.

The difference between one and many "type" blocks in Delphi

I've been programming Delphi for five or six years now and I consider myself fairly good at it, but I stumbled across a behavior recently which I couldn't really explain. I was writing a simple linked list, let's call it a TIntegerList. The example code below compiles correctly:
type
PIntegerValue = ^TIntegerValue;
TIntegerValue = record
Value: Integer;
Next: PIntegerValue;
Prev: PIntegerValue;
end;
However, the code below does not (saying TIntegerValue is undeclared):
type
PIntegerValue = ^TIntegerValue;
type
TIntegerValue = record
Value: Integer;
Next: PIntegerValue;
Prev: PIntegerValue;
end;
How exactly is the "type" keyword handled in Delphi? What is the syntactical meaning of having several types declared under one "type" keyword, compared to having one "type" per type? Alright, that was confusing, but I hope the code example helps explain what I mean. I am working in Delphi 2007.
Logically there's no need to use the type keyword when the code is already part of an existing type declaration section. So,
type
TRec1 = record
end;
TRec2 = record
end;
produces types that are indistinguishable from
type
TRec1 = record
end;
type
TRec2 = record
end;
However, as you have discovered, the compiler has a limitation that requires all forward declarations to be fully resolved before the end of the section where the forward declaration was introduced.
There's no particular reason that it has to be that way. It would be perfectly possible for the compiler to relax that limitation. One can only assume that a compiler implementation detail, probably originating a very long time ago, has leaked into the language specification.
This pure standard Pascal. Since Pascal compilers are usually one-pass and there is no forward declaration for types, this feature was defined in the original Pascal by N. Wirth to allow such 'recursive' types for e.g. linked lists etc.
I have already given a comment but it is to be peer-reviewed. This is rather in effective. Therefore here is another answer. This may be merged or what ever you want.
In "Algorithms + Data Structures = Programs" Wirth gives an example in "Program 4.1 Straight List Insertion"
type
ref = ^word;
word = record
key: integer;
count: integer;
next: ref
end;

Delphi String / Array of Strings

I have an old programm which was programmed in Delphi 1 (or 2, I'm not sure) and I want to build a 64-bit version of it (I use the Delphi XE2). Now the problem is that in the source code there are on the one hand strings and on the other arrays of strings (I guess to limit the string length).
Now there are a lot of errors while compiling because of incompatible types.
Above all there are procedures which should handle both types.
Is there an easy way to solve this problem (without changing every variable)?
Short answer
Search and replace : string => : ansistring
make sure you use length(astring) and setLength(astring) instead of manipulating string[0].
Long answer
Delphi 1 has only one type of string.
The old-skool ShortString that has a maximum length of 255 chars and a declared maximum length.
It looks and feels like an array of char, but it has a leading length byte.
var
ShortString: string[100];
In Delphi 2 longstrings (aka AnsiString) were introduced, these replace the shortstring. They do not have a fixed length, but are allocated dynamically instead and automatically grow and shrink as needed.
They are automatically created and destroyed.
var
Longstring: string; //AnsiString, can have any length up to 2GB.
In Delphi 2009 Unicode was introduced.
This changes the longstring because now each char no langer takes up 1 byte, but takes 2 bytes(*).
Additionally you can specify a character set to an AnsiString, whereas the new Unicode longstring uses UTF-16.
What you need to do depends on your needs:
If you just want the old code to work as before and you don't care about supporting all the multilingual stuff Unicode supports, you will need to replace all your string keywords with AnsiString (for all strings that are longstrings).
If you have Delphi 1 code, you can rename the string to ShortString.
I would recommend that you refactor the code to always use longstrings (read: AnsiString) though.
Delphi will automatically translate the UnicodeStrings that all return values of functions (Unicode string) are translated into AnsiStrings and visa versa, however this may include loss of data if your users enter symbols in a editbox that your AnsiString cannot store.
Also all that translation takes a bit of time (I doubt you will notice this though).
In Delphi 1 up to Delphi 2007 this problem did not exist, because controls did not allow Unicode characters to be entered.
(*) gross oversimplification

Delphi Unicode String Type Stored Directly at its Address (or "Unicode ShortString")

I want a string type that is Unicode and that stores the string directly at the adress of the variable, as is the case of the (Ansi-only) ShortString type.
I mean, if I declare a S: ShortString and let S := 'My String', then, at #S, I will find the length of the string (as one byte, so the string cannot contain more than 255 characters) followed by the ANSI-encoded string itself.
What I would like is a Unicode variant of this. That is, I want a string type such that, at #S, I will find a unsigned 32-bit integer (or a single byte would be enough, actually) containing the length of the string in bytes (or in characters, which is half the number of bytes) followed by the Unicode representation of the string. I have tried WideString, UnicodeString, and RawByteString, but they all appear only to store an adress at #S, and the actual string somewhere else (I guess this has do do with reference counting and such). Update: The most important reason for this is probably that it would be very problematic if sizeof(string) were variable.
I suspect that there is no built-in type to use, and that I have to come up with my own way of storing text the way I want (which actually is fun). Am I right?
Update
I will, among other things, need to use these strings in packed records. I also need manually to read/write these strings to files/the heap. I could live with fixed-size strings, such as <= 128 characters, and I could redesign the problem so it will work with null-terminated strings. But PChar will not work, for sizeof(PChar) = 1 - it's merely an address.
The approach I eventually settled for was to use a static array of bytes. I will post my implementation as a solution later today.
You're right. There is no exact analogue to ShortString that holds Unicode characters. There are lots of things that come close, including WideString, UnicodeString, and arrays of WideChar, but if you're not willing to revisit the way you intend to use the data type (make byte-for-byte copies in memory and in files while still being using them in all the contexts a string could be allowed), then none of Delphi's built-in types will work for you.
WideString fails because you insist that the string's length must exist at the address of the string variable, but WideString is a reference type; the only thing at its address is another address. Its length happens to be at the address held by the variable, minus four. That's subject to change, though, because all operations on that type are supposed to go through the API.
UnicodeString fails for that same reason, as well as because it's a reference-counted type; making a byte-for-byte copy of one breaks the reference counting, so you'll get memory leaks, invalid-pointer-operation exceptions, or more subtle heap corruption.
An array of WideChar can be copied without problems, but it doesn't keep track of its effective length, and it also doesn't act like a string very often. You can assign string literals to it and it will act like you called StrLCopy, but you can't assign string variables to it.
You could define a record that has a field for the length and another field for a character array. That would resolve the length issue, but it would still have all the rest of the shortcomings of an undecorated array.
If I were you, I'd simply use a built-in string type. Then I'd write functions to help transfer it between files, blocks of memory, and native variables. It's not that hard; probably much easier than trying to get operator overloading to work just right with a custom record type. Consider how much code you will write to load and store your data versus how much code you're going to write that uses your data structure like an ordinary string. You're going to write the data-persistence code once, but for the rest of the project's lifetime, you're going to be using those strings, and you're going to want them to look and act just like real strings. So use real strings. "Suffer" the inconvenience of manually producing the on-disk format you want, and gain the advantage of being able to use all the existing string library functions.
PChar should work like this, right? AFAIK, it's an array of chars stored right where you put it. Zero terminated, not sure how that works with Unicode Chars.
You actually have this in some way with the new unicode strings.
s as a pointer points to s[1] and the 4 bytes on the left contains the length.
But why not simply use Length(s)?
And for direct reading of the length from memory:
procedure TForm9.Button1Click(Sender: TObject);
var
s: string;
begin
s := 'hlkk ljhk jhto';
{$POINTERMATH ON}
Assert(Length(s) = (PInteger(s)-1)^);
//if you don't want POINTERMATH, replace by PInteger(Cardinal(s)-SizeOf(Integer))^
showmessage(IntToStr(length(s)));
end;
There's no Unicode version of ShortString. If you want to store unicode data inline inside an object instead of as a reference type, you can allocate a buffer:
var
buffer = array[0..255] of WideChar;
This has two disadvantages. 1, the size is fixed, and 2, the compiler doesn't recognize it as a string type.
The main problem here is #1: The fixed size. If you're going to declare an array inside of a larger object or record, the compiler needs to know how large it is in order to calculate the size of the object or record itself. For ShortString this wasn't a big problem, since they could only go up to 256 bytes (1/4 of a K) total, which isn't all that much. But if you want to use long strings that are addressed by a 32-bit integer, that makes the max size 4 GB. You can't put that inside of an object!
This, not the reference counting, is why long strings are implemented as reference types, whose inline size is always a constant sizeof(pointer). Then the compiler can put the string data inside a dynamic array and resize it to fit the current needs.
Why do you need to put something like this into a packed array? If I were to guess, I'd say this probably has something to do with serialization. If so, you're better off using a TStream and a normal Unicode string, and writing an integer (size) to the stream, and then the contents of the string. That turns out to be a lot more flexible than trying to stuff everything into a packed array.
The solution I eventually settled for is this (real-world sample - the string is, of course, the third member called "Ident"):
TASStructMemHeader = packed record
TotalSize: cardinal;
MemType: TASStructMemType;
Ident: packed array[0..63] of WideChar;
DataSize: cardinal;
procedure SetIdent(const AIdent: string);
function ReadIdent: string;
end;
where
function TASStructMemHeader.ReadIdent: string;
begin
result := WideCharLenToString(PWideChar(#(Ident[0])), length(Ident));
end;
procedure TASStructMemHeader.SetIdent(const AIdent: string);
var
i: Integer;
begin
if length(AIdent) > 63 then
raise Exception.Create('Too long structure identifier.');
FillChar(Ident[0], length(Ident) * sizeof(WideChar), 0);
Move(AIdent[1], Ident[0], length(AIdent) * sizeof(WideChar));
end;
But then I realized that the compiler really can interpret array[0..63] of WideChar as a string, so I could simply write
var
MyStr: string;
Ident := 'This is a sample string.';
MyStr := Ident;
Hence, after all, the answer given by Mason Wheeler above is actually the answer.

Resources