Delphi String / Array of Strings - delphi

I have an old programm which was programmed in Delphi 1 (or 2, I'm not sure) and I want to build a 64-bit version of it (I use the Delphi XE2). Now the problem is that in the source code there are on the one hand strings and on the other arrays of strings (I guess to limit the string length).
Now there are a lot of errors while compiling because of incompatible types.
Above all there are procedures which should handle both types.
Is there an easy way to solve this problem (without changing every variable)?

Short answer
Search and replace : string => : ansistring
make sure you use length(astring) and setLength(astring) instead of manipulating string[0].
Long answer
Delphi 1 has only one type of string.
The old-skool ShortString that has a maximum length of 255 chars and a declared maximum length.
It looks and feels like an array of char, but it has a leading length byte.
var
ShortString: string[100];
In Delphi 2 longstrings (aka AnsiString) were introduced, these replace the shortstring. They do not have a fixed length, but are allocated dynamically instead and automatically grow and shrink as needed.
They are automatically created and destroyed.
var
Longstring: string; //AnsiString, can have any length up to 2GB.
In Delphi 2009 Unicode was introduced.
This changes the longstring because now each char no langer takes up 1 byte, but takes 2 bytes(*).
Additionally you can specify a character set to an AnsiString, whereas the new Unicode longstring uses UTF-16.
What you need to do depends on your needs:
If you just want the old code to work as before and you don't care about supporting all the multilingual stuff Unicode supports, you will need to replace all your string keywords with AnsiString (for all strings that are longstrings).
If you have Delphi 1 code, you can rename the string to ShortString.
I would recommend that you refactor the code to always use longstrings (read: AnsiString) though.
Delphi will automatically translate the UnicodeStrings that all return values of functions (Unicode string) are translated into AnsiStrings and visa versa, however this may include loss of data if your users enter symbols in a editbox that your AnsiString cannot store.
Also all that translation takes a bit of time (I doubt you will notice this though).
In Delphi 1 up to Delphi 2007 this problem did not exist, because controls did not allow Unicode characters to be entered.
(*) gross oversimplification

Related

Delphi XE - should I use String or AnsiString?

I finally upgraded to Delphi XE. I have a library of units where I use strings to store plain ANSI characters (chars between A and U). I am 101% sure that I will never ever use UNICODE characters in those places.
I want to convert all other libraries to Unicode, but for this specific library I think it will be better to stick with ANSI. The advantage is the memory requirement as in some cases I load very large TXT files (containing ONLY Ansi characters). The disadvantage might be that I have to do lots and lots of typecasts when I make those libraries to interact with normal (unicode) libraries.
There are some general guidelines to show when is good to convert to Unicode and when to stick with Ansi?
The problem with general guidelines is that something like this can be very specific to a person's situation. Your example here is one of those.
However, for people Googling and arriving here, some general guidelines are:
Yes, convert to Unicode. Don't try to keep an old app fully using AnsiStrings. The reason is that the whole VCL is Unicode, and you shouldn't try to mix the two, because you will convert every time you assign a Unicode string to an ANSI string, and that is a lossy conversion. Trying to keep the old way because it's less work (or some similar reason) will cause you pain; just embrace the new string type, convert, and go with it.
Instead of randomly mixing the two, explicitly perform any conversions you need to, once - for example, if you're loading data from an old version of your program you know it will be ANSI, so read it into a Unicode string there, and that's it. Ever after, it will be Unicode.
You should not need to change the type of your string variables - string pre-D2009 is ANSI, and in D2009 and alter is Unicode. Instead, follow compiler warnings and watch which string methods you use - some still take an AnsiString parameter and I find it all confusing. The compiler will tell you.
If you use strings to hold bytes (in other words, using them as an array of bytes because a character was a byte) switch to TBytes.
You may encounter specific problems for things like encryption (strings are no longer byte/characters, so 'character' for 'character' you may get different output); reading text files (use the stream classes and TEncoding); and, frankly, miscellaneous stuff. Search here on SO, most things have been asked before.
Commenters, please add more suggestions... I mostly use C++Builder, not Delphi, and there are probably quite a few specific things for Delphi I don't know about.
Now for your specific question: should you convert this library?
If:
The values between A and U are truly only ever in this range, and
These values represent characters (A really is A, not byte value 65 - if so, use TBytes), and
You load large text files and memory is a problem
then not converting to Unicode, and instead switching your strings to AnsiStrings, makes sense.
Be aware that:
There is an overhead every time you convert from ANSI to Unicode
You could use UTF8String, which is a specific type of AnsiString that will not be lossy when converted, and will still store most text (Roman characters) in a single byte
Changing all the instances of string to AnsiString could be a bit of work, and you will need to check all the methods called with them to see if too many implicit conversions are being performed (for performance), etc
You may need to change the outer layer of your library to use Unicode so that conversion code or ANSI/Unicode compiler warnings are not visible to users of your library
If you convert to Unicode, sets of characters (can't remember the syntax, maybe if 'S' in MySet?) won't work. From your description of characters A to U, I could guess you would like to use this syntax.
My recommendation? Personally, the only reason I would do this from the information you've given is the memory use, and possibly performance depending on what you're doing with this huge amount of A..Us. If that truly is significant, it's both the driver and the constraint, and you should convert to ANSI.
You should be able to wrap up the conversion at the interface between this unit and its clients. Use AnsiString internally and string everywhere else and you should be fine.
In general only use AnsiString if it is important that the Chars are single bytes, Otherwise the use of string ensures future compatibility with Unicode.
You need to check all libraries anyway because all Windows API functions in Delhpi XE replaced by their unicode-analogues, etc. If you will never use UNICODE you need to use Delphi 7.
Use AnsiString explicitly everywhere in this unit and then you'll get compiler warning errors (which you should never ignore) for String to AnsiString conversion errors if you happen to access the routines incorrectly.
Alternately, perhaps preferably depending on your situation, simply convert everything to UTF8.
Stick with Ansi strings ONLY if you do not have the time to convert the code properly. The use of Ansi strings is really only for backward compatibility - to my knowledge C# does not have an equiavalent to Ansi strings. Otherwise use the standard Unicode strings. If you have a look on my web-site I have a whole strings routines unit (about 5,000 LOC) that works with both Delphi 2007 (non-Uniocde) and XE (Unicode) with only "string" interfaces and contains almost all of the conversion issues you might face.

What is the difference between WideChar and AnsiChar?

I'm upgrading some ancient (from 2003) Delphi code to Delphi Architect XE and I'm running into a few problems. I am getting a number of errors where there are incompatible types. These errors don't happen in Delphi 6 so I must assume that this is because things have been upgraded.
I honestly don't know what the difference between PAnsiChar and PWideChar is, but Delphi sure knows the difference and won't let me compile. If I knew what the differences were maybe I could figure out which to use or how to fix this.
The short: prior to Delphi 2009 the native string type in Delphi used to be ANSI CHAR: Each char in every string was represented as an 8 bit char. Starting with Delphi 2009 Delphi's strings became UNICODE, using the UTF-16 notation: Now the basic Char uses 16 bits of data (2 bytes), and you probably don't need to know much about the Unicode code points that are represented as two consecutive 16 bits chars.
The 8 bit chars are called "Ansi Chars". An PAnsiChar is a pointer to 8 bit chars.
The 16 bit chars are called "Wide Chars". An PWideChar is a pointer to 16 bit chars.
Delphi knows the difference and does well if it doesn't allow you to mix the two!
More info
Here's a popular link on Unicode: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
You can find some more information on migrating Delphi to Unicode here: New White Paper: Delphi Unicode Migration for Mere Mortals
You may also search SO for "Delphi Unicode migration".
A couple years ago, the default character type in Delphi was changed from AnsiChar (single-byte variable representing an ANSI character) to WideChar (two-byte variable representing a UTF16 character.) The char type is now an alias to WideChar instead of AnsiChar, the string type is now an alias to UnicodeString (a UTF-16 Unicode version of Delphi's traditional string type) instead of AnsiString, and the PChar type is now an alias to PWideChar instead of PAnsiChar.
The compiler can take care of a lot of the conversions itself, but there are a few issues:
If you're using string-pointer types, such as PChar, you need to make sure your pointer is pointing to the right type of data, and the compiler can't always verify this.
If you're passing strings to var parameters, the variable type needs to be exactly the same. This can be more complicated now that you've got two string types to deal with.
If you're using string as a convenient byte-array buffer for holding arbitrary data instead of a variable that holds text, that won't work as a UnicodeString. Make sure those are declared as RawByteString as a workaround.
Anyplace you're dealing with string byte lengths, for example when reading or writing to/from a TStream, make sure your code isn't assuming that a char is one byte long.
Take a look at Delphi Unicode Migration for Mere Mortals for some more tricks and advice on how to get this to work. It's not as hard as it sounds, but it's not trivial either. Good luck!

FormatDateTime with chinese location - wrong characters... Delphi 2007

Output: Period: from 11-Ê®¶þÔÂ-10 to 13-Ê®¶þÔÂ-10
The above output is from a line like this:
FormatDateTime('dd-mmm-yy', dateValue)
The IDE is Delphi 2007 and we are trying to gear up our app to the Chinese market.
How can I display the correct characters?
With the setting turn to Hindi (India), instead of the funny characters I have the "?".
I'm trying to display the date on a report, using ReportBuilder 11.
Any help will be much appreciated.
The characters seem to be correct, only IMO they have been rendered wrong.
Here's what I've done:
copied the string as presented by the OP ("11-Ê®¶þÔÂ-10 to 13-Ê®¶þÔÂ-10");
pasted it into a blank plain-text editor window with CP 1252 (Windows Latin-1) and saved;
opened the text file in a browser;
the text showed up the same as the browser chose the same codepage, so I turned on the automatic detection of character encoding, hinting it that the contents was Chinese;
the text changed to "11-十二月-10 to 13-十二月-10" (hope your browser displays correct Chinese characters here, my does anyway) and the codepage changed to GB18030 (and I then tried GB2312, but the text wouldn't change);
well, I was curious and searched for "十二月", and it turned out to stand for "December", quite suitable for the context unless the month names had been mixed up.
So, this is why I think it's a text rendering (or whatever you call it, I'm not really sure about the term) problem.
EDIT: Of course, it must have had something to do with the data type chosen for storing the string. If the function result is AnsiString and the variable is WideString, then maybe the characters get converted as WideChars and so they are no longer one-byte compounds of multi-byte characters but are multi-byte characters on their own? At least that's what happened when the OP posted them here.
I don't know actually, but if it is so then I doubt if they can be rendered correctly unless converted back and rendered as part of an AnsiString.
Another solution is to use TntControls. They're a set of standard Delphi controls enhanced to support Unicode. You'll have to go through all your form files and replace
Button1: TButton
Label1: TLabel
with TTntButton, TTntLabel et cetera.
Please note, that as things stand, it's not only Chinese which will not work. Try any language using symbols other than standard European set (latin + stress marks etc), for instance Russian.
But
By replacing the controls, you'll solve one part of the problem. Another part is that everywhere where you use "string" or "AnsiString" and "char/pchar" or "AnsiChar/PAnsiChar", you can store only strings in default system encoding.
For instance, if your system encoding ("Language for non-unicode programs") is EN/US, Russian characters will be replaced with question marks when you assign them to "string" variable:
a: WideString;
b: string;
...
a := 'ЯУЭФЫЦ'; //WideString can store international characters
b := a; //string cannot, so the data is lost - you cannot restore it from just "b"
To store string data which is independent of system encoding, use WideString/WideChar/PWideChar and appropriate functions. If you have
a, b: WideString;
...
a := UpperCase(b);
then unicode information will still be lost because UpperCase() accepts "string":
function UpperCase(const S: string): string;
Your WideString will be converted to "string" (losing all international characters), given to UpperCase, then the result will be converted back to WideString but it's already too late.
Therefore you have to replace all string functions with Wide versions:
a := WideUpperCase(b);
(for some functions, their wide versions are unavailable or called differently, TntControls also contain a bunch of wide function versions)
The Chinese Market requires support for multi-byte character sets (either WideChar or Unicode).
The Delphi 2007 RTL/VCL only supports single-byte character sets (there is very limited support for WideChar in the RTL and VCL).
The easiest for you is to upgrade to a Delphi version that supports Unicode (Delphi 2009 was the first version that supports Unicode, the current Delphi vesion is Delphi XE).
Or you will need to update all your components to support WideChar, and rewrite the portions of RTL/VCL for which you need WideChar support.
--jeroen
Did you install Far East charset support in Windows? In Windows pre 7 (or Vista) those charset are not installed by default in Western versions, you have to add them in Control Panel -> Regional Settins, IIRC
Using a non-Unicode version of Delphi unluckily what character can be displayed depends on the current codepage. If it is not one of the Chinese ones, for example, it could not display the characters you need. What characters are actually displayed depends on how the codes you're using are mapped in the current codepage. You could use a multi-lingual version of Windows to switch fully to the locale you need, or you have to use a Unicode version of Delphi (from 2009 onwards).

Why does Delphi warn when assigning ShortString to string?

I'm converting some legacy code to Delphi 2010.
There are a fair number of old ShortStrings, like string[25]
Why does the assignment below:
type
S: String;
ShortS: String[25];
...
S := ShortS;
cause the compiler to generate this warning:
W1057 Implicit string cast from 'ShortString' to 'string'.
There's no data loss that is occurring here. In what circumstances would this warning be helpful information to me?
Thanks!
Tomw
It's because your code is implicitly converting a single-byte character string to a UnicodeString. It's warning you in case you might have overlooked it, since that can cause problems if you do it by mistake.
To make it go away, use an explicit conversion:
S := string(ShortS);
The ShortString type has not changed. It continues to be, in effect, an array of AnsiChar.
By assigning it to a string type, you are taking what is a group of AnsiChars (one byte) and putting it into a group of WideChars (two bytes). The compiler can do that just fine, and is smart enough not to lose data, but the warning is there to let you know that such a conversion has taken place.
The warning is very important because you may lose data. The conversion is done using the current Windows 8-bit character set, and some character sets do not define all values between 0 and 255, or are multi-byte character sets, and thus cannot convert all byte values.
The data loss can occur on a standard computer in a country with specific standard character sets, or on a computer in USA that has been set up for a different locale, because the user communicates a lot with people in other languages.
For instance, if the local code page is 932, the byte values 129 and 130 will both convert to the same value in the Unicode string.
In addition to this, the conversion involves a Windows API call, which is an expensive operation. If you do a lot of these, it can slow down your application.
It's safe ( as long as you're using the ShortString for its intended purpose: to hold a string of characters and not a collection of bytes, some of which may be 0 ), but may have performance implications if you do it a lot. As far as I know, Delphi has to allocate memory for the new unicode string, extract the characters from the ShortString into a null-terminated string (that's why it's important that it's a properly-formed string) and then call something like the Windows API MultiByteToWideChar() function. Not rocket science, but not a trivial operation either.
ShortStrings don't have a code page associated with them, AnsiStrings do (since D2009).
The conversion from ShortString to UnicodeString can only be done on the assumption that ShortStrings are encoded in the default ANSI encoding which is not a safe assumption.
I don't really know Delphi, but if I remember correctly, the Shortstrings are essentially a sequence of characters on the stack, whereas a regular string (AnsiString) is actually a reference to a location on the heap. This may have different implications.
Here's a good article on the different string types:
http://www.codexterity.com/delphistrings.htm
I think there might also be a difference in terms of encoding but I'm not 100% sure.

Delphi 2009 + Unicode + Char-size

I just got Delphi 2009 and have previously read some articles about modifications that might be necessary because of the switch to Unicode strings.
Mostly, it is mentioned that sizeof(char) is not guaranteed to be 1 anymore.
But why would this be interesting regarding string manipulation?
For example, if I use an AnsiString:='Test' and do the same with a String (which is unicode now), then I get Length() = 4 which is correct for both cases.
Without having tested it, I'm sure all other string manipulation functions behave the same way and decide internally if the argument is a unicode string or anything else.
Why would the actual size of a char be of interest for me if I do string manipulations?
(Of course if I use strings as strings and not to store any other data)
Thanks for any help!
Holger
With Unicode SizeOf(SomeChar) <> Length(SomeChar). Essentially the length of a string is less then the sum of the size of its chars. As long as you don't assume SizeOf(Char) = 1, or SizeOf(SomeString[x]) = 1 (since both are FALSE now) or try to interchange bytes with chars, then you shouldn't have any trouble. Any place you are doing something creative stuffing Bytes into Chars or Strings, then you will need to use AnsiString.
(SizeOf(SomeString) is still 4 no matter the length since it is essentially a pointer with some compiler magic.)
People often implicitly convert from characters to bytes in old Delphi code without really thinking about it. For example, when writing to a stream. When you write a string to a stream, you have to specify the number of bytes you write, but people often pass the character count instead. See this post from Chris Bensen for another example.
Another way people often make this implicit conversion and older code is by using a "string" to store binary data. In this case, they actually want bytes, but the data type expects characters. D2009 has a better type for this.
I didn't try Delphi 2009, but are using fpc which is also switching to unicode slowly. I'm 95% sure that everything below also holds for Delphi 2009
In fpc (when supporting unicode) it will be so that functions like 'length' take the codepage into consideration. Thus it will return the length of the string as a 'human' would see it. If there are - for example - two chinese characters, that both take two bytes of memory in unicode, length will return 2, since there are two characters in the string. But the string will take 4 bytes of memory. (+the memory for the reference count and the leading #0, but that aside)
What you can not do anymore is this:
var p : pchar;
begin
p := s[1];
for i := 0 to length(string)-1 do
begin
write(p);
inc(p);
end;
end;
Because this code will - in the two chinese-character example - write the wrong two characters. Namely the two bytes which are part of the first 'real' character.
In short: Length() doesn't return the amount of bytes allocated for the string anymore, but the amount of characters. (Before the switch to unicode, those two values were equal to eachother)
The actual size of a character shouldn't matter, unless you are doing the manipulation at the byte level.
(Of course if I use strings as strings and not to store any other data)
That's the key point, YOU don't use strings for other purposes, but some people do. They use strings just like arrays, so they (and that's including me) would need to check all such uses to make sure nothing is broken...
Lets not forget that there are times when this conversion is not really desired. Say for storing a GUID in a record for instance. The guid can only contain hexadecimal characters plus the - and brackets...making them take up twice the space can make quite an impact on existing code. Sure the simple solution is to change them to AnsiString, and deal with the compiler warnings if you do any string manipulation on them.
It can be an issue if you make Windows API calls. Or if you have legacy code that does inc or dec of str[0] to change its length.

Resources