Is there some advantage in use resourcestring instead of a const string? - delphi

Would you tell me if there is some advantage (less sotorage space, increase speed, etc) in using:
resourcestring
MsgErrInvalidInputRange = 'Invalid Message Here!';
instead of
const
MsgErrInvalidInputRange : String = 'Invalid Message Here!';

The const option will be faster than resourcestring, because the later will call the Windows API to get the resource text.
You can make it faster by using some caching mechanism. This is what we do in our Enhanced Delphi RTL.
And it's a good idea to first load the resourcestring into a string, if you'll have to access many times to a resourcestring content.
The main point of resourcestring is to allow i18n (internationalization) of your program.
You've got the Translation Manager with some editions of the Delphi IDE. But it relies on external DLL.
You can use the gettext system, coming from the Linux world, from http://dxgettext.po.dk which relies on external .po files.
We included our own i18n mechanism in our framework, which translates and caches the resourcestring text, and relies on external .txt files (you can use UTF-8 or Unicode text files, from Delphi 6 up to XE). The caching make it quite as fast as the const usage. See http://synopse.info/fossil/finfo?name=SQLite3/SQLite3i18n.pas
There are other open source or commercial solutions around.
About size storage, resourcestring are stored as UC2 buffers. So resourcestring will use more memory than string up to Delphi 2009. Since Delphi 2009, all string are unicodestring i.e. UCS2, so you won't have much more storage size. In all cases, storage size of text is not the bigger size parameter for an application (bitmaps and code size have a much bigger effect to the final exe).

Resource strings are stored as STRINGTABLE entries in your exe resource, consts are stored as part of the fixed data segment. Since they're part of the resource section you can extract them and the DFMs, translate them, and store them in a resource module (data-only DLL). When a Delphi app starts, it looks for that DLL and will use the strings from it instead of the ones included in your EXE to load translations.
The Embarcadero docwiki covers using the Translation Manager, but a lot of other Delphi translation tools use resource strings too.

As others have mentioned, resourcestring strings will be included in a separate resource within your exe, and as such have advantages when you need to cater for multiple languages in the UI of your app.
As some have mentioned as well, const strings are included in the data section of your app.
Up to D2007
In Delphi versions up to D2007, const strings were stored as Ansi strings, requiring a single byte per character, whereas resource strings would be stored in UTF-16: the windows default encoding (though perhaps not for Win9x). IIRC D2007 and prior versions didn't support UTF-8 encoded unit files. So any strings coded in your sources would have to be supported by the ANSI code pages, and as such probably didn't go beyond the Unicode Basic Multilingual Plane. Which means that only the UCS-2 part of UTF-16 would be used and all strings could be stored in two bytes per character.
In short: up to D2007 const strings take a single byte per character, resource strings take two bytes per character.
D2009 and up
Delphi was unicode enabled in version D2009. Since then things are a little different. Resourcestring strings are still stored as UTF-16. No other option here as they are "managed" by Windows.
Consts strings however are a completely different story. Since D2009 Delphi stores multiple versions of each const string in your exe. Each version in a different encoding. Const can be stored as Ansi strings, UTF-8 strings and UTF-16 strings.
Which of these encodings is stored depends on the use of the const. By default UTf-16 will be used, as that is the default internal Delphi encoding. Assign the same const to a "normal" (UTF-16) string, as well as to an AnsiString variable, and the const will be stored in the exe both UTF-16 and Ansi encoded...
De-duping
By the looks of it (experimenting with D5 and D2009), Delphi "de-dupes" const strings, whereas it doesn't do this for resourcestring strings.

With resourcestring, the compiler places those strings as a stringtable resource in the executable, allowing anyone (say your translation team) to edit them with a resource editor without needing to recompile the application, or have access to the source code.

There's also a third options that is:
const
MsgErrInvalidInputRange = 'Invalid Message Here!';
The latter shoud be the more performant one because tell the compiler to not allocate space in the data segment, it could put the string in the code segment. Also remember that what coould be done with typed constants depends on the $WRITEABLECONST directive, although I do not know what the compiler exactly when it is on or off.

Related

Handling of UTF-16 string data type in Delphi XE (best practice)

So much information about Unicode but hard for for me to get a conclusion.
I'm working on an multi-language Delphi XE5 application and now I face this problem with this unicode characters. Honestly I don't want to understand the magic behind, I just want to see them work in my application.
Before it was simple. In general use String data type. Now I've read about WideString, UnicodeString, AnsiString and the fact that String in XE5 is compliant with UTF-16
I've tested with WideString and the lating characters like (șțăîâ) are working, but it's still not clear if WideString is the best one or not. Should I use UnicodeString or else?
So, If I should make a multi-language application that support all languages, in the end, what kind of data type should I use? Is it any possibility to maintain String type and get the same results like WideString?
Remark: I use inside my application FireDac components, but this should not matter.
In modern Delphi "string" is a shortcut to "UnicodeString" real data type. Use it unless have to forced to use other types.
WideString is but Delphi pseudonym for Microsoft OLE BSTR type and lacks reference counting. That disables copy-on-write optimizations and makes those string work slower than UnicodeString in general (the very data buffer is copied time and again instead of just passing new pointer to it). Unless you need exactly those features - better use usual strings. However for i18n both those types work enough.

What determines if a variable of type UnicodeString represents a Unicode string or an ANSI string?

I'm experienced with Delphi but new to Unicode.
The embedded Delphi XE2 documentation about UnicodeString (System.UnicodeString) says:
"Delphi utilizes several string types. UnicodeString can contain both Unicode and ANSI strings.
Support for this type includes the following features:
Strings as large as available memory.
Efficient use of memory through shared references.
Routines and operators that evaluate strings based on the current locale.
Despite its name, UnicodeString can represent both ANSI character set strings and Unicode strings. "
I don't understand what is meant by the word "can." ("It can contain both Unicode and ANSI." ... "Despite its name, UnicodeString can represent both ANSI character set strings and Unicode strings.")
My question: what determines if a variable of type UnicodeString represents a Unicode string or an ANSI string?
The documentation is outdated. UnicodeString in XE2 can only contain Unicode data.
In CB2009 and D2009, when UnicodeString was first introduced, there were cases, mostly in C++<->Delphi interactions, where the RTL allowed Ansi data to be stored in a UnicodeString and Unicode data to be stored in an AnsiString to help users migrate legacy Ansi code to Unicode. UnicodeString and AnsiString do have a unified internal structure, and the Delphi compiler had a {$STRINGCHECKS} directive that would detect any discrepancies and perform silent data conversions when needed. Although it did work, it also had subtle side effects if you were not careful with it.
By the time XE was released, Embarcadero figured users had had enough time to migrate, so the {$STRINGCHECKS} directive and supporting RTL functionality was removed. UnicodeString and AnsiString still have a unified internal structure, so it is technically possible to store Ansi data in a UnicodeString and Unicode in an AnsiString, but you would have to directly manipulate memory to do it manually, the compiler/RTL will not do it in "normal" code, and will not perform silent conversions anymore when discrepancies exist, so data corruption and/or crashes can occur if you are not careful.

Delphi XE - should I use String or AnsiString?

I finally upgraded to Delphi XE. I have a library of units where I use strings to store plain ANSI characters (chars between A and U). I am 101% sure that I will never ever use UNICODE characters in those places.
I want to convert all other libraries to Unicode, but for this specific library I think it will be better to stick with ANSI. The advantage is the memory requirement as in some cases I load very large TXT files (containing ONLY Ansi characters). The disadvantage might be that I have to do lots and lots of typecasts when I make those libraries to interact with normal (unicode) libraries.
There are some general guidelines to show when is good to convert to Unicode and when to stick with Ansi?
The problem with general guidelines is that something like this can be very specific to a person's situation. Your example here is one of those.
However, for people Googling and arriving here, some general guidelines are:
Yes, convert to Unicode. Don't try to keep an old app fully using AnsiStrings. The reason is that the whole VCL is Unicode, and you shouldn't try to mix the two, because you will convert every time you assign a Unicode string to an ANSI string, and that is a lossy conversion. Trying to keep the old way because it's less work (or some similar reason) will cause you pain; just embrace the new string type, convert, and go with it.
Instead of randomly mixing the two, explicitly perform any conversions you need to, once - for example, if you're loading data from an old version of your program you know it will be ANSI, so read it into a Unicode string there, and that's it. Ever after, it will be Unicode.
You should not need to change the type of your string variables - string pre-D2009 is ANSI, and in D2009 and alter is Unicode. Instead, follow compiler warnings and watch which string methods you use - some still take an AnsiString parameter and I find it all confusing. The compiler will tell you.
If you use strings to hold bytes (in other words, using them as an array of bytes because a character was a byte) switch to TBytes.
You may encounter specific problems for things like encryption (strings are no longer byte/characters, so 'character' for 'character' you may get different output); reading text files (use the stream classes and TEncoding); and, frankly, miscellaneous stuff. Search here on SO, most things have been asked before.
Commenters, please add more suggestions... I mostly use C++Builder, not Delphi, and there are probably quite a few specific things for Delphi I don't know about.
Now for your specific question: should you convert this library?
If:
The values between A and U are truly only ever in this range, and
These values represent characters (A really is A, not byte value 65 - if so, use TBytes), and
You load large text files and memory is a problem
then not converting to Unicode, and instead switching your strings to AnsiStrings, makes sense.
Be aware that:
There is an overhead every time you convert from ANSI to Unicode
You could use UTF8String, which is a specific type of AnsiString that will not be lossy when converted, and will still store most text (Roman characters) in a single byte
Changing all the instances of string to AnsiString could be a bit of work, and you will need to check all the methods called with them to see if too many implicit conversions are being performed (for performance), etc
You may need to change the outer layer of your library to use Unicode so that conversion code or ANSI/Unicode compiler warnings are not visible to users of your library
If you convert to Unicode, sets of characters (can't remember the syntax, maybe if 'S' in MySet?) won't work. From your description of characters A to U, I could guess you would like to use this syntax.
My recommendation? Personally, the only reason I would do this from the information you've given is the memory use, and possibly performance depending on what you're doing with this huge amount of A..Us. If that truly is significant, it's both the driver and the constraint, and you should convert to ANSI.
You should be able to wrap up the conversion at the interface between this unit and its clients. Use AnsiString internally and string everywhere else and you should be fine.
In general only use AnsiString if it is important that the Chars are single bytes, Otherwise the use of string ensures future compatibility with Unicode.
You need to check all libraries anyway because all Windows API functions in Delhpi XE replaced by their unicode-analogues, etc. If you will never use UNICODE you need to use Delphi 7.
Use AnsiString explicitly everywhere in this unit and then you'll get compiler warning errors (which you should never ignore) for String to AnsiString conversion errors if you happen to access the routines incorrectly.
Alternately, perhaps preferably depending on your situation, simply convert everything to UTF8.
Stick with Ansi strings ONLY if you do not have the time to convert the code properly. The use of Ansi strings is really only for backward compatibility - to my knowledge C# does not have an equiavalent to Ansi strings. Otherwise use the standard Unicode strings. If you have a look on my web-site I have a whole strings routines unit (about 5,000 LOC) that works with both Delphi 2007 (non-Uniocde) and XE (Unicode) with only "string" interfaces and contains almost all of the conversion issues you might face.

ReadLn working with WideString (utf-8 files)

I use delphi 7.
I need to read a utf-8 file line by line, each line contain a word and its weight (a number)
So I need to read every next line, then divide a line by a separator (tab char) and save this in memory.
So,
1) is there a library to work with utf-8 files in Delphi (3-rd party maybe)
2) will functions operate ok with widestring? I use PosEx. So, if they won't, can you also give a link to 3-rd party library to work with widestrings?
If it is really UTF-8 that you are dealing with, then you should not need anything special as far as reading and processing them. You should be able to treat them as pchar or even as a normal Delphi 7 string. If you try to show the contents in some kind of message box, then you may need to do some conversions. For example, I don't believe the Delphi 7 message box method would display UTF-8 strings correctly if the string contained any byte values over 127 (0x7f). For something like that, you would need to convert to UTF-16 and call the Windows API MessageBoxW or something similar. Otherwise, though, UTF-8 strings can be treated in many situations the same as single byte ANSI strings.
I don't think UTF-8 is typically referred to as "widestring". I might be wrong, but I think that typically means UTF-16.
If your file is encoded as UTF-8, and the characters you're looking for are ASCII, then there's no need to use WideString at all. ASCII is a subset of UTF-8, and any ASCII character is guaranteed not to interfere with the special encoding used for other characters in UTF-8. The number characters 0 through 9 and the tab character are all ASCII.
The JCL comes with various functions and classes for dealing with Unicode, if you find you really need to use them.
If most of your input is UTF-8, it might be worthwhile to change your codepage on startup from the "default" to utf8 (codepage 65001). This will make all ansistring->widestring conversions effectively become a lossless utf-8->utf-16.
With D7, you will need a set of so called "unicode" components, components that base themselves on the winapi -W functions. Delphi's own components only do this with the watershed D2009 release that switches the default string type to UTF-16.
If you want to heavily invest in Unicode support, upgrading might be a smart thing to do
WideString is an UTF-16 implementation (a COM BSTR compatible one), it can't store UTF-8 strings, if you assign an 8 bit string it will be converted to UTF-16. But unless you use explicitly the proper conversion function, Delphi will interpret the 8 bit string using the current codepage.
An UTF-8 string can be stored in a Delphi AnsiString (the default string type in Delphi 7), but string manipulation functions are designed for ANSI codepages, not UTF-8. The difference is that UTF-8 is a multi byte character set. But the first 127 ANSI characters, more than one byte is needed to encode a given "character", while many ANSI codepages (especially those for European languages) only require one byte, encoding only 255 "characters" (while UTF-8 can encode the whole Unicode set).
If you're just looking for the tab character AFAIK you could use simply an AnsiString, but you have to ensure that any byte above $80 you may need to look for is not part of a multibyte sequence. If you have more complex processing needs, it may be easier to find libraries working on UTF-16 strings than UTF-8. As Rob Kennedy said, JCL is a good starting point as a free library implementing UTF string manipulation.
You could simply read the file as-is into a normal TStringList via its LoadFrom...() methods, then loop through the list as needed. If loading the entire file into memory at one time is not an option, then you can open the file using a TFileStream and then use the TStreamReader.ReadLine() method to read the stream line-by-line.
If you need to decode a given UTF-8 sequence to UTF-16 for processing, then I would suggest using the Win32 API MultiByteToWideChar() function directly, only because the RTL's UTF8Decode() function has a broken UTF-8 implementation in older Delphi versions (not sure about D7, but it definately does in D6).
The nice thing about either loading approach is that they are both encoding-aware in D2009 and later, which means that if you ever upgrade, you can make a couple of very small code changes to tell the RTL that the data is UTF-8, and it will decode it to UTF-16 for you automatically, and then the rest of your processing code can remain the same (assuming you are not doing anything that is Ansi-specific).

How do the new string types work in Delphi 2009/2010?

I have to convert a large legacy application to Delphi 2009 which uses strings, AnsiStrings, WideStrings and UTF8 data all over the place and I have a hard time to understand how the new string types work and how they should be used.
The application fully supported Unicode using TntUnicodeControls and there are 3rd party DLLs which require strings in specific encodings, mostly UTF8 and UTF16, making the conversion task not as trivial as one would suspect.
I especially have problems with the C DLL calls and choosing the right type.
I also get the impression that there are many implicit string conversions happening, because one of the DLL seems to always receive UTF-8 encoded strings, no matter how the Delphi string is encoded.
Can someone please provide a short overview about the new Delphi 2009 string types UnicodeString and RawByteString, perhaps some usage hints and possible pitfalls when converting a pre 2009 application?
See Delphi and Unicode, a white paper written by Marco Cantù and I guess
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), written by Joel.
One pitfall is that the default Win32 API call has been mapped to use the W (wide string) version instead of the A (ANSI) version, for example ShellExecuteA If your code is doing tricky pointer code assuming internal layout of AnsiString, it will break. A fallback is to substitute PChar with PAnsiChar, Char with AnsiChar, string with AnsiString, and append A at the end of Win32 API call for that portion of code. After the code actually compiles and runs normally, you could refactor your code to use string (UnicodeString).
Watch my CodeRage 4 talk on "Using Unicode and Other Encodings in your Programs" this friday, or wait until the replay of it is available online.
I'm going to cover some encodings and explain about the string format.
The slides will be available shortly (I'll try to get them online today) and contain a lot of references to stuff you should read on the internet (but I must admit I forgot the link to Joel on Unicode that eed3si9n posted).
Will edit this answer today with the uploads and the links.
Edit:
If you have a small sample where you can show that your C/C++ DLL receives the strings UTF8 encoded, but thought they should be encoded otherwise, please post it (mail me; almost anything at the pluimers dot com gets to me, especially if you use my first name before the at sign).
Session materials can be downloaded now, including the "Using Unicode and Other Encodings in your Programs" session.
These are links from that session:
Read these:
Marco Cantu, Whitepaper “Delphi and Unicode”
Marco Cantu, Presentation “Delphi and Unicode”
Nick Hodges, Whitepaper “Delphi in a Unicode World”
Relevant on-line help topics:
What's New in Delphi and C++Builder 2009
String Types: Base: ShortString, AnsiString, WideString, UnicodeString
String Types: Unicode (including internal memory layouts of the string types)
String Types: Enabling for Unicode
String Types: RawByteString (AnsiString with CodePage $ffff)
String Types: UTF8String (AnsiString with CodePage 65001)
String <-> PChar conversions: PChar fundamentals
String <-> PChar conversions: Returning a PChar Local Variable
String <-> PChar conversions: Passing a Local Variable as a PChar
Hope this gets you going. If not, mail me and I'll try to extend the answer here.
Note that it does not only hit real string code. It also hits code where PCHAR is used to trawl through buffers, or interface with APIs.
E.g. initialization code of headers that load the DLL dynamically (getprocedureaddress/loadlibray)
It seems almost all my problems come from the automatic conversion on assignments to UTF8String.
I already had old code using UTF8String just to help me think which type of string a variable should contain.
When starting to port my application, I replaced AnsiString with UTF8String for the same reason, but the code depended on UTF8String being just an alias to (classic) AnsiString
Now with the automatic conversion that assumption is no longer true, which created many problems.
Be careful if you use UTF8String when porting from pre-2009 Delphi code!
Another thing to watch out for when passing string between dlls built with different versions of Delphi or C++ Builder is that, starting with 2009, the StrRec part of AnsiStringBase gained two extra fields; codePage and elemSize. They are 2 bytes each (short ints), so the size of StrRec is now 12 bytes instead of 8. This can cause invalid pointer exception problems with memory allocation and destruction, even when the data part of the string seems to transfer ok.

Resources