CharInSet bulk conversions when migrating from Delphi 2007 - delphi

We are shifting a number of projects from Delphi 2007 to XE8 and have a number of the following warning (many hundreds of them):
[dcc32 Warning] X.PAS(1568): W1050 WideChar reduced to byte char in set expressions. Consider using 'CharInSet' function in 'SysUtils' unit.`
It occurs to me that many of these are of the form
if x in ['1','2','3'] then
which need to be converted to
if CharInSet(x, ['1','2','3']) then
And this looks like there might be some sort of regular expression type search and replace that could be used to do these in bulk.
Can anyone think of a way to convert these in bulk?

This can be done with Search/Replace in the IDE.
The following works for me in XE4.
search for:
if {[a-z]} in \[{{'[0-9]+'\,? ?}+}\] then
If you want to match a variable more than one character long, consider to use some quantifier like [a-z]+.
replace with:
if CharInSet\(\0, \[\1\]\) then
Notice that the IDE uses {} for groups and \0, \1 ... as replacement placeholders.
Embarcadero Regular Expressions reference for Delphi XE4
IDE regular expressions search:
The resulting unit:
You may also find this question useful for further reference.

Related

Using CharInSet in Delphi or Firemonkey

How do I use CharInSet() to get rid of this warning?
lastop:=c;
{OK - if op is * or / and prev op was + or -, put the whole thing in parens}
If (c in ['*','/']) and (NextToLastOp in ['+','-']) then
Result.text:='('+Result.text+')';
Result.text:=Result.text+c;
[dcc32 Warning] U_Calc1.pas(114): W1050 WideChar reduced to byte char in set expressions. Consider using 'CharInSet' function in 'SysUtils' unit.
CharInSet is necessary due to the fact that WideChar values are too large to be used in actual set operations.
To use it in your code you would use it like this:
If (CharInSet(c, ['*','/']) and (CharInSet(NextToLastOp, ['+','-']) then
However CharInSet doesn't actually do anything special or even useful other than silence the warning.
The code in the function is identical to the code it replaces. The warning is silenced only because the RTL units are pre-compiled. The compiler does not recompile them even if you do a full "Rebuild all". i.e. that code in SysUtils is not actually ever re-compiled. If it were it would emit the exact same warning.
So, you could use that function to avoid the warning but if you can be certain that your character data is ASCII (or extended ASCII), that is with ordinal values in the range #0 to #255, then you could also avoid the warning by typecasting your character variable explicitly to an ANSIChar:
If (ANSIChar(c) in ['*','/']) and (ANSIChar(NextToLastOp) in ['+','-']) then
To reiterate: You should only do this if you are absolutely certain that the character data that you are working with consists only of ASCII characters.
But this same caveat applies equally to using CharInSet since this still requires that you test for membership in an ANSIChar set (i.e. single byte characters).
Explicit type-casting has the advantage (imho) of making it explicit and apparent that your code is intended deliberately to expect and work only with ASCII characters, rather than giving a perhaps misleading impression of being more fully equipped to handle Unicode.

Delphi String / Array of Strings

I have an old programm which was programmed in Delphi 1 (or 2, I'm not sure) and I want to build a 64-bit version of it (I use the Delphi XE2). Now the problem is that in the source code there are on the one hand strings and on the other arrays of strings (I guess to limit the string length).
Now there are a lot of errors while compiling because of incompatible types.
Above all there are procedures which should handle both types.
Is there an easy way to solve this problem (without changing every variable)?
Short answer
Search and replace : string => : ansistring
make sure you use length(astring) and setLength(astring) instead of manipulating string[0].
Long answer
Delphi 1 has only one type of string.
The old-skool ShortString that has a maximum length of 255 chars and a declared maximum length.
It looks and feels like an array of char, but it has a leading length byte.
var
ShortString: string[100];
In Delphi 2 longstrings (aka AnsiString) were introduced, these replace the shortstring. They do not have a fixed length, but are allocated dynamically instead and automatically grow and shrink as needed.
They are automatically created and destroyed.
var
Longstring: string; //AnsiString, can have any length up to 2GB.
In Delphi 2009 Unicode was introduced.
This changes the longstring because now each char no langer takes up 1 byte, but takes 2 bytes(*).
Additionally you can specify a character set to an AnsiString, whereas the new Unicode longstring uses UTF-16.
What you need to do depends on your needs:
If you just want the old code to work as before and you don't care about supporting all the multilingual stuff Unicode supports, you will need to replace all your string keywords with AnsiString (for all strings that are longstrings).
If you have Delphi 1 code, you can rename the string to ShortString.
I would recommend that you refactor the code to always use longstrings (read: AnsiString) though.
Delphi will automatically translate the UnicodeStrings that all return values of functions (Unicode string) are translated into AnsiStrings and visa versa, however this may include loss of data if your users enter symbols in a editbox that your AnsiString cannot store.
Also all that translation takes a bit of time (I doubt you will notice this though).
In Delphi 1 up to Delphi 2007 this problem did not exist, because controls did not allow Unicode characters to be entered.
(*) gross oversimplification

Delphi XE - should I use String or AnsiString?

I finally upgraded to Delphi XE. I have a library of units where I use strings to store plain ANSI characters (chars between A and U). I am 101% sure that I will never ever use UNICODE characters in those places.
I want to convert all other libraries to Unicode, but for this specific library I think it will be better to stick with ANSI. The advantage is the memory requirement as in some cases I load very large TXT files (containing ONLY Ansi characters). The disadvantage might be that I have to do lots and lots of typecasts when I make those libraries to interact with normal (unicode) libraries.
There are some general guidelines to show when is good to convert to Unicode and when to stick with Ansi?
The problem with general guidelines is that something like this can be very specific to a person's situation. Your example here is one of those.
However, for people Googling and arriving here, some general guidelines are:
Yes, convert to Unicode. Don't try to keep an old app fully using AnsiStrings. The reason is that the whole VCL is Unicode, and you shouldn't try to mix the two, because you will convert every time you assign a Unicode string to an ANSI string, and that is a lossy conversion. Trying to keep the old way because it's less work (or some similar reason) will cause you pain; just embrace the new string type, convert, and go with it.
Instead of randomly mixing the two, explicitly perform any conversions you need to, once - for example, if you're loading data from an old version of your program you know it will be ANSI, so read it into a Unicode string there, and that's it. Ever after, it will be Unicode.
You should not need to change the type of your string variables - string pre-D2009 is ANSI, and in D2009 and alter is Unicode. Instead, follow compiler warnings and watch which string methods you use - some still take an AnsiString parameter and I find it all confusing. The compiler will tell you.
If you use strings to hold bytes (in other words, using them as an array of bytes because a character was a byte) switch to TBytes.
You may encounter specific problems for things like encryption (strings are no longer byte/characters, so 'character' for 'character' you may get different output); reading text files (use the stream classes and TEncoding); and, frankly, miscellaneous stuff. Search here on SO, most things have been asked before.
Commenters, please add more suggestions... I mostly use C++Builder, not Delphi, and there are probably quite a few specific things for Delphi I don't know about.
Now for your specific question: should you convert this library?
If:
The values between A and U are truly only ever in this range, and
These values represent characters (A really is A, not byte value 65 - if so, use TBytes), and
You load large text files and memory is a problem
then not converting to Unicode, and instead switching your strings to AnsiStrings, makes sense.
Be aware that:
There is an overhead every time you convert from ANSI to Unicode
You could use UTF8String, which is a specific type of AnsiString that will not be lossy when converted, and will still store most text (Roman characters) in a single byte
Changing all the instances of string to AnsiString could be a bit of work, and you will need to check all the methods called with them to see if too many implicit conversions are being performed (for performance), etc
You may need to change the outer layer of your library to use Unicode so that conversion code or ANSI/Unicode compiler warnings are not visible to users of your library
If you convert to Unicode, sets of characters (can't remember the syntax, maybe if 'S' in MySet?) won't work. From your description of characters A to U, I could guess you would like to use this syntax.
My recommendation? Personally, the only reason I would do this from the information you've given is the memory use, and possibly performance depending on what you're doing with this huge amount of A..Us. If that truly is significant, it's both the driver and the constraint, and you should convert to ANSI.
You should be able to wrap up the conversion at the interface between this unit and its clients. Use AnsiString internally and string everywhere else and you should be fine.
In general only use AnsiString if it is important that the Chars are single bytes, Otherwise the use of string ensures future compatibility with Unicode.
You need to check all libraries anyway because all Windows API functions in Delhpi XE replaced by their unicode-analogues, etc. If you will never use UNICODE you need to use Delphi 7.
Use AnsiString explicitly everywhere in this unit and then you'll get compiler warning errors (which you should never ignore) for String to AnsiString conversion errors if you happen to access the routines incorrectly.
Alternately, perhaps preferably depending on your situation, simply convert everything to UTF8.
Stick with Ansi strings ONLY if you do not have the time to convert the code properly. The use of Ansi strings is really only for backward compatibility - to my knowledge C# does not have an equiavalent to Ansi strings. Otherwise use the standard Unicode strings. If you have a look on my web-site I have a whole strings routines unit (about 5,000 LOC) that works with both Delphi 2007 (non-Uniocde) and XE (Unicode) with only "string" interfaces and contains almost all of the conversion issues you might face.

Delphi Ansistrings

I have a case here, I am going to migrate over to delphi 2011 XE from Delphi 7, and to my surprise many components will have problems due to ansistrings, in delphi xe they look like japanese / chinese characters, now the unit I use is a PCSC connector and seem to be discontinued/abandoned from the original developper.
basically what I want is a easy way to read the strings again with as little modification to original code as possible..
also if there are any good tutorials out there on how to makae components ansistring ready for 2009 and newer would help me also
#Plastkort, Delphi >= 2009 is perfectly capable of reading and handling AnsiString. You only get the meaningless characters if you somehow hard-cast ANSI data to Unicode, possibly by hard-casting a pointer to PChar.
If I had to convert someone else's code to Unicode I'd start by searching for PChar, Char and String and specifically looking at places where other types are hard casted to those types. That's because those types changed meaning: In non-Unicode delphi CHAR was 1 byte, now it's 2 bytes.
The conversion itself isn't necessary difficult, you just need to understand the problem you're facing and you need to have good understanding of the code you're converting. And it's a lot of work, especially when dealing with code that did "smart things with strings".
The big change in Delphi (prior to Delphi 2009 I think) is the aliased string types; string, Char, PChar, etc which prios to 2009 were ANSI string types are now all WideChar types.
so
Type | Delphi 6,7 | Delphi 2009, XE
-------+------------+----------------
string | AnsiString | UnicodeString
char | AnsiChar | WideChar
pchar | PAnsiChar | PWideChar
The simplest way to migrate from Ansi Delphi to Unicode Delphi is a global search and replace for the aliased string types that replaces them with the explicit 8 bit ANSI equivlents (i.e. replace all string with AnsiString, PChar with PAnsiChar, etc...)
That should get you 90% of the way.
update
After reading the comments to my answer and the article referenced by #O.D
I think the advice to bite the bullet and go with Unicode is the wiser option.

Delphi 2010 Wide functions vs. String functions

We're currently converting a Delphi 2007 project to Delphi 2010. We were already using Unicode (via WideStrings and TNT Unicode Controls).
I was expecting to replace all Wide functions, e.g. WideUpperCase, with their equivalent, e.g. UpperCase, but they do not work the same way. For example, WideUpperCase works differently from UpperCase. WideUpperCase correctly uppercases Campañas, but UpperCase leaves the ñ in lower case.
Are there any other differences that I should be aware of? e.g. do WideFormat and Format work the same?
Thanks
You should use ToUpper function from Character unit to uppercase unicode strings. Or else you can use AnsiUpperCase if you need to support the common codebase for non-unicode and unicode Delphi versions - AnsiUpperCase is Ansi function for Delphi 2007 and prior, and unicode function for Delphi 2009 and above.
The naming is really bad (due to keeping compat with older versions). I suggest you read the cos for each string function you might want to use and check if it works with Unicode or not.

Resources