Delphi, how to avoid unicode warning message in D2009, D2010

Delphi, how to avoid unicode warning message in D2009, D2010 - delphi

In a sorting routine in Delphi 2007 I am using code like this:
(txt[n] in ['0'..'9'])
function ExtractNr(n: Integer; var txt: String): Int64;
begin
while (n <= Length(txt)) and (txt[n] in ['0'..'9']) do n:= n + 1;
Result:= StrToInt64Def(Copy(txt, 1, (n - 1)), 0);
Delete(txt, 1, (n - 1));
end;
where txt is a string. This works fine in D2007 but will give warnings in D2009 and D2010 I have no idea why but is there any way I can make it work without warnings in D2009 and D2010?
Roy M Klever

Are you getting the "WideChar reduced to byte Char in set expressions. Consider using 'CharInSet' function in 'SysUtils' unit" message?
Here's the issue. In D2009, the default string type was changed from AnsiString to UnicodeString. An AnsiString uses a single byte for each character, giving you 256 possible characters. A UnicodeString uses 2 bytes per character, giving up to 64K characters. But a Pascal set can only contain up to 256 elements. So it can't create a "set of WideChar" because there are too many possible elements.
The warning is a warning that you're attempting to compare txt[n], which is a WideChar from a Unicode string, against a set of chars. It can't make a set of WideChars, so it had to reduce them to AnsiChars to fit them into a Pascal set, and your txt[n] might be outside the Ansi boundaries entirely.
You can fix it by using CharInSet, or by making txt an AnsiString, if you're certain you won't need any Unicode characters for it. Or if that won't work well, you can disable the warning, but I'd consider that a last resort.

use CharInSet or better use Character.IsDigit

Related

Does Delphi handle automatic type conversion to/from AnsiString and ShortString? [duplicate]

I want to be able to use a string that is quite long (not longer then 100000 signs).
As far as I know a typical string variable can cotain only up to 256 chars.
Is there a way to store such a long string?

Old-style (Turbo Pascal, or Delphi 1) strings, now known as ShortString, are limited to 255 characters (byte 0 was reserved for the string length). This appears to still be the default in FreePascal (according to #MarcovandeVoort's comment below). Keep reading, though, until you get to the discussion and code sample for AnsiString below. :-)
Currently, most other dialects of Pascal I'm aware of default to either AnsiString (long strings of single byte characters) or UnicodeString (long strings of multi-byte characters). Neither of those are limited to 255 characters.
The current versions of Delphi defaults to UnicodeString as the default type, so declaring a string variable is in fact a long UnicodeString. There is no practical upper limit to the string length:
var
Test: string; // Declare a new Unicode string
begin
SetLength(Test, 100000); // Initialize it to hold 100000 characters
Test := StringOfChar('X', 100000); // Fill it with 100000 'X' characters
end;
If you want to force single-byte characters (but not be limited to 255 character strings), use AnsiString (which can set as the default string type in FreePascal if you use the {$H+} compiler directive - thanks #MarcovandeVoort):
var
Test: AnsiString; // Declare a new Ansistring
begin
SetLength(Test, 100000); // Initialize it to hold 100000 characters
Test := StringOfChar('X', 100000); // Fill it with 100000 'X' characters
end;
Finally, if you do for some unknown reason want to use the old style ShortString that is restricted to 255 characters, declare it as such, either using ShortString or the old style String[Size] declaration:
var
Test: ShortString; // Declare a new short string of 255 characters
ShortTest: String[100]; // Also a ShortString of 100 characters
begin
// This line won't compile, because it's too large for Test
Test := StringOfChar('X', 100000); // Fill it with 100000 'X' characters
end;

In Free Pascal, you do not need to be worry about this. You only need to insert the directive {$H+} at the beginning of the source code.
{$H+}
var s: String;
begin
s := StringOfChar('X', 1000);
writeln(s);
end.

You can use the AnsiString type.

Delphi 7 printing in Turkish

I have some legacy code (1 million lines) written in Delphi 7 Pascal which for various reasons can't be upgraded to a more recent version of Delphi. The program outputs documents in about 30 languages and makes a very good job of producing the various characters in all languages apart from Turkish. The coding sets the charset to TURKISH_CHARSET (162). When it tries to print char #351 (ş, hex 15f), char #285 (ğ, hex 11f) or char #305 (ı, hex 131), it prints only "s", "g" or "i". It uses a simple
Printer.Canvas.TextOut(x, y, sText)
to output the text.
I tried compiling the code on different machines and running it on different versions of Windows but always with the same result.

In Delphi 7, string is an alias for AnsiString, which encodes Unicode characters as 8-bit bytes using Windows codepages. In some MBCS codepages, Unicode characters may require multiple bytes (Turkish is not one of them, though).
Microsoft has several codepages for Turkish:
857 (MS-DOS)
1254 (Windows)
10081 (Macintosh)
28599 (ISO-8859-9)
In both codepages 1254 and 28599 (where 1254 is the most likely one you will run into), the Unicode characters in question are encoded in 8-bit as hex $FE (ş), $F0 (ğ), and $FD (ı).
Make sure your sText string variable actually contains those byte values to begin with, and not ASCII bytes $73 (s), $67 (g), and $69 (i) instead. If it contains the latter, you are losing the Turkish data before it even reaches Canvas.TextOut(). That would be an issue earlier in your code.
However, If sText contains the correct bytes, then the problem has to be on the OS side, as TCanvas.TextOut() is just a thin wrapper for the Win32 API ExtTextOutA() function, where sText gets passed as-is to the API. Maybe the particular font you are using doesn't support Turkish, or at least those particular characters. Or maybe there is a problem with the printer driver. Either way, you might have to resort to converting your sText value to a WideString using MultiByteToWideChar() and then call ExtTextOutW() (not ExtTextOutA()) directly, eg:
var
wText: WideString;
size: TSize;
begin
//Printer.Canvas.TextOut(x, y, sText);
SetLength(wText, MultiByteToWideChar(1254{28599}, 0, PAnsiChar(sText), Length(sText), nil, 0));
MultiByteToWideChar(1254{28599}, 0, PAnsiChar(sText), Length(sText), PWideChar(wText), Length(wText)));
Windows.ExtTextOutW(Printer.Canvas.Handle, x, y, Printer.Canvas.TextFlags, nil, PWideChar(wText), Length(wText), nil);
size.cX := 0;
size.cY := 0;
Windows.GetTextExtentPoint32W(Printer.Canvas.Handle, PWideChar(wText), Length(wText), size);
Printer.Canvas.MoveTo(x + size.cX, Y);
end;

Delphi XE2 : #nn notation for extended Character

Please help me,
I know this may sound like very simple question, but i just can not figured it out how to make it work. I just started learning Unicode, so please give me some hint or example code.
I was converting my old encoding and decoding code from Delphi 5 to Delphi XE2. And when i call "Char" function it result in a different character, seem like it happen at the extended character of any encoding set.
At Delphi 5 :
Char(129) -> will result as empty char
At Delphi XE2 :
Char(129) -> will result #$81
I tried to used AnsiChar at delphi XE2, and the result is :
AnsiChar(129) -> will result as #129
What code should i used at delphi XE2, so it will return an empty char too. Not the #nn notation?
I need it to return the same result of Delphi 5, for the backward compatibility reason.
Is this have something to do with HIGHCHARUNICODE directive? I have read and tried it too, but still not luck.
Here the code that i tried at Delphi XE2, i make a simple one, but it did have a same logic with my encode / decode code. The code will get the char then put it into edit box.
procedure TForm1.Button1Click;
var
chars : Array[0..2] of AnsiChar;
ansi_string : AnsiString;
begin
chars[0] := AnsiChar(65);
chars[1] := AnsiChar(129);
chars[2] := AnsiChar(66);
ansi_string := chars;
// Here the ansi_string have a value of 'A'#$81'B'
EditBox1.Text := ansi_string;
// Here when i look the EditBox1.text in Evaluate/modify form,
// it shows 'A'#$0081'B'
// but at the form, it only show AB
end;
How can i make the ansi_string variable having a value of 'AB' instead of 'A'#$81'B'?
Thanks in Advance,

In Delphi 5, Char is AnsiChar, but in XE2 Char is WideChar instead. 127 is the highest signed value that an AnsiChar can hold, so a value of 129, which is hex $81, binary 10000001, would simply be interpreted as -127, which is also hex $81, binary 10000001. They are just different interpretations of the same bit value.
Depending on what your encoding/decoding code I actually doing, you will need to either use AnsiChar/AnsiString explicitly instead of Char/String generically, or switch to using Byte values, or else re-write the code to support Unicode properly and not make assumptions about the size of Char anymore. Hard to say since you did not show your actual code. But you should be OK with just using AnsiChar/AnsiString, since they do operate the same way they always have (the debugger may simply be displaying AnsiChar values differently, that's all).

How to get the first element in a string?

I'm trying to figure out a way to check a string's first element if it's either a number or not.
if not(myString[0] in [0..9]) then //Do something
The problem is that I get an error "Element 0 inaccessible - use 'Length' or 'SetLength"
Another way came to my head from my C-like exprieince - convert the first element of the string to char and check the char,but there is no difference in the compile errors.
if not(char(myString[0]) in [0..9]) then //Do something
How do I accomplish it?

Strings are 1-based:
if not (myString[1] in ['0'..'9']) then // Do something

Pascal and Delphi indexes string from 1. This is a legacy from time where zero byte contained length, while next 255 (index 1 to 255) contained actual characters.
Joel Spolsky wrote quite good article on string issues:
http://www.joelonsoftware.com/articles/fog0000000319.html

Delphi strings use a 1-based index, so just rewrite to
if not(myString[1] in ['0'..'9']) then //Do something
Also take note of the quotes around the 0..9, otherwise you would be comparing characters to integers.

We should keep in mind some things:
String in Delphi is 0-based for mobile platforms and 1-based for Windows.
String in old versions of Delphi is AnsiString (1-byte per char) and WideString in new versions (2-bytes per char).
Delphi supports set of AnsiChar, but doesn't support set of WideChar.
So if we want to write a code compatible with all versions of Delphi, then it should be something like this:
if (myString[Low(myString)]>='0') and (myString[Low(myString)]<='9') then
// Do something

if not(myString[0] in [0..9]) then //Do something
If you're using Delphi 2009, the TCharacter class in Character.pas has functions like IsDigit to help simplify these kinds of operations.
Once you fix the indexing, of course. :)

With later updates to Delphi mobile code, the bottom string index changed from 0 to 1. When you compile older programmes, they compile and run correctly using 0 starting index. Programmes created with the later IDE produce an error. When you have mixtures, life gets complex!
It would be good to be able to take an older programme and tell the IDE that you want it brought up to date (maybe this would fix other things, like fonts getting scrambled when you answer a phone call!) but it would be good to get things consistent!

The simplest way to check to see if the first character of string is an integer, and then dispatch:
var
iResult : integer;
begin
if TryStrToInt( mySTring[1], iResult) then
begin
// handle number logic here iResult = number
end
else
begin
// handle non number logic here
end;
end;

I use a utility function to test the entire string:
function IsNumeric(const Value: string): Boolean;
var
i: Integer;
begin
Result := True;
for i := 1 to Length(Value) do
if not (Value[i] in ['0'..'9','.','+','-']) then
begin
Result := False;
Break;
end;
end;
The above code is for Delphi versions prior to 2007. In 2007 and 2009, you could change the integer variable i to a character c, and use for c in Value instead.
To test for integers only, remove the '.' from the set of characters to test against.

This is incorrect. ISO strings and older Pascal's also started at one. It is just a general convention, and afaik the s[0] thing is a result of that being vacant, and cheap to code in the UCSD bytecode interpreter. But that last bit is before my time, so only my guessing.
It results from the Pascal ability to have arbitrary upper and lower bounds, which provides for more typesafety accessing arrays.
Really old Pascal strings (till early eighties) strings were even worse than C ones btw. Multiple conventions were in used, but all were based on static arrays (like early C), but they were typically space padded, so you had scan back from the end till the spaces ended.
(removed the legacy tag, since being 1 based is not legacy. Accessing s[0] as length IS legacy, but that is not what the question is about)

Foreach element in strName
if not element in [0-9] then
do something
else
element is a digit
end if
Don't forget the quote between digits number.

MD5 Hashing in Delphi 2009

In borland delphi 7 and even in delphi 2007 everything worked, but in delphi 2009 it just returns the wrong hash!
I use wcrypt2 script (http://pastebin.com/m2f015cfd)
Just have a look:
string : "123456"
hash:
Delphi 7 : "e10adc3949ba59abbe56e057f20f883e" - real hash.
Delphi 2007 : "e10adc3949ba59abbe56e057f20f883e" - real hash too.
And...
Delphi 2009 : "5fa285e1bebe0a6623e33afc04a1fbd5" - WTF??
I've tried a lot of md5 scripts, but delphi 2009 does the same with all of them. Any help? Thanks.

Your library is not Unicode aware. Just passing it an AnsiString won't be enough because it probably uses strings internally to store data.
You could try to update that library, wait for the author to update it, or just use the MessageDigest_5.pas that ships with Delphi 2009. It is in the source\Win32\soap\wsdlimporter folder, which you will need to either add to your path, or explicitly include it in your project.
Here is some sample code using it in Delphi 2009:
uses Types, MessageDigest_5;
procedure TForm16.Edit1Change(Sender: TObject);
var
MD5: IMD5;
begin
MD5 := GetMD5;
MD5.Init;
MD5.Update(TByteDynArray(RawByteString(Edit1.Text)), Length(Edit1.Text));
Edit2.Text := LowerCase(MD5.AsString);
end;
And you are in business:
MD5(123456) = e10adc3949ba59abbe56e057f20f883e
You could wrap it in a simple function call if you wanted to. It is important you cast to a RawByteString before casting to a TByteDynArray since the RawByteString cast drops all the extra Unicode characters. Granted if the edit contains Unicode characters then you could end up with bad data.
Keep in mind that GetMD5 is returning an interface, so it is reference counted, etc.
Merry Christmas!

Before someone can comment on hashing algorithms, it helps if they have at least a fundamental understanding of the underlying concepts and principles. All of the responses so far which have focused on endless typecasting are completely overkill, but even worse, will result in unreliable results if a unicode string is being hashed.
The first thing you need to understand is that hashing and encryption algorithms operate at the byte-level. That means they don't care what you're hashing or encrypting. You can hash integers, chars, plain ASCII, full unicode, bytes, longwords, etc etc. The algorithm doesn't care.
When working with strings, the ONLY thing you have to ensure is that the internal function of your hashing library returns an AnsiString in the function which spits out your resulting hash. That's it. That's all that matters.
Your actual code for YOUR project can (and should) be based on normal string input, which maps to unicodestring in Delphi 2009. You shouldn't be typecasting anything to ansistring or rawbytestring. By doing so, you immediately create a broken hash if and when the user tries to hash anything outside the scope of the ANSI character set. And in the world of hashing, a broken hash is both unreliable AND insecure.

Have you checked that your library has been correctly updated for D2009 and unicodification?
I kinda doubt the same code would do D7/D2007 and D2009 for this sort of things.

It is obvious that your lib is not unicode enabled.
Convert your string to AnsiString or RawByteString or UTF8String by declaring temp AnsiString and assign your uniode string to it.
Note that if you are using unicode specific chars that can't be translated to single codepage, you should convert your string to UTF8.
Then call MD5(PAnsiChar(YourTempString)).
Check that your lib may have PWideChar or UNICODE declarations, to skip this.

If you have wcrypt2.pas, use this function.
function md5ansi(const Input: AnsiString): AnsiString;
var
hCryptProvider: HCRYPTPROV;
hHash: HCRYPTHASH;
bHash: array[0..$7f] of Byte;
dwHashBytes: Cardinal;
pbContent: PByte;
i: Integer;
begin
dwHashBytes := 16;
pbContent := Pointer(PAnsiChar(Input));
Result := '';
if CryptAcquireContext(#hCryptProvider, nil, nil, PROV_RSA_FULL, CRYPT_VERIFYCONTEXT or CRYPT_MACHINE_KEYSET) then
begin
if CryptCreateHash(hCryptProvider, CALG_MD5, 0, 0, #hHash) then
begin
if CryptHashData(hHash, pbContent, Length(Input) * sizeof(AnsiChar), 0) then
begin
if CryptGetHashParam(hHash, HP_HASHVAL, #bHash[0], #dwHashBytes, 0) then
begin
for i := 0 to dwHashBytes - 1 do
begin
Result := Result + AnsiString(Format('%.2x', [bHash[i]]));
end;
end;
end;
CryptDestroyHash(hHash);
end;
CryptReleaseContext(hCryptProvider, 0);
end;
Result := AnsiString(AnsiLowerCase(String(Result)));
end;

Are you perchance casting a generic string (which in Delphi 2009 is a UnicodeString) to a PAnsiChar and passing that into the hash function? That will not work. You first must cast the string into an AnsiString and then cast that one to PAnsiChar, a la:
PAnsiChar(AnsiString('123456'))
Also, try using RawByteString instead of AnsiString like dmajkic suggested. Avoid UTF8String since that's not an AnsiString and any characters outside the ASCII range (0..127) might get reinterpreted into multibyte characters.

In Jim's answer:
if we change
MD5.Update(TByteDynArray(RawByteString(Edit1.Text)), Length(Edit1.Text));
to
MD5.Update(TByteDynArray(RawByteString(Edit1.Text)), Length(RawByteString(Edit1.Text)));
will support better while Chinese characters exists.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Delphi, how to avoid unicode warning message in D2009, D2010 - delphi

use CharInSet or better use Character.IsDigit

Related

Does Delphi handle automatic type conversion to/from AnsiString and ShortString? [duplicate]

Delphi 7 printing in Turkish

Delphi XE2 : #nn notation for extended Character

How to get the first element in a string?

MD5 Hashing in Delphi 2009

Categories

Resources