I am using Delphi 10.3.2
I am trying to use the DCPCrypt units with the firemonkey framework to encrypt a string.
It works 100% on Win32, Win64 and macOS 32 targets, the result is always the same. But when I compile for macOS 64, the result is different.
This is the code used:
function EncodeAES(code:ansistring; key:ansistring):string;
var
s,u:ansistring;
enc: TEncoding;
k,iv, Data, Crypt: TBytes;
Cipher: TDCP_rijndael;
begin
u:='';
enc:=TEncoding.ANSI;
Data := enc.GetBytes(code); // VMpuXJGbUNOv
k:=enc.GetBytes(key); // kj3214ed)k32nre2
iv:=enc.GetBytes(u);
Cipher:=TDCP_rijndael.Create(nil);
Cipher.Init(K[0], 128, #iv[0]);
Crypt:=Copy(Data, 0, Length(Data));
BytePadding(Crypt, Cipher.BlockSize, pmPKCS7);
Cipher.EncryptECB(Crypt[0], Crypt[0]);
Cipher.Free;
s:=tencoding.ANSI.GetString(crypt);
result:=StringToHex(s);
end;
and this is the BytePadding function:
procedure BytePadding(var Data: TBytes; BlockSize: integer; PaddingMode: TPaddingMode);
var
I, DataBlocks, DataLength, PaddingStart, PaddingCount: integer;
begin
BlockSize := BlockSize div 8;
if PaddingMode in [pmZeroPadding, pmRandomPadding] then
if Length(Data) mod BlockSize = 0 then Exit;
DataBlocks := (Length(Data) div BlockSize) + 1;
DataLength := DataBlocks * BlockSize;
PaddingCount := DataLength - Length(Data);
if PaddingMode in [pmANSIX923, pmISO10126, pmPKCS7] then
if PaddingCount > $FF then Exit;
PaddingStart := Length(Data);
SetLength(Data, DataLength);
case PaddingMode of
pmZeroPadding, pmANSIX923, pmISO7816: // fill with $00 bytes
FillChar(Data[PaddingStart], PaddingCount, 0);
pmPKCS7: // fill with PaddingCount bytes
FillChar(Data[PaddingStart], PaddingCount, PaddingCount);
pmRandomPadding, pmISO10126: // fill with random bytes
for I := PaddingStart to DataLength-1 do Data[I] := Random($FF);
end;
case PaddingMode of
pmANSIX923, pmISO10126:
Data[DataLength-1] := PaddingCount; // set end-marker with number of bytes added
pmISO7816:
Data[PaddingStart] := $80; // set fixed end-markder $80
end;
end;
I call the function like:
procedure TForm1.BtnClick(Sender: TObject);
var
s:string;
begin
s:=EncodeAES('VMpuXJGbUNOv','kj3214ed)k32nre2');
end;
Result with Win32/Win64/macOS 32 (correct):
E32A9DE47CC60BDB70CA27885128D17A
Result with macOS 64 (wrong):
CF622155545E485AC3A083E8A0478493
What am I doing wrong?
Encryption operates on binary data, not textual data. When dealing with text, you have to encode characters to bytes before encryption, and then decode bytes to characters after decryption. Which means using the same character encoding before encryption and after decryption. You are attempting to do that encoding, but you are not accounting for the fact that TEncoding.ANSI is NOT portable across OS platforms, or even across different machines using the same OS platform. For better portability, you need to use a consistent encoding, such as TEncoding.UTF8.
Also, TEncoding operates only with UnicodeString, so using TEncoding.GetBytes() and TEncoding.GetString() with AnsiString will perform implicit conversions between ANSI and Unicode using the RTL's definition of ANSI, not yours, producing bytes you are likely not expecting if your strings contain any non-ASCII characters in them.
Your EncodeAES() function is best off using (Unicode)String for all of its string handling and forget that AnsiString even exists. Although platforms like Linux are largely UTF-8 systems, Delphi's default (Unicode)string uses UTF-16 on all platforms. If you want to encode ANSI strings, use RawByteString to avoid any implicit conversions if calling code wants to use UTF8String, AnsiString(N), etc. Encrypt the 8bit characters as-is without using TEncoding at all. Use TEncoding only for UnicodeString data.
Lastly, your EncodeAES() function should not be decoding encrypted bytes to a UnicodeString just to convert that to a hex string. You should be hex-encoding the encrypted bytes as-is.
Try this instead:
function EncodeAES(const Data: TBytes; const Key: string): string; overload;
const
Hex: array[0..15] of Char = ('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F');
var
enc: TEncoding;
iv, k, Crypt: TBytes;
Cipher: TDCP_rijndael;
B: Byte;
I: Integer;
begin
enc := TEncoding.UTF8;
iv := enc.GetBytes('');
k := enc.GetBytes(Key);
Cipher := TDCP_rijndael.Create(nil);
try
Cipher.Init(k[0], 128, #iv[0]);
Crypt := Copy(Data, 0, Length(Data));
BytePadding(Crypt, Cipher.BlockSize, pmPKCS7);
Cipher.EncryptECB(Crypt[0], Crypt[0]);
finally
Cipher.Free;
end;
SetLength(Result, Length(Crypt)*2);
I := Low(Result);
for B in Crypt do
begin
Result[ I ] := Hex[(B shr 4) and $F];
Result[I+1] := Hex[B and $F];
Inc(I, 2);
end;
end;
function EncodeAES(const S, Key: string): string; overload;
begin
Result := EncodeAES(TEncoding.UTF8.GetBytes(S), Key);
end;
function EncodeAES(const S: RawByteString; const Key: string): string; overload;
begin
Result := EncodeAES(BytesOf(S), Key);
end;
Related
I have a tagged Arabic string and I want to encode this string by using Base64 encoding. Everything runs perfect when using English letters for this string, but when using the Arabic letters, the QR reader doesn't display the correct letters.
Here is my code :
function TForm1.GetMyString(TagNo: Integer; TagValue: string): string;
var
Bytes, StrByte: TBytes;
i: Integer;
begin
SetLength(StrByte, Length(TagValue)+2);
StrByte[0] := Byte(TagNo);
StrByte[1] := Byte(Length(TagValue));
for i := 2 to Length(StrByte)-1 do
StrByte[i] := Byte(TagValue[i-1]);
Result := TEncoding.UTF8.GetString(StrByte);
end;
procedure TForm1.Button1Click(Sender: TObject);
var
s: String;
Bytes: TBytes;
begin
s := GetMyString(1, Edit1.Text) + GetMyString(2, Edit2.Text) +
GetMyString(3, Edit3.Text) + GetMyString(4, Edit4.Text) +
GetMyString(5, Edit5.Text);
bytes := TEncoding.UTF8.GetBytes(s);
QREdit.Text := TNetEncoding.Base64.EncodeBytesToString(Bytes);
end;
After decoding the Base64 string, it also shows the same QR reading result
eg. (E$33) 'D9E1'F) instead of (مؤسسة العمران)
I am using ZXingQR to read the generated string.
GetMyString() is truncating a series of UTF-16 characters into an array of 8bit bytes, as well as putting other non-textual bytes into the array, and then treating the whole array as if it were UTF-8 (which it is not) to produce a new UTF-16 string.
And then Button1Click() is taking those jacked-up UTF-16 strings, concatenating them together, and converting the result to UTF-8 for encoding to base64.
This approach will only work with ASCII strings whose lengths are less than 128 characters, and tags that are below 128 in value, since ASCII bytes in the range 0..127 is a subset of UTF-8. This will NOT work with non-ASCII characters/bytes outside of this range.
It seems that you want to base64 encode a series of tagged UTF-8 strings. If so, then try something more like this instead:
procedure TForm1.GetMyString(TagNo: UInt8; const TagValue: string; Output: TStream);
var
Bytes: TBytes;
begin
Bytes := TEncoding.UTF8.GetBytes(TagValue);
Assert(Length(Bytes) < 256);
Output.WriteData(TagNo);
Output.WriteData(UInt8(Length(Bytes)));
Output.WriteData(Bytes, Length(Bytes));
end;
procedure TForm1.Button1Click(Sender: TObject);
var
Stream: TMemoryStream;
begin
Stream := TMemoryStream.Create;
try
GetMyString(1, Edit1.Text, Stream);
GetMyString(2, Edit2.Text, Stream);
GetMyString(3, Edit3.Text, Stream);
GetMyString(4, Edit4.Text, Stream);
GetMyString(5, Edit5.Text, Stream);
QREdit.Text := TNetEncoding.Base64.EncodeBytesToString(Stream.Memory, Stream.Size);
finally
Stream.Free;
end;
end;
Alternatively:
function TForm1.GetMyString(TagNo: UInt8; const TagValue: string): TBytes;
var
Len: Integer;
begin
Len := TEncoding.UTF8.GetByteCount(TagValue);
Assert(Len < 256);
SetLength(Result, 2+Len);
Result[0] := Byte(TagNo);
Result[1] := Byte(Len);
TEncoding.UTF8.GetBytes(TagValue, 1, Length(TagValue), Result, 2);
end;
procedure TForm1.Button1Click(Sender: TObject);
var
Bytes: TBytes;
begin
Bytes := Concat(
GetMyString(1, Edit1.Text),
GetMyString(2, Edit2.Text),
GetMyString(3, Edit3.Text),
GetMyString(4, Edit4.Text),
GetMyString(5, Edit5.Text)
);
QREdit.Text := TNetEncoding.Base64.EncodeBytesToString(Bytes);
end;
First I get a TMemoryStream from an HTTP request, which contains the body of the response.
Then I load it in a TStringList and save the text in a widestring (also tried with ansistring).
The problem is that I need to convert the string because the users language is spanish, so vowels with accent marks are very common and I need to store the info.
lServerResponse := TStringList.Create;
lServerResponse.LoadFromStream(lResponseMemoryStream);
lStringResponse := lServerResponse.Text;
lDecodedResponse := Utf8Decode(lStringResponse );
If the response (a part of it) is "Hólá Múndó", lStringResponse value will be "Hólá Múndó", and lDecodedResponse will be "Hólá Múndó".
But if the user adds any emoji (lStringResponse value will be "Hólá Múndó 😀" if the emoji is 😀) Utf8Decode fails and returns an empty string.
Is there a way to get just the ANSI characters from a string (or MemoryStream)?, or removing whatever Utf8Decode can't convert?
Thanks for your time.
TMemoryStream is just raw bytes. There is no reason to loading that stream into a TStringList just to extract a (Wide|Ansi)String from it. You can assign the bytes directly to an AnsiString/UTF8String using SetString() instead, eg:
var
lStringResponse: UTF8String;
lDecodedResponse: WideString;
begin
SetString(lStringResponse, PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
lDecodedResponse := UTF8Decode(lStringResponse);
end;
Just make sure the HTTP content really is encoded as UTF-8, or else this approach will not work.
That being said - UTF8Decode() (and UTF8Encode()) in Delphi 7 DO NOT support Unicode codepoints above U+FFFF, which means they DO NOT support Emojis at all. That was fixed in Delphi 2009.
To work around that issue in earlier versions, you can use the Win32 API MultiByteToWideChar() function instead, eg:
uses
..., Windows;
function My_UTF8Decode(const S: UTF8String): WideString;
var
WLen: Integer;
begin
WLen := MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(S), Length(S), nil, 0);
if WLen > 0 then
begin
SetLength(Result, WLen);
MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(S), Length(S), PWideChar(Result), WLen);
end else
Result := '';
end;
var
lStringResponse: UTF8String;
lDecodedResponse: WideString;
begin
SetString(lStringResponse, PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
lDecodedResponse := My_UTF8Decode(lStringResponse);
end;
Alternatively:
uses
..., Windows;
function My_UTF8Decode(const S: PAnsiChar; const SLen: Integer): WideString;
var
WLen: Integer;
begin
WLen := MultiByteToWideChar(CP_UTF8, 0, S, SLen, nil, 0);
if WLen > 0 then
begin
SetLength(Result, WLen);
MultiByteToWideChar(CP_UTF8, 0, S, SLen, PWideChar(Result), WLen);
end else
Result := '';
end;
var
lDecodedResponse: WideString;
begin
lDecodedResponse := My_UTF8Decode(PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
end;
Or, use a 3rd party Unicode conversion library, like ICU or libiconv, which handle this for you.
When i create a file in Notepad, containing (example) the string 1d and save as unicode file, i get a 6 bytes size file containing the bytes #255#254#49#0#100#0.
OK. Now I need a Delphi 6 function which takes (example) input the widestring 1d and returns the string containing #255#254#49#0#100#0 (and viceversa).
How?
Thanks.
D
It is easier to read bytes if you use hex. #255#254#49#0#100#0 is represented in hex as
FF FE 31 00 64 00
Where
FF FE is the UTF-16LE BOM, which identifies the following bytes as being encoded as UTF-16 using values in Little Endian.
31 00 is the ASCII character '1'
64 00 is the ASCII character 'd'.
To create a WideString containing these bytes is very easy:
var
W: WideString;
S: String;
begin
S := '1d';
W := WideChar($FEFF) + S;
end;
When an AnsiString (which is Delphi 6's default string type) is assigned to a WideString, the RTL automatically converts the AnsiString data from 8-bit to UTF-16LE using the local machine's default Ansi charset for the conversion.
Going the other way is just as easy:
var
W: WideString;
S: String;
begin
W := WideChar($FEFF) + '1d';
S := Copy(W, 2, MaxInt);
end;
When you assign a WideString to an AnsiString, the RTL automatically converts the WideString data from UTF-16LE to 8-bit using the default Ansi charset.
If the default Ansi charset is not suitable for your needs (say the 8-bit data needs to be encoded in a different charset), you will have to use the Win32 API MultiByteToWideChar() and WideCharToMultiByte() functions directly (or 3rd party library with equivalent functionality) so you can specify the desired charset/codepage as needed.
Now then, Delphi 6 does not offer any useful helpers to read Unicode files (Delphi 2009 and later do), so you will have to do it yourself manually, for example:
function ReadUnicodeFile(const FileName: string): WideString;
const
cBOM_UTF8: array[0..2] of Byte = ($EF, $BB, $BF);
cBOM_UTF16BE: array[0..1] of Byte = ($FE, $FF);
cBOM_UTF16LE: array[0..1] of Byte = ($FF, $FE);
cBOM_UTF32BE: array[0..3] of Byte = ($00, $00, $FE, $FF);
cBOM_UTF32LE: array[0..3] of Byte = ($FF, $FE, $00, $00);
var
FS: TFileStream;
BOM: array[0..3] of Byte;
NumRead: Integer;
U8: UTF8String;
U32: UCS4String;
I: Integer;
begin
Result := '';
FS := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
try
NumRead := FS.Read(BOM, 4);
// UTF-8
if (NumRead >= 3) and CompareMem(#BOM, #cBOM_UTF8, 3) then
begin
if NumRead > 3 then
FS.Seek(-(NumRead-3), soCurrent);
SetLength(U8, FS.Size - FS.Position);
if Length(U8) > 0 then
begin
FS.ReadBuffer(PAnsiChar(U8)^, Length(U8));
Result := UTF8Decode(U8);
end;
end
// the UTF-16LE and UTF-32LE BOMs are ambiguous! Check for UTF-32 first...
// UTF-32
else if (NumRead = 4) and (CompareMem(#BOM, cBOM_UTF32LE, 4) or CompareMem(#BOM, cBOM_UTF32BE, 4)) then
begin
// UCS4String is not a true string type, it is a dynamic array, so
// it must include room for a null terminator...
SetLength(U32, ((FS.Size - FS.Position) div SizeOf(UCS4Char)) + 1);
if Length(U32) > 1 then
begin
FS.ReadBuffer(PUCS4Chars(U32)^, (Length(U32) - 1) * SizeOf(UCS4Char));
if CompareMem(#BOM, cBOM_UTF32BE, 4) then
begin
for I := Low(U32) to High(U32) do
begin
U32[I] := ((U32[I] and $000000FF) shl 24) or
((U32[I] and $0000FF00) shl 8) or
((U32[I] and $00FF0000) shr 8) or
((U32[I] and $FF000000) shr 24);
end;
end;
U32[High(U32)] := 0;
// Note: UCS4StringToWidestring() does not actually support UTF-16,
// only UCS-2! If you need to handle UTF-16 surrogates, you will
// have to convert from UTF-32 to UTF-16 manually, there is no RTL
// or Win32 function that will do it for you...
Result := UCS4StringToWidestring(U32);
end;
end
// UTF-16
else if (NumRead >= 2) and (CompareMem(#BOM, cBOM_UTF16LE, 2) or CompareMem(#BOM, cBOM_UTF16BE, 2)) then
begin
if NumRead > 2 then
FS.Seek(-(NumRead-2), soCurrent);
SetLength(Result, (FS.Size - FS.Position) div SizeOf(WideChar));
if Length(Result) > 0 then
begin
FS.ReadBuffer(PWideChar(Result)^, Length(Result) * SizeOf(WideChar));
if CompareMem(#BOM, cBOM_UTF16BE, 2) then
begin
for I := 1 to Length(Result) then
begin
Result[I] := WideChar(
((Word(Result[I]) and $00FF) shl 8) or
((Word(Result[I]) and $FF00) shr 8)
);
end;
end;
end;
end
// something else, assuming UTF-8
else
begin
if NumRead > 0 then
FS.Seek(-NumRead, soCurrent);
SetLength(U8, FS.Size - FS.Position);
if Length(U8) > 0 then
begin
FS.ReadBuffer(PAnsiChar(U8)^, Length(U8));
Result := UTF8Decode(U8);
end;
end;
finally
FS.Free;
end;
end;
Update: if you want to store UTF-16LE encoded bytes inside of an AnsiString variable (why?), then you can Move() the raw bytes of a WideString's character data into the memory block of an AnsiString: eg:
function WideStringAsAnsi(const AValue: WideString): AnsiString;
begin
SetLength(Result, Length(AValue) * SizeOf(WideChar));
Move(PWideChar(AValue)^, PAnsiChar(Result)^, Length(Result));
end;
var
W: WideString;
S: AnsiString;
begin
W := WideChar($FEFF) + '1d';
S := WideStringAsAnsi(W);
end;
I would not suggest misusing AnsiString like this, though. If you need bytes, operate on bytes, eg:
type
TBytes = array of Byte;
function WideStringAsBytes(const AValue: WideString): TBytes;
begin
SetLength(Result, Length(AValue) * SizeOf(WideChar));
Move(PWideChar(AValue)^, PByte(Result)^, Length(Result));
end;
var
W: WideString;
B: TBytes;
begin
W := WideChar($FEFF) + '1d';
B := WideStringAsBytes(W);
end;
A WideString is already a string of Unicode bytes. Specifically, in UTF16-LE encoding.
The two extra bytes you see in the Unicode file saved by Notepad are called a BOM - Byte Order Mark. This is a special character in Unicode that is used to indicate the order of bytes in the data that follows, to ensure that the string is decoded correctly.
Adding a BOM to a string (which is what you are asking for) is simply a matter of pre-fixing the string with that special BOM character. The BOM character is U+FEFF (that is the Unicode notation for the hex representation of a 'character').
So, the function you need is very simple:
function WideStringWithBOM(aString: WideString): WideString;
const
BOM = WideChar($FEFF);
begin
result := BOM + aString;
end;
However, although the function is very simple, this possibly isn't the end of the matter.
The string that is returned from this function will include the BOM and as far as any Delphi code is concerned that BOM will be treated as part of the string.
Typically you would only add a BOM to string when passing that string to some external recipient (via a file or web service response for example) if there is no other mechanism for indicating the encoding you have used.
Likewise, when reading strings from some received data which may be Unicode you should check the first two bytes:
If you find #255#254 ($FFFE) then you know that the bytes in the U+FEFF BOM have been switched (U+FFFE is not a valid Unicode character). i.e. the string that follows is UTF16-LE. Therefore, for a Delphi WideString you can discard those first two bytes and load the remaining bytes directly in to a suitable WideString variable.
If you find #254#255 then the bytes in the U+FEFF BOM have not been switched around. i.e. you know that the string that follows is UTF16-BE. In that case you again need to discard the first two bytes but when loading the remaining bytes into the WideString you must switch each pair of bytes around to convert from the UTF16-BE bytes to the UTF16-LE encoding of a WideString.
If the first 2 bytes are #255#254 (or vice versa) then you are either dealing with UTF16-LE without a BOM or possibly some other encoding entirely.
Good luck. :)
I am using Delphi 2009.
I want to view the contents of a file (in hexadecimal) inside a memo.
I'm using this code :
var
Buffer:String;
begin
Buffer := '';
AssignFile(sF,Source); //Assign file
Reset(sF);
repeat
Readln(sF,Buffer); //Load every line to a string.
TempChar:=StrToHex(Buffer); //Convert to Hex using the function
...
until EOF(sF);
end;
function StrToHex(AStr: string): string;
var
I ,Len: Integer;
s: chr (0)..255;
//s:byte;
//s: char;
begin
len:=length(AStr);
Result:='';
for i:=1 to len do
begin
s:=AStr[i];
//The problem is here. Ord(s) is giving false values (251 instead of 255)
//And in general the output differs from a professional hex editor.
Result:=Result +' '+IntToHex(Ord(s),2)+'('+IntToStr(Ord(s))+')';
end;
Delete(Result,1,1);
end;
When I declare variable "s" as char (i know that char goes up to 255) I get results hex values up to 65535!
When i declare variable "s" as byte or chr (0)..255, it outputs different hex values, comparing to any Hexadecimal Editor!
Why is that? How can I see the correct values?
Check images for the differences.
1st image: Professional Hex Editor.
2nd image: Function output to Memo.
Thank you.
Your Delphi 2009 is unicode-enabled, so Char is actually WideChar and that's a 2 byte, 16 bit unsigned value, that can have values from 0 to 65535.
You could change all your Char declarations to AnsiChar and all your String declarations to AnsiString, but that's not the way to do it. You should drop Pascal I/O in favor of modern stream-based I/O, use a TFileStream, and don't treat binary data as Char.
Console demo:
program Project26;
{$APPTYPE CONSOLE}
uses SysUtils, Classes;
var F: TFileStream;
Buff: array[0..15] of Byte;
CountRead: Integer;
HexText: array[0..31] of Char;
begin
F := TFileStream.Create('C:\Temp\test', fmOpenRead or fmShareDenyWrite);
try
CountRead := F.Read(Buff, SizeOf(Buff));
while CountRead <> 0 do
begin
BinToHex(Buff, HexText, CountRead);
WriteLn(HexText); // You could add this to the Memo
CountRead := F.Read(Buff, SizeOf(Buff));
end;
finally F.Free;
end;
end.
In Delphi 2009, a Char is the same thing as a WideChar, that is, a Unicode character. A wide character occupies two bytes. You want to use AnsiChar. Prior to Delphi 2009 (that is, prior to Unicode Delphi), Char was the same thing as AnsiChar.
Also, you shouldn't use ReadLn. You are treating the file as a text file with text-file line endings! This is a general file! It might not have any text-file line endings at all!
For an easier to read output, and looking better too, you might want to use this simple hex dump formatter.
The HexDump procedure dumps an area of memory into a TStrings in lines of two chunks of 8 bytes in hex, and 16 ascii chars
example
406563686F206F66 660D0A6966206578 #echo off..if ex
69737420257E7331 5C6E756C20280D0A ist %~s1\nul (..
0D0A290D0A ..)..
Here is the code for the dump format function
function HexB (b: Byte): String;
const HexChar: Array[0..15] of Char = '0123456789ABCDEF';
begin
result:= HexChar[b shr 4]+HexChar[b and $0f];
end;
procedure HexDump(var data; size: Integer; s: TStrings);
const
sepHex=' ';
sepAsc=' ';
nonAsc='.';
var
i : Integer;
hexDat, ascDat : String;
buff : Array[0..1] of Byte Absolute data;
begin
hexDat:='';
ascDat:='';
for i:=0 to size-1 do
begin
hexDat:=hexDat+HexB(buff[i]);
if ((buff[i]>31) and (buff[i]<>255)) then
ascDat:=ascDat+Char(buff[i])
else
ascDat:=ascDat+nonAsc;
if (((i+1) mod 16)<>0) and (((i+1) mod 8)=0) then
hexDat:=hexDat+sepHex;
if ((i+1) mod 16)=0 then
begin
s.Add(hexdat+sepAsc+ascdat);
hexdat:='';
ascdat:='';
end;
end;
if (size mod 16)<>0 then
begin
if (size mod 16)<8 then
hexDat:=hexDat+StringOfChar(' ',(8-(size mod 8))*2)
+sepHex+StringOfChar(' ',16)
else
hexDat:=hexDat+StringOfChar(' ',(16-(size mod 16))*2);
s.Add(hexDat + sepAsc + ascDat);
end;
end;
And here is a complete code example for dumping the contents of a file into a Memo field.
procedure TForm1.Button1Click(Sender: TObject);
var
FStream: TFileStream;
buff: array[0..$fff] of Byte;
nRead: Integer;
begin
FStream := TFileStream.Create(edit1.text, fmOpenRead or fmShareDenyWrite);
try
repeat
nRead := FStream.Read(Buff, SizeOf(Buff));
if nRead<>0 then
hexdump(buff,nRead,memo1.lines);
until nRead=0;
finally
F.Free;
end;
end;
string is UnicodeString in Delphi 2009. If you want to use single-byte strings use AnsiString or RawByteString.
See String types.
I want to encode strings as Python do.
Python code is this:
def EncodeToUTF(inputstr):
uns = inputstr.decode('iso-8859-2')
utfs = uns.encode('utf-8')
return utfs
This is very simple.
But in Delphi I don't understand, how to encode, to force first the good character set (no matter, which computer we have).
I tried this test code to see the convertion:
procedure TForm1.Button1Click(Sender: TObject);
var
w : WideString;
buf : array[0..2048] of WideChar;
i : integer;
lc : Cardinal;
begin
lc := GetThreadLocale;
Caption := IntToStr(lc);
StringToWideChar(Edit1.Text, buf, SizeOF(buf));
w := buf;
lc := MakeLCID(
MakeLangID( LANG_ENGLISH, SUBLANG_ENGLISH_US),
0);
Win32Check(SetThreadLocale(lc));
Edit2.Text := WideCharToString(PWideChar(w));
Caption := IntToStr(AnsiCompareText(Edit1.Text, Edit2.Text));
end;
The input is: "árvíztűrő tükörfúrógép", the hungarian accent tester phrase.
The local lc is 1038 (hun), the new lc is 1033.
But this everytime makes 0 result (same strings), and the accents are same, I don't lost ŐŰ which is not in english lang.
What I do wrong? How to I do same thing as Python do?
Thanks for every help, link, etc:
dd
Windows uses codepage 28592 for ISO-8859-2. If you have a buffer containing ISO-8859-2 encoded bytes, then you have to decode the bytes to UTF-16 first, and then encode the result to UTF-8. Depending on which version of Delphi you are using, you can either:
1) on pre-D2009, use MultiByteToWideChar() and WideCharToMultiByte():
function EncodeToUTF(const inputstr: AnsiString): UTF8String;
var
ret: Integer;
uns: WideString;
begin
Result := '';
if inputstr = '' then Exit;
ret := MultiByteToWideChar(28592, 0, PAnsiChar(inputstr), Length(inputstr), nil, 0);
if ret < 1 then Exit;
SetLength(uns, ret);
MultiByteToWideChar(28592, 0, PAnsiChar(inputstr), Length(inputstr), PWideChar(uns), Length(uns));
ret := WideCharToMultiByte(65001, 0, PWideChar(uns), Length(uns), nil, 0, nil, nil);
if ret < 1 then Exit;
SetLength(Result, ret);
WideCharToMultiByte(65001, 0, PWideChar(uns), Length(uns), PAnsiChar(Result), Length(Result), nil, nil);
end;
2a) on D2009+, use SysUtils.TEncoding.Convert():
function EncodeToUTF(const inputstr: RawByteString): UTF8String;
var
enc: TEncoding;
buf: TBytes;
begin
Result := '';
if inputstr = '' then Exit;
enc := TEncoding.GetEncoding(28592);
try
buf := TEncoding.Convert(enc, TEncoding.UTF8, BytesOf(inputstr));
if Length(buf) > 0 then
SetString(Result, PAnsiChar(#buf[0]), Length(buf));
finally
enc.Free;
end;
end;
2b) on D2009+, alternatively define a new string typedef, put your data into it, and assign it to a UTF8String variable. No manual encoding/decoding needed, the RTL will handle everything for you:
type
Latin2String = type AnsiString(28592);
var
inputstr: Latin2String;
outputstr: UTF8String;
begin
// put the ISO-8859-2 encoded bytes into inputstr, then...
outputstr := inputstr;
end;
If you're using Delphi 2009 or newer every input from the default VCL controls will be UTF-16, so no need to do any conversions on your input.
If you're using Delphi 2007 or older (as it seems) you are at mercy of Windows, because the VCL is ANSI and Windows has a fixed Codepage that determines which characters can be used in i.e. a TEdit.
You can change the system-wide default ANSI CP in the control panel though, but that requires a reboot each time you do.
In Delphi 2007 you have some chance to use TNTUnicode controls or some similar solution to get the Text from the UI to your code.
In Delphi 2009 and newer there are also plenty of Unicode and character set handling routines in the RTL.
The conversion between character sets can be done with SysUtils.TEncoding:
http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/SysUtils_TEncoding.html
The Python code in your question returns a string in UTF-8 encoding. To do this with pre-2009 Delphi versions you can use code similar to:
procedure TForm1.Button1Click(Sender: TObject);
var
Src, Dest: string;
Len: integer;
buf : array[0..2048] of WideChar;
begin
Src := Edit1.Text;
Len := MultiByteToWideChar(CP_ACP, 0, PChar(Src), Length(Src), #buf[0], 2048);
buf[Len] := #0;
SetLength(Dest, 2048);
SetLength(Dest, WideCharToMultiByte(CP_UTF8, 0, #buf[0], Len, PChar(Dest),
2048, nil, nil));
Edit2.Text := Dest;
end;
Note that this doesn't change the current thread locale, it simply passes the correct code page parameters to the API.
There are encoding tools in Open XML library. There is cUnicodeCodecsWin32 unit with functions like: EncodingToUTF16().
My code that converts between ISO Latin2 and UTF-8 looks like:
s2 := EncodingToUTF16('ISO-8859-2', s);
s2utf8 := UTF16ToEncoding('UTF-8', s2);