I am using Delphi 2009.
I want to view the contents of a file (in hexadecimal) inside a memo.
I'm using this code :
var
Buffer:String;
begin
Buffer := '';
AssignFile(sF,Source); //Assign file
Reset(sF);
repeat
Readln(sF,Buffer); //Load every line to a string.
TempChar:=StrToHex(Buffer); //Convert to Hex using the function
...
until EOF(sF);
end;
function StrToHex(AStr: string): string;
var
I ,Len: Integer;
s: chr (0)..255;
//s:byte;
//s: char;
begin
len:=length(AStr);
Result:='';
for i:=1 to len do
begin
s:=AStr[i];
//The problem is here. Ord(s) is giving false values (251 instead of 255)
//And in general the output differs from a professional hex editor.
Result:=Result +' '+IntToHex(Ord(s),2)+'('+IntToStr(Ord(s))+')';
end;
Delete(Result,1,1);
end;
When I declare variable "s" as char (i know that char goes up to 255) I get results hex values up to 65535!
When i declare variable "s" as byte or chr (0)..255, it outputs different hex values, comparing to any Hexadecimal Editor!
Why is that? How can I see the correct values?
Check images for the differences.
1st image: Professional Hex Editor.
2nd image: Function output to Memo.
Thank you.
Your Delphi 2009 is unicode-enabled, so Char is actually WideChar and that's a 2 byte, 16 bit unsigned value, that can have values from 0 to 65535.
You could change all your Char declarations to AnsiChar and all your String declarations to AnsiString, but that's not the way to do it. You should drop Pascal I/O in favor of modern stream-based I/O, use a TFileStream, and don't treat binary data as Char.
Console demo:
program Project26;
{$APPTYPE CONSOLE}
uses SysUtils, Classes;
var F: TFileStream;
Buff: array[0..15] of Byte;
CountRead: Integer;
HexText: array[0..31] of Char;
begin
F := TFileStream.Create('C:\Temp\test', fmOpenRead or fmShareDenyWrite);
try
CountRead := F.Read(Buff, SizeOf(Buff));
while CountRead <> 0 do
begin
BinToHex(Buff, HexText, CountRead);
WriteLn(HexText); // You could add this to the Memo
CountRead := F.Read(Buff, SizeOf(Buff));
end;
finally F.Free;
end;
end.
In Delphi 2009, a Char is the same thing as a WideChar, that is, a Unicode character. A wide character occupies two bytes. You want to use AnsiChar. Prior to Delphi 2009 (that is, prior to Unicode Delphi), Char was the same thing as AnsiChar.
Also, you shouldn't use ReadLn. You are treating the file as a text file with text-file line endings! This is a general file! It might not have any text-file line endings at all!
For an easier to read output, and looking better too, you might want to use this simple hex dump formatter.
The HexDump procedure dumps an area of memory into a TStrings in lines of two chunks of 8 bytes in hex, and 16 ascii chars
example
406563686F206F66 660D0A6966206578 #echo off..if ex
69737420257E7331 5C6E756C20280D0A ist %~s1\nul (..
0D0A290D0A ..)..
Here is the code for the dump format function
function HexB (b: Byte): String;
const HexChar: Array[0..15] of Char = '0123456789ABCDEF';
begin
result:= HexChar[b shr 4]+HexChar[b and $0f];
end;
procedure HexDump(var data; size: Integer; s: TStrings);
const
sepHex=' ';
sepAsc=' ';
nonAsc='.';
var
i : Integer;
hexDat, ascDat : String;
buff : Array[0..1] of Byte Absolute data;
begin
hexDat:='';
ascDat:='';
for i:=0 to size-1 do
begin
hexDat:=hexDat+HexB(buff[i]);
if ((buff[i]>31) and (buff[i]<>255)) then
ascDat:=ascDat+Char(buff[i])
else
ascDat:=ascDat+nonAsc;
if (((i+1) mod 16)<>0) and (((i+1) mod 8)=0) then
hexDat:=hexDat+sepHex;
if ((i+1) mod 16)=0 then
begin
s.Add(hexdat+sepAsc+ascdat);
hexdat:='';
ascdat:='';
end;
end;
if (size mod 16)<>0 then
begin
if (size mod 16)<8 then
hexDat:=hexDat+StringOfChar(' ',(8-(size mod 8))*2)
+sepHex+StringOfChar(' ',16)
else
hexDat:=hexDat+StringOfChar(' ',(16-(size mod 16))*2);
s.Add(hexDat + sepAsc + ascDat);
end;
end;
And here is a complete code example for dumping the contents of a file into a Memo field.
procedure TForm1.Button1Click(Sender: TObject);
var
FStream: TFileStream;
buff: array[0..$fff] of Byte;
nRead: Integer;
begin
FStream := TFileStream.Create(edit1.text, fmOpenRead or fmShareDenyWrite);
try
repeat
nRead := FStream.Read(Buff, SizeOf(Buff));
if nRead<>0 then
hexdump(buff,nRead,memo1.lines);
until nRead=0;
finally
F.Free;
end;
end;
string is UnicodeString in Delphi 2009. If you want to use single-byte strings use AnsiString or RawByteString.
See String types.
Related
When i create a file in Notepad, containing (example) the string 1d and save as unicode file, i get a 6 bytes size file containing the bytes #255#254#49#0#100#0.
OK. Now I need a Delphi 6 function which takes (example) input the widestring 1d and returns the string containing #255#254#49#0#100#0 (and viceversa).
How?
Thanks.
D
It is easier to read bytes if you use hex. #255#254#49#0#100#0 is represented in hex as
FF FE 31 00 64 00
Where
FF FE is the UTF-16LE BOM, which identifies the following bytes as being encoded as UTF-16 using values in Little Endian.
31 00 is the ASCII character '1'
64 00 is the ASCII character 'd'.
To create a WideString containing these bytes is very easy:
var
W: WideString;
S: String;
begin
S := '1d';
W := WideChar($FEFF) + S;
end;
When an AnsiString (which is Delphi 6's default string type) is assigned to a WideString, the RTL automatically converts the AnsiString data from 8-bit to UTF-16LE using the local machine's default Ansi charset for the conversion.
Going the other way is just as easy:
var
W: WideString;
S: String;
begin
W := WideChar($FEFF) + '1d';
S := Copy(W, 2, MaxInt);
end;
When you assign a WideString to an AnsiString, the RTL automatically converts the WideString data from UTF-16LE to 8-bit using the default Ansi charset.
If the default Ansi charset is not suitable for your needs (say the 8-bit data needs to be encoded in a different charset), you will have to use the Win32 API MultiByteToWideChar() and WideCharToMultiByte() functions directly (or 3rd party library with equivalent functionality) so you can specify the desired charset/codepage as needed.
Now then, Delphi 6 does not offer any useful helpers to read Unicode files (Delphi 2009 and later do), so you will have to do it yourself manually, for example:
function ReadUnicodeFile(const FileName: string): WideString;
const
cBOM_UTF8: array[0..2] of Byte = ($EF, $BB, $BF);
cBOM_UTF16BE: array[0..1] of Byte = ($FE, $FF);
cBOM_UTF16LE: array[0..1] of Byte = ($FF, $FE);
cBOM_UTF32BE: array[0..3] of Byte = ($00, $00, $FE, $FF);
cBOM_UTF32LE: array[0..3] of Byte = ($FF, $FE, $00, $00);
var
FS: TFileStream;
BOM: array[0..3] of Byte;
NumRead: Integer;
U8: UTF8String;
U32: UCS4String;
I: Integer;
begin
Result := '';
FS := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
try
NumRead := FS.Read(BOM, 4);
// UTF-8
if (NumRead >= 3) and CompareMem(#BOM, #cBOM_UTF8, 3) then
begin
if NumRead > 3 then
FS.Seek(-(NumRead-3), soCurrent);
SetLength(U8, FS.Size - FS.Position);
if Length(U8) > 0 then
begin
FS.ReadBuffer(PAnsiChar(U8)^, Length(U8));
Result := UTF8Decode(U8);
end;
end
// the UTF-16LE and UTF-32LE BOMs are ambiguous! Check for UTF-32 first...
// UTF-32
else if (NumRead = 4) and (CompareMem(#BOM, cBOM_UTF32LE, 4) or CompareMem(#BOM, cBOM_UTF32BE, 4)) then
begin
// UCS4String is not a true string type, it is a dynamic array, so
// it must include room for a null terminator...
SetLength(U32, ((FS.Size - FS.Position) div SizeOf(UCS4Char)) + 1);
if Length(U32) > 1 then
begin
FS.ReadBuffer(PUCS4Chars(U32)^, (Length(U32) - 1) * SizeOf(UCS4Char));
if CompareMem(#BOM, cBOM_UTF32BE, 4) then
begin
for I := Low(U32) to High(U32) do
begin
U32[I] := ((U32[I] and $000000FF) shl 24) or
((U32[I] and $0000FF00) shl 8) or
((U32[I] and $00FF0000) shr 8) or
((U32[I] and $FF000000) shr 24);
end;
end;
U32[High(U32)] := 0;
// Note: UCS4StringToWidestring() does not actually support UTF-16,
// only UCS-2! If you need to handle UTF-16 surrogates, you will
// have to convert from UTF-32 to UTF-16 manually, there is no RTL
// or Win32 function that will do it for you...
Result := UCS4StringToWidestring(U32);
end;
end
// UTF-16
else if (NumRead >= 2) and (CompareMem(#BOM, cBOM_UTF16LE, 2) or CompareMem(#BOM, cBOM_UTF16BE, 2)) then
begin
if NumRead > 2 then
FS.Seek(-(NumRead-2), soCurrent);
SetLength(Result, (FS.Size - FS.Position) div SizeOf(WideChar));
if Length(Result) > 0 then
begin
FS.ReadBuffer(PWideChar(Result)^, Length(Result) * SizeOf(WideChar));
if CompareMem(#BOM, cBOM_UTF16BE, 2) then
begin
for I := 1 to Length(Result) then
begin
Result[I] := WideChar(
((Word(Result[I]) and $00FF) shl 8) or
((Word(Result[I]) and $FF00) shr 8)
);
end;
end;
end;
end
// something else, assuming UTF-8
else
begin
if NumRead > 0 then
FS.Seek(-NumRead, soCurrent);
SetLength(U8, FS.Size - FS.Position);
if Length(U8) > 0 then
begin
FS.ReadBuffer(PAnsiChar(U8)^, Length(U8));
Result := UTF8Decode(U8);
end;
end;
finally
FS.Free;
end;
end;
Update: if you want to store UTF-16LE encoded bytes inside of an AnsiString variable (why?), then you can Move() the raw bytes of a WideString's character data into the memory block of an AnsiString: eg:
function WideStringAsAnsi(const AValue: WideString): AnsiString;
begin
SetLength(Result, Length(AValue) * SizeOf(WideChar));
Move(PWideChar(AValue)^, PAnsiChar(Result)^, Length(Result));
end;
var
W: WideString;
S: AnsiString;
begin
W := WideChar($FEFF) + '1d';
S := WideStringAsAnsi(W);
end;
I would not suggest misusing AnsiString like this, though. If you need bytes, operate on bytes, eg:
type
TBytes = array of Byte;
function WideStringAsBytes(const AValue: WideString): TBytes;
begin
SetLength(Result, Length(AValue) * SizeOf(WideChar));
Move(PWideChar(AValue)^, PByte(Result)^, Length(Result));
end;
var
W: WideString;
B: TBytes;
begin
W := WideChar($FEFF) + '1d';
B := WideStringAsBytes(W);
end;
A WideString is already a string of Unicode bytes. Specifically, in UTF16-LE encoding.
The two extra bytes you see in the Unicode file saved by Notepad are called a BOM - Byte Order Mark. This is a special character in Unicode that is used to indicate the order of bytes in the data that follows, to ensure that the string is decoded correctly.
Adding a BOM to a string (which is what you are asking for) is simply a matter of pre-fixing the string with that special BOM character. The BOM character is U+FEFF (that is the Unicode notation for the hex representation of a 'character').
So, the function you need is very simple:
function WideStringWithBOM(aString: WideString): WideString;
const
BOM = WideChar($FEFF);
begin
result := BOM + aString;
end;
However, although the function is very simple, this possibly isn't the end of the matter.
The string that is returned from this function will include the BOM and as far as any Delphi code is concerned that BOM will be treated as part of the string.
Typically you would only add a BOM to string when passing that string to some external recipient (via a file or web service response for example) if there is no other mechanism for indicating the encoding you have used.
Likewise, when reading strings from some received data which may be Unicode you should check the first two bytes:
If you find #255#254 ($FFFE) then you know that the bytes in the U+FEFF BOM have been switched (U+FFFE is not a valid Unicode character). i.e. the string that follows is UTF16-LE. Therefore, for a Delphi WideString you can discard those first two bytes and load the remaining bytes directly in to a suitable WideString variable.
If you find #254#255 then the bytes in the U+FEFF BOM have not been switched around. i.e. you know that the string that follows is UTF16-BE. In that case you again need to discard the first two bytes but when loading the remaining bytes into the WideString you must switch each pair of bytes around to convert from the UTF16-BE bytes to the UTF16-LE encoding of a WideString.
If the first 2 bytes are #255#254 (or vice versa) then you are either dealing with UTF16-LE without a BOM or possibly some other encoding entirely.
Good luck. :)
I use Berlin in Windows 10. I try to save tList<string> to a file.
I know how to handle tStringlist, tStreamWriter and tStreamReader but I need to use tFileStream because the other type of data should be added.
In the following code the loop of Button2Click which reads the data raises an eOutOfMemory exception. When I allocate simple string value to _String it works well but if I put tList value to the same _String it seems that wrong data were written on the file. I can't understand the difference between _String := _List.List[i] and _String := 'qwert'.
How can I write tList<string> to tFileSteam?
procedure TForm1.Button1Click(Sender: TObject);
var
_List: TList<string>;
_FileStream: TFileStream;
_String: string;
i: Integer;
begin
_List := TList<string>.Create;
_List.Add('abcde');
_List.Add('abcde12345');
_FileStream := TFileStream.Create('test', fmCreate);
for i := 0 to 1 do
begin
_String := _List.List[i]; // _String := 'qwert' works well
_FileStream.Write(_string, 4);
end;
_FileStream.Free;
_List.Free;
end;
procedure TForm1.Button2Click(Sender: TObject);
var
_FileStream: TFileStream;
_String: string;
i: Integer;
begin
_FileStream := TFileStream.Create('test', fmOpenRead);
for i := 0 to 1 do
begin
_FileStream.Read(_String, 4);
Memo1.Lines.Add(_String);
end;
_FileStream.Free;
end;
If you lookup in the docs what TFileStream.Write does, it tells you (inherited from THandleStream.Write):
function Write(const Buffer; Count: Longint): Longint; override;
function Write(const Buffer: TBytes; Offset, Count: Longint): Longint; override;
Writes Count bytes from the Buffer to the current position in the
resource.
Now, Buffer is untyped and as such is expected to be the memory address of the data to be written. You are passing a string variable which is a reference to the actual string data, the address of the variable holds a pointer to string data. You are therefore writing a pointer to the file.
To correct it pass the strings first character for the buffer, ....write(_string[1], ...
If you have compiler directive {$ZEROBASEDSTRINGS ON} you would use index 0.
Alternatively, typecast the string to PChar and dereference it: ....write(PChar(_String)^, ...
Then look at the second parameter, Count. As the docs say, it indicates the number of bytes to be written, specifically not characters. In Delphi 2009 and later strings are UnicodeString, so each character is 2 bytes. You need to pass the strings size in bytes.
This will write 4 characters (8 bytes) to the file stream:
_FileStream.Write(_String[1], 4 * SizeOf(Char));
or better
_FileStream.Write(PChar(_String)^, 4 * SizeOf(Char));
For reading you need to make corresponding changes, but most notable, you need to set the strings length before reading (length is counted in characters).
SetLength(_String, 4);
for i := 0 to 1 do
begin
_FileStream.Read(_String[1], 4 * SizeOf(Char));
Memo1.Lines.Add(_String);
end;
To continue with this low-level approach you could generalize string writing and reading as follows:
Add a variable to hold the length of a string
var
_String: string;
_Length: integer;
then writing
begin
...
for ....
begin
_String := _List.List[i];
_Length := Length(_String);
_FileStream.Write(_Length, SizeOf(Integer));
_FileStream.Write(PChar(_List.List[i])^, _Length * SizeOf(Char));
end;
and reading
begin
...
for ....
begin
_FileStream.Read(_Length, SizeOf(Integer));
SetLength(_String, _Length);
_FileStream.Read(_String[1], _Length * SizeOf(Char));
Memo1.Lines.Add(_String);
end;
IOW, you write the length first and then the string. On reading you read the length and then the string.
When I use this code and function in Delphi 7 an error message will be displayed :
This code convert MemoryStream content to WideString
function ReadWideString(stream: TStream): WideString;
var
nChars: LongInt;
begin
stream.Position := 0;
stream.ReadBuffer(nChars, SizeOf(nChars));
SetLength(Result, nChars);
if nChars > 0 then
stream.ReadBuffer(Result[1], nChars * SizeOf(Result[1]));
end;
procedure TForm1.Button2Click(Sender: TObject);
var
mem: TMemoryStream;
begin
mem := TMemoryStream.Create;
mem.LoadFromFile('C:\Users\User1\Desktop\wide.txt');
Memo1.Lines.Add(ReadWideString(mem));
end;
Any help would be greatly appreciated.
Your code works fine as is. The problem is that the input that you pass to your function is not in the expected format.
The function that you are using expects a 4 byte integer containing the length, followed by the UTF-16 payload.
It looks like you actually have straight UTF-16 text, without the length prepended. Read that like this:
stream.Position := 0;
nChars := stream.Size div SizeOf(Result[1]);
SetLength(Result, nChars);
if nChars > 0 then
stream.ReadBuffer(Result[1], nChars * SizeOf(Result[1]));
Now, your input may contain a UTF-16 BOM. If so you'll need to decide how to handle that.
The bottom line here is that you need your code to match the input you provide.
I have got a DLL function that returns a pointer to ANSI text (PAnsiChar). I want to assign this to a (unicode-) string (This is Delphi XE2.). The following compiles but I get a warning
"W1057 Implicit String cast from 'AnsiChar' to 'string'":
function TProj4.pj_strerrno(_ErrorCode: Integer): string;
var
Err: PAnsiChar;
begin
Err := Fpj_strerrno(_ErrorCode);
Result := Err;
end;
EDIT: The text in question is an error message in English, so there are unlikely to be any conversion problems here.
I am now tempted to just explicitly typecast Err to string like this ...
Result := String(Err);
.. to get rid of the warning. Could this go wrong? Should I rather use a temporary AnsiString variable instead?
var
s: AnsiString;
[...]
s := Err;
Result := String(s);
If yes, why?
Or should I make it explicit, that the code first converts a PAnsiChar to AnsiString and then the AnsiString to a String?
Result := String(AnsiString(Err));
And of course I could make it a function:
function PAnsicharToString(_a: PAnsiChar): string;
begin
// one of the above conversion codes goes here
end;
All these options compile, but will they work? And what's the best practice here?
Bonus points: The code should ideally compile and work with Delphi 2007 and newer versions as well.
If the text is encoded in the users current locale then I'd say it is simplest to write:
var
p: PAnsiChar;
str: string;
....
str := string(p);
Otherwise if you wish to convert from a specific code page to a Unicode string then you would use UnicodeFromLocaleChars.
I think the general solution is assigning c char pointer to RawByteString, then set its codepage corresponding to c null-terminated string encoding.
var
bys :TBytes;
rbstr :RawByteString;
ustr :string;
pastr :PAnsiChar;
begin
SetLength(bys,5);
bys[0] := $ca;
bys[1] := $e9;
bys[2] := $d2;
bys[3] := $b5;
bys[4] := 0;
pastr := #bys[0]; // just simulate char* returned by c api
rbstr := pastr; // assign PAnsiChar to RawByteString
// assume text encoded as codepage 936
// Note here: set 3rd param to false!
SetCodePage(rbstr,936,false);
ustr := string(rbstr);
ShowMessage(ustr);
end;
And the other cross-platform solution is (vcl,fmx,fmx with mobile platform)
function CString2TBytes(ptr :{$IFDEF NEXTGEN} MarshaledAString {$ELSE} PAnsiChar {$ENDIF}) :TBytes;
var
pby :PByte;
len :Integer;
begin
pby := PByte(ptr);
while pby^<>0 do Inc(pby);
len := pby - ptr;
SetLength(Result,len);
if len>0 then Move(ptr^,Result[0],len);
end;
procedure TForm5.Button1Click(Sender: TObject);
var
bys, cbys: TBytes;
ustr: string;
// PAnsiChar is undefined in mobile platform
// remap param foo(outSting:PAnsiString) => foo(outString:MarshaledAString)
ptr: {$IFDEF NEXTGEN} MarshaledAString {$ELSE} PAnsiChar {$ENDIF}; //
encoding : TEncoding;
begin
SetLength(bys, 5);
bys[0] := $CA;
bys[1] := $E9;
bys[2] := $D2;
bys[3] := $B5;
bys[4] := 0;
ptr := #bys[0]; // just simulate char* returned by c api
cbys := CString2TBytes(ptr);
// assume text encoded as codepage 936
encoding := TEncoding.GetEncoding(936);
try
ustr := encoding.GetString(cbys);
ShowMessage(ustr);
finally
encoding.Free;
end;
end;
I want to encode strings as Python do.
Python code is this:
def EncodeToUTF(inputstr):
uns = inputstr.decode('iso-8859-2')
utfs = uns.encode('utf-8')
return utfs
This is very simple.
But in Delphi I don't understand, how to encode, to force first the good character set (no matter, which computer we have).
I tried this test code to see the convertion:
procedure TForm1.Button1Click(Sender: TObject);
var
w : WideString;
buf : array[0..2048] of WideChar;
i : integer;
lc : Cardinal;
begin
lc := GetThreadLocale;
Caption := IntToStr(lc);
StringToWideChar(Edit1.Text, buf, SizeOF(buf));
w := buf;
lc := MakeLCID(
MakeLangID( LANG_ENGLISH, SUBLANG_ENGLISH_US),
0);
Win32Check(SetThreadLocale(lc));
Edit2.Text := WideCharToString(PWideChar(w));
Caption := IntToStr(AnsiCompareText(Edit1.Text, Edit2.Text));
end;
The input is: "árvíztűrő tükörfúrógép", the hungarian accent tester phrase.
The local lc is 1038 (hun), the new lc is 1033.
But this everytime makes 0 result (same strings), and the accents are same, I don't lost ŐŰ which is not in english lang.
What I do wrong? How to I do same thing as Python do?
Thanks for every help, link, etc:
dd
Windows uses codepage 28592 for ISO-8859-2. If you have a buffer containing ISO-8859-2 encoded bytes, then you have to decode the bytes to UTF-16 first, and then encode the result to UTF-8. Depending on which version of Delphi you are using, you can either:
1) on pre-D2009, use MultiByteToWideChar() and WideCharToMultiByte():
function EncodeToUTF(const inputstr: AnsiString): UTF8String;
var
ret: Integer;
uns: WideString;
begin
Result := '';
if inputstr = '' then Exit;
ret := MultiByteToWideChar(28592, 0, PAnsiChar(inputstr), Length(inputstr), nil, 0);
if ret < 1 then Exit;
SetLength(uns, ret);
MultiByteToWideChar(28592, 0, PAnsiChar(inputstr), Length(inputstr), PWideChar(uns), Length(uns));
ret := WideCharToMultiByte(65001, 0, PWideChar(uns), Length(uns), nil, 0, nil, nil);
if ret < 1 then Exit;
SetLength(Result, ret);
WideCharToMultiByte(65001, 0, PWideChar(uns), Length(uns), PAnsiChar(Result), Length(Result), nil, nil);
end;
2a) on D2009+, use SysUtils.TEncoding.Convert():
function EncodeToUTF(const inputstr: RawByteString): UTF8String;
var
enc: TEncoding;
buf: TBytes;
begin
Result := '';
if inputstr = '' then Exit;
enc := TEncoding.GetEncoding(28592);
try
buf := TEncoding.Convert(enc, TEncoding.UTF8, BytesOf(inputstr));
if Length(buf) > 0 then
SetString(Result, PAnsiChar(#buf[0]), Length(buf));
finally
enc.Free;
end;
end;
2b) on D2009+, alternatively define a new string typedef, put your data into it, and assign it to a UTF8String variable. No manual encoding/decoding needed, the RTL will handle everything for you:
type
Latin2String = type AnsiString(28592);
var
inputstr: Latin2String;
outputstr: UTF8String;
begin
// put the ISO-8859-2 encoded bytes into inputstr, then...
outputstr := inputstr;
end;
If you're using Delphi 2009 or newer every input from the default VCL controls will be UTF-16, so no need to do any conversions on your input.
If you're using Delphi 2007 or older (as it seems) you are at mercy of Windows, because the VCL is ANSI and Windows has a fixed Codepage that determines which characters can be used in i.e. a TEdit.
You can change the system-wide default ANSI CP in the control panel though, but that requires a reboot each time you do.
In Delphi 2007 you have some chance to use TNTUnicode controls or some similar solution to get the Text from the UI to your code.
In Delphi 2009 and newer there are also plenty of Unicode and character set handling routines in the RTL.
The conversion between character sets can be done with SysUtils.TEncoding:
http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/SysUtils_TEncoding.html
The Python code in your question returns a string in UTF-8 encoding. To do this with pre-2009 Delphi versions you can use code similar to:
procedure TForm1.Button1Click(Sender: TObject);
var
Src, Dest: string;
Len: integer;
buf : array[0..2048] of WideChar;
begin
Src := Edit1.Text;
Len := MultiByteToWideChar(CP_ACP, 0, PChar(Src), Length(Src), #buf[0], 2048);
buf[Len] := #0;
SetLength(Dest, 2048);
SetLength(Dest, WideCharToMultiByte(CP_UTF8, 0, #buf[0], Len, PChar(Dest),
2048, nil, nil));
Edit2.Text := Dest;
end;
Note that this doesn't change the current thread locale, it simply passes the correct code page parameters to the API.
There are encoding tools in Open XML library. There is cUnicodeCodecsWin32 unit with functions like: EncodingToUTF16().
My code that converts between ISO Latin2 and UTF-8 looks like:
s2 := EncodingToUTF16('ISO-8859-2', s);
s2utf8 := UTF16ToEncoding('UTF-8', s2);