Encoding in Indy 10 and Delphi - delphi

I am using Indy 10 with Delphi. Following is my code which uses EncodeString method of Indy to encode a string.
var
EncodedString : String;
StringToBeEncoded : String;
EncoderMIME: TIdEncoderMIME;
....
....
EncodedString := EncoderMIME.EncodeString(StringToBeEncoded);
I am not getting the correct value in encoded sting.

What is the purpose of IndyTextEncoding_OSDefault?
Here's the source code for IndyTextEncoding_OSDefault.
function IndyTextEncoding_OSDefault: IIdTextEncoding;
begin
if GIdOSDefaultEncoding = nil then begin
LEncoding := TIdMBCSEncoding.Create;
if InterlockedCompareExchangeIntf(IInterface(GIdOSDefaultEncoding), LEncoding, nil) <> nil then begin
LEncoding := nil;
end;
end;
Result := GIdOSDefaultEncoding;
end;
Note that I stripped out the .net conditional code for simplicity. Most of this code is to arrange singleton thread-safety. The actual value returned is synthesised by a call to TIdMBCSEncoding.Create. Let's look at that.
constructor TIdMBCSEncoding.Create;
begin
Create(CP_ACP, 0, 0);
end;
Again I've remove conditional code that does not apply to your Windows setting. Now, CP_ACP is the Active Code Page, the current system Windows ANSI code page. So, on Windows at least, IndyTextEncoding_OSDefault is an encoding for the current system Windows ANSI code page.
Why did using IndyTextEncoding_OSDefault give the same behaviour as my Delphi 7 code?
That's because the Delphi 7 / Indy 9 code for TEncoderMIME.EncodeString does not perform any code page transformation and MIME encodes the input string as though it were a byte array. Since the Delphi 7 string is encoded in the active ANSI code page, this has the same effect as passing IndyTextEncoding_OSDefault to TEncoderMIME.EncodeString in your Unicode version of the code.
What is the difference between IndyTextEncoding_Default and IndyTextEncoding_OSDefault?
Here is the source code for IndyTextEncoding_OSDefault:
function IndyTextEncoding_Default: IIdTextEncoding;
var
LType: IdTextEncodingType;
begin
LType := GIdDefaultTextEncoding;
if LType = encIndyDefault then begin
LType := encASCII;
end;
Result := IndyTextEncoding(LType);
end;
This returns an encoding that is determined by the value of GIdDefaultTextEncoding. By default, GIdDefaultTextEncoding is encASCII. And so, by default, IndyTextEncoding_Default yields an ASCII encoding.
Beyond all this you should be asking yourself which encoding you want to be using. Relying on default values leaves you at the mercy of those defaults. What if those defaults don't do what you want to do. Especially as the defaults are not Unicode encodings and so support only a limited range of characters. And what's more are dependent on system settings.
If you wish to encode international text, you would normally choose to use the UTF-8 encoding.
One other point to make is that you are calling EncodeString as though it were an instance method, but it is actually a class method. You can remove EncoderMIME and call TEncoderMIME.EncodeString. Or keep EncoderMIME and call EncoderMIME.Encode.

Related

Send bytes using Indy 10

I'm trying to send bytes between two delphi 2010 applications using Indy 10, but without success. Received bytes are different from sent bytes. This is my example code:
Application 1, send button click:
var s:TIdBytes;
begin
setlength(s,3);
s[0]:=48;
s[1]:=227;
s[2]:=0;
IdTCPClient.IOHandler.WriteDirect(s); // or .Write(s);
end;
Application 2, idtcpserver execute event (first attempt):
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var rec:TIdBytes;
b:byte;
begin
SetLength(rec,0);
b:=AContext.Connection.IOHandler.ReadByte;
while b<>0 do
begin
setlength(rec, length(rec)+1);
rec[length(rec)-1]:=b;
b:=AContext.Connection.IOHandler.ReadByte;
end;
// Here I expect rec[0] = 48, rec[1]=227, rec[2]=0.
// But I get: rec[0] = 48, rec[1]=63, rec[2]=0. Rec[1] is incorrect
// ... rest of code
end;
Application 2, idtcpserver execute event (second attempt):
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var c:ansistring;
begin
c:= AContext.Connection.IOHandler.ReadLn(Char(0));
// Here I expect c[0] = 48, c[1]=227, c[2]=0.
// But I get: c[0] = 48, c[1]=63, c[2]=0. c[1] is incorrect
// ... rest of code
end;
The most strange thing is those applications were developed with Delphi 5 some years ago and they worked good (with readln(char(0)). When I translate both applications to Delphi 2010 they stop working. I supoused It was due unicode string, but I haven't found a solution.
First off, don't use TIdIOHandler.WriteDirect() directly, use only the TIdIOHandler.Write() overloads instead.
Second, there is no possible way that your 1st attempt code can produce the result you are claiming. TIdIOHandler.Write[Direct](TIdBytes) and TIdIOHandler.ReadByte() operate on raw bytes only, and bytes are transmitted as-is. So you are guaranteed to get exactly what you send, there is simply no possible way the byte values could change as you have suggested.
However, your 2nd attempt code certainly can produce the result you are claiming, because TIdIOHandler.ReadLn() will read in the bytes and convert them to a UnicodeString, which you are then assigning to an AnsiString, which means the received data is going through 2 lossy charset conversions:
first, the received bytes are decoded as-is into a UTF-16 UnicodeString using the TIdIOHandler.DefStringEncoding property as the decoding charset, which is set to IndyTextEncoding_ASCII by default (you can change that, for example to IndyTextEncoding_8bit). Byte 227 is outside of the ASCII range, so that byte gets decoded to Unicode character '?' (#63), which is an indication that data loss has occurred.
then, the UnicodeString is assigned to AnsiString, which converts the UTF-16 data to ANSI using the RTL's default charset (which is set to the user's OS locale by default, but can be changed using the System.SetMultiByteConversionCodePage() function). In this case, no more loss occurs, since the UnicodeString is holding characters in the US-ASCII range, which convert to ANSI as-is.
That being said, I would not suggest using TIdIOHandler.ReadByte() in a manual loop like you are doing. Although TIdIOHandler does not have a WaitFor() method for bytes, there is nonetheless a more efficient way to approach this, eg:
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var
rec: TIdBytes;
LPos: Integer;
begin
SetLength(rec, 0);
// adapted from code in TIdIOHandler.WaitFor()...
with AContext.Connection.IOHandler do
begin
LPos := 0;
repeat
LPos := InputBuffer.IndexOf(Byte(0), LPos);
if LPos <> -1 then begin
InputBuffer.ExtractToBytes(rec, LPos+1);
{ or, if you don't want the terminating $00 byte put in the TIdBytes:
InputBuffer.ExtractToBytes(rec, LPos);
InputBuffer.Remove(1);
}
Break;
end;
LPos := InputBuffer.Size;
ReadFromSource(True, IdTimeoutDefault, True);
until False;
end;
// ... rest of code
end;
Or:
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var
rec: string
begin
rec := AContext.Connection.IOHandler.ReadLn(#0, IndyTextEncoding_8Bit);
// ... rest of code
end;
The most strange thing is those applications were developed with Delphi 5 some years ago and they worked good (with readln(char(0)). When I translate both applications to Delphi 2010 they stop working. I supoused It was due unicode string, but I haven't found a solution.
Yes, the string type in Delphi 5 was AnsiString, but it was changed to UnicodeString in Delphi 2009.
But even so, in pre-Unicode versions of Delphi, TIdIOHandler.ReadLn() in Indy 10 will still read in raw bytes and convert them to Unicode using the TIdIOHanlder.DefStringEncoding. It will then convert that Unicode data to AnsiString before exiting, using the TIdIOHandler.DefAnsiEncoding property as the converting charset, which is set to IndyTextEncoding_OSDefault by default.
The morale of the story is - whenever you are reading in bytes and converting them to string characters, the bytes must be decoded using the correct charset, or else you will get data loss. This was true in Delphi 5 (but not as strictly enforced), it is true more so in Delphi 2009+.

Insert an emoji inside a string in Delphi 2007 [duplicate]

This question already has answers here:
Handling a Unicode String in Delphi Versions <= 2007
(5 answers)
Closed 5 years ago.
I'm trying to do exactly what the title say, insert an emoji into a string in Delphi 2007, just like the example below :
procedure TForm1.Button1Click(Sender: TObject);
var s : string;
begin
s := 'This is my original string (y)';
s := ansireplacestr(s,'(y)','👍');
showmessage(s);
end;
I can even paste the emoji into IDE's code, but in runtime showmessage results in this :
This is my original string ????
Is there a way to achieve this task in Delphi 2007 ? Due to several reasons i can't upgrade Delphi right now.
Someone said my question is solved on this topic :
Handling a Unicode String in Delphi Versions <= 2007
But this topic just says to use third-party components, without telling exactly how to do it.
EDIT : After suggested, i tried to use the functions pos, delete and insert and a widestring var :
function addEmoji(mystring : widestring) : widestring;
var r, aux : widestring;
p : integer;
begin
r := mystring;
while pos('(y)',r) > 0 do
begin
aux := r;
p := pos('(y)',aux);
Insert('👍',aux,p);
delete(aux,pos('(y)',aux),3);
r := aux;
end;
result := r;
end;
But the result is the '(y)' replaced by '????'.
In Delphi 2007, the default string type is AnsiString. Emojis require Unicode handling, as they use high Unicode codepoints that simply do not fit/exist in most commonly used Ansi encodings. So you need to use a Unicode UTF encoding instead (UTF-7, -8, -16, or -32).
You can use AnsiString for UTF-71, or UTF8String2 for UTF-8, or WideString for UTF-16, or UCS4String3 for UTF-32.
1: UTF-7 is a 7-bit ASCII compatible encoding.
2: UTF8String does exist in Delphi 2007 (it was introduced in Delphi 6), but it is not a true UTF-8 string type, it is just an alias for AnsiString with the expectation that it always holds UTF-8 encoded data. You have to use UTF8Encode() and UTF8Decode() to ensure proper conversions to other encodings via UTF-16. UTF8String did not become a true UTF-8 string type until Delphi 2009 (UTF8Encode() and UTF8Decode() were also deprecated).
3: UCS4String also exists since Delphi 6, but it is not a true string type at all (even in modern Delphi versions). It is just an alias for array of UCS4Char.
The RTL doesn't have any native support for UTF-7 (but it is not hard to implement manually), and very little support for UTF-32 (only to facilitate conversions between UTF-16 <-> UTF-32), so you should stick with UTF-8 or UTF-16 in your code.
You are going to lose Emoji data if you convert UTF data to Ansi, such as if you pass a WideString to ShowMessage(). You can pass a WideString to the Win32 API MessageBoxW() function instead, and you won't have any data loss, however the Emoji may or may not appear correctly depending on the font used by the dialog (but it won't appear as ??, at least).
However, the native RTL in Delphi 2007 simply does not support what you are attempting, at least not for UTF-16. You would have to find a 3rd party WideString-based function, or just write your own using the RTL's Pos(), Delete(), and Insert() intrinsic functions, which are overloaded for WideString data, eg:
function WideReplaceStr(const S, FromText, ToText: WideString): WideString;
var
I: Integer;
begin
Result := S;
repeat
I := Pos(FromText, Result);
if I = 0 then Break;
Delete(Result, I, Length(FromText));
Insert(ToText, Result, I);
until False;
end;
var
s : WideString;
begin
s := 'This is my original string (y)';
s := WideReplaceStr(s, '(y)', '👍');
MessageBoxW(0, PWideChar(s), '', MB_OK);
end;
However, using UTF-8, you can accomplish the same thing using the native RTL, but you still can't use ShowMessage() (well, you could, but it won't show non-ASCII characters correctly):
var
s : UTF8String;
begin
s := UTF8Encode('This is my original string (y)');
s := AnsiReplaceStr(s, '(y)', UTF8Encode('👍'));
MessageBoxW(0, PWideChar(UTF8Decode(s)), '', MB_OK);
end;
Either way, make sure your code editor is set to save the .pas file in UTF-8, otherwise you can't use the literal '👍', you would have to use something more like this instead:
var
Emoji: WideString;
SetLength(Emoji, 2);
Emoji[1] := WideChar($D83D);
Emoji[2] := WideChar($DC4D);
Then you can do this:
var s: WideString;
...
s := WideReplaceStr(s, '(y)', Emoji);
Or:
var s: UTF8String;
...
s := AnsiReplaceStr(s, '(y)', UTF8Encode(Emoji));

Delphi - Store WideStrings inside a program

In the past I used INI-Files to store unicode text, but now I need to store unicode text in the executable. How can I achieve this?
I want to store these letters:
āčēūīšķļņž
If you want to save the Unicode INI files then you might try the following code. The files are saved in UTF8 encoding.
Also you might take a look at this Unicode library where you can find a lot of helper functions.
uses IniFiles;
function WideStringToUTF8(const Value: WideString): AnsiString;
var
BufferLen: Integer;
begin
Result := '';
if Value <> '' then
begin
BufferLen := WideCharToMultiByte(CP_UTF8, 0, PWideChar(Value), -1, nil, 0, nil, nil);
SetLength(Result, BufferLen - 1);
if BufferLen > 1 then
WideCharToMultiByte(CP_UTF8, 0, PWideChar(Value), -1, PAnsiChar(Result), BufferLen - 1, nil, nil);
end;
end;
function UTF8ToWideString(const Value: AnsiString): WideString;
var
BufferLen: integer;
begin
Result := '';
if Value <> '' then
begin
BufferLen := MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(Value), -1, nil, 0);
SetLength(Result, BufferLen - 1);
if BufferLen > 1 then
MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(Value), -1, PWideChar(Result), BufferLen - 1);
end;
end;
procedure TForm1.Button1Click(Sender: TObject);
var
IniFile: TIniFile;
const
UnicodeValue = WideString(#$0101#$010D#$0113#$016B#$012B#$0161);
begin
IniFile := TIniFile.Create('C:\test.ini');
try
IniFile.WriteString('Section', 'Key', WideStringToUTF8(UnicodeValue));
IniFile.UpdateFile;
finally
IniFile.Free;
end;
end;
procedure TForm1.Button2Click(Sender: TObject);
var
IniFile: TIniFile;
UnicodeValue: WideString;
begin
IniFile := TIniFile.Create('C:\test.ini');
try
UnicodeValue := UTF8ToWideString(IniFile.ReadString('Section', 'Key', 'Default'));
MessageBoxW(Handle, PWideChar(UnicodeValue), 'Caption', 0);
finally
IniFile.Free;
end;
end;
with Delphi 2007 on 64-bit Windows 7 Enterprise SP 1
If you definitely need to use Delphi 7 there are some variants:
Store strings in resources linked to executable file.
Store strings in big memo or same thing, located on global data module or any other visual or non-visual component and access it by index. It's possible because strings in Delphi resources stored in XML-encoded form. E.g. your symbols example āčēūīšķļņž will be stored as āčēūīšķļņž
Store XML-encoded or Base64-encoded strings in string constants inside your code.
For string conversion you can use EncdDecd.pas , xdom.pas or some functions of System.pas like UTF8Encode/UTF8Decode.
To display and edit Unicode strings in Delphi forms you can use special set of Unicode controls like TNT Unicode Controls or subclass original Delphi controls and do some other workarounds by yourself, like described in this excerpt from comments in TntControls.pas (part of TNT Unicode Controls):
Windows NT provides support for native Unicode windows. To add
Unicode support to a
TWinControl descendant, override CreateWindowHandle() and call
CreateUnicodeHandle().
One major reason this works is because the VCL only uses the ANSI
version of
SendMessage() -- SendMessageA(). If you call SendMessageA() on a
UNICODE
window, Windows deals with the ANSI/UNICODE conversion
automatically. So
for example, if the VCL sends WM_SETTEXT to a window using
SendMessageA,
Windows actually expects a PAnsiChar even if the target window
is a UNICODE
window. So caling SendMessageA with PChars causes no problems.
A problem in the VCL has to do with the TControl.Perform() method.
Perform()
calls the window procedure directly and assumes an ANSI window.
This is a
problem if, for example, the VCL calls Perform(WM_SETTEXT, ...)
passing in a
PAnsiChar which eventually gets passed downto DefWindowProcW()
which expects a PWideChar.
This is the reason for SubClassUnicodeControl(). This procedure
will subclass the
Windows WndProc, and the TWinControl.WindowProc pointer. It will
determine if the
message came from Windows or if the WindowProc was called
directly. It will then
call SendMessageA() for Windows to perform proper conversion on
certain text messages.
Another problem has to do with TWinControl.DoKeyPress(). It is
called from the WM_CHAR
message. It casts the WideChar to an AnsiChar, and sends the
resulting character to
DefWindowProc. In order to avoid this, the DefWindowProc is
subclassed as well. WindowProc
will make a WM_CHAR message safe for ANSI handling code by
converting the char code to
#FF before passing it on. It stores the original WideChar in the
.Unused field of TWMChar.
The code #FF is converted back to the WideChar before passing onto
DefWindowProc.
Do
const MyString = WideString('Teksts latvie'#$0161'u valod'#$0101);
Simple, the idea is to find a non-visual component, which can store text and store your text there. Prefer that such component can also provide you an editor to edit the text in design time.
There is a component call FormResource which can do this. I use TUniScript. I believe there are other similar components. However, I did not find a usable component from the standard library.
The approach Widestring(#$65E5#$672C) does not work, because Delphi 7 just doesn't expect more than one byte for the #, so the outcome is by far not what you expect when going above 255 or $FF.
Another approach WideChar($65E5)+ WideChar($672C) can be used to store single Unicode codepoints in your source code when knowing that you need to have a Widestring at the start of the assignment (which can also be an empty literal) so the compiler understands which datatype you want:
const
// Compiler error "Imcompatible types"
WONT_COMPILE: WideChar($65E5)+ WideChar($672C);
// 日本
NIPPON: Widestring('')+ WideChar($65E5)+ WideChar($672C);
Looks cumbersome, but surely has your UTF-16 texts in Delphi 7.
Alternatively, store your constants in UTF-8, which is ASCII safe - that way you can use # easily. One advantage is, that it's a lot less cumbersome to write in your source code. One disadvantage is, that you can never use the constant directly, but have to convert it to UTF-16 first:
const
// UTF-8 of the two graphemes 日 and 本, needing 3 bytes each
NIPPON: #$E6#$97#$A5#$E6#$9C#$AC;
var
sUtf16: Widestring;
begin
// Internally these are 2 WORDs: $65E5 and $672C
sUtf16:= UTF8ToWideString( NIPPON );

How to save string into text files in Delphi?

What is the easiest way to create and save string into .txt files?
Use TStringList.
uses
Classes, Dialogs; // Classes for TStrings, Dialogs for ShowMessage
var
Lines: TStrings;
Line: string;
FileName: string;
begin
FileName := 'test.txt';
Lines := TStringList.Create;
try
Lines.Add('First line');
Lines.Add('Second line');
Lines.SaveToFile(FileName);
Lines.LoadFromFile(FileName);
for Line in Lines do
ShowMessage(Line);
finally
Lines.Free;
end;
end;
Also SaveToFile and LoadFromFile can take an additional Encoding in Delphi 2009 and newer to set the text encoding (Ansi, UTF-8, UTF-16, UTF-16 big endian).
Actually, I prefer this:
var
Txt: TextFile;
SomeFloatNumber: Double;
SomeStringVariable: string;
Buffer: Array[1..4096] of byte;
begin
SomeStringVariable := 'Text';
AssignFile(Txt, 'Some.txt');
Rewrite(Txt);
SetTextBuf(Txt, Buffer, SizeOf(Buffer));
try
WriteLn(Txt, 'Hello, World.');
WriteLn(Txt, SomeStringVariable);
SomeFloatNumber := 3.1415;
WriteLn(Txt, SomeFloatNumber:0:2); // Will save 3.14
finally CloseFile(Txt);
end;
end;
I consider this the easiest way, since you don't need the classes or any other unit for this code. And it works for all Delphi versions including -if I'm not mistaken- all .NET versions of Delphi...
I've added a call to SetTextBuf() to this example, which is a good trick to speed up textfiles in Delphi considerably. Normally, textfiles have a buffer of only 128 bytes. I tend to increase this buffer to a multiple of 4096 bytes. In several cases, I'va also implemented my own TextFile types, allowing me to use these "console" functions to write text to memo fields or even to another, external application! At this location is some example code (ZIP) I wrote in 2000 and just modified to make sure it compiles with Delphi 2007. Not sure about newer Delphi versions, though. Then again, this code is 10 years old already.These console functions have been a standard of the Pascal language since it's beginning so I don't expect them to disappear anytime soon. The TtextRec type might be modified in the future, though, so I can't predict if this code will work in the future... Some explanations:
WA_TextCustomEdit.AssignCustomEdit allows text to be written to CustomEdit-based objects like TMemo.
WA_TextDevice.TWATextDevice is a class that can be dropped on a form, which contains events where you can do something with the data written.
WA_TextLog.AssignLog is used by me to add timestamps to every line of text.
WA_TextNull.AssignNull is basically a dummy text device. It just discards anything you write to it.
WA_TextStream.AssignStream writes text to any TStream object, including memory streams, file streams, TCP/IP streams and whatever else you have.
Code in link is hereby licensed as CC-BY
Oh, the server with the ZIP file isn't very powerful, so it tends to be down a few times every day. Sorry about that.
The IOUtils unit which was introduced in Delphi 2010 provides some very convenient functions for writing/reading text files:
//add the text 'Some text' to the file 'C:\users\documents\test.txt':
TFile.AppendAllText('C:\users\documents\text.txt', 'Some text', TEncoding.ASCII);
Or if you are using an older version of Delphi (which does not have the for line in lines method of iterating a string list):
var i : integer;
begin
...
try
Lines.Add('First line');
Lines.Add('Second line');
Lines.SaveToFile(FileName);
Lines.LoadFromFile(FileName);
for i := 0 to Lines.Count -1 do
ShowMessage(Lines[i]);
finally
Lines.Free;
end;
If you're using a Delphi version >= 2009, give a look to the TStreamWriter class.
It will also take care of text file encodings and newline characters.
procedure String2File;
var s:ansiString;
begin
s:='My String';
with TFileStream.create('c:\myfile.txt',fmCreate) do
try
writeBuffer(s[1],length(s));
finally
free;
end;
end;
Care needed when using unicode strings....

Delphi: problem with httpcli (ICS) post method

I am using HttpCli component form ICS to POST a request. I use an example that comes with the component. It says:
procedure TForm4.Button2Click(Sender: TObject);
var
Data : String;
begin
Data:='status=no';
HttpCli1.SendStream := TMemoryStream.Create;
HttpCli1.SendStream.Write(Data[1], Length(Data));
HttpCli1.SendStream.Seek(0, 0);
HttpCli1.RcvdStream := TMemoryStream.Create;
HttpCli1.URL := Trim('http://server/something');
HttpCli1.PostAsync;
end;
But it fact, it sends not
status=no
but
s.t.a.t.u
I can't understand, where is the problem. Maybe someone can show an example, how to send POST request with the help of HttpCli component?
PS I can't use Indy =)
I suppose you're using Delphi 2009 or later, where the string type holds two-byte-per-character Unicode data. The Length function gives the number of characters, not the number of bytes, so when you put your string into the memory stream, you only copy half the bytes from the string. Even if you'd copied all of them, though, you'd still have a bunch of extra data in the stream since each character has two bytes and the server probably only expects to get one.
Use a different string type, such as AnsiString or UTF8String.

Resources