How to encode binary data with Indy Mime? - delphi

I have an Ansi string that I use to store binary data - bytes in 0-255 range (I know it should be a Byte array or so, but it is not much difference between them).
I want to pass this "binary string" through Indy MIME (TIdEncoderMIME.EncodeString / TIdDecoderMIME.DecodeString) and obtain a human-readable ANSI string.
I thought that the output of Encode/DecodeString will be a string that has only ANSI characters in it if I use IndyTextEncoding_8Bit encoding. But I was wrong!
so, how to encode binary data with Indy Mime (something similar to application/octet-stream)?

DONT use AnsiString for binary data!
AnsiString is not an appropriate container for binary data, especially in a Unicode environment like XE7. Use a proper byte container, like T(Id)Bytes or TMemoryStream instead.
You can't pass AnsiString as-is through the TId(Encoder|Decoder)MIME string methods, only UnicodeString, so implicit RTL Ansi<->Unicode conversions are likely to corrupt your binary data. Use the binary-oriented methods instead ((Encode|Decode)Bytes(), (Encode|Decode)Stream()). They exist for a reason.
That being said, Indy 10 does have a TIdMemoryBufferStream class (desktop platforms only), so if you MUST use AnsiString (and you really shouldn't), you can wrap it in a TStream interface without having to make additional copies of data in memory. For example:
var
Binary: AnsiString;
Strm: TIdMemoryBufferStream;
Base64: String;
begin
Binary := ...; // binary data
Strm := TIdMemoryBufferStream.Create(PAnsiChar(Binary), Length(Binary));
try
Base64 := TIdEncoderMIME.EncodeStream(Strm);
finally
Strm.Free;
end;
// use Base64 as needed...
end;
var
Base64: String;
Strm: TIdMemoryBufferStream;
Binary: AnsiString;
begin
Base64 := ...; // encoded data
SetLength(Binary, (Length(Base64) div 4) * 3);
Strm := TIdMemoryBufferStream.Create(PAnsiChar(Binary), Length(Binary));
try
TIdDecoderMIME.DecodeStream(Base64, Strm);
SetLength(Binary, Strm.Size);
SetCodePage(PRawByteString(#Binary)^, 28591, False);
finally
Strm.Free;
end;
// use Binary as needed...
end;

Related

Send bytes using Indy 10

I'm trying to send bytes between two delphi 2010 applications using Indy 10, but without success. Received bytes are different from sent bytes. This is my example code:
Application 1, send button click:
var s:TIdBytes;
begin
setlength(s,3);
s[0]:=48;
s[1]:=227;
s[2]:=0;
IdTCPClient.IOHandler.WriteDirect(s); // or .Write(s);
end;
Application 2, idtcpserver execute event (first attempt):
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var rec:TIdBytes;
b:byte;
begin
SetLength(rec,0);
b:=AContext.Connection.IOHandler.ReadByte;
while b<>0 do
begin
setlength(rec, length(rec)+1);
rec[length(rec)-1]:=b;
b:=AContext.Connection.IOHandler.ReadByte;
end;
// Here I expect rec[0] = 48, rec[1]=227, rec[2]=0.
// But I get: rec[0] = 48, rec[1]=63, rec[2]=0. Rec[1] is incorrect
// ... rest of code
end;
Application 2, idtcpserver execute event (second attempt):
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var c:ansistring;
begin
c:= AContext.Connection.IOHandler.ReadLn(Char(0));
// Here I expect c[0] = 48, c[1]=227, c[2]=0.
// But I get: c[0] = 48, c[1]=63, c[2]=0. c[1] is incorrect
// ... rest of code
end;
The most strange thing is those applications were developed with Delphi 5 some years ago and they worked good (with readln(char(0)). When I translate both applications to Delphi 2010 they stop working. I supoused It was due unicode string, but I haven't found a solution.
First off, don't use TIdIOHandler.WriteDirect() directly, use only the TIdIOHandler.Write() overloads instead.
Second, there is no possible way that your 1st attempt code can produce the result you are claiming. TIdIOHandler.Write[Direct](TIdBytes) and TIdIOHandler.ReadByte() operate on raw bytes only, and bytes are transmitted as-is. So you are guaranteed to get exactly what you send, there is simply no possible way the byte values could change as you have suggested.
However, your 2nd attempt code certainly can produce the result you are claiming, because TIdIOHandler.ReadLn() will read in the bytes and convert them to a UnicodeString, which you are then assigning to an AnsiString, which means the received data is going through 2 lossy charset conversions:
first, the received bytes are decoded as-is into a UTF-16 UnicodeString using the TIdIOHandler.DefStringEncoding property as the decoding charset, which is set to IndyTextEncoding_ASCII by default (you can change that, for example to IndyTextEncoding_8bit). Byte 227 is outside of the ASCII range, so that byte gets decoded to Unicode character '?' (#63), which is an indication that data loss has occurred.
then, the UnicodeString is assigned to AnsiString, which converts the UTF-16 data to ANSI using the RTL's default charset (which is set to the user's OS locale by default, but can be changed using the System.SetMultiByteConversionCodePage() function). In this case, no more loss occurs, since the UnicodeString is holding characters in the US-ASCII range, which convert to ANSI as-is.
That being said, I would not suggest using TIdIOHandler.ReadByte() in a manual loop like you are doing. Although TIdIOHandler does not have a WaitFor() method for bytes, there is nonetheless a more efficient way to approach this, eg:
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var
rec: TIdBytes;
LPos: Integer;
begin
SetLength(rec, 0);
// adapted from code in TIdIOHandler.WaitFor()...
with AContext.Connection.IOHandler do
begin
LPos := 0;
repeat
LPos := InputBuffer.IndexOf(Byte(0), LPos);
if LPos <> -1 then begin
InputBuffer.ExtractToBytes(rec, LPos+1);
{ or, if you don't want the terminating $00 byte put in the TIdBytes:
InputBuffer.ExtractToBytes(rec, LPos);
InputBuffer.Remove(1);
}
Break;
end;
LPos := InputBuffer.Size;
ReadFromSource(True, IdTimeoutDefault, True);
until False;
end;
// ... rest of code
end;
Or:
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var
rec: string
begin
rec := AContext.Connection.IOHandler.ReadLn(#0, IndyTextEncoding_8Bit);
// ... rest of code
end;
The most strange thing is those applications were developed with Delphi 5 some years ago and they worked good (with readln(char(0)). When I translate both applications to Delphi 2010 they stop working. I supoused It was due unicode string, but I haven't found a solution.
Yes, the string type in Delphi 5 was AnsiString, but it was changed to UnicodeString in Delphi 2009.
But even so, in pre-Unicode versions of Delphi, TIdIOHandler.ReadLn() in Indy 10 will still read in raw bytes and convert them to Unicode using the TIdIOHanlder.DefStringEncoding. It will then convert that Unicode data to AnsiString before exiting, using the TIdIOHandler.DefAnsiEncoding property as the converting charset, which is set to IndyTextEncoding_OSDefault by default.
The morale of the story is - whenever you are reading in bytes and converting them to string characters, the bytes must be decoded using the correct charset, or else you will get data loss. This was true in Delphi 5 (but not as strictly enforced), it is true more so in Delphi 2009+.

Transform Western codepage to Windows 1251

I try to load Cyrillic web page (default codepage Western) and put it to TMemo component.
But I see "Âûñòàâêè" instead of "Выставки" in Memo.
How to transform string from Western to Windows 1251 codepage?
Delphi XE 8 sp1
TMemo (and most of the RTL/VCL/FMX in general) in XE8 expects UnicodeString data in UTF-16 format. You would have to decode the webpage data from its actual charset (which is presumably already Windows-1251, as it does not make sense for Russian text to be encoded in Windows-1252) to UTF-16 before then loading it into the TMemo. The actual charset used for the raw data needs to be reported in the HTTP Content-Type header, or in the HTML itself.
You would not decode the raw data to Windows 1251. That would have been necessary only if you were using a pre-Unicode version of Delphi (2007 and earlier) running your app on a Windows Russian machine that uses Windows-1251 as its default codepage. Those days are gone in a Unicode environment like XE8.
Delphi ships with Indy pre-installed. Indy's TIdHTTP component handles the charset-to-UTF16 decoding for you, eg:
Memo1.Text := IdHTTP1.Get(URL);
If you download the webpage data any other way, you would have to download it as raw bytes and decode them yourself, such as by using TEncoding.GetEncoding(1251) followed by TEncoding.GetString(). Or, if the bytes are in a TStream, you can use Memo1.Lines.LoadFromStream() specifying TEncoding.GetEncoding(1251) as the encoding.
type
TSrcStr = type AnsiString(1251);
TDstStr = type AnsiString(1252);
function Decode(const s: string): string;
var
a: TSrcStr;
b: TDstStr;
begin
setlength(a, length(s));
b := s;
move(b[low(b)], a[low(a)], length(b)*sizeof(b[low(b)]));
result := a;
end;
procedure Test;
var
s: string;
begin
s := 'Âûñòàâêè';
s := Decode(s);
Assert(s='Выставки');
end;

Cast from RawByteString to string does automatically invoke UTF8Decode?

I want to store arbitary binary data as BLOB into a SQlite database.
The data will be added as value with this function:
procedure TSQLiteDatabase.AddParamText(name: string; value: string);
Now I want to convert a WideString into its UTF8 representation, so it can be stored to the database. After calling UTF8Encode and storing the result into the database, I noticed that the data inside the database is not UTF8 decoded. Instead, it is encoded as AnsiString in my computer's locale.
I ran following test to check what happened:
type
{$IFDEF Unicode}
TBinary = RawByteString;
{$ELSE}
TBinary = AnsiString;
{$ENDIF}
procedure TForm1.Button1Click(Sender: TObject);
var
original: WideString;
blob: TBinary;
begin
original := 'ä';
blob := UTF8Encode(original);
// Delphi 6: ä (as expected)
// Delphi XE4: ä (unexpected! How did it do an automatic UTF8Decode???)
ShowMessage(blob);
end;
After the character "ä" has been converted to UTF8, the data is correct in memory ("ä"), however, as soon as I pass the TBinary value to a function (as string or AnsiString), Delphi XE4 does a "magic typecast" invoking UTF8Decode for some reason I don't know.
I have already found a workaround to avoid this:
function RealUTF8Encode(AInput: WideString): TBinary;
var
tmp: TBinary;
begin
tmp := UTF8Encode(AInput);
SetLength(result, Length(tmp));
CopyMemory(#result[1], #tmp[1], Length(tmp));
end;
procedure TForm1.Button2Click(Sender: TObject);
var
original: WideString;
blob: TBinary;
begin
original := 'ä';
blob := RealUTF8Encode(original);
// Delphi 6: ä (as expected)
// Delphi XE4: ä (as expected)
ShowMessage(blob);
end;
However, this workaround with RealUTF8Encode looks dirty to me and I would like to understand why a simple call of UTF8Encode did not work and if there is a better solution.
In Ansi versions of Delphi (prior to D2009), UTF8Encode() returns a UTF-8 encoded AnsiString. In Unicode versions (D2009 and later), it returns a UTF-8 encoded RawByteString with a code page of CP_UTF8 (65001) assigned to it.
In Ansi versions, ShowMessage() takes an AnsiString as input, and the UTF-8 string is an AnsiString, so it gets displayed as-is. In Unicode versions, ShowMessage() takes a UTF-16 encoded UnicodeString as input, so the UTF-8 encoded RawByteString gets converted to UTF-16 using its assigned CP-UTF8 code page.
If you actually wrote the blob data directly to the database you would find that it may or may not be UTF-8 encoded, depending on how you are writing it. But your approach is wrong; the use of RawByteString is incorrect in this situation. RawByteString is meant to be used as a procedure parameter only. Do not use it as a local variable. That is the source of your problem. From the documentation:
The purpose of RawByteString is to reduce the need for multiple
overloads of procedures that read string data. This means that
parameters of routines that process strings without regard for the
string's code page should typically be of type RawByteString.
RawByteString should only be used as a parameter type, and only in
routines which otherwise would need multiple overloads for AnsiStrings
with different codepages. Such routines need to be written with care
for the actual codepage of the string at run time.
For Unicode versions of Delphi, instead of RawByteString, I would suggest that you use TBytes to hold your UTF-8 data, and encode it with TEncoding:
var
utf8: TBytes;
str: string;
...
str := ...;
utf8 := TEncoding.UTF8.GetBytes(str);
You are looking for a data type that does not perform implicit text encodings when passed around, and TBytes is that type.
For Ansi versions of Delphi, you can use AnsiString, WideString and UTF8Encode exactly as you do.
Personally however, I would recommend using TBytes consistently for your UTF-8 data. So if you need a single code base that supports Ansi and Unicode compilers (ugh!) then you should create some helpers:
{$IFDEF Unicode}
function GetUTF8Bytes(const Value: string): TBytes;
begin
Result := TEncoding.UTF8.GetBytes(Value);
end;
{$ELSE}
function GetUTF8Bytes(const Value: WideString): TBytes;
var
utf8str: UTF8String;
begin
utf8str := UTF8Encode(Value);
SetLength(Result, Length(utf8str));
Move(Pointer(utf8str)^, Pointer(Result)^, Length(utf8str));
end;
{$ENDIF}
The Ansi version incurs more heap allocations than are necessary. You might well choose to write a more efficient helper that calls WideCharToMultiByte() directly.
In Unicode versions of Delphi, if for some reason you don't want to use TBytes for UTF-8 data, you can use UTF8String instead. This is a special AnsiString that always uses the CP_UTF8 code page. You can then write:
var
utf8: UTF8String;
str: string;
....
utf8 := str;
and the compiler will convert from UTF-16 to UTF-8 behind the scenes for you. I would not recommend this though, because it is not supported on mobile platforms, or in Ansi versions of Delphi (UTF8String has existed since Delphi 6, but it was not a true UTF-8 string until Delphi 2009). That is, amongst other reasons, why I suggest that you use TBytes. My philosophy is, at least in the Unicode age, that there is the native string type, and any other encoding should be held in TBytes.

Delphi XE and ZLib Problems

I'm in Delphi XE and I'm having some problems with ZLib routines...
I'm trying to compress some strings (and encode it to send it via a SOAP webservice -not really important-)
The string results from ZDecompressString differs used in ZCompressString.
example1:
uses ZLib;
// compressing string
// ZCompressString('1234567890', zcMax);
// compressed string ='xÚ3426153·°4'
// Uncompressing the result of ZCompressString, don't return the same:
// ZDecompressString('xÚ3426153·°4');
// uncompressed string = '123456789'
if '1234567890' <> ZDecompressString(ZCompressString('1234567890', zcMax)) then
ShowMessage('Compression/Decompression fails');
example2:
Uses ZLib;
// compressing string
// ZCompressString('12345678901234567890', zcMax)
// compressed string ='xÚ3426153·°40„³'
// Uncompressing the result of ZCompressString, don't return the same:
// ZDecompressString('xÚ3426153·°40„³')
// uncompressed string = '12345678901'
if '12345678901234567890' <> ZDecompressString(ZCompressString('12345678901234567890', zcMax)) then
ShowMessage('Compression/Decompression fails');
the functions used are from some other posts about compressing and deCompressing
function TForm1.ZCompressString(aText: string; aCompressionLevel: TZCompressionLevel): string;
var
strInput,
strOutput: TStringStream;
Zipper: TZCompressionStream;
begin
Result:= '';
strInput:= TStringStream.Create(aText);
strOutput:= TStringStream.Create;
try
Zipper:= TZCompressionStream.Create(strOutput, aCompressionLevel);
try
Zipper.CopyFrom(strInput, strInput.Size);
finally
Zipper.Free;
end;
Result:= strOutput.DataString;
finally
strInput.Free;
strOutput.Free;
end;
end;
function TForm1.ZDecompressString(aText: string): string;
var
strInput,
strOutput: TStringStream;
Unzipper: TZDecompressionStream;
begin
Result:= '';
strInput:= TStringStream.Create(aText);
strOutput:= TStringStream.Create;
try
Unzipper:= TZDecompressionStream.Create(strInput);
try
strOutput.CopyFrom(Unzipper, Unzipper.Size);
finally
Unzipper.Free;
end;
Result:= strOutput.DataString;
finally
strInput.Free;
strOutput.Free;
end;
end;
Where I was wrong?
Someone else have same problems??
ZLib, like all compression codes I know, is a binary compression algorithm. It knows nothing of string encodings. You need to supply it with byte streams to compress. And when you decompress, you are given back byte streams.
But you are working with strings, and so need to convert between encoded text and byte streams. The TStringStream class is doing that work in your code. You supply the string stream instance a text encoding when you create it.
Only your code does not supply an encoding. And so the default local ANSI encoding is used. And here's the first problem. That is not a full Unicode encoding. As soon as you use characters outside your local ANSI codepage the chain breaks down.
Solve that problem by supplying an encoding when you create string stream instances. Pass the encoding to the TStringStream constructor. A sound choice is TEncoding.UTF8. Pass this when creating strInput in the compressor, and strOutput in the decompressor.
Now the next and bigger problem that you face is that your compressed data may not be a meaningful string in any encoding. You might make your existing code sort of work if you switch to using AnsiString instead of string. But it's a rather brittle solution.
Fundamentally you are making the mistake of treating binary data as text. Once you compress you have binary data. My recommendation is that you don't attempt to interpret the compressed binary as text. Leave it as binary. Compress to a TBytesStream. And decompress from a TBytesStream. So the compressor function returns TBytes and the decompressor receives that same TBytes.
If, for some reason, you must compress to a string, then you must encode the compressed binary. Do that using base64. The EncdDecd unit can do that for you.
This flow for the compressor looks like this: string -> UTF-8 bytes -> compressed bytes -> base64 string. Obviously you reverse the arrows to decompress.

TSQLQuery.FieldByName().AsString -> TStringStream Corrupts Data

I'm using Delphi XE2. My code pulls data from a SQL-Server 2008 R2 database. The data returned is a nvarchar(max) field with 1,055,227 bytes of data. I use the following code to save the field data to a file:
procedure WriteFieldToFile(FieldName: string; Query: TSQLQuery);
var
ss: TStringStream;
begin
ss := TStringStream.Create;
try
ss.WriteString(Query.FieldByName(FieldName).AsString);
ss.Position := 0;
ss.SaveToFile('C:\Test.txt');
finally
FreeAndNil(ss);
end;
end;
When I inspect the file in a hex viewer, the first 524,287 bytes (exactly 1/2 meg) look correct. The remaining bytes (524,288 to 1,055,227) are all nulls (#0), instead of the original data.
Is this the right way to save a string field from a TSQLQuery to a file? I chose to use TStringStream because I will eventually add code to do other things to the data on the stream, which I can't do with a TFileStream.
TStringStream is TEncoding-aware in XE2, but you are not specifying any encoding in the constructor so TEncoding.Default will be used, meaning that any string you provide to it will internally be converted to the OS default Ansi encoding. Make sure that encoding supports the Unicode characters you are trying to work with, or else specify a more suitable encoding, such as TEncoding.UTF8.
Also make sure that AsString is returning a valid and correct UnicodeString value to begin with. TStringStream will not save the data correctly if it is given garbage as input. Make sure that FieldByName() is returning a pointer to a TWideStringField object and not a TStringField object in order to handle the database's Unicode data correctly.

Resources