Delphi XE and ZLib Problems - delphi

I'm in Delphi XE and I'm having some problems with ZLib routines...
I'm trying to compress some strings (and encode it to send it via a SOAP webservice -not really important-)
The string results from ZDecompressString differs used in ZCompressString.
example1:
uses ZLib;
// compressing string
// ZCompressString('1234567890', zcMax);
// compressed string ='xÚ3426153·°4'
// Uncompressing the result of ZCompressString, don't return the same:
// ZDecompressString('xÚ3426153·°4');
// uncompressed string = '123456789'
if '1234567890' <> ZDecompressString(ZCompressString('1234567890', zcMax)) then
ShowMessage('Compression/Decompression fails');
example2:
Uses ZLib;
// compressing string
// ZCompressString('12345678901234567890', zcMax)
// compressed string ='xÚ3426153·°40„³'
// Uncompressing the result of ZCompressString, don't return the same:
// ZDecompressString('xÚ3426153·°40„³')
// uncompressed string = '12345678901'
if '12345678901234567890' <> ZDecompressString(ZCompressString('12345678901234567890', zcMax)) then
ShowMessage('Compression/Decompression fails');
the functions used are from some other posts about compressing and deCompressing
function TForm1.ZCompressString(aText: string; aCompressionLevel: TZCompressionLevel): string;
var
strInput,
strOutput: TStringStream;
Zipper: TZCompressionStream;
begin
Result:= '';
strInput:= TStringStream.Create(aText);
strOutput:= TStringStream.Create;
try
Zipper:= TZCompressionStream.Create(strOutput, aCompressionLevel);
try
Zipper.CopyFrom(strInput, strInput.Size);
finally
Zipper.Free;
end;
Result:= strOutput.DataString;
finally
strInput.Free;
strOutput.Free;
end;
end;
function TForm1.ZDecompressString(aText: string): string;
var
strInput,
strOutput: TStringStream;
Unzipper: TZDecompressionStream;
begin
Result:= '';
strInput:= TStringStream.Create(aText);
strOutput:= TStringStream.Create;
try
Unzipper:= TZDecompressionStream.Create(strInput);
try
strOutput.CopyFrom(Unzipper, Unzipper.Size);
finally
Unzipper.Free;
end;
Result:= strOutput.DataString;
finally
strInput.Free;
strOutput.Free;
end;
end;
Where I was wrong?
Someone else have same problems??

ZLib, like all compression codes I know, is a binary compression algorithm. It knows nothing of string encodings. You need to supply it with byte streams to compress. And when you decompress, you are given back byte streams.
But you are working with strings, and so need to convert between encoded text and byte streams. The TStringStream class is doing that work in your code. You supply the string stream instance a text encoding when you create it.
Only your code does not supply an encoding. And so the default local ANSI encoding is used. And here's the first problem. That is not a full Unicode encoding. As soon as you use characters outside your local ANSI codepage the chain breaks down.
Solve that problem by supplying an encoding when you create string stream instances. Pass the encoding to the TStringStream constructor. A sound choice is TEncoding.UTF8. Pass this when creating strInput in the compressor, and strOutput in the decompressor.
Now the next and bigger problem that you face is that your compressed data may not be a meaningful string in any encoding. You might make your existing code sort of work if you switch to using AnsiString instead of string. But it's a rather brittle solution.
Fundamentally you are making the mistake of treating binary data as text. Once you compress you have binary data. My recommendation is that you don't attempt to interpret the compressed binary as text. Leave it as binary. Compress to a TBytesStream. And decompress from a TBytesStream. So the compressor function returns TBytes and the decompressor receives that same TBytes.
If, for some reason, you must compress to a string, then you must encode the compressed binary. Do that using base64. The EncdDecd unit can do that for you.
This flow for the compressor looks like this: string -> UTF-8 bytes -> compressed bytes -> base64 string. Obviously you reverse the arrows to decompress.

Related

Base64 decode fails with "No mapping for the Unicode character exists in the target multi-byte code page."

I am trying to create the signature string needed for Azure Storage Tables as described here:
https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key#encoding-the-signature
The Access key is "p1qPD2lBiMlDtl/Vb/T/Xt5ysd/ZXNg5AGrwJvtrUbyqiAQxGabWkR9NmKWb8Jhd6ZIyBd9HA3kCRHr6eveKaA=="
The failing code is:
property AccessKey: string read FAccessKey write SetAccessKey;
function TAzureStorageAPI.GetAuthHeader(StringToSign: UTF8String): UTF8String;
var
AccessKey64: string;
begin
AccessKey64 := URLEncoding.Base64.Decode(AccessKey);
Result := URLEncoding.Base64.Encode(CalculateHMACSHA256(StringtoSign,AccessKey64));
end;
What can I do to get this working?
I have checked that AccessKey decodes on https://www.base64decode.org/ and it converts to strange data, but it is what I am supposed to do.... :-/
It looks like CalculateHMACSHA256() comes from blindly copying it and just fiddling everything to make the compiler happy. But there are still logical flaws all over. Without testing I'd write it this way:
function GetAuthHeader
( StringToSign: UTF8String // Expecting an UTF-8 encoded text
; AzureAccountKey: String // Base64 is ASCII, so text encoding is almost irrelevant
): String; // Result is ASCII, too, not UTF-8
var
HMAC: TIdHMACSHA256;
begin
HMAC:= TIdHMACSHA256.Create;
try
// Your key must be treated as pure binary - don't confuse it with
// a human-readable text-like password. This corresponds to the part
// "Base64.decode(<your_azure_storage_account_shared_key>)"
HMAC.Key:= URLEncoding.Base64.DecodeStringToBytes( AzureAccessKey );
result:= URLEncoding.Base64.EncodeBytesToString // The part "Signature=Base64(...)"
( HMAC.HashValue // The part "HMAC-SHA256(...,...)"
( IndyTextEncoding_UTF8.GetBytes( StringToSign ) // The part "UTF8(StringToSign)"
)
);
finally
HMAC.Free;
end;
end;
No guarantee as I have no chance to compile it myself.
Using ToHex() anywhere makes no sense if you encode it all in Base64 anyway. And stuffing everything into UTF8String makes no sense either then. Hashing always implies working with bytes, not text - never mix up binary with String.

How to encode binary data with Indy Mime?

I have an Ansi string that I use to store binary data - bytes in 0-255 range (I know it should be a Byte array or so, but it is not much difference between them).
I want to pass this "binary string" through Indy MIME (TIdEncoderMIME.EncodeString / TIdDecoderMIME.DecodeString) and obtain a human-readable ANSI string.
I thought that the output of Encode/DecodeString will be a string that has only ANSI characters in it if I use IndyTextEncoding_8Bit encoding. But I was wrong!
so, how to encode binary data with Indy Mime (something similar to application/octet-stream)?
DONT use AnsiString for binary data!
AnsiString is not an appropriate container for binary data, especially in a Unicode environment like XE7. Use a proper byte container, like T(Id)Bytes or TMemoryStream instead.
You can't pass AnsiString as-is through the TId(Encoder|Decoder)MIME string methods, only UnicodeString, so implicit RTL Ansi<->Unicode conversions are likely to corrupt your binary data. Use the binary-oriented methods instead ((Encode|Decode)Bytes(), (Encode|Decode)Stream()). They exist for a reason.
That being said, Indy 10 does have a TIdMemoryBufferStream class (desktop platforms only), so if you MUST use AnsiString (and you really shouldn't), you can wrap it in a TStream interface without having to make additional copies of data in memory. For example:
var
Binary: AnsiString;
Strm: TIdMemoryBufferStream;
Base64: String;
begin
Binary := ...; // binary data
Strm := TIdMemoryBufferStream.Create(PAnsiChar(Binary), Length(Binary));
try
Base64 := TIdEncoderMIME.EncodeStream(Strm);
finally
Strm.Free;
end;
// use Base64 as needed...
end;
var
Base64: String;
Strm: TIdMemoryBufferStream;
Binary: AnsiString;
begin
Base64 := ...; // encoded data
SetLength(Binary, (Length(Base64) div 4) * 3);
Strm := TIdMemoryBufferStream.Create(PAnsiChar(Binary), Length(Binary));
try
TIdDecoderMIME.DecodeStream(Base64, Strm);
SetLength(Binary, Strm.Size);
SetCodePage(PRawByteString(#Binary)^, 28591, False);
finally
Strm.Free;
end;
// use Binary as needed...
end;

Problems with unicode text

I use delphi xe3 and i have small problem !! but i don't how to fix it..
problem is with this letter "è" this letter is inside a file path "C:\lène.mp4"
i save this path into a tstringlist , when i save this tstringlist to a file the path will be shown fine inside the txt file ..
but when trying to loading it using tstringlist it will be shown as "è" (showing it inside a memo or int a variable) in this case it gonna be an invalid path ..
but adding the path(string) directly to the tstring list and then passing it to the path variable it works fine
but loading from the file and passing to the path variable it doesnt work (getting "è" instead of "è")
normally i will work with a lot of uncite string but for i'm struggling with that letter
this will not work ..
var
resp : widestring;
xfiles : tstringlist;
begin
xfiles := tstringlist.Create;
try
xfiles.LoadFromFile('C:\Demo6-out.txt'); // this file contains only "C:\lène.mp4"
resp := (xfiles.Strings[0]);
// if i save xfiles to a file "path string" will be saved fine ... !
finally
xfiles.Free ;
end;
but like this it work ..
var
resp : widestring;
xfiles : tstringlist;
begin
xfiles := tstringlist.Create;
try
xfiles.Add('C:lène.mp4');
resp := (xfiles.Strings[0]);
finally
xfiles.Free ;
end;
i'm really confused
First, you should be using UnicodeString instead of WideString. UnicodeString was introduced in Delphi 2009, and is much more efficient than WideString. The RTL uses UnicodeString (almost) everywhere it previously used AnsiString prior to 2009.
Second, something else introduced in Delphi 2009 is SysUtils.TEncoding, which is used for Byte<->Character conversions. Several existing RTL classes, including TStrings/TStringList, were updated to support TEncoding when converting bytes to/from strings.
What happens when you load a file into TStringList is that an internal TEncoding object is assigned to help convert the file's raw bytes to UnicodeString values. Which implementation of TEncoding it uses depends on the character encoding that LoadFromFile() thinks the file is using, if not explicitly stated (LoadFromFile() has an optional AEncoding parameter). If the file has a UTF BOM, a matching TEncoding is used, whether that be TEncoding.UTF8 or TEncoding.(BigEndian)Unicode. If no BOM is present, and the AEncoding parameter is not used, then TEncoding.Default is used, which represents the OS's default charset locale (and thus provides backwards compatibility with existing pre-2009 code).
When saving a TStringList to file, if the list was previously loaded from a file then the same TEncoding used for loading is used for saving, otherwise TEncoding.Default is used (again, for backwards compatibility), unless overwritten by the optional AEncoding parameter of SaveToFile().
In your first example, the input file is most likely encoded in UTF-8 without a BOM. So LoadFromFile() would use TEncoding.Default to interpret the file's bytes. è is the result of the UTF-8 encoded form of è (byte octets 0xC3 0xA8) being misinterpreted as Windows-1252 instead of UTF-8. So, you would have to load the file like this instead:
xfiles.LoadFromFile('C:\Demo6-out.txt', TEncoding.UTF8);
In your second example, you are not loading a file or saving a file. You are simply assigning a string literal (which is unicode-aware in D2009+) to a UnicodeString variable (inside of the TStringList) and then assigning that to a WideString variable (WideString and UnicodeString use the same UTF-16 character encoding, they just different memory managements). So there are no data conversions being performed.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

How to use an arbitrary string encoding?

I'm trying to get some code working against an API published by a Chinese company. I have a spec and some sample code (in Java), enough to understand most of what's going on, but I ran across one thing I don't know how to do.
String ecodeform = "GBK";
String sm = new String(Hex.encodeHex("Insert message here".getBytes(ecodeform))); //test message
It's creating a string from the char array result of the hex representation of the original string, encoded in GBK format (the standard Chinese character encoding, equivalent to ASCII for English text). I can work out how to do most of that in Delphi, but I don't know how to encode a string to GBK, which is specifically required by this API.
In SysUtils, there's a TEncoding class that comes with a few built-in encodings, such as UTF8, UTF16, and "Default" (the system's default code page), but I don't know how to set up a TEncoding for an arbitrary encoding such as GBK.
Does anyone know how to set this up?
You can use the TEncoding.GetEncoding() method to get a TEncoding object for a specific codepage/charset, eg:
var
Enc: TEncoding;
Bytes: TBytes;
begin
Enc := TEncoding.GetEncoding(936); // or TEncoding.GetEncoding('gb2312')
try
Bytes := Enc.GetBytes('Insert message here');
finally
Enc.Free;
end;
// encode Bytes to hex string as needed...
end;
TEncoding has a GetEncoding method for that. Give it the encoding name or number, and it will return a TEncoding instance.
For GBK, the number I think you want is 936. See Microsoft's list of code pages for more.

TSQLQuery.FieldByName().AsString -> TStringStream Corrupts Data

I'm using Delphi XE2. My code pulls data from a SQL-Server 2008 R2 database. The data returned is a nvarchar(max) field with 1,055,227 bytes of data. I use the following code to save the field data to a file:
procedure WriteFieldToFile(FieldName: string; Query: TSQLQuery);
var
ss: TStringStream;
begin
ss := TStringStream.Create;
try
ss.WriteString(Query.FieldByName(FieldName).AsString);
ss.Position := 0;
ss.SaveToFile('C:\Test.txt');
finally
FreeAndNil(ss);
end;
end;
When I inspect the file in a hex viewer, the first 524,287 bytes (exactly 1/2 meg) look correct. The remaining bytes (524,288 to 1,055,227) are all nulls (#0), instead of the original data.
Is this the right way to save a string field from a TSQLQuery to a file? I chose to use TStringStream because I will eventually add code to do other things to the data on the stream, which I can't do with a TFileStream.
TStringStream is TEncoding-aware in XE2, but you are not specifying any encoding in the constructor so TEncoding.Default will be used, meaning that any string you provide to it will internally be converted to the OS default Ansi encoding. Make sure that encoding supports the Unicode characters you are trying to work with, or else specify a more suitable encoding, such as TEncoding.UTF8.
Also make sure that AsString is returning a valid and correct UnicodeString value to begin with. TStringStream will not save the data correctly if it is given garbage as input. Make sure that FieldByName() is returning a pointer to a TWideStringField object and not a TStringField object in order to handle the database's Unicode data correctly.

Resources