Delphi7 Base64 encode UTF8 XML - delphi

I'm still using Delphi7 (I know) and I need to encode an UTF8 XML in Base64 format.
I create the XML using IXMLDocument, which support UTF8 (that is, if I save to a file).
Since I'm using Indy10 to HTTP Post the XML request, I tried using TIdEncoderMIME to Base64 encode the XML. But some UTF8 chars are not encoded well.
Try1:
XMLText := XML.XML.Text;
EncodedXML := TIdEncoderMIME.EncodeBytes(ToBytes(XMLText));
In the above case most probably some UTF8 information/characters are already lost when the XML is saved to a string.
Try2:
XMLStream := TMemoryStream.Create;
XML.SaveToStream(XMLStream);
EncodedXML := TIdEncoderMIME.EncodeStream(XMLStream);
//or
EncodedXML := TIdEncoderMIME.EncodeStream(XMLStream, XMLStream.Size);
Both of the above gives back EncodedXML = '' (empty string).
What am I doing wrong?

Try using the TIdEncoderMIME.EncodeString() method instead. It has an AByteEncoding parameter that you can use to specify the desired byte encoding that Indy should encode the string characters as, such as UTF-8, before it then base64 encodes the resulting bytes:
XMLText := XML.XML.Text;
EncodedXML := TIdEncoderMIME.EncodeString(XMLText, IndyTextEncoding_UTF8);
Also note that in Delphi 2007 and earlier, where string is AnsiString, there is also an optional ASrcEncoding that you can use the specify the encoding of the AnsiString (for instance, if it is already UTF-8), so that it can be decoded to Unicode properly before then being encoded to the specified byte encoding (or, in the case where the two encodings are the same, the AnsiString can be base64 encoded as-is):
XMLText := XML.XML.Text;
EncodedXML := TIdEncoderMIME.EncodeString(XMLText, IndyTextEncoding_UTF8, IndyTextEncoding_UTF8);
You are getting data loss when using EncodeBytes() because you are using ToBytes() without specifying any encoding parameters for it. ToBytes() has similar AByteEncoding and ASrcEncoding parameters.
In the case where you tried to encode a TMemoryStream, you simply forgot to reset the stream's Position back to 0 after calling SaveToStream(), so there was nothing for EncodeStream() to encode. That is why it returned a blank base64 string:
XMLStream := TMemoryStream.Create;
try
XML.SaveToStream(XMLStream);
XMLStream.Position := 0; // <-- add this
EncodedXML := TIdEncoderMIME.EncodeStream(XMLStream);
finally
XMLStream.Free;
end;

Related

Receiving Unicode strings with Indy 10

I am using the latest Delphi 10.4.2 with Indy 10.
In a REST server, JSON commands are received and handled. It works fine except for Unicode.
A simple JSON like this:
{"driverNote": "Test"}
is shown correctly
If I now change to Unicode Russian characters:
{"driverNote": "Статья"}
Not sure where I should begin to track this. I expect ARequestInfo.FormParams to have the same value in debugger as s variable.
If I debug Indy itself, FormParams are set in this code:
if LRequestInfo.PostStream <> nil then
begin
// decoding percent-encoded octets and applying the CharSet is handled by
// DecodeAndSetParams() further below...
EnsureEncoding(LEncoding, enc8Bit);
LRequestInfo.FormParams :=
ReadStringFromStream( LRequestInfo.PostStream,
-1,
LEncoding
{$IFDEF STRING_IS_ANSI}, LEncoding{$ENDIF});
DoneWithPostStream(AContext, LRequestInfo); // don't need the PostStream anymore
end;
It use enc8Bit. But my string has 16-bits characters.
Is this handled incorrect in Indy?
The code snippet you quoted from IdCustomHTTPServer.pas is not what is in Indy's GitHub repo.
In the official code, TIdHTTPServer does not decode the PostStream to FormParams unless the ContentType is 'application/x-www-form-urlencoded':
if LRequestInfo.PostStream <> nil then begin
if TextIsSame(LContentType, ContentTypeFormUrlencoded) then
begin
// decoding percent-encoded octets and applying the CharSet is handled by DecodeAndSetParams() further below...
EnsureEncoding(LEncoding, enc8Bit);
LRequestInfo.FormParams := ReadStringFromStream(LRequestInfo.PostStream, -1, LEncoding{$IFDEF STRING_IS_ANSI}, LEncoding{$ENDIF});
DoneWithPostStream(AContext, LRequestInfo); // don't need the PostStream anymore
end;
end;
That ContentType check was added way back in 2010, so I don't know why it is not present in your version.
In your example, the ContentType is 'application/json', so the raw JSON should be in the PostStream and the FormParams should be blank.
That being said, in your version of Indy, TIdHTTPServer is simply reading the raw bytes from the PostStream and zero-extending each byte to a 16-bit character in the FormParams. To recover the original bytes, simply truncate each Char to an 8-bit Byte. For instance, you can use Indy's ToBytes() function in the IdGlobal unit, specifying enc8Bit/IndyTextEncoding_8Bit as the byte encoding.
JSON is most commonly transmitted as UTF-8 (and that is the case in your example), so when you have access to the raw bytes, in any version, make sure you parse the JSON bytes as UTF-8.

Delphi tidhttp encoding special characters

I have upgraded an app from D2007 to XE6. It posts data to a webserver.
I cannot work out what encoding will send the left and right quote characters correctly (code snippet below). I have tried every option I can find, but they get encoded as ? when sent (as far as I can see in WireShark).
D2007 had no problem, but XE6 is all about Unicode, and I am not sure if the problem is encoding or codepages or what.
Params := TIdMultipartFormDataStream.Create;
params.AddFormField('TEST', 'Test ‘n’ Try', 'utf8').ContentTransfer := '8bit';
IdHTTP1.Request.ContentType := 'text/plain';
IdHTTP1.Request.Charset := 'utf-8';
IdHTTP1.Post('http://test.com.au/TestEncoding.php', Params, Stream);
When calling params.AddFormField(), you are setting the charset to 'utf8', which is not a valid charset name. The official charset name is 'utf-8' instead:
params.AddFormField('TEST', 'Test ‘n’ Try', 'utf-8').ContentTransfer := '8bit';
When compiling for Unicode, an invalid charset ends up using Indy's built-in 8bit encoder, which encodes Unicode codepages > U+00FF as byte 0x3F ('?'). The quote characters you are using, ‘ and ’, are codepoints U+2018 and U+2019, respectively.
The reason you do not encounter this issue in D2007 is because the TIdFormDataField.Charset property is ignored for encoding purposes when compiling for Ansi. The TIdFormDataField.FieldValue property is an AnsiString, and its raw bytes get transmitted as-is, so you are required to ensure it is encoded properly before adding it to TIdMultipartFormDataStream, eg:
params.AddFormField('TEST', UTF8Encode('Test ‘n’ Try'), 'utf-8').ContentTransfer := '8bit';
On a side note, you do not need to set the Request.ContentType or Request.Charset properties when posting a TIdMultipartFormDataStream (and especially since 'text/plain' is an invalid content type for a MIME post anyway). This version of Post() will set those properties for you:
Params := TIdMultipartFormDataStream.Create;
params.AddFormField(...);
IdHTTP1.Post('http://test.com.au/TestEncoding.php', Params, Stream);

Delphi XE and ZLib Problems

I'm in Delphi XE and I'm having some problems with ZLib routines...
I'm trying to compress some strings (and encode it to send it via a SOAP webservice -not really important-)
The string results from ZDecompressString differs used in ZCompressString.
example1:
uses ZLib;
// compressing string
// ZCompressString('1234567890', zcMax);
// compressed string ='xÚ3426153·°4'
// Uncompressing the result of ZCompressString, don't return the same:
// ZDecompressString('xÚ3426153·°4');
// uncompressed string = '123456789'
if '1234567890' <> ZDecompressString(ZCompressString('1234567890', zcMax)) then
ShowMessage('Compression/Decompression fails');
example2:
Uses ZLib;
// compressing string
// ZCompressString('12345678901234567890', zcMax)
// compressed string ='xÚ3426153·°40„³'
// Uncompressing the result of ZCompressString, don't return the same:
// ZDecompressString('xÚ3426153·°40„³')
// uncompressed string = '12345678901'
if '12345678901234567890' <> ZDecompressString(ZCompressString('12345678901234567890', zcMax)) then
ShowMessage('Compression/Decompression fails');
the functions used are from some other posts about compressing and deCompressing
function TForm1.ZCompressString(aText: string; aCompressionLevel: TZCompressionLevel): string;
var
strInput,
strOutput: TStringStream;
Zipper: TZCompressionStream;
begin
Result:= '';
strInput:= TStringStream.Create(aText);
strOutput:= TStringStream.Create;
try
Zipper:= TZCompressionStream.Create(strOutput, aCompressionLevel);
try
Zipper.CopyFrom(strInput, strInput.Size);
finally
Zipper.Free;
end;
Result:= strOutput.DataString;
finally
strInput.Free;
strOutput.Free;
end;
end;
function TForm1.ZDecompressString(aText: string): string;
var
strInput,
strOutput: TStringStream;
Unzipper: TZDecompressionStream;
begin
Result:= '';
strInput:= TStringStream.Create(aText);
strOutput:= TStringStream.Create;
try
Unzipper:= TZDecompressionStream.Create(strInput);
try
strOutput.CopyFrom(Unzipper, Unzipper.Size);
finally
Unzipper.Free;
end;
Result:= strOutput.DataString;
finally
strInput.Free;
strOutput.Free;
end;
end;
Where I was wrong?
Someone else have same problems??
ZLib, like all compression codes I know, is a binary compression algorithm. It knows nothing of string encodings. You need to supply it with byte streams to compress. And when you decompress, you are given back byte streams.
But you are working with strings, and so need to convert between encoded text and byte streams. The TStringStream class is doing that work in your code. You supply the string stream instance a text encoding when you create it.
Only your code does not supply an encoding. And so the default local ANSI encoding is used. And here's the first problem. That is not a full Unicode encoding. As soon as you use characters outside your local ANSI codepage the chain breaks down.
Solve that problem by supplying an encoding when you create string stream instances. Pass the encoding to the TStringStream constructor. A sound choice is TEncoding.UTF8. Pass this when creating strInput in the compressor, and strOutput in the decompressor.
Now the next and bigger problem that you face is that your compressed data may not be a meaningful string in any encoding. You might make your existing code sort of work if you switch to using AnsiString instead of string. But it's a rather brittle solution.
Fundamentally you are making the mistake of treating binary data as text. Once you compress you have binary data. My recommendation is that you don't attempt to interpret the compressed binary as text. Leave it as binary. Compress to a TBytesStream. And decompress from a TBytesStream. So the compressor function returns TBytes and the decompressor receives that same TBytes.
If, for some reason, you must compress to a string, then you must encode the compressed binary. Do that using base64. The EncdDecd unit can do that for you.
This flow for the compressor looks like this: string -> UTF-8 bytes -> compressed bytes -> base64 string. Obviously you reverse the arrows to decompress.

TSQLQuery.FieldByName().AsString -> TStringStream Corrupts Data

I'm using Delphi XE2. My code pulls data from a SQL-Server 2008 R2 database. The data returned is a nvarchar(max) field with 1,055,227 bytes of data. I use the following code to save the field data to a file:
procedure WriteFieldToFile(FieldName: string; Query: TSQLQuery);
var
ss: TStringStream;
begin
ss := TStringStream.Create;
try
ss.WriteString(Query.FieldByName(FieldName).AsString);
ss.Position := 0;
ss.SaveToFile('C:\Test.txt');
finally
FreeAndNil(ss);
end;
end;
When I inspect the file in a hex viewer, the first 524,287 bytes (exactly 1/2 meg) look correct. The remaining bytes (524,288 to 1,055,227) are all nulls (#0), instead of the original data.
Is this the right way to save a string field from a TSQLQuery to a file? I chose to use TStringStream because I will eventually add code to do other things to the data on the stream, which I can't do with a TFileStream.
TStringStream is TEncoding-aware in XE2, but you are not specifying any encoding in the constructor so TEncoding.Default will be used, meaning that any string you provide to it will internally be converted to the OS default Ansi encoding. Make sure that encoding supports the Unicode characters you are trying to work with, or else specify a more suitable encoding, such as TEncoding.UTF8.
Also make sure that AsString is returning a valid and correct UnicodeString value to begin with. TStringStream will not save the data correctly if it is given garbage as input. Make sure that FieldByName() is returning a pointer to a TWideStringField object and not a TStringField object in order to handle the database's Unicode data correctly.

HTTP Post text to SMS service getting %20 in text Delphi 2007

I'm using Indy to do a Post to an SMS service that will send the SMS, but the SMS text ends up on my phone with %20 instead of spaces, here is the code:
url,text:string;
IdHTTP1: TIdHTTP;
IdSSLIOHandlerSocketOpenSSL2: TIdSSLIOHandlerSocketOpenSSL;
begin
IdSSLIOHandlerSocketOpenSSL2 := TIdSSLIOHandlerSocketOpenSSL.Create;
IdHTTP1 := TIdHTTP.Create;
IdSSLIOHandlerSocketOpenSSL2.SSLOptions.Method := sslvSSLv23;
IdHTTP1.IOHandler := IdSSLIOHandlerSocketOpenSSL2;
IdHTTP1.HandleRedirects := true;
IdHTTP1.ReadTimeout := 5000;
param:=TStringList.create;
param.Clear;
param.Add('action=create');
param.Add('token=' + SMSToken);
param.Add('to=' + Phone);
param.Add('msg=' + MessageText);
url:='https://api.tropo.com/1.0/sessions';
try
text:=IdHTTP1.Post(url, param);
thanks
The TStrings version of TIdHTTP.Post() sends an application/x-www-form-urlencoded request to the server. The posted data is url-encoded by default. The server needs to decode the posted data before processing it. It sounds like the server-side code is not doing that correctly. You can remove the hoForceEncodeParams flag from the TIdHTTP.HTTPOptions property to disable the url-encoding of the posted data, but I would advise you to report the bug to Tropo instead so they can fix their server-side code.
TIdHTTP itself does not apply quoted-printable encoding to posted data, so the data being posted has to be quoted-printable encoded beforehand.
In Indy 10, you can use the TIdFormDataField.Charset property to specify how strings are converted to bytes, and then use the TIdFormDataField.ContentTransfer property to specify how the bytes are encoded. For the ContentTransfer, you can specify '7bit', '8bit', 'binary', 'quoted-printable', 'base64', or a blank string (which is equivilent to '7bit', but without stating as much in the MIME header).
Set the TIdFormDataField.CharSet property to a charset that matches what your OS is using, and then set the TIdFormDataField.ContentTransfer property to '8bit'.
Alternatively, use the TStream overloaded version of TIdMultipartFormDataStream.AddFormField() instead of the String overloaded version, then you can store data in your input TStream any way you wish and it will be encoded as-is based on the value of the TIdFormDataField.ContentTransfer property. This should remove the %20 you are getting.

Resources