Delphi tidhttp encoding special characters - delphi

I have upgraded an app from D2007 to XE6. It posts data to a webserver.
I cannot work out what encoding will send the left and right quote characters correctly (code snippet below). I have tried every option I can find, but they get encoded as ? when sent (as far as I can see in WireShark).
D2007 had no problem, but XE6 is all about Unicode, and I am not sure if the problem is encoding or codepages or what.
Params := TIdMultipartFormDataStream.Create;
params.AddFormField('TEST', 'Test ‘n’ Try', 'utf8').ContentTransfer := '8bit';
IdHTTP1.Request.ContentType := 'text/plain';
IdHTTP1.Request.Charset := 'utf-8';
IdHTTP1.Post('http://test.com.au/TestEncoding.php', Params, Stream);

When calling params.AddFormField(), you are setting the charset to 'utf8', which is not a valid charset name. The official charset name is 'utf-8' instead:
params.AddFormField('TEST', 'Test ‘n’ Try', 'utf-8').ContentTransfer := '8bit';
When compiling for Unicode, an invalid charset ends up using Indy's built-in 8bit encoder, which encodes Unicode codepages > U+00FF as byte 0x3F ('?'). The quote characters you are using, ‘ and ’, are codepoints U+2018 and U+2019, respectively.
The reason you do not encounter this issue in D2007 is because the TIdFormDataField.Charset property is ignored for encoding purposes when compiling for Ansi. The TIdFormDataField.FieldValue property is an AnsiString, and its raw bytes get transmitted as-is, so you are required to ensure it is encoded properly before adding it to TIdMultipartFormDataStream, eg:
params.AddFormField('TEST', UTF8Encode('Test ‘n’ Try'), 'utf-8').ContentTransfer := '8bit';
On a side note, you do not need to set the Request.ContentType or Request.Charset properties when posting a TIdMultipartFormDataStream (and especially since 'text/plain' is an invalid content type for a MIME post anyway). This version of Post() will set those properties for you:
Params := TIdMultipartFormDataStream.Create;
params.AddFormField(...);
IdHTTP1.Post('http://test.com.au/TestEncoding.php', Params, Stream);

Related

Receiving Unicode strings with Indy 10

I am using the latest Delphi 10.4.2 with Indy 10.
In a REST server, JSON commands are received and handled. It works fine except for Unicode.
A simple JSON like this:
{"driverNote": "Test"}
is shown correctly
If I now change to Unicode Russian characters:
{"driverNote": "Статья"}
Not sure where I should begin to track this. I expect ARequestInfo.FormParams to have the same value in debugger as s variable.
If I debug Indy itself, FormParams are set in this code:
if LRequestInfo.PostStream <> nil then
begin
// decoding percent-encoded octets and applying the CharSet is handled by
// DecodeAndSetParams() further below...
EnsureEncoding(LEncoding, enc8Bit);
LRequestInfo.FormParams :=
ReadStringFromStream( LRequestInfo.PostStream,
-1,
LEncoding
{$IFDEF STRING_IS_ANSI}, LEncoding{$ENDIF});
DoneWithPostStream(AContext, LRequestInfo); // don't need the PostStream anymore
end;
It use enc8Bit. But my string has 16-bits characters.
Is this handled incorrect in Indy?
The code snippet you quoted from IdCustomHTTPServer.pas is not what is in Indy's GitHub repo.
In the official code, TIdHTTPServer does not decode the PostStream to FormParams unless the ContentType is 'application/x-www-form-urlencoded':
if LRequestInfo.PostStream <> nil then begin
if TextIsSame(LContentType, ContentTypeFormUrlencoded) then
begin
// decoding percent-encoded octets and applying the CharSet is handled by DecodeAndSetParams() further below...
EnsureEncoding(LEncoding, enc8Bit);
LRequestInfo.FormParams := ReadStringFromStream(LRequestInfo.PostStream, -1, LEncoding{$IFDEF STRING_IS_ANSI}, LEncoding{$ENDIF});
DoneWithPostStream(AContext, LRequestInfo); // don't need the PostStream anymore
end;
end;
That ContentType check was added way back in 2010, so I don't know why it is not present in your version.
In your example, the ContentType is 'application/json', so the raw JSON should be in the PostStream and the FormParams should be blank.
That being said, in your version of Indy, TIdHTTPServer is simply reading the raw bytes from the PostStream and zero-extending each byte to a 16-bit character in the FormParams. To recover the original bytes, simply truncate each Char to an 8-bit Byte. For instance, you can use Indy's ToBytes() function in the IdGlobal unit, specifying enc8Bit/IndyTextEncoding_8Bit as the byte encoding.
JSON is most commonly transmitted as UTF-8 (and that is the case in your example), so when you have access to the raw bytes, in any version, make sure you parse the JSON bytes as UTF-8.

TIdTCPClient.IOHandler.Write(TStream) cannot send Big5?

Through TCPClient.IOHandler.Write(StmMsg);, the message is delivered to the frontend. English is ok, but Big5 cannot be delivered, why!!??
(StmMsg: TStringStream, the program has added... TCPClient.IOHandler.DefStringEncoding:= IndyTextEncoding_UTF8;)
The following is the code:
if not TCPClient.Connected then
TCPClient.Connect;
deviceToken :=
'6aa5bfcfe731ab29b260fab38a43f1e1abac0de3d6e8e0bc5f4b89c422938e8f';
MensajeEnviar := edtMensaje.Text;
strMessage := Get_Msg(deviceToken, Get_PayLoad(MensajeEnviar, 1,
'default'));
StmMsg := TStringStream.Create(strMessage);
StmMsg.Seek(0, soBeginning);
TCPClient.IOHandler.Write(StmMsg);
Big5 is not a language. It is a byte encoding used for Chinese.
The TIdIOHandler.DefStringEncoding property applies only to string operations, not to stream operations. The TIdIOHandler.Write(TStream) method writes the content of a stream as-is. So, it is your responsibility to make sure the contents of the stream are encoded properly beforehand.
However, the TStringStream constructor you are calling uses TEncoding.Default for the stream's byte encoding. On Windows1, TEncoding.Default represents the default ANSI charset of the user that is running your program. An ANSI charset will not work for Chinese text, and will lose data.
1: on non-Windows platforms, TEncoding.Default uses UTF-8 instead.
You need to use TEncoding.UTF8 instead for the stream's byte encoding, eg:
StmMsg := TStringStream.Create(strMessage, TEncoding.UTF8);
Alternatively, you can remove the stream altogether and just use the TIdIOHandler.Write(String) method instead, which will then use the TIdIOHandler.DefStringEncoding property, eg:
TCPClient.IOHandler.Write(strMessage);

Delphi7 Base64 encode UTF8 XML

I'm still using Delphi7 (I know) and I need to encode an UTF8 XML in Base64 format.
I create the XML using IXMLDocument, which support UTF8 (that is, if I save to a file).
Since I'm using Indy10 to HTTP Post the XML request, I tried using TIdEncoderMIME to Base64 encode the XML. But some UTF8 chars are not encoded well.
Try1:
XMLText := XML.XML.Text;
EncodedXML := TIdEncoderMIME.EncodeBytes(ToBytes(XMLText));
In the above case most probably some UTF8 information/characters are already lost when the XML is saved to a string.
Try2:
XMLStream := TMemoryStream.Create;
XML.SaveToStream(XMLStream);
EncodedXML := TIdEncoderMIME.EncodeStream(XMLStream);
//or
EncodedXML := TIdEncoderMIME.EncodeStream(XMLStream, XMLStream.Size);
Both of the above gives back EncodedXML = '' (empty string).
What am I doing wrong?
Try using the TIdEncoderMIME.EncodeString() method instead. It has an AByteEncoding parameter that you can use to specify the desired byte encoding that Indy should encode the string characters as, such as UTF-8, before it then base64 encodes the resulting bytes:
XMLText := XML.XML.Text;
EncodedXML := TIdEncoderMIME.EncodeString(XMLText, IndyTextEncoding_UTF8);
Also note that in Delphi 2007 and earlier, where string is AnsiString, there is also an optional ASrcEncoding that you can use the specify the encoding of the AnsiString (for instance, if it is already UTF-8), so that it can be decoded to Unicode properly before then being encoded to the specified byte encoding (or, in the case where the two encodings are the same, the AnsiString can be base64 encoded as-is):
XMLText := XML.XML.Text;
EncodedXML := TIdEncoderMIME.EncodeString(XMLText, IndyTextEncoding_UTF8, IndyTextEncoding_UTF8);
You are getting data loss when using EncodeBytes() because you are using ToBytes() without specifying any encoding parameters for it. ToBytes() has similar AByteEncoding and ASrcEncoding parameters.
In the case where you tried to encode a TMemoryStream, you simply forgot to reset the stream's Position back to 0 after calling SaveToStream(), so there was nothing for EncodeStream() to encode. That is why it returned a blank base64 string:
XMLStream := TMemoryStream.Create;
try
XML.SaveToStream(XMLStream);
XMLStream.Position := 0; // <-- add this
EncodedXML := TIdEncoderMIME.EncodeStream(XMLStream);
finally
XMLStream.Free;
end;

Can TidHttpServer (Delphi XE2) handle urlencoded characters?

I have a TidHttpServer listening to port 8844 with the following code:
procedure TForm1.IdHTTPServer1CommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
begin
if ARequestInfo.Document <> '/favicon.ico' then
begin
Memo1.Text := ARequestInfo.Params.Text;
end;
end;
This is compiled with Delphi XE2. When I browse to
http://localhost:8844/document?Value=%F6 <-- %F6 is the encoded value for ö
...I get the result:
value=?
If i compile the application using Delphi 2007 I get the following result
value=ö
Is this a bug in Indy of something that I have missed?
In XE2, strings are Unicode. When TIdHTTPServer decodes the ARequestInfo.Document in D2009 and later, it requires percent-encoded data to decode into UTF-8 encoded data, which is then decoded into the final Unicode string. There is currently no option to change that (I have submitted a feature request to our issue trackers for it). %F6 does not represent a valid UTF-8 octet, which is why you end up with '?'. In UTF-8, the 'ö' character would be UTF-8 encoded as $C3 $B6 and thus percent-encoded as %C3%B6, not %F6.
In D2007, strings are Ansi. When TIdHTTPServer decodes the ARequestInfo.Document in D2007 and earlier, it provides the decoded data as-is, thus %F6 would decode into $F6 and be stored as #246. That value is then interpretted by the RTL using whatever the local machine's default Ansi codepage is, so it would represent the 'ö' character only for Ansi codepages that define it that way (Windows-1252 and ISO-8859-1 do, but ISO-8859-5 does not, for example).
I would suggest you change your server logic to use UTF-8 encoded URLs in both Delphi versions. In D2007, you can use the RTL's UTF8Decode() function to decode a UTF-8 encoded AnsiString into a WideString, which you can then assign to another AnsiString to convert the data into the Ansi value you were originally expecting. In D009+, that is handled automatically for you.
On a side note, accessing a UI component directly in the OnCommandGet event is not thread-safe. ou have to synchronize with the main thread in order to access the UI safely.

HTTP Post text to SMS service getting %20 in text Delphi 2007

I'm using Indy to do a Post to an SMS service that will send the SMS, but the SMS text ends up on my phone with %20 instead of spaces, here is the code:
url,text:string;
IdHTTP1: TIdHTTP;
IdSSLIOHandlerSocketOpenSSL2: TIdSSLIOHandlerSocketOpenSSL;
begin
IdSSLIOHandlerSocketOpenSSL2 := TIdSSLIOHandlerSocketOpenSSL.Create;
IdHTTP1 := TIdHTTP.Create;
IdSSLIOHandlerSocketOpenSSL2.SSLOptions.Method := sslvSSLv23;
IdHTTP1.IOHandler := IdSSLIOHandlerSocketOpenSSL2;
IdHTTP1.HandleRedirects := true;
IdHTTP1.ReadTimeout := 5000;
param:=TStringList.create;
param.Clear;
param.Add('action=create');
param.Add('token=' + SMSToken);
param.Add('to=' + Phone);
param.Add('msg=' + MessageText);
url:='https://api.tropo.com/1.0/sessions';
try
text:=IdHTTP1.Post(url, param);
thanks
The TStrings version of TIdHTTP.Post() sends an application/x-www-form-urlencoded request to the server. The posted data is url-encoded by default. The server needs to decode the posted data before processing it. It sounds like the server-side code is not doing that correctly. You can remove the hoForceEncodeParams flag from the TIdHTTP.HTTPOptions property to disable the url-encoding of the posted data, but I would advise you to report the bug to Tropo instead so they can fix their server-side code.
TIdHTTP itself does not apply quoted-printable encoding to posted data, so the data being posted has to be quoted-printable encoded beforehand.
In Indy 10, you can use the TIdFormDataField.Charset property to specify how strings are converted to bytes, and then use the TIdFormDataField.ContentTransfer property to specify how the bytes are encoded. For the ContentTransfer, you can specify '7bit', '8bit', 'binary', 'quoted-printable', 'base64', or a blank string (which is equivilent to '7bit', but without stating as much in the MIME header).
Set the TIdFormDataField.CharSet property to a charset that matches what your OS is using, and then set the TIdFormDataField.ContentTransfer property to '8bit'.
Alternatively, use the TStream overloaded version of TIdMultipartFormDataStream.AddFormField() instead of the String overloaded version, then you can store data in your input TStream any way you wish and it will be encoded as-is based on the value of the TIdFormDataField.ContentTransfer property. This should remove the %20 you are getting.

Resources