Can TidHttpServer (Delphi XE2) handle urlencoded characters? - delphi

I have a TidHttpServer listening to port 8844 with the following code:
procedure TForm1.IdHTTPServer1CommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
if ARequestInfo.Document <> '/favicon.ico' then
Memo1.Text := ARequestInfo.Params.Text;
This is compiled with Delphi XE2. When I browse to
http://localhost:8844/document?Value=%F6 <-- %F6 is the encoded value for ö
...I get the result:
If i compile the application using Delphi 2007 I get the following result
Is this a bug in Indy of something that I have missed?

In XE2, strings are Unicode. When TIdHTTPServer decodes the ARequestInfo.Document in D2009 and later, it requires percent-encoded data to decode into UTF-8 encoded data, which is then decoded into the final Unicode string. There is currently no option to change that (I have submitted a feature request to our issue trackers for it). %F6 does not represent a valid UTF-8 octet, which is why you end up with '?'. In UTF-8, the 'ö' character would be UTF-8 encoded as $C3 $B6 and thus percent-encoded as %C3%B6, not %F6.
In D2007, strings are Ansi. When TIdHTTPServer decodes the ARequestInfo.Document in D2007 and earlier, it provides the decoded data as-is, thus %F6 would decode into $F6 and be stored as #246. That value is then interpretted by the RTL using whatever the local machine's default Ansi codepage is, so it would represent the 'ö' character only for Ansi codepages that define it that way (Windows-1252 and ISO-8859-1 do, but ISO-8859-5 does not, for example).
I would suggest you change your server logic to use UTF-8 encoded URLs in both Delphi versions. In D2007, you can use the RTL's UTF8Decode() function to decode a UTF-8 encoded AnsiString into a WideString, which you can then assign to another AnsiString to convert the data into the Ansi value you were originally expecting. In D009+, that is handled automatically for you.
On a side note, accessing a UI component directly in the OnCommandGet event is not thread-safe. ou have to synchronize with the main thread in order to access the UI safely.


Receiving Unicode strings with Indy 10

I am using the latest Delphi 10.4.2 with Indy 10.
In a REST server, JSON commands are received and handled. It works fine except for Unicode.
A simple JSON like this:
{"driverNote": "Test"}
is shown correctly
If I now change to Unicode Russian characters:
{"driverNote": "Статья"}
Not sure where I should begin to track this. I expect ARequestInfo.FormParams to have the same value in debugger as s variable.
If I debug Indy itself, FormParams are set in this code:
if LRequestInfo.PostStream <> nil then
// decoding percent-encoded octets and applying the CharSet is handled by
// DecodeAndSetParams() further below...
EnsureEncoding(LEncoding, enc8Bit);
LRequestInfo.FormParams :=
ReadStringFromStream( LRequestInfo.PostStream,
DoneWithPostStream(AContext, LRequestInfo); // don't need the PostStream anymore
It use enc8Bit. But my string has 16-bits characters.
Is this handled incorrect in Indy?
The code snippet you quoted from IdCustomHTTPServer.pas is not what is in Indy's GitHub repo.
In the official code, TIdHTTPServer does not decode the PostStream to FormParams unless the ContentType is 'application/x-www-form-urlencoded':
if LRequestInfo.PostStream <> nil then begin
if TextIsSame(LContentType, ContentTypeFormUrlencoded) then
// decoding percent-encoded octets and applying the CharSet is handled by DecodeAndSetParams() further below...
EnsureEncoding(LEncoding, enc8Bit);
LRequestInfo.FormParams := ReadStringFromStream(LRequestInfo.PostStream, -1, LEncoding{$IFDEF STRING_IS_ANSI}, LEncoding{$ENDIF});
DoneWithPostStream(AContext, LRequestInfo); // don't need the PostStream anymore
That ContentType check was added way back in 2010, so I don't know why it is not present in your version.
In your example, the ContentType is 'application/json', so the raw JSON should be in the PostStream and the FormParams should be blank.
That being said, in your version of Indy, TIdHTTPServer is simply reading the raw bytes from the PostStream and zero-extending each byte to a 16-bit character in the FormParams. To recover the original bytes, simply truncate each Char to an 8-bit Byte. For instance, you can use Indy's ToBytes() function in the IdGlobal unit, specifying enc8Bit/IndyTextEncoding_8Bit as the byte encoding.
JSON is most commonly transmitted as UTF-8 (and that is the case in your example), so when you have access to the raw bytes, in any version, make sure you parse the JSON bytes as UTF-8.

TIdTCPClient.IOHandler.Write(TStream) cannot send Big5?

Through TCPClient.IOHandler.Write(StmMsg);, the message is delivered to the frontend. English is ok, but Big5 cannot be delivered, why!!??
(StmMsg: TStringStream, the program has added... TCPClient.IOHandler.DefStringEncoding:= IndyTextEncoding_UTF8;)
The following is the code:
if not TCPClient.Connected then
deviceToken :=
MensajeEnviar := edtMensaje.Text;
strMessage := Get_Msg(deviceToken, Get_PayLoad(MensajeEnviar, 1,
StmMsg := TStringStream.Create(strMessage);
StmMsg.Seek(0, soBeginning);
Big5 is not a language. It is a byte encoding used for Chinese.
The TIdIOHandler.DefStringEncoding property applies only to string operations, not to stream operations. The TIdIOHandler.Write(TStream) method writes the content of a stream as-is. So, it is your responsibility to make sure the contents of the stream are encoded properly beforehand.
However, the TStringStream constructor you are calling uses TEncoding.Default for the stream's byte encoding. On Windows1, TEncoding.Default represents the default ANSI charset of the user that is running your program. An ANSI charset will not work for Chinese text, and will lose data.
1: on non-Windows platforms, TEncoding.Default uses UTF-8 instead.
You need to use TEncoding.UTF8 instead for the stream's byte encoding, eg:
StmMsg := TStringStream.Create(strMessage, TEncoding.UTF8);
Alternatively, you can remove the stream altogether and just use the TIdIOHandler.Write(String) method instead, which will then use the TIdIOHandler.DefStringEncoding property, eg:

HttpGetText(), autodetect charset, and convert source to UTF8

I'm using HttpGetText with Synapse for Delphi 7 Professional to get the source of a web page - but feel free to recommend any component and code.
The goal is to save some time by 'unifying' non-ASCII characters to a single charset, so I can process it with the same Delphi code.
So I'm looking for something similar to "Select All and Convert To UTF without BOM in Notepad++", if you know what I mean. ANSI instead of UTF8 would also be okay.
Webpages are encoded in 3 charsets: UTF8, "ISO-8859-1=Win 1252=ANSI" and straight up the alley HTML4 without charset spec, ie. htmlencoded Å type characters in the content.
If I need to code a PHP page that does the conversion, that's fine too. Whatever is the least code / time.
When you retreive a webpage, its Content-Type header (or sometimes a <meta> tag inside the HTML itself) tells you which charset is being used for the data. You would decode the data to Unicode using that charset, then you can encode the Unicode to whatever you need for your processing.
I instead did the reverse conversion directly after retrieving the HTML using GpTextStream. Making the documents conform to ISO-8859-1 made them processable using straight up Delphi, which saved quite a bit of code changes. On output all the data was converted to UTF-8 :)
Here's some code. Perhaps not the prettiest solution but it certainly got the job done in less time. Note that this is for the reverse conversion.
procedure UTF8FileTo88591(fileName: string);
const bufsize=1024*1024;
fs1,fs2: TFileStream;
ts1,ts2: TGpTextStream;
procedure LG2(ss:string);
//dont log for now.
fs1 := TFileStream.Create(fileName,fmOpenRead);
fs2 := TFileStream.Create(fileName+'_ISO88591.txt',fmCreate);
//compatible enough for my purposes with default 'Windows/Notepad' CP 1252 ANSI and Swe ANSI codepage, Latin1 etc.
//also works for ASCII sources with htmlencoded accent chars, naturally
LG2('Files opened OK.');
ts1 := TGpTextStream.Create(fs1,tsaccRead,[],CP_UTF8);
ts2 := TGpTextStream.Create(fs2,tsaccWrite,[],ISO_8859_1);
LG2(inttostr(siz)+' bytes read.');
if siz>0 then ts2.Write(buf^,siz);
LG2('Bytes read and written OK.');
finally FreeAndNil(fs1);FreeAndNil(fs2);FreeMem(buf);
LG2('Everything freed OK.');
end; // UTF8FileTo88591
