I have a TidHttpServer listening to port 8844 with the following code:
procedure TForm1.IdHTTPServer1CommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
begin
if ARequestInfo.Document <> '/favicon.ico' then
begin
Memo1.Text := ARequestInfo.Params.Text;
end;
end;
This is compiled with Delphi XE2. When I browse to
http://localhost:8844/document?Value=%F6 <-- %F6 is the encoded value for ö
...I get the result:
value=?
If i compile the application using Delphi 2007 I get the following result
value=ö
Is this a bug in Indy of something that I have missed?
In XE2, strings are Unicode. When TIdHTTPServer decodes the ARequestInfo.Document in D2009 and later, it requires percent-encoded data to decode into UTF-8 encoded data, which is then decoded into the final Unicode string. There is currently no option to change that (I have submitted a feature request to our issue trackers for it). %F6 does not represent a valid UTF-8 octet, which is why you end up with '?'. In UTF-8, the 'ö' character would be UTF-8 encoded as $C3 $B6 and thus percent-encoded as %C3%B6, not %F6.
In D2007, strings are Ansi. When TIdHTTPServer decodes the ARequestInfo.Document in D2007 and earlier, it provides the decoded data as-is, thus %F6 would decode into $F6 and be stored as #246. That value is then interpretted by the RTL using whatever the local machine's default Ansi codepage is, so it would represent the 'ö' character only for Ansi codepages that define it that way (Windows-1252 and ISO-8859-1 do, but ISO-8859-5 does not, for example).
I would suggest you change your server logic to use UTF-8 encoded URLs in both Delphi versions. In D2007, you can use the RTL's UTF8Decode() function to decode a UTF-8 encoded AnsiString into a WideString, which you can then assign to another AnsiString to convert the data into the Ansi value you were originally expecting. In D009+, that is handled automatically for you.
On a side note, accessing a UI component directly in the OnCommandGet event is not thread-safe. ou have to synchronize with the main thread in order to access the UI safely.
Related
I am using the latest Delphi 10.4.2 with Indy 10.
In a REST server, JSON commands are received and handled. It works fine except for Unicode.
A simple JSON like this:
{"driverNote": "Test"}
is shown correctly
If I now change to Unicode Russian characters:
{"driverNote": "Статья"}
Not sure where I should begin to track this. I expect ARequestInfo.FormParams to have the same value in debugger as s variable.
If I debug Indy itself, FormParams are set in this code:
if LRequestInfo.PostStream <> nil then
begin
// decoding percent-encoded octets and applying the CharSet is handled by
// DecodeAndSetParams() further below...
EnsureEncoding(LEncoding, enc8Bit);
LRequestInfo.FormParams :=
ReadStringFromStream( LRequestInfo.PostStream,
-1,
LEncoding
{$IFDEF STRING_IS_ANSI}, LEncoding{$ENDIF});
DoneWithPostStream(AContext, LRequestInfo); // don't need the PostStream anymore
end;
It use enc8Bit. But my string has 16-bits characters.
Is this handled incorrect in Indy?
The code snippet you quoted from IdCustomHTTPServer.pas is not what is in Indy's GitHub repo.
In the official code, TIdHTTPServer does not decode the PostStream to FormParams unless the ContentType is 'application/x-www-form-urlencoded':
if LRequestInfo.PostStream <> nil then begin
if TextIsSame(LContentType, ContentTypeFormUrlencoded) then
begin
// decoding percent-encoded octets and applying the CharSet is handled by DecodeAndSetParams() further below...
EnsureEncoding(LEncoding, enc8Bit);
LRequestInfo.FormParams := ReadStringFromStream(LRequestInfo.PostStream, -1, LEncoding{$IFDEF STRING_IS_ANSI}, LEncoding{$ENDIF});
DoneWithPostStream(AContext, LRequestInfo); // don't need the PostStream anymore
end;
end;
That ContentType check was added way back in 2010, so I don't know why it is not present in your version.
In your example, the ContentType is 'application/json', so the raw JSON should be in the PostStream and the FormParams should be blank.
That being said, in your version of Indy, TIdHTTPServer is simply reading the raw bytes from the PostStream and zero-extending each byte to a 16-bit character in the FormParams. To recover the original bytes, simply truncate each Char to an 8-bit Byte. For instance, you can use Indy's ToBytes() function in the IdGlobal unit, specifying enc8Bit/IndyTextEncoding_8Bit as the byte encoding.
JSON is most commonly transmitted as UTF-8 (and that is the case in your example), so when you have access to the raw bytes, in any version, make sure you parse the JSON bytes as UTF-8.
Through TCPClient.IOHandler.Write(StmMsg);, the message is delivered to the frontend. English is ok, but Big5 cannot be delivered, why!!??
(StmMsg: TStringStream, the program has added... TCPClient.IOHandler.DefStringEncoding:= IndyTextEncoding_UTF8;)
The following is the code:
if not TCPClient.Connected then
TCPClient.Connect;
deviceToken :=
'6aa5bfcfe731ab29b260fab38a43f1e1abac0de3d6e8e0bc5f4b89c422938e8f';
MensajeEnviar := edtMensaje.Text;
strMessage := Get_Msg(deviceToken, Get_PayLoad(MensajeEnviar, 1,
'default'));
StmMsg := TStringStream.Create(strMessage);
StmMsg.Seek(0, soBeginning);
TCPClient.IOHandler.Write(StmMsg);
Big5 is not a language. It is a byte encoding used for Chinese.
The TIdIOHandler.DefStringEncoding property applies only to string operations, not to stream operations. The TIdIOHandler.Write(TStream) method writes the content of a stream as-is. So, it is your responsibility to make sure the contents of the stream are encoded properly beforehand.
However, the TStringStream constructor you are calling uses TEncoding.Default for the stream's byte encoding. On Windows1, TEncoding.Default represents the default ANSI charset of the user that is running your program. An ANSI charset will not work for Chinese text, and will lose data.
1: on non-Windows platforms, TEncoding.Default uses UTF-8 instead.
You need to use TEncoding.UTF8 instead for the stream's byte encoding, eg:
StmMsg := TStringStream.Create(strMessage, TEncoding.UTF8);
Alternatively, you can remove the stream altogether and just use the TIdIOHandler.Write(String) method instead, which will then use the TIdIOHandler.DefStringEncoding property, eg:
TCPClient.IOHandler.Write(strMessage);
This line:
TFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
drops an exception if the filename contain something like ñ
You are, ultimately calling CreateFileA, the ANSI API, and the characters you use have no ANSI encoding. The only way to get beyond this is to open the file with CreateFileW, the Unicode API.
You might not realise that you call CreateFileA, but that's how the Delphi 7 file stream is implemented.
One easy way to solve your problems is to upgrade to the latest Delphi which has good support for the native Windows Unicode API.
If you are stuck with ANSI Delphi then you still need to call CreateFileW. You can do this to create a file handle. You'll need to pass a UTF-16 string to that API. Use WideString to store it. You'll also need to get the filename from the user in UTF-16 form. Which means a call to GetOpenFileNameW or IFileDialog. Create a stream by passing the file handle to THandleStream.
To make all this possible you would use the TNT Unicode libraries. They work well but will impose a big port on you.
Frankly, the right way forward is to use modern tools that support Unicode.
You can use the TntUnicode units to have UTF8 support under Delphi 7.
Add TntClasses to your Uses and make the call like this:
TTntFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
Make sure that fileName is widestring.
Here you can get a copy of TntUnicode:
https://github.com/rofl0r/TntUnicode
UTF16 can be thought of as a codepage, just like all of the possible ANSI codepages.
As Remy mentions in his comment, assuming your ANSI codepage supports the required characters in your Unicode string you simply have to convert that Unicode version of that string to the equivalent ANSI codepage version.
The Delphi compiler can take care of a simple conversion for you automatically, which you use simply by casting a WIDEString (UTF16) to an (ANSI)String:
const
WIDE_FILENAME : WIDEString = 'fuññy.txt';
var
sFilename: String;
strm: TFileStream;
begin
sFilename := String(WIDE_FILENAME);
strm := TFileStream.Create(sFilename, fmOpenRead);
// etc
end;
This works perfectly well even on (e.g.) Delphi 7. The only caveat is that the codepage involved (the system default) must support the extended characters in the Unicode string.
NOTE: The above code uses the String type rather than ANSIString explicitly. On Delphi versions where String is ANSIString, this has the required effect but also is portable to versions where String is UnicodeString (should you upgrade your version later).
If you use ANSIString explicitly in this case, the result will be a double conversion if/when you upgrade:
// Unicode compiler using ANSIString type....
var
sFilename: ANSIString;
begin
sFilename := ANSIString(WIDE_FILENAME); // Codepage conversion from UTF16 to ANSI
strm := TFileStream.Create(sFilename, fmOpenRead); // Will implicitly convert *back* from ANSI to WIDE
versus
// Unicode compiler using String type....
var
sFilename: String;
begin
sFilename := String(WIDE_FILENAME); // String type conversion from WideString to UnicodeString
strm := TFileStream.Create(sFilename, fmOpenRead); // No further conversion necessary
Best solution is to go Unicode, but if that is not an option, you can still solve the problem.
In Windows you can set what codepage to use for non-Unicode programs. Just change it to support the correct language (Spanish?). Then the code should work.
Windows 7: Control Panel > Region and Language > Administrative > Language for non-Unicode programs
Windows XP: Control Panel > Regional and Language > Advanced > Language for non-Unicode programs
This line:
TFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
drops an exception if the filename contain something like ñ
You are, ultimately calling CreateFileA, the ANSI API, and the characters you use have no ANSI encoding. The only way to get beyond this is to open the file with CreateFileW, the Unicode API.
You might not realise that you call CreateFileA, but that's how the Delphi 7 file stream is implemented.
One easy way to solve your problems is to upgrade to the latest Delphi which has good support for the native Windows Unicode API.
If you are stuck with ANSI Delphi then you still need to call CreateFileW. You can do this to create a file handle. You'll need to pass a UTF-16 string to that API. Use WideString to store it. You'll also need to get the filename from the user in UTF-16 form. Which means a call to GetOpenFileNameW or IFileDialog. Create a stream by passing the file handle to THandleStream.
To make all this possible you would use the TNT Unicode libraries. They work well but will impose a big port on you.
Frankly, the right way forward is to use modern tools that support Unicode.
You can use the TntUnicode units to have UTF8 support under Delphi 7.
Add TntClasses to your Uses and make the call like this:
TTntFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
Make sure that fileName is widestring.
Here you can get a copy of TntUnicode:
https://github.com/rofl0r/TntUnicode
UTF16 can be thought of as a codepage, just like all of the possible ANSI codepages.
As Remy mentions in his comment, assuming your ANSI codepage supports the required characters in your Unicode string you simply have to convert that Unicode version of that string to the equivalent ANSI codepage version.
The Delphi compiler can take care of a simple conversion for you automatically, which you use simply by casting a WIDEString (UTF16) to an (ANSI)String:
const
WIDE_FILENAME : WIDEString = 'fuññy.txt';
var
sFilename: String;
strm: TFileStream;
begin
sFilename := String(WIDE_FILENAME);
strm := TFileStream.Create(sFilename, fmOpenRead);
// etc
end;
This works perfectly well even on (e.g.) Delphi 7. The only caveat is that the codepage involved (the system default) must support the extended characters in the Unicode string.
NOTE: The above code uses the String type rather than ANSIString explicitly. On Delphi versions where String is ANSIString, this has the required effect but also is portable to versions where String is UnicodeString (should you upgrade your version later).
If you use ANSIString explicitly in this case, the result will be a double conversion if/when you upgrade:
// Unicode compiler using ANSIString type....
var
sFilename: ANSIString;
begin
sFilename := ANSIString(WIDE_FILENAME); // Codepage conversion from UTF16 to ANSI
strm := TFileStream.Create(sFilename, fmOpenRead); // Will implicitly convert *back* from ANSI to WIDE
versus
// Unicode compiler using String type....
var
sFilename: String;
begin
sFilename := String(WIDE_FILENAME); // String type conversion from WideString to UnicodeString
strm := TFileStream.Create(sFilename, fmOpenRead); // No further conversion necessary
Best solution is to go Unicode, but if that is not an option, you can still solve the problem.
In Windows you can set what codepage to use for non-Unicode programs. Just change it to support the correct language (Spanish?). Then the code should work.
Windows 7: Control Panel > Region and Language > Administrative > Language for non-Unicode programs
Windows XP: Control Panel > Regional and Language > Advanced > Language for non-Unicode programs
I'm using HttpGetText with Synapse for Delphi 7 Professional to get the source of a web page - but feel free to recommend any component and code.
The goal is to save some time by 'unifying' non-ASCII characters to a single charset, so I can process it with the same Delphi code.
So I'm looking for something similar to "Select All and Convert To UTF without BOM in Notepad++", if you know what I mean. ANSI instead of UTF8 would also be okay.
Webpages are encoded in 3 charsets: UTF8, "ISO-8859-1=Win 1252=ANSI" and straight up the alley HTML4 without charset spec, ie. htmlencoded Å type characters in the content.
If I need to code a PHP page that does the conversion, that's fine too. Whatever is the least code / time.
When you retreive a webpage, its Content-Type header (or sometimes a <meta> tag inside the HTML itself) tells you which charset is being used for the data. You would decode the data to Unicode using that charset, then you can encode the Unicode to whatever you need for your processing.
I instead did the reverse conversion directly after retrieving the HTML using GpTextStream. Making the documents conform to ISO-8859-1 made them processable using straight up Delphi, which saved quite a bit of code changes. On output all the data was converted to UTF-8 :)
Here's some code. Perhaps not the prettiest solution but it certainly got the job done in less time. Note that this is for the reverse conversion.
procedure UTF8FileTo88591(fileName: string);
const bufsize=1024*1024;
var
fs1,fs2: TFileStream;
ts1,ts2: TGpTextStream;
buf:PChar;
siz:integer;
procedure LG2(ss:string);
begin
//dont log for now.
end;
begin
fs1 := TFileStream.Create(fileName,fmOpenRead);
fs2 := TFileStream.Create(fileName+'_ISO88591.txt',fmCreate);
//compatible enough for my purposes with default 'Windows/Notepad' CP 1252 ANSI and Swe ANSI codepage, Latin1 etc.
//also works for ASCII sources with htmlencoded accent chars, naturally
try
LG2('Files opened OK.');
GetMem(buf,bufsize);
ts1 := TGpTextStream.Create(fs1,tsaccRead,[],CP_UTF8);
ts2 := TGpTextStream.Create(fs2,tsaccWrite,[],ISO_8859_1);
try
siz:=ts1.Read(buf^,bufsize);
LG2(inttostr(siz)+' bytes read.');
if siz>0 then ts2.Write(buf^,siz);
finally
LG2('Bytes read and written OK.');
FreeAndNil(ts1);FreeAndNil(ts2);end;
finally FreeAndNil(fs1);FreeAndNil(fs2);FreeMem(buf);
LG2('Everything freed OK.');
end;
end; // UTF8FileTo88591