I am trying to implement a POST to a web service. I need to send a file whose type is variable (.docx, .pdf, .txt) along with a JSON formatted string.
I have manage to post files successfully with code similar to the following:
procedure DoRequest;
var
Http: TIdHTTP;
Params: TIdMultipartFormDataStream;
RequestStream, ResponseStream: TStringStream;
JRequest, JResponse: TJSONObject;
url: string;
begin
url := 'some_custom_service'
JRequest := TJSONObject.Create;
JResponse := TJSONObject.Create;
try
JRequest.AddPair('Pair1', 'Value1');
JRequest.AddPair('Pair2', 'Value2');
JRequest.AddPair('Pair3', 'Value3');
Http := TIdHTTP.Create(nil);
ResponseStream := TStringStream.Create;
RequestStream := TStringStream.Create(UTF8Encode(JRequest.ToString));
try
Params := TIdMultipartFormDataStream.Create;
Params.AddFile('File', ceFileName.Text, '').ContentTransfer := '';
Params.AddFormField('Json', 'application/json', '', RequestStream);
Http.Post(url, Params, ResponseStream);
JResponse := TJSONObject.ParseJSONValue(ResponseStream.DataString) as TJSONObject;
finally
RequestStream.Free;
ResponseStream.Free;
Params.Free;
Http.Free;
end;
finally
JRequest.Free;
JResponse.Free;
end;
end;
The problem appears when I try to send a file that contains Greek characters and spaces in the filename. Sometimes it fails and sometimes it succeeds.
After a lot of research, I notice that the POST header is encoded by Indy's TIdFormDataField class using the EncodeHeader() function. When the post fails, the encoded filename in the header is split, compared to the successful post where is not split.
For example :
Επιστολή εκπαιδευτικο.docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#$D#$A' =?UTF-8?B?eA==?=, which fails.
Επιστολή εκπαιδευτικ.docx is encoded as
=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?=, which succeeds.
Επιστολή εκπαιδευτικ .docx is encoded as
=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx, which fails.
I have tried to change the encoding of the filename, the AContentType of the AddFile() procedure, and the ContentTransfer, but none of those change the behavior, and I still get errors when the encoded filename is split.
Is this some kind of bug, or am I missing something?
My code works for every case except those I described above.
I am using Delphi XE3 with Indy10.
EncodeHeader() does have some known issues with Unicode strings:
EncodeHeader() needs to take codeunits into account when splitting data between adjacent encoded-words
Basically, an MIME-encoded word cannot be more than 75 characters in length, so long text gets split up. But when encoding a long Unicode string, any given Unicode character may be charset-encoded using 1 or more bytes, and EncodeHeader() does not yet avoid erroneously splitting a multi-byte character between two individual bytes into separate encoded words (which is illegal and explicitly forbidden by RFC 2047 of the MIME spec).
However, that is not what is happening in your examples.
In your first example, 'Επιστολή εκπαιδευτικο.docx' is too long to be encoded as a single MIME word, so it gets split into 'Επιστολή εκπαιδευτικο.doc' 'x' substrings, which are then encoded separately. This is legal in MIME for long text (though you might have expected Indy to split the text into 'Επιστολή' ' εκπαιδευτικο.doc' instead, or even 'Επιστολή' ' εκπαιδευτικο' '.doc'. That might be a possibility in a future release). Adjacent MIME words that are separated by only whitespace are meant to be concatenated together without separating whitespace when decoded, thus producing 'Επιστολή εκπαιδευτικο.docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικο.doc x' instead?).
In your second example, 'Επιστολή εκπαιδευτικ.docx' is short enough to be encoded as a single MIME word.
In your third example, 'Επιστολή εκπαιδευτικ .docx' gets split on the second whitespace (not the first) into 'Επιστολή εκπαιδευτικ' ' .docx' substrings, and only the first substring needs to be encoded. This is legal in MIME. When decoded, the decoded text is meant to be concatenated with the following unencoded text, preserving whitespace between them, thus producing 'Επιστολή εκπαιδευτικ .docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικ.docx' instead?).
If you run these example filenames through Indy's MIME header encoder/decoder, they do decode properly:
var
s: String;
begin
s := EncodeHeader('Επιστολή εκπαιδευτικο.docx', '', 'B', 'UTF-8');
ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#13#10' =?UTF-8?B?eA==?='
s := DecodeHeader(s);
ShowMessage(s); // 'Επιστολή εκπαιδευτικο.docx'
s := EncodeHeader('Επιστολή εκπαιδευτικ.docx', '', 'B', 'UTF-8');
ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?='
s := DecodeHeader(s);
ShowMessage(s); // 'Επιστολή εκπαιδευτικ.docx'
s := EncodeHeader('Επιστολή εκπαιδευτικ .docx', '', 'B', 'UTF-8');
ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx'
s := DecodeHeader(s);
ShowMessage(s); // 'Επιστολή εκπαιδευτικ .docx'
end;
So the problem seems to be on the server side decoding, not on Indy's client side encoding.
That being said, if you are using a fairly recent version of Indy 10 (Nov 2011 or later), TIdFormDataField has a HeaderEncoding property, which defaults to 'B' (base64) in Unicode environments. However, the splitting logic also affects 'Q' (quoted-printable) as well, so that may or may not work for you, either (but you can try it):
with Params.AddFile('File', ceFileName.Text, '') do
begin
ContentTransfer := '';
HeaderEncoding := 'Q'; // <--- here
HeaderCharSet := 'utf-8';
end;
Otherwise, a workaround might be to change the value to '8' (8-bit) instead, which effectively disables MIME encoding (but not charset encoding):
with Params.AddFile('File', ceFileName.Text, '') do
begin
ContentTransfer := '';
HeaderEncoding := '8'; // <--- here
HeaderCharSet := 'utf-8';
end;
Just note that if the server is not expecting raw UTF-8 bytes for the filename, you might still run into problems (ie, 'Επιστολή εκπαιδευτικο.docx' being interpreted as 'Επιστολή εκπαιδευτικο.docx', for instance).
Related
I am trying to implement a POST to a web service. I need to send a file whose type is variable (.docx, .pdf, .txt) along with a JSON formatted string.
I have manage to post files successfully with code similar to the following:
procedure DoRequest;
var
Http: TIdHTTP;
Params: TIdMultipartFormDataStream;
RequestStream, ResponseStream: TStringStream;
JRequest, JResponse: TJSONObject;
url: string;
begin
url := 'some_custom_service'
JRequest := TJSONObject.Create;
JResponse := TJSONObject.Create;
try
JRequest.AddPair('Pair1', 'Value1');
JRequest.AddPair('Pair2', 'Value2');
JRequest.AddPair('Pair3', 'Value3');
Http := TIdHTTP.Create(nil);
ResponseStream := TStringStream.Create;
RequestStream := TStringStream.Create(UTF8Encode(JRequest.ToString));
try
Params := TIdMultipartFormDataStream.Create;
Params.AddFile('File', ceFileName.Text, '').ContentTransfer := '';
Params.AddFormField('Json', 'application/json', '', RequestStream);
Http.Post(url, Params, ResponseStream);
JResponse := TJSONObject.ParseJSONValue(ResponseStream.DataString) as TJSONObject;
finally
RequestStream.Free;
ResponseStream.Free;
Params.Free;
Http.Free;
end;
finally
JRequest.Free;
JResponse.Free;
end;
end;
The problem appears when I try to send a file that contains Greek characters and spaces in the filename. Sometimes it fails and sometimes it succeeds.
After a lot of research, I notice that the POST header is encoded by Indy's TIdFormDataField class using the EncodeHeader() function. When the post fails, the encoded filename in the header is split, compared to the successful post where is not split.
For example :
Επιστολή εκπαιδευτικο.docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#$D#$A' =?UTF-8?B?eA==?=, which fails.
Επιστολή εκπαιδευτικ.docx is encoded as
=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?=, which succeeds.
Επιστολή εκπαιδευτικ .docx is encoded as
=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx, which fails.
I have tried to change the encoding of the filename, the AContentType of the AddFile() procedure, and the ContentTransfer, but none of those change the behavior, and I still get errors when the encoded filename is split.
Is this some kind of bug, or am I missing something?
My code works for every case except those I described above.
I am using Delphi XE3 with Indy10.
EncodeHeader() does have some known issues with Unicode strings:
EncodeHeader() needs to take codeunits into account when splitting data between adjacent encoded-words
Basically, an MIME-encoded word cannot be more than 75 characters in length, so long text gets split up. But when encoding a long Unicode string, any given Unicode character may be charset-encoded using 1 or more bytes, and EncodeHeader() does not yet avoid erroneously splitting a multi-byte character between two individual bytes into separate encoded words (which is illegal and explicitly forbidden by RFC 2047 of the MIME spec).
However, that is not what is happening in your examples.
In your first example, 'Επιστολή εκπαιδευτικο.docx' is too long to be encoded as a single MIME word, so it gets split into 'Επιστολή εκπαιδευτικο.doc' 'x' substrings, which are then encoded separately. This is legal in MIME for long text (though you might have expected Indy to split the text into 'Επιστολή' ' εκπαιδευτικο.doc' instead, or even 'Επιστολή' ' εκπαιδευτικο' '.doc'. That might be a possibility in a future release). Adjacent MIME words that are separated by only whitespace are meant to be concatenated together without separating whitespace when decoded, thus producing 'Επιστολή εκπαιδευτικο.docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικο.doc x' instead?).
In your second example, 'Επιστολή εκπαιδευτικ.docx' is short enough to be encoded as a single MIME word.
In your third example, 'Επιστολή εκπαιδευτικ .docx' gets split on the second whitespace (not the first) into 'Επιστολή εκπαιδευτικ' ' .docx' substrings, and only the first substring needs to be encoded. This is legal in MIME. When decoded, the decoded text is meant to be concatenated with the following unencoded text, preserving whitespace between them, thus producing 'Επιστολή εκπαιδευτικ .docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικ.docx' instead?).
If you run these example filenames through Indy's MIME header encoder/decoder, they do decode properly:
var
s: String;
begin
s := EncodeHeader('Επιστολή εκπαιδευτικο.docx', '', 'B', 'UTF-8');
ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#13#10' =?UTF-8?B?eA==?='
s := DecodeHeader(s);
ShowMessage(s); // 'Επιστολή εκπαιδευτικο.docx'
s := EncodeHeader('Επιστολή εκπαιδευτικ.docx', '', 'B', 'UTF-8');
ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?='
s := DecodeHeader(s);
ShowMessage(s); // 'Επιστολή εκπαιδευτικ.docx'
s := EncodeHeader('Επιστολή εκπαιδευτικ .docx', '', 'B', 'UTF-8');
ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx'
s := DecodeHeader(s);
ShowMessage(s); // 'Επιστολή εκπαιδευτικ .docx'
end;
So the problem seems to be on the server side decoding, not on Indy's client side encoding.
That being said, if you are using a fairly recent version of Indy 10 (Nov 2011 or later), TIdFormDataField has a HeaderEncoding property, which defaults to 'B' (base64) in Unicode environments. However, the splitting logic also affects 'Q' (quoted-printable) as well, so that may or may not work for you, either (but you can try it):
with Params.AddFile('File', ceFileName.Text, '') do
begin
ContentTransfer := '';
HeaderEncoding := 'Q'; // <--- here
HeaderCharSet := 'utf-8';
end;
Otherwise, a workaround might be to change the value to '8' (8-bit) instead, which effectively disables MIME encoding (but not charset encoding):
with Params.AddFile('File', ceFileName.Text, '') do
begin
ContentTransfer := '';
HeaderEncoding := '8'; // <--- here
HeaderCharSet := 'utf-8';
end;
Just note that if the server is not expecting raw UTF-8 bytes for the filename, you might still run into problems (ie, 'Επιστολή εκπαιδευτικο.docx' being interpreted as 'Επιστολή εκπαιδευτικο.docx', for instance).
I need to send from Windows to mobile devices, iOS and Android, by TCP protocol, a big Base64 string.
I have no problem to send and receive, but the strings size are too big, about 24000 characters, and I'm looking at method to compress an decompress these strings.
Looking I see, that the best way is using the Zlib, and I found these link Delphi XE and ZLib Problems (II) in which explains how to do it.
The functions work with normal text string, but compressing base64 strings make they more big.
An example of a very small string that i would send, would be this:
cEJNYkpCSThLVEh6QjNFWC9wSGhXQ3lHWUlBcGNURS83TFdDNVUwUURxRnJvZlRVUWd4WEFWcFJBNUZSSE9JRXlsaWgzcEJvTGo5anQwTlEyd1pBTEtVQVlPbXdkKzJ6N3J5ZUd4SmU2bDNBWjFEd3lVZmZTR1FwNXRqWTVFOFd2SHRwakhDOU9JUEZRM00wMWhnU0p3MWxxNFRVdmdEU2pwekhwV2thS0JFNG9WYXRDUHhTdnp4blU5Vis2ZzJQYnRIdllubzhKSFhZeUlpckNtTGtUZHVHOTFncHVUWC9FSTdOK3JEUDBOVzlaTngrcEdxcXhpRWJ1ZXNUMmdxOXpJa0ZEak1ORHBFenFVSTlCdytHTy==
I don't know if is posible to compress this types of strings. I need help.
The functions that I use are this:
uses
SysUtils, Classes, ZLib, EncdDecd;
function CompressAndEncodeString(const Str: string): string;
var
Utf8Stream: TStringStream;
Compressed: TMemoryStream;
Base64Stream: TStringStream;
begin
Utf8Stream := TStringStream.Create(Str, TEncoding.UTF8);
try
Compressed := TMemoryStream.Create;
try
ZCompressStream(Utf8Stream, Compressed);
Compressed.Position := 0;
Base64Stream := TStringStream.Create('', TEncoding.ASCII);
try
EncodeStream(Compressed, Base64Stream);
Result := Base64Stream.DataString;
finally
Base64Stream.Free;
end;
finally
Compressed.Free;
end;
finally
Utf8Stream.Free;
end;
end;
function DecodeAndDecompressString(const Str: string): string;
var
Utf8Stream: TStringStream;
Compressed: TMemoryStream;
Base64Stream: TStringStream;
begin
Base64Stream := TStringStream.Create(Str, TEncoding.ASCII);
try
Compressed := TMemoryStream.Create;
try
DecodeStream(Base64Stream, Compressed);
Compressed.Position := 0;
Utf8Stream := TStringStream.Create('', TEncoding.UTF8);
try
ZDecompressStream(Compressed, Utf8Stream);
Result := Utf8Stream.DataString;
finally
Utf8Stream.Free;
end;
finally
Compressed.Free;
end;
finally
Base64Stream.Free;
end;
end;
As I understand the question you have done the following:
Encoding a string as UTF-8 bytes.
Compressed those bytes using zlib.
Base64 encoded the compressed bytes.
You then attempt to compress the output of step 3 and find that the result is no smaller. That is to be expected. You have already compressed the data, and further attempts to compress it cannot be expected to reduce the size significantly, especially not if you have base64 encoded in the meantime. If you could repeatedly compress data and have it get smaller each time, then eventually there would be nothing left. That is obviously not possible.
I think you are already doing a good job. You convert to UTF-8 which for most text is the most space effective of the Unicode encodings. If you worked with Chinese text then you'd be better off with UTF-16. You then compress the UTF-8 which is also reasonable. And finally for transmission you encode with base64, also reasonable.
The most obvious way for you to reduce the size of data to be transmitted is for you to omit the base64 step. If you can transmit the compressed bytes that are produced in step 2 then you will be transmitting less. Base64 uses 4 bytes to encode 3 bytes so the size of base64 encoded data is a third larger than the input data.
Another way could be to use a better compression algorithm than zlib, but again there are limits to what can be achieved. And usually better compression is achieved at the cost of increased computational time.
Delphi XE3, Indy 10.5.9.0
I am creating an interface between a computer and an instrument. The instrument uses ASTM protocol.
I have successfully sent text based messages back and forth between the server and client. I have been able to send control characters to the server and read those. What I have not figured out after 3 days of searching is how to write and read messages that have a mixture of control characters and text.
I am sending ASTM protocol messages which require control characters and text like the following line. Everything in angle brackets are control characters. Writing the message is not where I run into problems. It is when reading it since I will receive both text and control characters. My code below is how I read the control characters. How can I tell when I get the character whether it is a control character and when it is text in the same string of control and text characters? Thanks to Remy Lebeau and his posts on this site to get me where I am. He talked about how to use buffers but I couldn't tell how to read a buffer that contained control characters and text characters.
<STX>3O|1|G-13-00017||^^^HPV|R||||||N||||||||||||||O<CR><ETX>D3<CR><LF>
I have added the following code to my server components OnConnect event which is supposed to allows me to send control characters...
...
AContext.Connection.IOHandler.DefStringEncoding := TIdTextEncoding.UTF8;
...
My server OnExecute event...
procedure TTasksForm.IdTCPServer1Execute(AContext: TIdContext);
var
lastline : WideString;
lastcmd : WideString ;
lastbyte : Byte ;
begin
ServerTrafficMemo.Lines.Add('OnExecute') ;
lastline := '' ;
lastcmd := '' ;
lastbyte := (AContext.Connection.IOHandler.ReadByte) ;
if lastbyte = Byte(5) then
begin
lastcmd := '<ENQ>' ;
ServerTrafficMemo.Lines.Add(lastcmd) ;
AContext.Connection.IOHandler.WriteLn(lastcmd + ' received') ;
end;
end;
The only control characters present are STX and ETX, and they are both < 32, so ASCII and UTF-8 will both handle them just fine. Or, you can use Indy's own built-in 8bit encoding instead.
For this type of data, there are several different ways to read it with Indy. Since the bulk of the data is textual, and the control characters are just used as frame delimiters, the easiest way would be to use IOHandler.ReadLn() or IOHandler.WaitFor() with explicit terminators.
Of course, there are other options as well, such as reading bytes from the IOHandler.InputBuffer directly (which I think is overkill in this situation), using the InputBuffer.IndexOf() method to know how many bytes to read.
Also, TIdTCPServer is a multithreaded component, where its events are fired in worker threads, but your code is directly accessing the UI, which is not thread-safe. You MUST synchronize with the UI thread.
And you shouldn't be WideString, either. Use (Unicode)String instead.
Try something like this:
procedure TTasksForm.IdTCPServer1Connect (AContext: TIdContext);
begin AContext.Connection.IOHandler.DefStringEncoding := Indy8BitEncoding;
end;
procedure TTasksForm.IdTCPServer1Execute(AContext: TIdContext);
var
lastline : string;
lastcmd : string ;
lastbyte : Byte ;
begin
TThread.Synchronize(nil,
procedure
begin
ServerTrafficMemo.Lines.Add('OnExecute') ;
end
);
lastbyte := (AContext.Connection.IOHandler.ReadByte);
if lastbyte = $5 then
begin
lastcmd := '<ENQ>' ;
TThread.Synchronize(nil,
procedure
begin
ServerTrafficMemo.Lines.Add(lastcmd) ;
end
);
end
else if lastbyte = $2 then
begin
lastline := #2 + AContext.Connection.IOHandler.ReadLn(#3) + #3;
lastline := lastline + AContext.Connection.IOHandler.ReadLn(#13#10) + #13#10;
{ or:
lastline := #2 + AContext.Connection.IOHandler.WaitFor(#3, true, true);
lastline := lastline + AContext.Connection.IOHandler.WaitFor(#13#10, true, true);
}
lastcmd := '<STX>' ;
TThread.Synchronize(nil,
procedure
begin
ServerTrafficMemo.Lines.Add(lastcmd) ;
end
);
end;
AContext.Connection.IOHandler.WriteLn(lastcmd + ' received') ;
end;
I couldn't tell how to read a buffer that contained control characters and text characters
This protocol is no doubt using ASCII strings. Any characters below decimal 32 will be control characters. Those 32 and above will be data characters. See
http://ascii-table.com/ascii.php
Dealing with that as bytes works fine. You can also use ansistring, which is ASCII plus the top 127 characters. In this situation I would avoid UTF(any) and stick with either byte or ansistring. You need to control the message at the character level, and these characters are 8 bits per character with no escapes.
Alsosee the first example, in the first answer here:
Consider the following code snippet (in Delphi XE2):
function PrepData(StrVal: string; Base64Val: AnsiString): OleVariant;
begin
Result := VarArrayCreate([0, 1], varVariant);
Result[0] := StrVal;
Result[1] := Base64Val;
end;
Base64Val is a binary value encoded as Base64 (so no null bytes). The (OleVariant) Result is automatically marshalled and sent between a client app and a DataSnap server.
When I capture the traffic with Wireshark, I see that both StrVal and Base64Val are transferred as Unicode strings. If I can, I would like to avoid the Unicode conversion for Base64Val. I've looked at all the Variant types and don't see anything other than varString that can transfer an array of characters.
I found this question that shows how to create a variant array of bytes. I'm thinking that I could use this technique instead of using an AnsiString. I'm curious though, is there another way to assign an array of non-Unicode character data to a Variant without a conversion to a Unicode string?
Delphi's implementation supports storing AnsiString and UnicodeString in a Variant, using custom variant type codes. These codes are varString and varUString.
But interop will typically use standard OLE variants and the OLE string, varOleStr, is 16 bit encoded. That would seem to be the reason for your observation.
You'll need to put the data in as an array of bytes if you do wish to avoid a conversion to 16 bit text. Doing so renders base64 encoding pointless. Stop base64 encoding the payload and send the binary in a byte array.
Keeping with the example in the question, this is how I made it work (using code and comments from David's answer to another question as referenced in my question):
function PrepData(StrVal: string; Data: TBytes): OleVariant;
var
SafeArray: PVarArray;
begin
Result := VarArrayCreate([0, 1], varVariant);
Result[0] := StrVal;
Result[1] := VarArrayCreate([1, Length(Data)], varByte);
SafeArray := VarArrayAsPSafeArray(Result[1]);
Move(Pointer(Data)^, SafeArray.Data^, Length(Data));
end;
Then on the DataSnap server, I can extract the binary data from the OleVariant like this, assuming Value is Result[1] from the Variant Array in the OleVariant:
procedure GetBinaryData(Value: Variant; Result: TMemoryStream);
var
SafeArray: PVarArray;
begin
SafeArray := VarArrayAsPSafeArray(Value);
Assert(SafeArray.ElementSize=1);
Result.Clear;
Result.WriteBuffer(SafeArray.Data^, SafeArray.Bounds[0].ElementCount);
end;
We wrote a Delphi program that send some informations with CDO.
In my Win7 machine (hungarian) the accents are working fine.
So if I sent a mail with "ÁÉÍÓÖŐÚÜŰ", I got it in this format.
I used iso-8859-2 encoding in the body, and this encode the subject, and the email addresses to (the sender address is contains name).
I thought that I finished with this.
But when I try to send a mail from a Win2k3 english machine (the mailing server is same!), the result is truncate some accents:
Ű = U
Ő = O
Next I tried to use UTF-8 encoding here.
This provided accents - but wrong accents.
The mail contains accents with ^ signs.
ê <> é
This is not valid hungarian letter... :-(
So I want to know, how to I convert or setup the input to I got good result.
I tried to log the body to see is changes...
Log(SBody);
Msg.Body := SBody;
Log(Msg.Body);
... or not.
But these logs are providing good result, the input is good.
So it is possible lost and misconverted on CDO generate the message.
May I can help the CDO if I can encode the ANSI text into real UTF.
But in Delphi converter functions don't have "CodePage" parameters.
In Python I can said:
s.encode('iso-8859-2')
or
s.decode('iso-8859-2')
But in Delphi I don't see this parameter.
Is anybody knows, how to preserve the accents, how to convert the accented hungarian strings to preserve them accented format?
And I want to know, can I check the result without sending the mail?
Thanks for your help:
dd
a quick google search with "delphi string codepage" got me to torry's delphi pages
and maybe the following codesnippets (found here) can shed some light on your problem:
{:Converts Unicode string to Ansi string using specified code page.
#param ws Unicode string.
#param codePage Code page to be used in conversion.
#returns Converted ansi string.
}
function WideStringToString(const ws: WideString; codePage: Word): AnsiString;
var
l: integer;
begin
if ws = ' then
Result := '
else
begin
l := WideCharToMultiByte(codePage,
WC_COMPOSITECHECK or WC_DISCARDNS or WC_SEPCHARS or WC_DEFAULTCHAR,
#ws[1], - 1, nil, 0, nil, nil);
SetLength(Result, l - 1);
if l > 1 then
WideCharToMultiByte(codePage,
WC_COMPOSITECHECK or WC_DISCARDNS or WC_SEPCHARS or WC_DEFAULTCHAR,
#ws[1], - 1, #Result[1], l - 1, nil, nil);
end;
end; { WideStringToString }
{:Converts Ansi string to Unicode string using specified code page.
#param s Ansi string.
#param codePage Code page to be used in conversion.
#returns Converted wide string.
}
function StringToWideString(const s: AnsiString; codePage: Word): WideString;
var
l: integer;
begin
if s = ' then
Result := '
else
begin
l := MultiByteToWideChar(codePage, MB_PRECOMPOSED, PChar(#s[1]), - 1, nil, 0);
SetLength(Result, l - 1);
if l > 1 then
MultiByteToWideChar(CodePage, MB_PRECOMPOSED, PChar(#s[1]),
- 1, PWideChar(#Result[1]), l - 1);
end;
end; { StringToWideString }
--reinhard