We wrote a Delphi program that send some informations with CDO.
In my Win7 machine (hungarian) the accents are working fine.
So if I sent a mail with "ÁÉÍÓÖŐÚÜŰ", I got it in this format.
I used iso-8859-2 encoding in the body, and this encode the subject, and the email addresses to (the sender address is contains name).
I thought that I finished with this.
But when I try to send a mail from a Win2k3 english machine (the mailing server is same!), the result is truncate some accents:
Ű = U
Ő = O
Next I tried to use UTF-8 encoding here.
This provided accents - but wrong accents.
The mail contains accents with ^ signs.
ê <> é
This is not valid hungarian letter... :-(
So I want to know, how to I convert or setup the input to I got good result.
I tried to log the body to see is changes...
Log(SBody);
Msg.Body := SBody;
Log(Msg.Body);
... or not.
But these logs are providing good result, the input is good.
So it is possible lost and misconverted on CDO generate the message.
May I can help the CDO if I can encode the ANSI text into real UTF.
But in Delphi converter functions don't have "CodePage" parameters.
In Python I can said:
s.encode('iso-8859-2')
or
s.decode('iso-8859-2')
But in Delphi I don't see this parameter.
Is anybody knows, how to preserve the accents, how to convert the accented hungarian strings to preserve them accented format?
And I want to know, can I check the result without sending the mail?
Thanks for your help:
dd
a quick google search with "delphi string codepage" got me to torry's delphi pages
and maybe the following codesnippets (found here) can shed some light on your problem:
{:Converts Unicode string to Ansi string using specified code page.
#param ws Unicode string.
#param codePage Code page to be used in conversion.
#returns Converted ansi string.
}
function WideStringToString(const ws: WideString; codePage: Word): AnsiString;
var
l: integer;
begin
if ws = ' then
Result := '
else
begin
l := WideCharToMultiByte(codePage,
WC_COMPOSITECHECK or WC_DISCARDNS or WC_SEPCHARS or WC_DEFAULTCHAR,
#ws[1], - 1, nil, 0, nil, nil);
SetLength(Result, l - 1);
if l > 1 then
WideCharToMultiByte(codePage,
WC_COMPOSITECHECK or WC_DISCARDNS or WC_SEPCHARS or WC_DEFAULTCHAR,
#ws[1], - 1, #Result[1], l - 1, nil, nil);
end;
end; { WideStringToString }
{:Converts Ansi string to Unicode string using specified code page.
#param s Ansi string.
#param codePage Code page to be used in conversion.
#returns Converted wide string.
}
function StringToWideString(const s: AnsiString; codePage: Word): WideString;
var
l: integer;
begin
if s = ' then
Result := '
else
begin
l := MultiByteToWideChar(codePage, MB_PRECOMPOSED, PChar(#s[1]), - 1, nil, 0);
SetLength(Result, l - 1);
if l > 1 then
MultiByteToWideChar(CodePage, MB_PRECOMPOSED, PChar(#s[1]),
- 1, PWideChar(#Result[1]), l - 1);
end;
end; { StringToWideString }
--reinhard
Related
I need to draw a barcode (QR) with Delphi 6/7. The program can run in various windows locales, and the data is from an input box.
On this input box, the user can choose a charset, and input his own language. This works fine. The input data is only ever from the same codepage. Example configurations could be:
Windows is on Western Europe, Codepage 1252 for ANSI text
Input is done in Shift-JIS ANSI charset
I need to get the Shift-JIS across to the barcode. The most robust way is to use hex encoding.
So my question is: how do I go from Shift-JIS to a hex String in UTF-8 encoding, if the codepage is not the same as the Windows locale?
As example: I have the string 能ラ. This needs to be converted to E883BDE383A9 as per UTF-8. I have tried this but the result is different and meaningless:
String2Hex(UTF8Encode(ftext))
Unfortunately I can't just have an inputbox for WideStrings. But if I can find a way to convert the ANSI text to a WideString, the barcode module can work with Unicode Strings as well.
If it's relevant: I am using the TEC-IT TBarcode DLL.
Creating and accessing a Unicode text control
This is easier than you may think and I did so in the past with the brand new Windows 2000 when convenient components like Tnt Delphi Unicode Controls were not available. Having background knowledge on how to create a Windows GUI program without using Delphi's VCL and manually creating everything helps - otherwise this is also an introduction of it.
First add a property to your form, so we can later access the new control easily:
type
TForm1= class(TForm)
...
private
hEdit: THandle; // Our new Unicode control
end;
Now just create it at your favorite event - I chose FormCreate:
// Creating a child control, type "edit"
self.hEdit:= CreateWindowW( PWideChar(WideString('edit')), PWideChar(WideString('myinput')), WS_CHILD or WS_VISIBLE, 10, 10, 200, 25, Handle, 0, HINSTANCE, nil );
if self.hEdit= 0 then begin // Failed. Get error code so we know why it failed.
//GetLastError();
exit;
end;
// Add a sunken 3D edge (well, historically speaking)
if SetWindowLong( self.hEdit, GWL_EXSTYLE, WS_EX_CLIENTEDGE )= 0 then begin
//GetLastError();
exit;
end;
// Applying new extended style: the control's frame has changed
if not SetWindowPos( self.hEdit, 0, 0, 0, 0, 0, SWP_FRAMECHANGED or SWP_NOMOVE or SWP_NOZORDER or SWP_NOSIZE ) then begin
//GetLastError();
exit;
end;
// The system's default font is no help, let's use this form's font (hopefully Tahoma)
SendMessage( self.hEdit, WM_SETFONT, self.Font.Handle, 1 );
At some point you want to get the edit's content. Again: how is this done without Delphi's VCL but instead directly with the WinAPI? This time I used a button's Click event:
var
sText: WideString;
iLen, iError: Integer;
begin
// How many CHARACTERS to copy?
iLen:= GetWindowTextLengthW( self.hEdit );
if iLen= 0 then iError:= GetLastError() else iError:= 0; // Could be empty, could be an error
if iError<> 0 then begin
exit;
end;
Inc( iLen ); // For a potential trailing #0
SetLength( sText, iLen ); // Reserve space
if GetWindowTextW( self.hEdit, #sText[1], iLen )= 0 then begin // Copy text
//GetLastError();
exit;
end;
// Demonstrate that non-ANSI text was copied out of a non-ANSI control
MessageBoxW( Handle, PWideChar(sText), nil, 0 );
end;
There are detail issues, like not being able to reach this new control via Tab, but we're already basically re-inventing Delphi's VCL, so those are details to take care about at other times.
Converting codepages
The WinAPI deals either in codepages (Strings) or in UTF-16 LE (WideStrings). For historical reasons (UCS-2 and later) UTF-16 LE fits everything, so this is always the implied target to achieve when coming from codepages:
// Converting an ANSI charset (String) to UTF-16 LE (Widestring)
function StringToWideString( s: AnsiString; iSrcCodePage: DWord ): WideString;
var
iLenDest, iLenSrc: Integer;
begin
iLenSrc:= Length( s );
iLenDest:= MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, nil, 0 ); // How much CHARACTERS are needed?
SetLength( result, iLenDest );
if iLenDest> 0 then begin // Otherwise we get the error ERROR_INVALID_PARAMETER
if MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, PWideChar(result), iLenDest )= 0 then begin
//GetLastError();
result:= '';
end;
end;
end;
The source codepage is up to you: maybe
1252 for "Windows-1252" = ANSI Latin 1 Multilingual (Western Europe)
932 for "Shift-JIS X-0208" = IBM-PC Japan MIX (DOS/V) (DBCS) (897 + 301)
28595 for "ISO 8859-5" = Cyrillic
65001 for "UTF-8"
However, if you want to convert from one codepage to another, and both source and target shall not be UTF-16 LE, then you must go forth and back:
Convert from ANSI to WIDE
Convert from WIDE to a different ANSI
// Converting UTF-16 LE (Widestring) to an ANSI charset (String, hopefully you want 65001=UTF-8)
function WideStringToString( s: WideString; iDestCodePage: DWord= CP_UTF8 ): AnsiString;
var
iLenDest, iLenSrc: Integer;
begin
iLenSrc:= Length( s );
iLenDest:= WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, nil, 0, nil, nil );
SetLength( result, iLenDest );
if iLenDest> 0 then begin // Otherwise we get the error ERROR_INVALID_PARAMETER
if WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, PChar(result), iLenDest, nil, nil )= 0 then begin
//GetLastError();
result:= '';
end;
end;
end;
As per every Windows installation not every codepage is supported, or different codepages are supported, so conversion attempts may fail. It would be more robust to aim for a Unicode program right away, as that is what every Windows installation definitly supports (unless you still deal with Windows 95, Windows 98 or Windows ME).
Combining everything
Now you got everything you need to put it together:
you can have a Unicode text control to directly get it in UTF-16 LE
you can use an ANSI text control to then convert the input to UTF-16 LE
you can convert from UTF-16 LE (WIDE) to UTF-8 (ANSI)
Size
UTF-8 is mostly the best choice, but size wise UTF-16 may need fewer bytes in total when your target audience is Asian: in UTF-8 both 能 and ラ need 3 bytes each, but in UTF-16 both only need 2 bytes each. As per your QR barcode size is an important factor, I guess.
Likewise don't waste by turning binary data (8 bits per byte) into ASCII text (displaying 4 bits per character, but itself needing 1 byte = 8 bits again). Have a look at Base64 which encodes 6 bits into every byte. A concept that you encountered countless times in your life already, because it's used for email attachments.
In Delphi 7 (for a lot of reasons I can't convert the application in Xe) I must generate the RTF string from a Chinese string that is stored into a UTF8 field (in Firebird 2.5 table).
For example I read the field value that contain the UTF8 value of the string "史蒂芬·克拉申" string into a wiredstring and then I should convert to a string like this
'ca\'b7\'b5\'d9\'b7\'d2\f1\'b7\f0\'bf\'cb\'c0\'ad\'c9\'ea\
The value of UTF8 field for the previous Chinese string is 'å²è’‚芬·克拉申'
How can I do that ?
I have done a lot of search but I haven't find solutions.
Please give me some advice to solve this problem.
Thanks Massimo
After several days I finally found a solution inspired by this post
Getting char value in Delphi 7 and the conversion to decimal available on this site https://www.branah.com/unicode-converter
After loading the UTF8 value from the field I use UTF8Decode and then I convert to Rtf ..
var
fUnicode : widestring;
function UnicodeToRtf(const S: WideString): string;
var
I: Integer;
begin
Result := '';
for I := 1 to Length(S) do
Result := Result + '\u' + IntToStr(Word(S[I]))+'?';
end;
begin
fUnicode := Utf8Decode(DbChineseField.AsString);
fRtfString := UnicodetoRtf(fUnicode);
.....
end;
I am trying to implement a POST to a web service. I need to send a file whose type is variable (.docx, .pdf, .txt) along with a JSON formatted string.
I have manage to post files successfully with code similar to the following:
procedure DoRequest;
var
Http: TIdHTTP;
Params: TIdMultipartFormDataStream;
RequestStream, ResponseStream: TStringStream;
JRequest, JResponse: TJSONObject;
url: string;
begin
url := 'some_custom_service'
JRequest := TJSONObject.Create;
JResponse := TJSONObject.Create;
try
JRequest.AddPair('Pair1', 'Value1');
JRequest.AddPair('Pair2', 'Value2');
JRequest.AddPair('Pair3', 'Value3');
Http := TIdHTTP.Create(nil);
ResponseStream := TStringStream.Create;
RequestStream := TStringStream.Create(UTF8Encode(JRequest.ToString));
try
Params := TIdMultipartFormDataStream.Create;
Params.AddFile('File', ceFileName.Text, '').ContentTransfer := '';
Params.AddFormField('Json', 'application/json', '', RequestStream);
Http.Post(url, Params, ResponseStream);
JResponse := TJSONObject.ParseJSONValue(ResponseStream.DataString) as TJSONObject;
finally
RequestStream.Free;
ResponseStream.Free;
Params.Free;
Http.Free;
end;
finally
JRequest.Free;
JResponse.Free;
end;
end;
The problem appears when I try to send a file that contains Greek characters and spaces in the filename. Sometimes it fails and sometimes it succeeds.
After a lot of research, I notice that the POST header is encoded by Indy's TIdFormDataField class using the EncodeHeader() function. When the post fails, the encoded filename in the header is split, compared to the successful post where is not split.
For example :
Επιστολή εκπαιδευτικο.docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#$D#$A' =?UTF-8?B?eA==?=, which fails.
Επιστολή εκπαιδευτικ.docx is encoded as
=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?=, which succeeds.
Επιστολή εκπαιδευτικ .docx is encoded as
=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx, which fails.
I have tried to change the encoding of the filename, the AContentType of the AddFile() procedure, and the ContentTransfer, but none of those change the behavior, and I still get errors when the encoded filename is split.
Is this some kind of bug, or am I missing something?
My code works for every case except those I described above.
I am using Delphi XE3 with Indy10.
EncodeHeader() does have some known issues with Unicode strings:
EncodeHeader() needs to take codeunits into account when splitting data between adjacent encoded-words
Basically, an MIME-encoded word cannot be more than 75 characters in length, so long text gets split up. But when encoding a long Unicode string, any given Unicode character may be charset-encoded using 1 or more bytes, and EncodeHeader() does not yet avoid erroneously splitting a multi-byte character between two individual bytes into separate encoded words (which is illegal and explicitly forbidden by RFC 2047 of the MIME spec).
However, that is not what is happening in your examples.
In your first example, 'Επιστολή εκπαιδευτικο.docx' is too long to be encoded as a single MIME word, so it gets split into 'Επιστολή εκπαιδευτικο.doc' 'x' substrings, which are then encoded separately. This is legal in MIME for long text (though you might have expected Indy to split the text into 'Επιστολή' ' εκπαιδευτικο.doc' instead, or even 'Επιστολή' ' εκπαιδευτικο' '.doc'. That might be a possibility in a future release). Adjacent MIME words that are separated by only whitespace are meant to be concatenated together without separating whitespace when decoded, thus producing 'Επιστολή εκπαιδευτικο.docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικο.doc x' instead?).
In your second example, 'Επιστολή εκπαιδευτικ.docx' is short enough to be encoded as a single MIME word.
In your third example, 'Επιστολή εκπαιδευτικ .docx' gets split on the second whitespace (not the first) into 'Επιστολή εκπαιδευτικ' ' .docx' substrings, and only the first substring needs to be encoded. This is legal in MIME. When decoded, the decoded text is meant to be concatenated with the following unencoded text, preserving whitespace between them, thus producing 'Επιστολή εκπαιδευτικ .docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικ.docx' instead?).
If you run these example filenames through Indy's MIME header encoder/decoder, they do decode properly:
var
s: String;
begin
s := EncodeHeader('Επιστολή εκπαιδευτικο.docx', '', 'B', 'UTF-8');
ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#13#10' =?UTF-8?B?eA==?='
s := DecodeHeader(s);
ShowMessage(s); // 'Επιστολή εκπαιδευτικο.docx'
s := EncodeHeader('Επιστολή εκπαιδευτικ.docx', '', 'B', 'UTF-8');
ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?='
s := DecodeHeader(s);
ShowMessage(s); // 'Επιστολή εκπαιδευτικ.docx'
s := EncodeHeader('Επιστολή εκπαιδευτικ .docx', '', 'B', 'UTF-8');
ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx'
s := DecodeHeader(s);
ShowMessage(s); // 'Επιστολή εκπαιδευτικ .docx'
end;
So the problem seems to be on the server side decoding, not on Indy's client side encoding.
That being said, if you are using a fairly recent version of Indy 10 (Nov 2011 or later), TIdFormDataField has a HeaderEncoding property, which defaults to 'B' (base64) in Unicode environments. However, the splitting logic also affects 'Q' (quoted-printable) as well, so that may or may not work for you, either (but you can try it):
with Params.AddFile('File', ceFileName.Text, '') do
begin
ContentTransfer := '';
HeaderEncoding := 'Q'; // <--- here
HeaderCharSet := 'utf-8';
end;
Otherwise, a workaround might be to change the value to '8' (8-bit) instead, which effectively disables MIME encoding (but not charset encoding):
with Params.AddFile('File', ceFileName.Text, '') do
begin
ContentTransfer := '';
HeaderEncoding := '8'; // <--- here
HeaderCharSet := 'utf-8';
end;
Just note that if the server is not expecting raw UTF-8 bytes for the filename, you might still run into problems (ie, 'Επιστολή εκπαιδευτικο.docx' being interpreted as 'Επιστολή εκπαιδευτικο.docx', for instance).
I need to make some simple program ,but I don't know to start with.
For example ,I got symbol row - 1m213p03a - and this row need to convert to ANSI code ,but only these letter "m", "p" ,"a". In result need to got this - 1109213112397
I need to make this with forms ,and this symbol row need to write user ,who use this program.
Can anyone help me?
I can give you head start with conversion algorithm. It should work in all Delphi versions. Algorithm is searching through input string characters, if character is number then it is written in result string as-is, otherwise it is converted to decimal ANSI representation of underlying character.
function Convert(const input: string): string;
var
i: integer;
begin
result := '';
for i := 1 to Length(input) do
if input[i] in ['0' .. '9'] then result := result + input[i]
else result := result + IntToStr(Ord(input[i]));
end;
var
s: string;
s := Convert('1m213p03a');
I'm generating texture atlases for rendering Unicode texts in my app. Source texts are stored in ANSI codepages (1250, 1251, 1254, 1257, etc). I want to be able to generate all the symbols from each ANSI codepage.
Here is the outline of the code I would expect to have:
for I := 0 to 255 do
begin
anChar := AnsiChar(I); //obtain AnsiChar
//Apply codepage without converting the chars
//<<--- this part does not work, showing:
//"E2033 Types of actual and formal var parameters must be identical"
SetCodePage(anChar, aCodepages[K], False);
//Assign AnsiChar to UnicodeChar (automatic conversion)
uniChar := anChar;
//Here we get Unicode character index
uniCode := Ord(uniChar);
end;
The code above does not works (E2033) and I'm not sure it is a proper solution at all. Perhaps there's much shorter version.
What is the proper way of converting AnsiChar into Unicode with specific codepage in mind?
I would do it like this:
function AnsiCharToWideChar(ac: AnsiChar; CodePage: UINT): WideChar;
begin
if MultiByteToWideChar(CodePage, 0, #ac, 1, #Result, 1) <> 1 then
RaiseLastOSError;
end;
I think you should avoid using strings for what is in essence a character operation. If you know up front which code pages you need to support then you can hard code the conversions into a lookup table expressed as an array constant.
Note that all the characters that are defined in the ANSI code pages map to Unicode characters from the Basic Multilingual Plane and so are represented by a single UTF-16 character. Hence the size assumptions of the code above.
However, the assumption that you are making, and that this answer persists, is that a single byte represents a character in an ANSI character set. That's a valid assumption for many character sets, for example the single byte western character sets like 1252. But there are character sets like 932 (Japanese), 949 (Koren) etc. that are double byte character sets. Your entire approach breaks down for those code pages. My guess is that only wish to support single byte character sets.
If you are writing cross-platform code then you can replace MultiByteToWideChar with UnicodeFromLocaleChars.
You can also do it in one step for all characters. Here is an example for codepage 1250:
var
encoding: TEncoding;
bytes: TBytes;
unicode: TArray<Word>;
I: Integer;
S: string;
begin
SetLength(bytes, 256);
for I := 0 to 255 do
bytes[I] := I;
SetLength(unicode, 256);
encoding := TEncoding.GetEncoding(1250); // change codepage as needed
try
S := encoding.GetString(bytes);
for I := 0 to 255 do
unicode[I] := Word(S[I+1]); // as long as strings are 1-based
finally
encoding.Free;
end;
end;
Here is the code I have found to be working well:
var
I: Byte;
anChar: AnsiString;
Tmp: RawByteString;
uniChar: Char;
uniCode: Word;
begin
for I := 0 to 255 do
begin
anChar := AnsiChar(I);
Tmp := anChar;
SetCodePage(Tmp, aCodepages[K], False);
uniChar := UnicodeString(Tmp)[1];
uniCode := Word(uniChar);
<...snip...>
end;