Generate RTF Text from Chinese string stored in UTF8 database field - delphi

In Delphi 7 (for a lot of reasons I can't convert the application in Xe) I must generate the RTF string from a Chinese string that is stored into a UTF8 field (in Firebird 2.5 table).
For example I read the field value that contain the UTF8 value of the string "史蒂芬·克拉申" string into a wiredstring and then I should convert to a string like this
'ca\'b7\'b5\'d9\'b7\'d2\f1\'b7\f0\'bf\'cb\'c0\'ad\'c9\'ea\
The value of UTF8 field for the previous Chinese string is 'å²è’‚芬·克拉申'
How can I do that ?
I have done a lot of search but I haven't find solutions.
Please give me some advice to solve this problem.
Thanks Massimo

After several days I finally found a solution inspired by this post
Getting char value in Delphi 7 and the conversion to decimal available on this site https://www.branah.com/unicode-converter
After loading the UTF8 value from the field I use UTF8Decode and then I convert to Rtf ..
var
fUnicode : widestring;
function UnicodeToRtf(const S: WideString): string;
var
I: Integer;
begin
Result := '';
for I := 1 to Length(S) do
Result := Result + '\u' + IntToStr(Word(S[I]))+'?';
end;
begin
fUnicode := Utf8Decode(DbChineseField.AsString);
fRtfString := UnicodetoRtf(fUnicode);
.....
end;

Related

delphi symbol row to ansi code

I need to make some simple program ,but I don't know to start with.
For example ,I got symbol row - 1m213p03a - and this row need to convert to ANSI code ,but only these letter "m", "p" ,"a". In result need to got this - 1109213112397
I need to make this with forms ,and this symbol row need to write user ,who use this program.
Can anyone help me?
I can give you head start with conversion algorithm. It should work in all Delphi versions. Algorithm is searching through input string characters, if character is number then it is written in result string as-is, otherwise it is converted to decimal ANSI representation of underlying character.
function Convert(const input: string): string;
var
i: integer;
begin
result := '';
for i := 1 to Length(input) do
if input[i] in ['0' .. '9'] then result := result + input[i]
else result := result + IntToStr(Ord(input[i]));
end;
var
s: string;
s := Convert('1m213p03a');

Machine dependent results for OLE check of MSWord version

With this code to retrieve the version of the installed MS Word:
uses uses oleauto;
[...]
function TForm2.GetWordVersion:string;
const
wdDoNotSaveChanges = 0;
var
WordApp: OLEVariant;
WordVersion: variant;
begin
Try
WordApp := CreateOLEObject('Word.Application');
WordVersion := WordApp.version;
WordApp.Quit(wdDoNotSaveChanges);
except
on E: Exception do
begin
WordVersion := -1;
end;
End;
Result := wordversion;
end;
I get 140 on my machine, my colleague gets 14. Both are win7/Word2010 but I am in Italy he is in India.
Anyone knows about this?
Why different values?
Thanks
I'm guessing this is a decimal separator issue. Word returns the string '14.0' and then when you convert to integer the period is treated as a positional separator on one machine, and a decimal separator on another.
The solution is to stop converting to integer which I infer that you are doing in code that you have not shown.
I am inferring that from this comment:
I can convert it to string and use the first 2 chars.
Since the code in the question operates on strings, I conclude that other code, not shown in the question, is converting to integer.

unicode text file output differs between XE2 and Delphi 2009?

When I try the code below there seem to be different output in XE2 compared to D2009.
procedure TForm1.Button1Click(Sender: TObject);
var Outfile:textfile;
myByte: Byte;
begin
assignfile(Outfile,'test_chinese.txt');
Rewrite(Outfile);
for myByte in TEncoding.UTF8.GetPreamble do write(Outfile, AnsiChar(myByte));
//This is the UTF-8 BOM
Writeln(Outfile,utf8string('总结'));
Writeln(Outfile,'°C');
Closefile(Outfile);
end;
Compiling with XE2 on a Windows 8 PC gives in WordPad
??
C
txt hex code: EF BB BF 3F 3F 0D 0A B0 43 0D 0A
Compiling with D2009 on a Windows XP PC gives in Wordpad
总结
°C
txt hex code: EF BB BF E6 80 BB E7 BB 93 0D 0A B0 43 0D 0A
My questions is why it differs and how can I save Chinese characters to a text file using the old text file I/O?
Thanks!
In XE2 onwards, AssignFile() has an optional CodePage parameter that sets the codepage of the output file:
function AssignFile(var F: File; FileName: String; [CodePage: Word]): Integer; overload;
Write() and Writeln() both have overloads that support UnicodeString and WideChar inputs.
So, you can create a file that has its codepage set to CP_UTF8, and then Write/ln() will automatically convert Unicode strings to UTF-8 when writing them to the file.
The downside is that you will not be able to write the UTF-8 BOM using AnsiChar values anymore, because the individual bytes will get converted to UTF-8 and thus not be written correctly. You can get around that by writing the BOM as a single Unicode character (which it what it really is - U+FEFF) instead of as individual bytes.
This works in XE2:
procedure TForm1.Button1Click(Sender: TObject);
var
Outfile: TextFile;
begin
AssignFile(Outfile, 'test_chinese.txt', CP_UTF8);
Rewrite(Outfile);
//This is the UTF-8 BOM
Write(Outfile, #$FEFF);
Writeln(Outfile, '总结');
Writeln(Outfile, '°C');
CloseFile(Outfile);
end;
With that said, if you want something that is more compatible and reliable between D2009 and XE2, use TStreamWriter instead:
procedure TForm1.Button1Click(Sender: TObject);
var
Outfile: TStreamWriter;
begin
Outfile := TStreamWriter.Create('test_chinese.txt', False, TEncoding.UTF8);
try
Outfile.WriteLine('总结');
Outfile.WriteLine('°C');
finally
Outfile.Free;
end;
end;
Or do the file I/O manually:
procedure TForm1.Button1Click(Sender: TObject);
var
Outfile: TFileStream;
BOM: TBytes;
procedure WriteBytes(const B: TBytes);
begin
if B <> '' then Outfile.WriteBuffer(B[0], Length(B));
end;
procedure WriteStr(const S: UTF8String);
begin
if S <> '' then Outfile.WriteBuffer(S[1], Length(S));
end;
procedure WriteLine(const S: UTF8String);
begin
WriteStr(S);
WriteStr(sLineBreak);
end;
begin
Outfile := TFileStream.Create('test_chinese.txt', fmCreate);
try
WriteBytes(TEncoding.UTF8.GetPreamble);
WriteLine('总结');
WriteLine('°C');
finally
Outfile.Free;
end;
end;
You really shouldn't use the old text I/O anymore.
Anyway, you can use TEncoding to get the UTF-8 TBytes like this:
procedure TForm1.Button1Click(Sender: TObject);
var Outfile:textfile;
Bytes: TBytes;
myByte: Byte;
begin
assignfile(Outfile,'test_chinese.txt');
Rewrite(Outfile);
for myByte in TEncoding.UTF8.GetPreamble do write(Outfile, AnsiChar(myByte));
//This is the UTF-8 BOM
Bytes := TEncoding.UTF8.GetBytes('总结');
for myByte in Bytes do begin
Write(Outfile, AnsiChar(myByte));
end;
Writeln(Outfile,'°C');
Closefile(Outfile);
end;
I'm not sure if there is an easier way to write TBytes to a Textfile, maybe somebody else has a better idea.
Edit:
For a pure binary file (File instead of TextFile type) use can use BlockWrite.
There are a couple of tell-tale signs that may tell you what whent wrong when dealing with Unicode. In your case you're seeing "?" in the resulting output file: You get question marks when you try to convert some thing from Unicode to a Code Page and the target Code Page can't represent the requested characters.
Looking at the hex dump it's obvious (counting line terminators) that the question marks are the result of saving the two Chinese characters to the file. The two chars got converted to exactly two question marks. This tells you the Writeln() decided to give you helping and converted the text from UTF8 (a unicode representation) to your local code page. The Delphi team probably decided to do this since the old I/O routines are not supposed to be UNICODE compatible; since you're writing an UTF8 string using the old I/O routines, they're helping you by converting this to your Code Page. You might not welcome that helping hand, but it doesn't mean it was wrong to do so: it's undocumented territory.
Since you now know why that's happening you know what to do to stop it. Let WriteLn() know you're sending something that doesn't need converting. You'll discover that's not particularly easy, since Delphi XE2 apparently "helps you out" whatever you. For example, stuff like this doesn't just change the string type, it converts to AnsiString, going through the code-page conversion routine that gets you question marks:
AnsiString(UTF8String('Whatever Unicode'));
Because of this, and if you need one-liner solutions, you could try a conversion routine, something like this:
function FakeConvert(const InStr: UTF8String): AnsiString;
var N: Integer;
begin
N := Length(InStr);
SetLength(Result, N);
Move(InStr[1], Result[1], N);
end;
You'll then be able to do:
Writeln(Outfile,FakeConvert('总结'));
And it'll do what you expect (I did actually try it before posting!)
Of course the only TRUE answer to this question is, since you upgraded all the way to Delphi XE2:
Stop using deprecated I/O routines, move to TStream based

Delphi: CDO.Message encoding problems

We wrote a Delphi program that send some informations with CDO.
In my Win7 machine (hungarian) the accents are working fine.
So if I sent a mail with "ÁÉÍÓÖŐÚÜŰ", I got it in this format.
I used iso-8859-2 encoding in the body, and this encode the subject, and the email addresses to (the sender address is contains name).
I thought that I finished with this.
But when I try to send a mail from a Win2k3 english machine (the mailing server is same!), the result is truncate some accents:
Ű = U
Ő = O
Next I tried to use UTF-8 encoding here.
This provided accents - but wrong accents.
The mail contains accents with ^ signs.
ê <> é
This is not valid hungarian letter... :-(
So I want to know, how to I convert or setup the input to I got good result.
I tried to log the body to see is changes...
Log(SBody);
Msg.Body := SBody;
Log(Msg.Body);
... or not.
But these logs are providing good result, the input is good.
So it is possible lost and misconverted on CDO generate the message.
May I can help the CDO if I can encode the ANSI text into real UTF.
But in Delphi converter functions don't have "CodePage" parameters.
In Python I can said:
s.encode('iso-8859-2')
or
s.decode('iso-8859-2')
But in Delphi I don't see this parameter.
Is anybody knows, how to preserve the accents, how to convert the accented hungarian strings to preserve them accented format?
And I want to know, can I check the result without sending the mail?
Thanks for your help:
dd
a quick google search with "delphi string codepage" got me to torry's delphi pages
and maybe the following codesnippets (found here) can shed some light on your problem:
{:Converts Unicode string to Ansi string using specified code page.
#param ws Unicode string.
#param codePage Code page to be used in conversion.
#returns Converted ansi string.
}
function WideStringToString(const ws: WideString; codePage: Word): AnsiString;
var
l: integer;
begin
if ws = ' then
Result := '
else
begin
l := WideCharToMultiByte(codePage,
WC_COMPOSITECHECK or WC_DISCARDNS or WC_SEPCHARS or WC_DEFAULTCHAR,
#ws[1], - 1, nil, 0, nil, nil);
SetLength(Result, l - 1);
if l > 1 then
WideCharToMultiByte(codePage,
WC_COMPOSITECHECK or WC_DISCARDNS or WC_SEPCHARS or WC_DEFAULTCHAR,
#ws[1], - 1, #Result[1], l - 1, nil, nil);
end;
end; { WideStringToString }
{:Converts Ansi string to Unicode string using specified code page.
#param s Ansi string.
#param codePage Code page to be used in conversion.
#returns Converted wide string.
}
function StringToWideString(const s: AnsiString; codePage: Word): WideString;
var
l: integer;
begin
if s = ' then
Result := '
else
begin
l := MultiByteToWideChar(codePage, MB_PRECOMPOSED, PChar(#s[1]), - 1, nil, 0);
SetLength(Result, l - 1);
if l > 1 then
MultiByteToWideChar(CodePage, MB_PRECOMPOSED, PChar(#s[1]),
- 1, PWideChar(#Result[1]), l - 1);
end;
end; { StringToWideString }
--reinhard

Help with sending number to Excel 2007 from Delphi 2010 as a string

I'm sending a number to Excel 2007 as a string (Cell.Value := '2,5') using late binding. The actual code is more like:
var CellVal: OLEVariant;
...
CellVal := FloatToStr(2.5); // Regionally formatted.
Cell.Value := CellVal;
On my Excel 97 version, this value will be formatted as "General" by default and will be seen as a number. A customer with Excel 2007 ends up with the cell formatted as "Standard" and Excel appears to see it as a string (it's not right aligned.) Note that I am using the regional settings to format the number and that Excel appears to be using the default regional settings as well.
If the customer just types 2,5 into a cell it accepts it as a number and if he does a copy of the string '2,5' from the clipboard into a cell, it also gets accepted as a number. Does anyone know why the string value sent though the automation interface to Excel ends up as a non-number?
Thanks for any suggestions! Edited to specify the regional decimal separator for the customer is ','.
Since you cannot format comments:
I just did a little test and Excel doesn't want a regional formatted float value as string, it just want a dot as decimal separator.
procedure TForm1.Button1Click(Sender: TObject);
var
App: Variant;
Workbook: Variant;
Worksheet: Variant;
DoubleValue: Double;
begin
App := CreateOleObject('Excel.Application');
Workbook := App.Workbooks.Add;
Worksheet := Workbook.ActiveSheet;
DoubleValue := 1.2;
Worksheet.Range['A1'].Value := DoubleValue; //DoubleValue is a double, excel recognizes a double
Worksheet.Range['A2'].Value := '1.2'; //excel recognizes a double
Worksheet.Range['A3'].Value := '1,2'; //excel recognizes a string
Worksheet.Range['A4'].Value := FloatToStr(1.2); //excel recognizes a string
App.Visible := True;
end;
Keep in mind that I hava a comma as decimal separator.
Probably because you give it a string. Have you tried passing it the float value directly?
Can't explain why the behaviour is different but it would appear to be down to how Excel 2007 interprets the incoming value.
How about setting the format of the cell in code?
Worksheets("Sheet1").Range("A17").NumberFormat = "General"

Resources