Transform Western codepage to Windows 1251 - delphi

I try to load Cyrillic web page (default codepage Western) and put it to TMemo component.
But I see "Âûñòàâêè" instead of "Выставки" in Memo.
How to transform string from Western to Windows 1251 codepage?
Delphi XE 8 sp1

TMemo (and most of the RTL/VCL/FMX in general) in XE8 expects UnicodeString data in UTF-16 format. You would have to decode the webpage data from its actual charset (which is presumably already Windows-1251, as it does not make sense for Russian text to be encoded in Windows-1252) to UTF-16 before then loading it into the TMemo. The actual charset used for the raw data needs to be reported in the HTTP Content-Type header, or in the HTML itself.
You would not decode the raw data to Windows 1251. That would have been necessary only if you were using a pre-Unicode version of Delphi (2007 and earlier) running your app on a Windows Russian machine that uses Windows-1251 as its default codepage. Those days are gone in a Unicode environment like XE8.
Delphi ships with Indy pre-installed. Indy's TIdHTTP component handles the charset-to-UTF16 decoding for you, eg:
Memo1.Text := IdHTTP1.Get(URL);
If you download the webpage data any other way, you would have to download it as raw bytes and decode them yourself, such as by using TEncoding.GetEncoding(1251) followed by TEncoding.GetString(). Or, if the bytes are in a TStream, you can use Memo1.Lines.LoadFromStream() specifying TEncoding.GetEncoding(1251) as the encoding.

type
TSrcStr = type AnsiString(1251);
TDstStr = type AnsiString(1252);
function Decode(const s: string): string;
var
a: TSrcStr;
b: TDstStr;
begin
setlength(a, length(s));
b := s;
move(b[low(b)], a[low(a)], length(b)*sizeof(b[low(b)]));
result := a;
end;
procedure Test;
var
s: string;
begin
s := 'Âûñòàâêè';
s := Decode(s);
Assert(s='Выставки');
end;

Related

Send bytes using Indy 10

I'm trying to send bytes between two delphi 2010 applications using Indy 10, but without success. Received bytes are different from sent bytes. This is my example code:
Application 1, send button click:
var s:TIdBytes;
begin
setlength(s,3);
s[0]:=48;
s[1]:=227;
s[2]:=0;
IdTCPClient.IOHandler.WriteDirect(s); // or .Write(s);
end;
Application 2, idtcpserver execute event (first attempt):
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var rec:TIdBytes;
b:byte;
begin
SetLength(rec,0);
b:=AContext.Connection.IOHandler.ReadByte;
while b<>0 do
begin
setlength(rec, length(rec)+1);
rec[length(rec)-1]:=b;
b:=AContext.Connection.IOHandler.ReadByte;
end;
// Here I expect rec[0] = 48, rec[1]=227, rec[2]=0.
// But I get: rec[0] = 48, rec[1]=63, rec[2]=0. Rec[1] is incorrect
// ... rest of code
end;
Application 2, idtcpserver execute event (second attempt):
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var c:ansistring;
begin
c:= AContext.Connection.IOHandler.ReadLn(Char(0));
// Here I expect c[0] = 48, c[1]=227, c[2]=0.
// But I get: c[0] = 48, c[1]=63, c[2]=0. c[1] is incorrect
// ... rest of code
end;
The most strange thing is those applications were developed with Delphi 5 some years ago and they worked good (with readln(char(0)). When I translate both applications to Delphi 2010 they stop working. I supoused It was due unicode string, but I haven't found a solution.
First off, don't use TIdIOHandler.WriteDirect() directly, use only the TIdIOHandler.Write() overloads instead.
Second, there is no possible way that your 1st attempt code can produce the result you are claiming. TIdIOHandler.Write[Direct](TIdBytes) and TIdIOHandler.ReadByte() operate on raw bytes only, and bytes are transmitted as-is. So you are guaranteed to get exactly what you send, there is simply no possible way the byte values could change as you have suggested.
However, your 2nd attempt code certainly can produce the result you are claiming, because TIdIOHandler.ReadLn() will read in the bytes and convert them to a UnicodeString, which you are then assigning to an AnsiString, which means the received data is going through 2 lossy charset conversions:
first, the received bytes are decoded as-is into a UTF-16 UnicodeString using the TIdIOHandler.DefStringEncoding property as the decoding charset, which is set to IndyTextEncoding_ASCII by default (you can change that, for example to IndyTextEncoding_8bit). Byte 227 is outside of the ASCII range, so that byte gets decoded to Unicode character '?' (#63), which is an indication that data loss has occurred.
then, the UnicodeString is assigned to AnsiString, which converts the UTF-16 data to ANSI using the RTL's default charset (which is set to the user's OS locale by default, but can be changed using the System.SetMultiByteConversionCodePage() function). In this case, no more loss occurs, since the UnicodeString is holding characters in the US-ASCII range, which convert to ANSI as-is.
That being said, I would not suggest using TIdIOHandler.ReadByte() in a manual loop like you are doing. Although TIdIOHandler does not have a WaitFor() method for bytes, there is nonetheless a more efficient way to approach this, eg:
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var
rec: TIdBytes;
LPos: Integer;
begin
SetLength(rec, 0);
// adapted from code in TIdIOHandler.WaitFor()...
with AContext.Connection.IOHandler do
begin
LPos := 0;
repeat
LPos := InputBuffer.IndexOf(Byte(0), LPos);
if LPos <> -1 then begin
InputBuffer.ExtractToBytes(rec, LPos+1);
{ or, if you don't want the terminating $00 byte put in the TIdBytes:
InputBuffer.ExtractToBytes(rec, LPos);
InputBuffer.Remove(1);
}
Break;
end;
LPos := InputBuffer.Size;
ReadFromSource(True, IdTimeoutDefault, True);
until False;
end;
// ... rest of code
end;
Or:
procedure TForm1.IdTCPServerExecute(AContext: TIdContext);
var
rec: string
begin
rec := AContext.Connection.IOHandler.ReadLn(#0, IndyTextEncoding_8Bit);
// ... rest of code
end;
The most strange thing is those applications were developed with Delphi 5 some years ago and they worked good (with readln(char(0)). When I translate both applications to Delphi 2010 they stop working. I supoused It was due unicode string, but I haven't found a solution.
Yes, the string type in Delphi 5 was AnsiString, but it was changed to UnicodeString in Delphi 2009.
But even so, in pre-Unicode versions of Delphi, TIdIOHandler.ReadLn() in Indy 10 will still read in raw bytes and convert them to Unicode using the TIdIOHanlder.DefStringEncoding. It will then convert that Unicode data to AnsiString before exiting, using the TIdIOHandler.DefAnsiEncoding property as the converting charset, which is set to IndyTextEncoding_OSDefault by default.
The morale of the story is - whenever you are reading in bytes and converting them to string characters, the bytes must be decoded using the correct charset, or else you will get data loss. This was true in Delphi 5 (but not as strictly enforced), it is true more so in Delphi 2009+.

How to convert ANSI string filename to windows filename [duplicate]

This line:
TFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
drops an exception if the filename contain something like ñ
You are, ultimately calling CreateFileA, the ANSI API, and the characters you use have no ANSI encoding. The only way to get beyond this is to open the file with CreateFileW, the Unicode API.
You might not realise that you call CreateFileA, but that's how the Delphi 7 file stream is implemented.
One easy way to solve your problems is to upgrade to the latest Delphi which has good support for the native Windows Unicode API.
If you are stuck with ANSI Delphi then you still need to call CreateFileW. You can do this to create a file handle. You'll need to pass a UTF-16 string to that API. Use WideString to store it. You'll also need to get the filename from the user in UTF-16 form. Which means a call to GetOpenFileNameW or IFileDialog. Create a stream by passing the file handle to THandleStream.
To make all this possible you would use the TNT Unicode libraries. They work well but will impose a big port on you.
Frankly, the right way forward is to use modern tools that support Unicode.
You can use the TntUnicode units to have UTF8 support under Delphi 7.
Add TntClasses to your Uses and make the call like this:
TTntFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
Make sure that fileName is widestring.
Here you can get a copy of TntUnicode:
https://github.com/rofl0r/TntUnicode
UTF16 can be thought of as a codepage, just like all of the possible ANSI codepages.
As Remy mentions in his comment, assuming your ANSI codepage supports the required characters in your Unicode string you simply have to convert that Unicode version of that string to the equivalent ANSI codepage version.
The Delphi compiler can take care of a simple conversion for you automatically, which you use simply by casting a WIDEString (UTF16) to an (ANSI)String:
const
WIDE_FILENAME : WIDEString = 'fuññy.txt';
var
sFilename: String;
strm: TFileStream;
begin
sFilename := String(WIDE_FILENAME);
strm := TFileStream.Create(sFilename, fmOpenRead);
// etc
end;
This works perfectly well even on (e.g.) Delphi 7. The only caveat is that the codepage involved (the system default) must support the extended characters in the Unicode string.
NOTE: The above code uses the String type rather than ANSIString explicitly. On Delphi versions where String is ANSIString, this has the required effect but also is portable to versions where String is UnicodeString (should you upgrade your version later).
If you use ANSIString explicitly in this case, the result will be a double conversion if/when you upgrade:
// Unicode compiler using ANSIString type....
var
sFilename: ANSIString;
begin
sFilename := ANSIString(WIDE_FILENAME); // Codepage conversion from UTF16 to ANSI
strm := TFileStream.Create(sFilename, fmOpenRead); // Will implicitly convert *back* from ANSI to WIDE
versus
// Unicode compiler using String type....
var
sFilename: String;
begin
sFilename := String(WIDE_FILENAME); // String type conversion from WideString to UnicodeString
strm := TFileStream.Create(sFilename, fmOpenRead); // No further conversion necessary
Best solution is to go Unicode, but if that is not an option, you can still solve the problem.
In Windows you can set what codepage to use for non-Unicode programs. Just change it to support the correct language (Spanish?). Then the code should work.
Windows 7: Control Panel > Region and Language > Administrative > Language for non-Unicode programs
Windows XP: Control Panel > Regional and Language > Advanced > Language for non-Unicode programs

Insert an emoji inside a string in Delphi 2007 [duplicate]

This question already has answers here:
Handling a Unicode String in Delphi Versions <= 2007
(5 answers)
Closed 5 years ago.
I'm trying to do exactly what the title say, insert an emoji into a string in Delphi 2007, just like the example below :
procedure TForm1.Button1Click(Sender: TObject);
var s : string;
begin
s := 'This is my original string (y)';
s := ansireplacestr(s,'(y)','👍');
showmessage(s);
end;
I can even paste the emoji into IDE's code, but in runtime showmessage results in this :
This is my original string ????
Is there a way to achieve this task in Delphi 2007 ? Due to several reasons i can't upgrade Delphi right now.
Someone said my question is solved on this topic :
Handling a Unicode String in Delphi Versions <= 2007
But this topic just says to use third-party components, without telling exactly how to do it.
EDIT : After suggested, i tried to use the functions pos, delete and insert and a widestring var :
function addEmoji(mystring : widestring) : widestring;
var r, aux : widestring;
p : integer;
begin
r := mystring;
while pos('(y)',r) > 0 do
begin
aux := r;
p := pos('(y)',aux);
Insert('👍',aux,p);
delete(aux,pos('(y)',aux),3);
r := aux;
end;
result := r;
end;
But the result is the '(y)' replaced by '????'.
In Delphi 2007, the default string type is AnsiString. Emojis require Unicode handling, as they use high Unicode codepoints that simply do not fit/exist in most commonly used Ansi encodings. So you need to use a Unicode UTF encoding instead (UTF-7, -8, -16, or -32).
You can use AnsiString for UTF-71, or UTF8String2 for UTF-8, or WideString for UTF-16, or UCS4String3 for UTF-32.
1: UTF-7 is a 7-bit ASCII compatible encoding.
2: UTF8String does exist in Delphi 2007 (it was introduced in Delphi 6), but it is not a true UTF-8 string type, it is just an alias for AnsiString with the expectation that it always holds UTF-8 encoded data. You have to use UTF8Encode() and UTF8Decode() to ensure proper conversions to other encodings via UTF-16. UTF8String did not become a true UTF-8 string type until Delphi 2009 (UTF8Encode() and UTF8Decode() were also deprecated).
3: UCS4String also exists since Delphi 6, but it is not a true string type at all (even in modern Delphi versions). It is just an alias for array of UCS4Char.
The RTL doesn't have any native support for UTF-7 (but it is not hard to implement manually), and very little support for UTF-32 (only to facilitate conversions between UTF-16 <-> UTF-32), so you should stick with UTF-8 or UTF-16 in your code.
You are going to lose Emoji data if you convert UTF data to Ansi, such as if you pass a WideString to ShowMessage(). You can pass a WideString to the Win32 API MessageBoxW() function instead, and you won't have any data loss, however the Emoji may or may not appear correctly depending on the font used by the dialog (but it won't appear as ??, at least).
However, the native RTL in Delphi 2007 simply does not support what you are attempting, at least not for UTF-16. You would have to find a 3rd party WideString-based function, or just write your own using the RTL's Pos(), Delete(), and Insert() intrinsic functions, which are overloaded for WideString data, eg:
function WideReplaceStr(const S, FromText, ToText: WideString): WideString;
var
I: Integer;
begin
Result := S;
repeat
I := Pos(FromText, Result);
if I = 0 then Break;
Delete(Result, I, Length(FromText));
Insert(ToText, Result, I);
until False;
end;
var
s : WideString;
begin
s := 'This is my original string (y)';
s := WideReplaceStr(s, '(y)', '👍');
MessageBoxW(0, PWideChar(s), '', MB_OK);
end;
However, using UTF-8, you can accomplish the same thing using the native RTL, but you still can't use ShowMessage() (well, you could, but it won't show non-ASCII characters correctly):
var
s : UTF8String;
begin
s := UTF8Encode('This is my original string (y)');
s := AnsiReplaceStr(s, '(y)', UTF8Encode('👍'));
MessageBoxW(0, PWideChar(UTF8Decode(s)), '', MB_OK);
end;
Either way, make sure your code editor is set to save the .pas file in UTF-8, otherwise you can't use the literal '👍', you would have to use something more like this instead:
var
Emoji: WideString;
SetLength(Emoji, 2);
Emoji[1] := WideChar($D83D);
Emoji[2] := WideChar($DC4D);
Then you can do this:
var s: WideString;
...
s := WideReplaceStr(s, '(y)', Emoji);
Or:
var s: UTF8String;
...
s := AnsiReplaceStr(s, '(y)', UTF8Encode(Emoji));

TIdUDPClient broadcasts wrong data

I am sending and receiving messages from an electronic board through UDP using Delphi 6 and Indy 8. But since updating to Delphi XE4, the TIdUDPClient component sends wrong data packets. I think the problem is with the Send() function only sends in Unicode. Is it possible to send AnsiString through TIdUDPClient.Send()?
Here is the code I am using:
idudpclient1.Send(#$7e#$b8#$c7#$81#$10#$8d#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$9d#$02#$0d);
You are sending binary data as a String. In XE4, Strings are Unicode, and Indy's default encoding is ASCII. Your String data contains characters that are outside of the ASCII range.
Don't use String for binary data. That is not what it is meant for. You can get away with that in Delphi 2007 an earlier, but not in Delphi 2009 and later.
You can either:
continue using Send(), but tell it to use Indy's 8bit encoding instead of Indy's default encoding:
IdUDPClient1.Send(#$7e#$b8#$c7#$81#$10#$8d#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$00#$9d#$02#$0d, Indy8BitEncoding);
switch to SendBuffer() instead (which you should do, even in your Indy 8 code):
var
Buf: TIdBytes;
begin
SetLength(Buf, 34);
FillBytes(Buf, 34, $00);
Buf[0] := $7e;
Buf[1] := $b8;
Buf[2] := $c7;
Buf[3] := $81;
Buf[4] := $10;
Buf[5] := $8d;
Buf[31] := $9d;
Buf[32] := $02;
Buf[33] := $0d;
IdUDPClient1.Send(Buf);
end;

How to read/write local characters from/to MSWord 2003 using Delphi 7?

I have ListView on my form containing names and numbers and I have to provide printing MSWord document with those data filled into document's tables. Everything works fine with english characters but when I try to send some eastern European or Russian characters it is visible in document as "?" or some "trash". Also I can't read those characters from document back to application.
My questions are:
How to send characters like "ЉЊĐŠŽČ" to Word document?
How to read these characters from MSWord back to application?
In short, code looks like this:
word := CreateOleObject('Word.Application');
word.Visible := true;
doc := word.documents.Open(ExtractFilePath(Application.ExeName) + '\tpl.doc');
table := word.ActiveDocument.Tables.Item(1);
table.Cell(1,2).Range.Text := 'MY TEXT';
word.ActiveDocument.Close;
word.Quit;
word := UnAssigned;
doc := UnAssigned;
table := UnAssigned;
I can change font's name, size and color properties but can't do that with charset property.
Anybody?
Software installed:
Windows XP Professional
Microsoft Word 2003
Delphi 7 Enterprise Edition
The issue comes from the fact that you're calling Word via OLE Automation using late binding.
So Range.Text is not known as a method expecting a WideString (Unicode) content, but plain ASCII text, under Delphi 7.
First solution could be to use Delphi 2009 and later. The new string type made such Unicode assignment transparent.
Under Delphi 7, what about forcing the type cast to WideString:
table.Cell(1,2).Range.Text := WideString('MY TEXT');
or using a temporary variable, like this:
var tmp: WideString;
tmp := 'ЉЊĐŠŽČ'
table.Cell(1,2).Range.Text := tmp;
Another possibility could be to use not late-binding, but direct declaration of the OLE interface of Office, importing the "Microsoft Word ??? Object library" from the "Project" menu of the IDE.
You'll have widestring types in the imported interfaces, e.g:
Range = interface(IDispatch)
['{0002095E-0000-0000-C000-000000000046}']
function Get_Text: WideString; safecall;
procedure Set_Text(const prop: WideString); safecall;
(...)
property Text: WideString read Get_Text write Set_Text;
So you won't have any issue with Ansi charset any more.

Resources