I have some legacy code (1 million lines) written in Delphi 7 Pascal which for various reasons can't be upgraded to a more recent version of Delphi. The program outputs documents in about 30 languages and makes a very good job of producing the various characters in all languages apart from Turkish. The coding sets the charset to TURKISH_CHARSET (162). When it tries to print char #351 (ş, hex 15f), char #285 (ğ, hex 11f) or char #305 (ı, hex 131), it prints only "s", "g" or "i". It uses a simple
Printer.Canvas.TextOut(x, y, sText)
to output the text.
I tried compiling the code on different machines and running it on different versions of Windows but always with the same result.
In Delphi 7, string is an alias for AnsiString, which encodes Unicode characters as 8-bit bytes using Windows codepages. In some MBCS codepages, Unicode characters may require multiple bytes (Turkish is not one of them, though).
Microsoft has several codepages for Turkish:
857 (MS-DOS)
1254 (Windows)
10081 (Macintosh)
28599 (ISO-8859-9)
In both codepages 1254 and 28599 (where 1254 is the most likely one you will run into), the Unicode characters in question are encoded in 8-bit as hex $FE (ş), $F0 (ğ), and $FD (ı).
Make sure your sText string variable actually contains those byte values to begin with, and not ASCII bytes $73 (s), $67 (g), and $69 (i) instead. If it contains the latter, you are losing the Turkish data before it even reaches Canvas.TextOut(). That would be an issue earlier in your code.
However, If sText contains the correct bytes, then the problem has to be on the OS side, as TCanvas.TextOut() is just a thin wrapper for the Win32 API ExtTextOutA() function, where sText gets passed as-is to the API. Maybe the particular font you are using doesn't support Turkish, or at least those particular characters. Or maybe there is a problem with the printer driver. Either way, you might have to resort to converting your sText value to a WideString using MultiByteToWideChar() and then call ExtTextOutW() (not ExtTextOutA()) directly, eg:
var
wText: WideString;
size: TSize;
begin
//Printer.Canvas.TextOut(x, y, sText);
SetLength(wText, MultiByteToWideChar(1254{28599}, 0, PAnsiChar(sText), Length(sText), nil, 0));
MultiByteToWideChar(1254{28599}, 0, PAnsiChar(sText), Length(sText), PWideChar(wText), Length(wText)));
Windows.ExtTextOutW(Printer.Canvas.Handle, x, y, Printer.Canvas.TextFlags, nil, PWideChar(wText), Length(wText), nil);
size.cX := 0;
size.cY := 0;
Windows.GetTextExtentPoint32W(Printer.Canvas.Handle, PWideChar(wText), Length(wText), size);
Printer.Canvas.MoveTo(x + size.cX, Y);
end;
Related
I need to draw a barcode (QR) with Delphi 6/7. The program can run in various windows locales, and the data is from an input box.
On this input box, the user can choose a charset, and input his own language. This works fine. The input data is only ever from the same codepage. Example configurations could be:
Windows is on Western Europe, Codepage 1252 for ANSI text
Input is done in Shift-JIS ANSI charset
I need to get the Shift-JIS across to the barcode. The most robust way is to use hex encoding.
So my question is: how do I go from Shift-JIS to a hex String in UTF-8 encoding, if the codepage is not the same as the Windows locale?
As example: I have the string 能ラ. This needs to be converted to E883BDE383A9 as per UTF-8. I have tried this but the result is different and meaningless:
String2Hex(UTF8Encode(ftext))
Unfortunately I can't just have an inputbox for WideStrings. But if I can find a way to convert the ANSI text to a WideString, the barcode module can work with Unicode Strings as well.
If it's relevant: I am using the TEC-IT TBarcode DLL.
Creating and accessing a Unicode text control
This is easier than you may think and I did so in the past with the brand new Windows 2000 when convenient components like Tnt Delphi Unicode Controls were not available. Having background knowledge on how to create a Windows GUI program without using Delphi's VCL and manually creating everything helps - otherwise this is also an introduction of it.
First add a property to your form, so we can later access the new control easily:
type
TForm1= class(TForm)
...
private
hEdit: THandle; // Our new Unicode control
end;
Now just create it at your favorite event - I chose FormCreate:
// Creating a child control, type "edit"
self.hEdit:= CreateWindowW( PWideChar(WideString('edit')), PWideChar(WideString('myinput')), WS_CHILD or WS_VISIBLE, 10, 10, 200, 25, Handle, 0, HINSTANCE, nil );
if self.hEdit= 0 then begin // Failed. Get error code so we know why it failed.
//GetLastError();
exit;
end;
// Add a sunken 3D edge (well, historically speaking)
if SetWindowLong( self.hEdit, GWL_EXSTYLE, WS_EX_CLIENTEDGE )= 0 then begin
//GetLastError();
exit;
end;
// Applying new extended style: the control's frame has changed
if not SetWindowPos( self.hEdit, 0, 0, 0, 0, 0, SWP_FRAMECHANGED or SWP_NOMOVE or SWP_NOZORDER or SWP_NOSIZE ) then begin
//GetLastError();
exit;
end;
// The system's default font is no help, let's use this form's font (hopefully Tahoma)
SendMessage( self.hEdit, WM_SETFONT, self.Font.Handle, 1 );
At some point you want to get the edit's content. Again: how is this done without Delphi's VCL but instead directly with the WinAPI? This time I used a button's Click event:
var
sText: WideString;
iLen, iError: Integer;
begin
// How many CHARACTERS to copy?
iLen:= GetWindowTextLengthW( self.hEdit );
if iLen= 0 then iError:= GetLastError() else iError:= 0; // Could be empty, could be an error
if iError<> 0 then begin
exit;
end;
Inc( iLen ); // For a potential trailing #0
SetLength( sText, iLen ); // Reserve space
if GetWindowTextW( self.hEdit, #sText[1], iLen )= 0 then begin // Copy text
//GetLastError();
exit;
end;
// Demonstrate that non-ANSI text was copied out of a non-ANSI control
MessageBoxW( Handle, PWideChar(sText), nil, 0 );
end;
There are detail issues, like not being able to reach this new control via Tab, but we're already basically re-inventing Delphi's VCL, so those are details to take care about at other times.
Converting codepages
The WinAPI deals either in codepages (Strings) or in UTF-16 LE (WideStrings). For historical reasons (UCS-2 and later) UTF-16 LE fits everything, so this is always the implied target to achieve when coming from codepages:
// Converting an ANSI charset (String) to UTF-16 LE (Widestring)
function StringToWideString( s: AnsiString; iSrcCodePage: DWord ): WideString;
var
iLenDest, iLenSrc: Integer;
begin
iLenSrc:= Length( s );
iLenDest:= MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, nil, 0 ); // How much CHARACTERS are needed?
SetLength( result, iLenDest );
if iLenDest> 0 then begin // Otherwise we get the error ERROR_INVALID_PARAMETER
if MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, PWideChar(result), iLenDest )= 0 then begin
//GetLastError();
result:= '';
end;
end;
end;
The source codepage is up to you: maybe
1252 for "Windows-1252" = ANSI Latin 1 Multilingual (Western Europe)
932 for "Shift-JIS X-0208" = IBM-PC Japan MIX (DOS/V) (DBCS) (897 + 301)
28595 for "ISO 8859-5" = Cyrillic
65001 for "UTF-8"
However, if you want to convert from one codepage to another, and both source and target shall not be UTF-16 LE, then you must go forth and back:
Convert from ANSI to WIDE
Convert from WIDE to a different ANSI
// Converting UTF-16 LE (Widestring) to an ANSI charset (String, hopefully you want 65001=UTF-8)
function WideStringToString( s: WideString; iDestCodePage: DWord= CP_UTF8 ): AnsiString;
var
iLenDest, iLenSrc: Integer;
begin
iLenSrc:= Length( s );
iLenDest:= WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, nil, 0, nil, nil );
SetLength( result, iLenDest );
if iLenDest> 0 then begin // Otherwise we get the error ERROR_INVALID_PARAMETER
if WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, PChar(result), iLenDest, nil, nil )= 0 then begin
//GetLastError();
result:= '';
end;
end;
end;
As per every Windows installation not every codepage is supported, or different codepages are supported, so conversion attempts may fail. It would be more robust to aim for a Unicode program right away, as that is what every Windows installation definitly supports (unless you still deal with Windows 95, Windows 98 or Windows ME).
Combining everything
Now you got everything you need to put it together:
you can have a Unicode text control to directly get it in UTF-16 LE
you can use an ANSI text control to then convert the input to UTF-16 LE
you can convert from UTF-16 LE (WIDE) to UTF-8 (ANSI)
Size
UTF-8 is mostly the best choice, but size wise UTF-16 may need fewer bytes in total when your target audience is Asian: in UTF-8 both 能 and ラ need 3 bytes each, but in UTF-16 both only need 2 bytes each. As per your QR barcode size is an important factor, I guess.
Likewise don't waste by turning binary data (8 bits per byte) into ASCII text (displaying 4 bits per character, but itself needing 1 byte = 8 bits again). Have a look at Base64 which encodes 6 bits into every byte. A concept that you encountered countless times in your life already, because it's used for email attachments.
Delphi RIO. I have built an Excel PlugIn with Delphi (also using AddIn Express). I iterate through a column to read cell values. After I read the cell value, I do a TRIM function. The TRIM is not deleting the last space. Code Snippet...
acctName := Trim(UpperCase(Acctname));
Before the code, AcctName is 'ABC Holdings '. It is the same AFTER the TRIM function. It appears that Excel has added some type of other char there. (new line?? Carriage return??) What is the best way to get rid of this? Is there a way I can ask the debugger to show me the HEX value for this variable. I have tried the INSPECT and EVALUATE windows. They both just show text. Note that I have to be careful of just deleting NonText characters, and some companies names have dashes, commas, apostrophes, etc.
**Additional Info - Based on Andreas suggestion, I added the following...
ShowMessage(IntToHex(Ord(Acctname[Acctname.Length])));
This comes back with '00A0'. So I am thinking I can just do a simple StringReplace... so I add this BEFORE Andreas code...
acctName := StringReplace(acctName, #13, '', [rfReplaceAll]);
acctName := StringReplace(acctName, #10, '', [rfReplaceAll]);
Yet, it appears that nothing has changed. The ShowMessage still shows '00A0' as the last character. Why isn't the StringReplace removing this?
If you want to know the true identity of the last character of your string, you can display its Unicode codepoint:
ShowMessage(IntToHex(Ord(Acctname[Acctname.Length]))).
Or, you can use a utility to investigate the Unicode character on the clipboard, like my own.
Yes, the character in question is U+00A0: NO-BREAK SPACE.
This is like a usual space, but it tells the rendering application not to put a line break at this space. For instance, in Swedish, at least, you want non-breaking spaces in 5 000 kWh.
By default, Trim and TStringHelper.Trim do not remove this kind of whitespace. (They also leave U+2007: FIGURE SPACE and a few other kinds of whitespace.)
The string helper method has an overload which lets you specify the characters to trim. You can use this to include U+00A0:
S.Trim([#$20, #$A0, #$9, #$D, #$A]) // space, nbsp, tab, CR, LF
// (many whitespace characters missing!)
But perhaps an even better solution is to rely on the Unicode characterisation and do
function RealTrimRight(const S: string): string;
var
i: Integer;
begin
i := S.Length;
while (i > 0) and S[i].IsWhiteSpace do
Dec(i);
Result := Copy(S, 1, i);
end;
Of course, you can implement similar RealTrimLeft and RealTrim functions.
And of course there are many ways to see the actual string bytes in the debugger. In addition to writing things like Ord(S[S.Length]) in the Evaluate/Modify window (Ctrl+F7), my personal favourite method is to use the Memory window (Ctrl+Alt+E). When this has focus, you can press Ctrl+G and type S[1] to see the actual bytes:
Here you see the string test string. Since strings are Unicode (UTF-16) since Delphi 2009, each character occupies two bytes. For simple ASCII characters, this means that every second byte is null. The ASCII values for our string are 74 65 73 74 20 73 74 72 69 6E 67. You can also see, on the line above (02A0855C) that our string object has reference count 1 and length B (=11).
As a demo, to show the unicode string:
program q63847533;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils;
type
array100 = array[0..99] of Byte;
parray100 = ^array100;
var
searchResult : TSearchRec;
Name : string;
display : parray100 absolute Name;
dummy : string;
begin
if findfirst('z*.mp3', faAnyFile, searchResult) = 0 then
begin
repeat
writeln('File name = '+searchResult.Name);
name := searchResult.Name;
writeln('File size = '+IntToStr(searchResult.Size));
until FindNext(searchResult) <> 0;
// Must free up resources used by these successful finds
FindClose(searchResult);
end;
readln(dummy);
end.
My directory contains two z*.mp3 files, one with an ANSI name and the other with a Unicode name.
WATCHing display^ as Hex or Memorydump will display what you seem to require (the Is there a way I can ask the debugger to show me the HEX value for this variable. of your question)
Please help me,
I know this may sound like very simple question, but i just can not figured it out how to make it work. I just started learning Unicode, so please give me some hint or example code.
I was converting my old encoding and decoding code from Delphi 5 to Delphi XE2. And when i call "Char" function it result in a different character, seem like it happen at the extended character of any encoding set.
At Delphi 5 :
Char(129) -> will result as empty char
At Delphi XE2 :
Char(129) -> will result #$81
I tried to used AnsiChar at delphi XE2, and the result is :
AnsiChar(129) -> will result as #129
What code should i used at delphi XE2, so it will return an empty char too. Not the #nn notation?
I need it to return the same result of Delphi 5, for the backward compatibility reason.
Is this have something to do with HIGHCHARUNICODE directive? I have read and tried it too, but still not luck.
Here the code that i tried at Delphi XE2, i make a simple one, but it did have a same logic with my encode / decode code. The code will get the char then put it into edit box.
procedure TForm1.Button1Click;
var
chars : Array[0..2] of AnsiChar;
ansi_string : AnsiString;
begin
chars[0] := AnsiChar(65);
chars[1] := AnsiChar(129);
chars[2] := AnsiChar(66);
ansi_string := chars;
// Here the ansi_string have a value of 'A'#$81'B'
EditBox1.Text := ansi_string;
// Here when i look the EditBox1.text in Evaluate/modify form,
// it shows 'A'#$0081'B'
// but at the form, it only show AB
end;
How can i make the ansi_string variable having a value of 'AB' instead of 'A'#$81'B'?
Thanks in Advance,
In Delphi 5, Char is AnsiChar, but in XE2 Char is WideChar instead. 127 is the highest signed value that an AnsiChar can hold, so a value of 129, which is hex $81, binary 10000001, would simply be interpreted as -127, which is also hex $81, binary 10000001. They are just different interpretations of the same bit value.
Depending on what your encoding/decoding code I actually doing, you will need to either use AnsiChar/AnsiString explicitly instead of Char/String generically, or switch to using Byte values, or else re-write the code to support Unicode properly and not make assumptions about the size of Char anymore. Hard to say since you did not show your actual code. But you should be OK with just using AnsiChar/AnsiString, since they do operate the same way they always have (the debugger may simply be displaying AnsiChar values differently, that's all).
I have a problem in Delphi 2010. I would like to send from my PC some Unicode (16 bits) characters to the printer with serial port (COM port).
I use the TCiaComPort component in D2010.
For example:
CiaComPort1.Open := True; \\I open the port
Data := #$0002 + UnicodeString(Ж) + #$0003;
CiaComPort1.SendStr(Parancs); //I send the data to the device
If the printer characterset is ASCII then the characters arrive, but the ciril character is '?' on the Printer screen. But if the printer characterset is Unicode then the characters do not arrive to the printer.
An Unicode character represented in 2 bytes. How can I decompose an Unicode character to byte for byte? For example #$0002?
And how can I send this strings byte for byte with the comport? Which function?
Under Windows (check your OS how to open and write to comm ports), I use the following function to write a UnicodeString to a COMM Port:
Bear in mind that the port have to be setup correctly, baud rate, number of bits, etc. See Device Manager => Comm Ports
function WriteToCommPort(const sPort:String; const sOutput:UnicodeString):Boolean;
var
RetW:DWORD;
buff: PByte;
lenBuff:Integer;
FH:THandle;
begin
Result:=False;
lenBuff:=Length(sOutput)*2;
if lenBuff = 0 then Exit; // Nothing to write, empty string
FH:= Windows.CreateFile(PChar(sPort), GENERIC_READ or GENERIC_WRITE, 0, Nil, OPEN_EXISTING, 0, 0);
if (FH <> Windows.INVALID_HANDLE_VALUE) then
try
Buff:=PByte(#sOutput[1]);
Windows.WriteFile(FH, buff^, lenBuff, RetW, Nil);
Result:= Integer(RetW) = lenBuff;
finally
Windows.CloseHandle(FH);
end;
end;
Does CiaComPort1.SendStr() accept an AnsiString or UnicodeString as input? Did you try using a COM port sniffer to make sure that CiaComPort is transmitting the actual Unicode bytes as you are expecting?
The fact that you are using #$0002 and #$0003 makes me think it is actually not, because those characters are usually transmitted on COM ports as 8-bit values, not as 16-bit values. If that is the case, then that would explain why the Ж character is getting converted to ?, if CiaComPort is performing a Unicode->Ansi data conversion before transmitting. In which case, you may have to do something like this instead:
var
B: TBytes;
I: Integer;
B := WideBytesOf('Ж');
SetLength(Data, Length(B)+2);
Data[1] := #$0002;
for I := Low(B) to High(B) do
Data[2+I] := WideChar(B[I]);
Data[2+Length(B)] #$0003;
CiaComPort1.SendStr(Data);
However, if CiaComPort is actually performing a data conversion internally, then you will still run into conversion issues for any encoded bytes that are above $7F.
In which case, look to see if CiaComPort has any other sending methods available that allow you to send raw bytes instead of strings. If it does not, then you are pretty much SOL and will need to switch to a better COM component, or just use OS APIs to access the COM port directly instead.
In a sorting routine in Delphi 2007 I am using code like this:
(txt[n] in ['0'..'9'])
function ExtractNr(n: Integer; var txt: String): Int64;
begin
while (n <= Length(txt)) and (txt[n] in ['0'..'9']) do n:= n + 1;
Result:= StrToInt64Def(Copy(txt, 1, (n - 1)), 0);
Delete(txt, 1, (n - 1));
end;
where txt is a string. This works fine in D2007 but will give warnings in D2009 and D2010 I have no idea why but is there any way I can make it work without warnings in D2009 and D2010?
Roy M Klever
Are you getting the "WideChar reduced to byte Char in set expressions. Consider using 'CharInSet' function in 'SysUtils' unit" message?
Here's the issue. In D2009, the default string type was changed from AnsiString to UnicodeString. An AnsiString uses a single byte for each character, giving you 256 possible characters. A UnicodeString uses 2 bytes per character, giving up to 64K characters. But a Pascal set can only contain up to 256 elements. So it can't create a "set of WideChar" because there are too many possible elements.
The warning is a warning that you're attempting to compare txt[n], which is a WideChar from a Unicode string, against a set of chars. It can't make a set of WideChars, so it had to reduce them to AnsiChars to fit them into a Pascal set, and your txt[n] might be outside the Ansi boundaries entirely.
You can fix it by using CharInSet, or by making txt an AnsiString, if you're certain you won't need any Unicode characters for it. Or if that won't work well, you can disable the warning, but I'd consider that a last resort.
use CharInSet or better use Character.IsDigit