SendInput for Non Unicode, Non Ascii characters - delphi

I am writing a Global Hook Procedure. Using SendInput to send ascii ( < 255 ) and Unicode characters. So far it is working well.
But in my language, Tamil there are some characters that are combination characters which do not have Unicode code points. So in fonts there are glyphs defined above 255.
How do I send virtual key codes for them?
Though those glyphs do not have code points they have glyph names like u0BAF_u0BBF. the first and second part are defined in Unicode. These names are referred as named sequence and approved by Unicode.
Code:
procedure GenUKey(const vk: integer; const bUnicode: bool);
var
kb: TKEYBDINPUT;
Input: TINPUT;
begin
{keydown}
{ ZeroMemory(#kb,sizeof(kb));}
{ ZeroMemory(#input,sizeof(input));}
{keydown}
if bUnicode then
begin
kb.wVk:= 0;
kb.wScan:= vk; ;
kb.dwFlags:= $4;
{ KEYEVENTF_UNICODE=4}
end
else
begin
kb.wVk:= vk;
kb.wScan:= 0; ;
kb.dwFlags:= 0;
end;
Input.itype:= INPUT_KEYBOARD;
Input.ki:= kb;
SendInput(1, Input,sizeof(Input));
{keyup}
if bUnicode then
begin
kb.wVk:= 0;
kb.wScan:= vk ;
kb.dwFlags:= $4 or KEYEVENTF_KEYUP;
{KEYEVENTF_UNICODE=4}
end
else
begin
kb.wVk:= vk;
kb.wScan:= 0;
kb.dwFlags:= KEYEVENTF_KEYUP;
end;
Input.itype:= INPUT_KEYBOARD;
Input.ki:= kb;
SendInput(1,Input,sizeof(Input));
end;
Edit:
Yes, I do send each code point individually. There are glyphs attached to each code point. Also there is another one combining the design of both. It looks better visually and behaves as one character. As it is defined separately I thought there should be a way to send by using sendinput. Edit2:
Also there are some characters that can not be combined mechanically.
Edit 3:
Tests prove that Rob Kennedy is correct. The second code point is not displayed mechanically. It is intelligently combined with the first one to get a third new combined glyph. The glyph (name) encoding is used. My code to do the same programmatically has interfered in this process and hence the problem. Many Thanks

Related

FMX: Why does TTextLayout.RegionForRange fail in this particular case?

I have created a TTextLayout object with text containing consecutive 't' characters and with the 'Calibri' font. I then have the following code to return the rectangular region of each character using the RegionForRange function. The result is that the width of the 1st 't' is 0 and the position of the 2nd 't' is the same as the first. Any other characters in the text are correct - even ones after the error, although the letter 'f' also has the same problem and any consecutive combination of 't' and 'f'. Most other fonts don't seem to cause the problem, although 'Gabriola' does.
procedure TForm1.FormCreate(Sender: TObject);
var
Layout : TTextLayout;
LRange : TTextRange;
LRegion : TRegion;
LRects : array of TRectF;
i : Integer;
begin
Layout := TTextLayoutManager.DefaultTextLayout.Create;
Layout.Font.Size := 20;
// Calibra and Gabriola fail but Arial and most other fonts don't
Layout.Font.Family := 'Calibri';// 'Gabriola';
Layout.Text := 'tt'; // ff, ft, tf also fail
LRange.Length := 1;
SetLength(LRects, Length(Layout.Text));
for i := 0 to Length(Layout.Text) - 1 do begin
LRange.Pos := i;
LRegion := Layout.RegionForRange(LRange);
LRects[i] := LRegion[0]; // Bounding rect of this character
end;
end;
Put a break point at the end of the function to see the values of left and right stored in LRects.
Stepping into the RegionForRange function leads to TTextLayoutD2D.DoRegionForRange but from there I can't go any further to see what could be going wrong. Why could this be happening for these particular characters and only for these fonts? Why should the character following the one in the range affect the result? Is it a bug? I could perhaps write some code to detect these sequences and correct the position, but I don't feel that I should need to do that.
Note that I'm using Delphi 10.4. I have not tried more recent updates, so I would appreciate if someone could confirm that this issue occurs and in which version.
The reason is called ligature.
When a ligature is applied, a combination of two or more glyphs is replaced by another, single glyph.
Ligatures are defined by OpenType fonts. Each font defines its own set of ligatures. Calibri defines a lot (you have found only a few ones).

Delphi - SysUtils.Trim not deleting last space(?) char

Delphi RIO. I have built an Excel PlugIn with Delphi (also using AddIn Express). I iterate through a column to read cell values. After I read the cell value, I do a TRIM function. The TRIM is not deleting the last space. Code Snippet...
acctName := Trim(UpperCase(Acctname));
Before the code, AcctName is 'ABC Holdings '. It is the same AFTER the TRIM function. It appears that Excel has added some type of other char there. (new line?? Carriage return??) What is the best way to get rid of this? Is there a way I can ask the debugger to show me the HEX value for this variable. I have tried the INSPECT and EVALUATE windows. They both just show text. Note that I have to be careful of just deleting NonText characters, and some companies names have dashes, commas, apostrophes, etc.
**Additional Info - Based on Andreas suggestion, I added the following...
ShowMessage(IntToHex(Ord(Acctname[Acctname.Length])));
This comes back with '00A0'. So I am thinking I can just do a simple StringReplace... so I add this BEFORE Andreas code...
acctName := StringReplace(acctName, #13, '', [rfReplaceAll]);
acctName := StringReplace(acctName, #10, '', [rfReplaceAll]);
Yet, it appears that nothing has changed. The ShowMessage still shows '00A0' as the last character. Why isn't the StringReplace removing this?
If you want to know the true identity of the last character of your string, you can display its Unicode codepoint:
ShowMessage(IntToHex(Ord(Acctname[Acctname.Length]))).
Or, you can use a utility to investigate the Unicode character on the clipboard, like my own.
Yes, the character in question is U+00A0: NO-BREAK SPACE.
This is like a usual space, but it tells the rendering application not to put a line break at this space. For instance, in Swedish, at least, you want non-breaking spaces in 5 000 kWh.
By default, Trim and TStringHelper.Trim do not remove this kind of whitespace. (They also leave U+2007: FIGURE SPACE and a few other kinds of whitespace.)
The string helper method has an overload which lets you specify the characters to trim. You can use this to include U+00A0:
S.Trim([#$20, #$A0, #$9, #$D, #$A]) // space, nbsp, tab, CR, LF
// (many whitespace characters missing!)
But perhaps an even better solution is to rely on the Unicode characterisation and do
function RealTrimRight(const S: string): string;
var
i: Integer;
begin
i := S.Length;
while (i > 0) and S[i].IsWhiteSpace do
Dec(i);
Result := Copy(S, 1, i);
end;
Of course, you can implement similar RealTrimLeft and RealTrim functions.
And of course there are many ways to see the actual string bytes in the debugger. In addition to writing things like Ord(S[S.Length]) in the Evaluate/Modify window (Ctrl+F7), my personal favourite method is to use the Memory window (Ctrl+Alt+E). When this has focus, you can press Ctrl+G and type S[1] to see the actual bytes:
Here you see the string test string. Since strings are Unicode (UTF-16) since Delphi 2009, each character occupies two bytes. For simple ASCII characters, this means that every second byte is null. The ASCII values for our string are 74 65 73 74 20 73 74 72 69 6E 67. You can also see, on the line above (02A0855C) that our string object has reference count 1 and length B (=11).
As a demo, to show the unicode string:
program q63847533;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils;
type
array100 = array[0..99] of Byte;
parray100 = ^array100;
var
searchResult : TSearchRec;
Name : string;
display : parray100 absolute Name;
dummy : string;
begin
if findfirst('z*.mp3', faAnyFile, searchResult) = 0 then
begin
repeat
writeln('File name = '+searchResult.Name);
name := searchResult.Name;
writeln('File size = '+IntToStr(searchResult.Size));
until FindNext(searchResult) <> 0;
// Must free up resources used by these successful finds
FindClose(searchResult);
end;
readln(dummy);
end.
My directory contains two z*.mp3 files, one with an ANSI name and the other with a Unicode name.
WATCHing display^ as Hex or Memorydump will display what you seem to require (the Is there a way I can ask the debugger to show me the HEX value for this variable. of your question)

How to specify multiple ranges for an if statement in Delphi? [duplicate]

This question already has an answer here:
Delphi check if character is in range 'A'..'Z' and '0'..'9'
(1 answer)
Closed 5 years ago.
for counter := 1 to lengthofpassword do
begin
currentletter:=password[counter];
currentascii:=Ord(currentletter);
if (96<currentascii<123) OR (64<currentascii<91) OR (47<currentascii<58) then
Writeln('valid')
else
asciicheck:=false;
end;
I know this code is wrong but I did it to explain what I want to ask. How can you specify ranges for an if statement? Before, I messed around with lots of if statements and my code wasn't working the way I wanted it to. Basically, I am making a procedure which checks the user input for anything other than uppercase and lowercase alphabet and numbers. This question is different because I was looking for how this problem could be solved using a Case Of statement.
for counter := 1 to lengthofpassword do
begin
currentletter:=password[counter];
currentascii:=Ord(currentletter);
if (currentascii<48) AND (currentascii>57) then
asciipoints:=asciipoints+1;
if (currentascii<65) AND (currentascii>90) then
asciipoints:=asciipoints+1;
if (currentascii<97) AND (currentascii>122) then
asciipoints:=asciipoints+1;
Writeln(asciipoints);
end;
I also tried to do it like this but then realised this wouldn't work because if one statement was satisfied, the others wouldn't be and the points based system wouldn't work either.
Glad you found the answer yourself.
Another way to make sure the password only contains upper and lower case characters and numbers is what I tried to point to: define a set of characters that are valid and check if each character in your password is in these valid characters.
So with a set defined like this:
const
ValidChars = ['A'..'Z', 'a'..'z', '0'..'9'];
you can use statements like
if password[I] in ValidChars then
This statement will however generate a compiler warning in Unicode Delphi, as the type in a set is limited to 256 possible values, and their ordinalities must fall between 0 and 255. This isn't the case for WideChar with 65.536 values. So the set of char defined is in fact a set of AnsiChar. For this task this is acceptable, as every character that needs to be checked is ASCII, so using the function CharInSet will not generate a compiler warning and have a defined behavior - returning False - if the password contains Unicode characters.
This is the resulting code:
const
ValidChars = ['A'..'Z', 'a'..'z', '0'..'9'];
var
I: Integer;
begin
for I := 1 to passwordlength do
begin
if CharInSet(password[I], ValidChars) then
Writeln('valid') // more likely to do nothing and invert the if statement
else
begin
asciicheck := False;
Break; // No need to look further, the check failed
end;
end;
end;
Multiple ranges is best expressed in a case statement:
begin
for counter := 1 to lengthofpassword do
begin
case Ord(password[counter]) of
48..57,
65..90,
97..122 :
Writeln('valid')
else
asciicheck:=false;
end;
end;
end;
Now, this works for characters < #128. If you are working in a unicode application and don't want the restriction of characters being the english alphabet, it is possible to use TCharHelper.IsLetterOrDigit.
if password[counter].IsLetterOrDigit then ...
Thanks to a comment up above, I have found a solution. I ended up using a Case Of statement like this:
for counter := 1 to lengthofpassword do
begin
currentletter:=password[counter];
currentascii:=Ord(currentletter);
case currentascii of
97..122 : asciicheck:=true;
65..90 : asciicheck:=true;
48..57 : asciicheck:=true;
else asciicheck:=false;
end;
end;
Thanks once again.

How to convert AnsiChar to UnicodeChar with specific CodePage?

I'm generating texture atlases for rendering Unicode texts in my app. Source texts are stored in ANSI codepages (1250, 1251, 1254, 1257, etc). I want to be able to generate all the symbols from each ANSI codepage.
Here is the outline of the code I would expect to have:
for I := 0 to 255 do
begin
anChar := AnsiChar(I); //obtain AnsiChar
//Apply codepage without converting the chars
//<<--- this part does not work, showing:
//"E2033 Types of actual and formal var parameters must be identical"
SetCodePage(anChar, aCodepages[K], False);
//Assign AnsiChar to UnicodeChar (automatic conversion)
uniChar := anChar;
//Here we get Unicode character index
uniCode := Ord(uniChar);
end;
The code above does not works (E2033) and I'm not sure it is a proper solution at all. Perhaps there's much shorter version.
What is the proper way of converting AnsiChar into Unicode with specific codepage in mind?
I would do it like this:
function AnsiCharToWideChar(ac: AnsiChar; CodePage: UINT): WideChar;
begin
if MultiByteToWideChar(CodePage, 0, #ac, 1, #Result, 1) <> 1 then
RaiseLastOSError;
end;
I think you should avoid using strings for what is in essence a character operation. If you know up front which code pages you need to support then you can hard code the conversions into a lookup table expressed as an array constant.
Note that all the characters that are defined in the ANSI code pages map to Unicode characters from the Basic Multilingual Plane and so are represented by a single UTF-16 character. Hence the size assumptions of the code above.
However, the assumption that you are making, and that this answer persists, is that a single byte represents a character in an ANSI character set. That's a valid assumption for many character sets, for example the single byte western character sets like 1252. But there are character sets like 932 (Japanese), 949 (Koren) etc. that are double byte character sets. Your entire approach breaks down for those code pages. My guess is that only wish to support single byte character sets.
If you are writing cross-platform code then you can replace MultiByteToWideChar with UnicodeFromLocaleChars.
You can also do it in one step for all characters. Here is an example for codepage 1250:
var
encoding: TEncoding;
bytes: TBytes;
unicode: TArray<Word>;
I: Integer;
S: string;
begin
SetLength(bytes, 256);
for I := 0 to 255 do
bytes[I] := I;
SetLength(unicode, 256);
encoding := TEncoding.GetEncoding(1250); // change codepage as needed
try
S := encoding.GetString(bytes);
for I := 0 to 255 do
unicode[I] := Word(S[I+1]); // as long as strings are 1-based
finally
encoding.Free;
end;
end;
Here is the code I have found to be working well:
var
I: Byte;
anChar: AnsiString;
Tmp: RawByteString;
uniChar: Char;
uniCode: Word;
begin
for I := 0 to 255 do
begin
anChar := AnsiChar(I);
Tmp := anChar;
SetCodePage(Tmp, aCodepages[K], False);
uniChar := UnicodeString(Tmp)[1];
uniCode := Word(uniChar);
<...snip...>
end;

RichEdit 2.0's usage of single CR character as linebreak throws off SelStart calculations (Delphi XE2)

When transitioning from Delphi 2006 to Delphi XE2, one of the things that we learned is that RichEdit 2.0 replaces internally CRLF pairs with a single CR character. This has the unfortunate effect of throwing off all character index calculations based on the actual text string on the VCL's side.
The behavior I can see by tracing through the VCL code is as follows:
Sending a WM_GETTEXT message (done in TControl.GetTextBuf) will return a text buffer that contains CRLF pairs.
Sending a WM_GETTEXTLENGTH message (done in TControl.GetTextLen) will return a value as if the text still contains CRLF characters.
In contrast, sending an EM_SETSELEX message (i.e. setting SelStart) will treat the input value as if the text contains only CR characters.
This causes all sorts of things to fail (such as syntax highlighting) in our application. As you can tell, everything is off by exactly one character for every new line up to that point.
Obviously, since this is inconsistent behavior, we must be missing something or doing something very wrong.
Does anybody else has any experience with the transition from a RichEdit 1.0 to a RichEdit 2.0 control and how did you solve this issue? Finally, is there any way to force RichEdit 2.0 to use CRLF pairs just like RichEdit 1.0?
We also ran into this very issue.
We do a "mail merge" type of thing where we have templates with merge codes that are parsed and replaced by data from outside sources.
This index mismatch between pos(mystring, RichEdit.Text) and the positioning index into the RichEdit text using RichText.SelStart broke our merge.
I don't have a good answer but I came up with a workaround. It's a bit cumbersome (understatment!) but until a better solution comes along...
The workaround is to use a hidden TMemo and copy the RichEdit text to it and change the CR/LF pairs to CR only. Then use the TMemo to find the proper positioning using pos(string, TMemo) and use that to get the selstart position to use in the TRichEdit.
This really sucks but hopefully this workaround will help others in our situation or maybe spark somebody smarter than me into coming up with a better solution.
I'll show a little sample code...
Since we are replacing text using seltext we need to replace text in BOTH the RichEdit control and the TMemo control to keep the two synchronized.
StartToken and EndToken are the merge code delimiters and are a constant.
function TEditForm.ParseTest: boolean;
var TagLength: integer;
var ValueLength: integer;
var ParseStart: integer;
var ParseEnd: integer;
var ParseValue: string;
var Memo: TMemo;
begin
Result := True;//Default
Memo := TMemo.Create(nil);
try
Memo.Parent := self;
Memo.Visible := False;
try
Memo.Lines.Clear;
Memo.Lines.AddStrings(RichEditor.Lines);
Memo.Text := stringreplace(Memo.Text,#13#10,#13,[rfReplaceAll]);//strip CR/LF pairs and replace with CR
while (Pos(StartToken, Memo.Text) > 0) and (Pos(EndToken, Memo.Text) > 0) do begin
ParseStart := Pos(StartToken, Memo.SelText);
ParseEnd := Pos(EndToken, Memo.SelText) + Length(EndToken);
if ParseStart >= ParseEnd then begin//oops, something's wrong - bail out
Result := true;
myEditor.SelStart := 0;
exit;
end;
TagLength := ParseEnd - ParseStart;
ValueLength := (TagLength - Length(StartToken)) - Length(EndToken);
ParseValue := Copy(Memo.SelText, (ParseStart + Length(StartToken)), ValueLength);
Memo.selstart := ParseStart - 1; //since the .text is zero based, but pos is 1 based we subtract 1
Memo.sellength := TagLength;
RichEditor.selstart := ParseStart - 1; //since the .text is zero based, but pos is 1 based we subtract 1
RichEditor.sellength := TagLength;
TempText := GetValue(ParseValue);
Memo.SelText := TempText;
RichEditor.SelText := TempText;
end;
except
on e: exception do
begin
MessageDlg(e.message,mtInformation,[mbOK],0);
result := false;
end;
end;//try..except
finally
FreeAndNil(Memo);
end;
end;
How about subtracting EM_LINEFROMCHAR from the caret position? (OR the position of EM_GETSEL) whichever you need.
You could even get two EM_LINEFROMCHAR variables. One from the selection start and the other from the desired caret/selection position, if you only want to know how many cl/cr pairs are in the selection.

Resources