Using TStringList in Windows with Chinese display language? - delphi

How do I make my program compatible with non English Windows region (e.g Chinese)?
My program is compiled with Delphi XE2 and it loads signature file contain some hashes.
DB.txt contains Cardinal type data:
5654564534
8674534664
I use TStringList with TEncoding.Default to load the file and store the hashes in array (Cardinal).
SL.LoadFromFile(Path, TEncoding.Default);
SetLength(myCardinalArray, SL.Count - 1);
for i := 1 to SL.Count - 1 do
begin
myCardinalArray[i - 1] := strtoint64(SL[i]);
end;
Until here, the program work properly but when the program is executed in Windows with Chinese display language, the array contained invalid hashes.
I've tried TEncoding.Unicode and others property then my program does not work in Windows with English display language! Should I detect for BOM first before use TEncoding.Unicode?
What type of encoding to make my program can run both in Windows with English and Chinese display language?
Thanks

Related

Delphi - check if a Unicode character occurs in a set of characters?

This code works good with Delphi-7 (until Delphi had Unicode support):
Value := edit1.Text[1];
if Value in ['м', 'ж'] then ...
'м', 'ж' - cyrillic symbols
But this construction doesn't work with Unicode charachter.
I try a lot of things, but they are doesn't work.
I also tried changing the value types to "Char" and "AnsiChar".
Doesn't work:
const
MySet : set of WideChar = [WideChar('м'), WideChar('ж')];
begin
Value := edit1.Text[1];
if Value in MySet then ...
Doesn't work:
if AnsiChar(Value) in ['м', 'ж'] then ...
Doesn't work:
if CharInSet(Value, ['м', 'ж']) then ...
But this works good:
if (Value = 'м') or (Value = 'ж') then ...
Whether there is an opportunity to check up UNICODE character by use of a SET in the modern versions of Delphi?
Or should we check each character individually?
My Delphi version is 10.4 update 2 Community Edition
A Delphi set type can only handle a maximum of 256 values, so it cannot be used for handling Unicode characters. For handling Unicode, the System.Character unit provides various methods and helpers.
For this particular case, there is an IsInArray() character helper you can use. Instead of declaring a set of characters, you will need to declare an array of characters:
var
ch: Char;
a: array of Char;
s: string;
begin
a := ['м', 'ж'];
s := 'abcж';
for ch in s do
if ch.IsInArray(a) then ...
end;
Note: Delphi XE7 introduced additional language support for initializing and working with dynamic arrays, and square brackets can also be used for simpler array initialization. In the context of above example, ['м', 'ж'] is not a set, but an array of wide characters.
check if a Unicode character occurs in a set of characters?
Do you mean a Delphi set?
In general, it is impossible to have a set of X where the base type X has more than 256 possible distinct values. So set of Byte is fine, but set of Word isn't possible. Since there are 256 * 256 distinct wide character values, it is therefore impossible to have a set of wide characters. (If this were indeed possible, a variable of such a set type would be 8 kB in size. That would be an unusually large variable.)
Since there is no such thing as "Delphi set of Unicode characters", the question "How to see if a character belongs to a Delphi set of Unicode characters" doesn't make sense.
Or do you simply mean a mathematical set?
If so, of course this is possible, but you cannot use a Delphi set to represent the mathematical set of characters. Instead, you need to use some other data type. One possibility is a simple array, if you don't mind its O(n) characteristics.

Delphi - Removing specific hex value from string

Delphi Tokyo - I have a text file... (Specifically CSV file). I am reading the file line by line using TextFile operations... The first three bytes of the file has some type of header data which I am not interested in. While I think this will be the case in all files, I want to verify that before I delete it. In short, I want to read the line, compare the first three bytes to three hex values, and if matching, delete the 3 bytes.
When I look at the file in a hex editor, I see
EF BB BF ...
For whatever reason, my comparison is NOT working.
Here is a code fragment.
var
LeadingBadBytes: String;
begin
// Open file, and read first line into variable TriggerHeader
...
LeadingBadBytes := '$EFBBBF';
if AnsiPos(LeadingBadBytes, TriggerHeader) = 1 then
delete(TriggerHeader, 1, 3);
The DELETE command by itself works fine, but I cannot get the AnsiPos to work. What should I be doing different?
The bytes EF BB BF are a UTF-8 BOM, which identifies the file as Unicode text encoded in UTF-8. They only appear at the beginning of the file, not on every line.
Your comparison does not work because you are comparing the read string to the literal string '$EFBBBF', not to the byte sequence EF BB BF.
Change this:
LeadingBadBytes := '$EFBBBF';
...
Delete(TriggerHeader, 1, 3);
To this:
LeadingBadBytes := #$FEFF; // EF BB BF is the UTF-8 encoded form of Unicode codepoint U+FEFF...
...
Delete(TriggerHeader, 1, 1); // or Delete(..., Length(LeadingBadBytes))
Also, consider using StrUtils.StartsText(...) instead of AnsiPos(...) = 1.
That being said, modern versions of Delphi should be handling the BOM for you, you shouldn't be receiving it in the read data at all. But, since you said you are using a TextFile, it is not BOM-aware, AFAIK. You should not be using outdated Pascal-style file I/O to begin with. Try using more modern Delphi RTL I/O classes instead, like TStringList or TStreamReader, which are BOM-aware.

Searching for words in MS Word Document from Delphi with RegEx and import to Delphi app

I am working with our lab report system and want to automate some of the tasks. The system we use is not intuitive and uses word documents to enter data. There are several paragraphs with headings (protected headings).
I want to copy a phrase in one of the paragraphs and paste it into another paragraph using a Delphi app
GetActiveOleObject('Word.Application');
How can I use a RegEx for that. The good thing is the searchable phrases I want to copy are in uppercase while everything else is sentence case. example:
3rd paragraph heading:---> Receiver Notes <---- this is not editable in the document (protected)
the specimen is received in CONTAINER OF FORMALIN at this workstation
the specimen is received FRESH WITH NO FIXATIVE at this workstation
my result has to be something like:
4th paragraph heading --->Methods of Receiving <------ protected again
CONTAINER OF FORMALIN <----- here is where I want to paste from the first match
FRESH WITH NO FIXATIVE <----- and here the second match … etc
So my feeling is to have a delphi code to search between paragraph heading "Receiver Note" and "Methods of Receiving" for those in upper case and list them in the next paragraph.
I use delphi xe3 and I know how to use regex with other files but not in word using delphi. Any input, code snippets, examples, etc would be much appreciated!
Ok I finally got this to work and I am posting the code if incase someone needs this. I had to copy the document to my delphi Memo and work it there with regex and then paste it back where I want. Although the process may seem cumbersome, it executes very fast. The word documents I work with are usually one or two pages anyways.
procedure TForm1.Button1Click(Sender: TObject);
var
DXRANGE, DXWORD: OleVariant;
n : Integer;
regexpr: TRegEx;
Match: TMatch;
begin
try
DXWORD := GetActiveOleObject('Word.Application');
DXRANGE := DXWORD.Documents.Item(1)
.Range(DXWORD.Documents.Item(1).Range.Start, DXWORD.Documents.Item(1)
.Range.End);
DXRANGE.Copy;
Memo1.Clear;
Memo1.PasteFromClipBoard;
regexpr := TRegEx.Create('\b[A-Z][A-Z][A-Z]+(?:\s+[A-Z]+)*\b');
Match := regexpr.Match(Memo1.Text);
n := 1;
Memo2.Clear;
while Match.Success do
begin
Memo2.Lines.Add(IntToStr(n) + Match.Value);
Memo2.Lines.Add('');
Match := Match.NextMatch;
n := n + 1;
end;
Memo2.SelectAll;
Memo2.CopyToClipboard;
DXWORD.Selection.PasteSpecial(wdPasteRTF)
except
on E: exception do
begin
ShowMessage(E.Message);
end;
end;
end;
As a general rule when working with Word (or any office app) and ActiveX Delphi component, is to use the amazing macro recorder to see how it would do it.
eg.
Open your word document
Select [Record Macro] from the tools menu
Do your search
Copy it to the clipboard
Replace your code
Do Whatever else you need to do.
Stop Macro
Now open up the macro VBA organiser and look at the code VBA has generated for what you did. This will give you a very good idea of the functions you need to get your delphi code to call.

Error because of quote char after converting file to string with Delphi XE?

I have incorrect result when converting file to string in Delphi XE. There are several ' characters that makes the result incorrect. I've used UnicodeFileToWideString and FileToString from http://www.delphidabbler.com/codesnip and my code :
function LoadFile(const FileName: TFileName): ansistring;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
begin
try
SetLength(Result, Size);
Read(Pointer(Result)^, Size);
// ReadBuffer(Result[1], Size);
except
Result := '';
Free;
end;
Free;
end;
end;
The result between Delphi XE and Delphi 6 is different. The result from D6 is correct. I've compared with result of a hex editor program.
Your output is being produced in the style of the Delphi debugger, which displays string variables using Delphi's own string-literal format. Whatever function you're using to produce that output from your own program has actually been fixed for Delphi XE. It's really your Delphi 6 output that's incorrect.
Delphi string literals consist of a series of printable characters between apostrophes and a series of non-printable characters designated by number signs and the numeric values of each character. To represent an apostrophe, write two of them next to each other. The printable and non-printable series of characters can be written right not to each other; there's no need to concatenate them with the + operator.
Here's an excerpt from the output you say is correct:
#$12'O)=ù'dlû'#6't
There are four lone apostrophes in that string, so each one either opens or closes a series of printable characters. We don't necessarily know which is which when we start reading the string at the left because the #, $, 1, and 2 characters are all printable on their own. But if they represent printable characters, then the 0, ), =, and ù characters are in the non-printable region, and that can't be. Therefore, the first apostrophe above opens a printable series, and the #$12 part represents the character at code 18 (12 in hexadecimal). After the ù is another apostrophe. Since the previous one opened a printable string, this one must close it. But the next character after that is d, which is not #, and therefore cannot be the start of a non-printable character code. Therefore, this string from your Delphi 6 code is mal-formed.
The correct version of that excerpt is this:
#$12'O)=ù''dlû'#6't
Now there are three lone apostrophes and one set of doubled apostrophes. The problematic apostrophe from the previous string has been doubled, indicating that it is a literal apostrophe instead of a printable-string-closing one. The printable series continues with dlû. Then it's closed to insert character No. 6, and then opened again for t. The apostrophe that opens the entire string, at the beginning of the file, is implicit.
You haven't indicated what code you're using to produce the output you've shown, but that's where the problem was. It's not there anymore, and the code that loads the file is correct, so the only place that needs your debugging attention is any code that depended on the old, incorrect format. You'd still do well to replace your code with that of Robmil since it does better at handling (or not handling) exceptions and empty files.
Actually, looking at the real data, your problem is that the file stores binary data, not string data, so interpreting this as a string is not valid at all. The only reason it works at all in Delphi 6 is that non-Unicode Delphi allows you to treat binary data and strings the same way. You cannot do this in Unicode Delphi, nor should you.
The solution to get the actual text from within the file is to read the file as binary data, and then copy any values from this binary data, one byte at a time, to a string if it is a "valid" Ansi character (printable).
I will suggest the code:
function LoadFile(const FileName: TFileName): AnsiString;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
try
SetLength(Result, Size);
if Size > 0 then
Read(Result[1], Size);
finally
Free;
end;
end;

A better way of converting Codepage-1251 in RTF to Unicode

I am trying to parse RTF (via MSEDIT) in various languages, all in Delphi 2010, in order to produce HTML in unicode.
Taking Russian/Cyrillic as my starting point I find that the overall document codepage is 1252 (Western) but the Russian parts of the text are identified by the charset of the font (RUSSIAN_CHARSET 204).
So far I am:
1) Use AnsiString (or RawByteString) when parsing the RTF
2) Determine the CodePage by a lookup from the font charset (see http://msdn.microsoft.com/en-us/library/cc194829.aspx)
3) Translating using a lookup table in my code: (This table generated from http://msdn.microsoft.com/en-gb/goglobal/cc305144.aspx) - I'm going to need one table per supported codepage!
There MUST be a better way than this? Preferably something supplied by the OS and so less brittle than tables of constants.
The Charset to codepage table is small enough, and static enough, that I doubt the system provides a function to do it.
To do the actual character translations you can use the SysUtils.TEncoding class or the System.SetCodePage function. Both internally use MultiByteToWideString, which uses OS-provided lookup tables, so you don't need to maintain them.
Using SetCodePage would look something like this:
var
iStart, iStop: Integer;
RTF, RawText: AnsiString;
Text: string;
CodePage: Word;
begin
...
CodePage := CharSetToCodePage(CharSet);
RawText := Copy(RTF, iStart, iStop - iStart);
SetCodePage(RawText, CodePage, False); // Set string codepage to Russian without converting it
Text := string(RawText); // Automatic conversion from string codepage to Unicode

Resources