This statement (in Delphi 7)
writeln(logfile,format('%16.16d ',[FileInfo.size])+full_name);
results in this output
0000000021239384 C:\DATA\DELPHI\sxf_archive10-13.zip
This statement
writeln(logfile,format('%17.17d ',[FileInfo.size])+full_name);
results in this output
21239384 C:\DATA\DELPHI\sxf_archive10-13.zip
The padding with leading zeros changes to leading spaces when the precision specifier is larger than 16. The help says "If the format string contains a precision specifier, it indicates that the resulting string must contain at least the specified number of digits; if the value has less digits, the resulting string is left-padded with zeros."
Is there another way to format a 20 character integer with leading zeros?
Precision of an Integer value is limited to 16 digits max. If the specified Precision is larger than 16, 0 is used instead. This is not a bug, it is hard-coded logic, and is not documented.
There are two ways you can handle this:
use an Int64 instead of an Integer. Precision for an Int64 is 32 digits max:
WriteLn(logfile, Format('%20.20d %s', [Int64(FileInfo.Size), full_name]);
Note: In Delphi 2006 and later, TSearchRec.Size is an Int64. In earlier versions, it is an Integer instead, and thus limited to 2GB. If you need to handle file sizes > 2GB in earlier versions, you can get the full 64-bit size from the TSearchRec.FindData field:
var
FileSize: ULARGE_INTEGER;
begin
FileSize.LowPart := FileInfo.FindData.nFileSizeLow;
FileSize.HighPart := FileInfo.FindData.nFileSizeHigh:
WriteLn(logfile, Format('%20.20d %s', [FileSize.QuadPart, full_name]);
end;
convert the Integer to a string without any leading zeros, and then use StringOfChar() to prepend any required zeros:
s := IntToStr(FileInfo.Size);
if Length(s) < 20 then
s := StringOfChar('0', 20-Length(s)) + s;
WriteLn(logfile, s + ' ' + full_name);
Related
I tested some code:
var
B: Byte;
I: Integer;
begin
I := -10;
B := I;
end;
And I expected to see the result in the variable In the number 10 (since this is the low byte of the type integer ). But the result was B => 246.
Logically, I understand that 246 = 256 - 10, but I can't understand why this happened?
Your expectation of the value 10 for the least significant (ls) byte of the integer with value -10 is not correct.
Negative numbers are encoded according a system called 2's complement, where a value of -1 as a four byte integer is coded as $FFFFFFFF, -2 as $FFFFFFFE, -3 as $FFFFFFFD ... -10 as $FFFFFFF6 ... and so on. This is the system used by most computers/software at present time.
There are other systems too. The one you possibly thought of is a system called Signed magnitude (see: https://en.wikipedia.org/wiki/Signed_number_representations) where the most significant bit, when set, indicates negative values, and where the actual numbers are encoded the same for both negative and positive numbers (with the inconvenience of defining two zero values, +0 and -0).
The value -10 is $FFFFFFF6 in an Integer. Assigning that to a lower-width type will simply truncate the value - that means the value in B will be $F6 and that is 246.
If you compile with range checking you will get an exception because -10 does not fit into a Byte.
If you want to turn a negative number into a positive you need to use the Abs function.
I am trying to better understand surrogate pairs and Unicode implementation in Delphi.
If I call length() on the Unicode string S := 'Ĥà̲V̂e' in Delphi, I will get back, 8.
This is because the lengths of the individual characters [Ĥ],[à̲],[V̂], and [e] are 2, 3, 2, and 1 respectively. This is because Ĥ has a surrogate, à̲ has two additional surrogates, V̂ has a surrogate and e has no surrogates.
If I wanted to return the second element in the string including all surrogates, [à̲], how would I do that? I know I would need to do some sort of testing of the individual bytes. I ran some tests using the routine
function GetFirstCodepointSize(const S: UTF8String): Integer;
referenced in this SO Question.
but got some unusual results, eg, here are some length and sizes of some different codepoints. Below is a snippet of how I generated these tables.
...
UTFCRUDResultStrings.add('INPUT: '+#9#9+ DATA +#9#9+ 'GetFirstCodePointSize = ' +intToStr(GetFirstCodepointSize(DATA))
+#9#9+ 'Length =' + intToStr(length(DATA)));
...
First Set: This makes sense to me, each code point size is doubled, but these are one character each and Delphi gives me the length as just 1, perfect.
INPUT: ď GetFirstCodePointSize = 2 Length =1
INPUT: ơ GetFirstCodePointSize = 2 Length =1
INPUT: ǥ GetFirstCodePointSize = 2 Length =1
Second set: It initially looks to me like the lengths and code points are reversed? I am guessing the reason for this is that the characters + surrogates are being treated individually, hence the first codepoint size is for the 'H', which is 1, but the length is returning the lengths of 'H' plus '^'.
INPUT: Ĥ GetFirstCodePointSize = 1 Length =2
INPUT: à̲ GetFirstCodePointSize = 1 Length =3
INPUT: V̂ GetFirstCodePointSize = 1 Length =2
INPUT: e GetFirstCodePointSize = 1 Length =1
Some additional tests...
INPUT: ¼ GetFirstCodePointSize = 2 Length =1
INPUT: ₧ GetFirstCodePointSize = 3 Length =1
INPUT: 𤭢 GetFirstCodePointSize = 4 Length =2
INPUT: ß GetFirstCodePointSize = 2 Length =1
INPUT: 𨳒 GetFirstCodePointSize = 4 Length =2
Is there a reliable way in Delphi to determine where an element in a Unicode String starts and ends?
I know my terminology using the word element may be off, but I don't think codepoint and character are right either, particularly given that one element may have a codepoint size of 3, but have a length of only one.
I am trying to better understand surrogate pairs and Unicode implementation in Delphi.
Let's get some terminology out of the way.
Each "character" (known as a grapheme) that is defined by Unicode is assigned a unique codepoint.
In a Unicode Transformation Format (UTF) encoding - UTF-7, UTF-8, UTF-16, and UTF-32 - each codepoint is encoded as a sequence of codeunits. The size of each codeunit is determined by the encoding - 7 bits for UTF-7, 8 bits for UTF-8, 16 bits for UTF-16, and 32 bits for UTF-32 (hence their names).
In Delphi 2009 and later, String is an alias for UnicodeString, and Char is an alias for WideChar. WideChar is 16 bits. A UnicodeString holds a UTF-16 encoded string (in earlier versions of Delphi, the equivalent string type was WideString), and each WideChar is a UTF-16 codeunit.
In UTF-16, a codepoint can be encoded using either 1 or 2 codeunits. 1 codeunit can encode codepoint values in the Basic Multilingual Plane (BMP) range - $0000 to $FFFF, inclusive. Higher codepoints require 2 codeunits, which is also known as a surrogate pair.
If I call length() on the Unicode string S := 'Ĥà̲V̂e' in Delphi, I will get back, 8.
This is because the lengths of the individual characters [Ĥ],[à̲],[V̂], and [e] are 2, 3, 2, and 1 respectively.
This is because Ĥ has a surrogate, à̲ has two additional surrogates, V̂ has a surrogate and e has no surrogates.
Yes, there are 8 WideChar elements (codeunits) in your UTF-16 UnicodeString. What you are calling "surrogates" are actually known as "combining marks". Each combining mark is its own unique codepoint, and thus its own codeunit sequence.
If I wanted to return the second element in the string including all surrogates, [à̲], how would I do that?
You have to start at the beginning of the UnicodeString and analyze each WideChar until you find one that is not a combining mark attached to a previous WideChar. On Windows, the easiest way to do that is to use the CharNextW() function, eg:
var
S: String;
P: PChar;
begin
S := 'Ĥà̲V̂e';
P := CharNext(PChar(S)); // returns a pointer to à̲
end;
The Delphi RTL does not have an equivalent function. You would have write one manually, or use a third-party library. The RTL does have a StrNextChar() function, but it only handles UTF-16 surrogates, not combining marks (CharNext() handles both). So, you could use StrNextChar() to scan through each codepoint in the UnicodeString, but you have to loo at each codepoint to know whether it is a combining mark or not, eg:
uses
Character;
function MyCharNext(P: PChar): PChar;
begin
if (P <> nil) and (P^ <> #0) then
begin
Result := StrNextChar(P);
while GetUnicodeCategory(Result^) = ucCombiningMark do
Result := StrNextChar(Result);
end else begin
Result := nil;
end;
end;
var
S: String;
P: PChar;
begin
S := 'Ĥà̲V̂e';
P := MyCharNext(PChar(S)); // should return a pointer to à̲
end;
I know I would need to do some sort of testing of the individual bytes.
Not the bytes, but the codepoints that they represent when decoded.
I ran some tests using the routine
function GetFirstCodepointSize(const S: UTF8String): Integer
Look closely at that function signature. See the parameter type? It is a UTF-8 string, not a UTF-16 string. This was even stated in the answer you got that function from:
Here is an example how to parse UTF8 string
UTF-8 and UTF-16 are very different encodings, and thus have different semantics. You cannot use UTF-8 semantics to process a UTF-16 string, and vice versa.
Is there a reliable way in Delphi to determine where an element in a Unicode String starts and ends?
Not directly. You have to parse the string from the beginning, skipping elements as needed until you reach the desired element. Remember that each codepoint may be encoded as either 1 or 2 codeunit elements, and each logical glyph may be encoded using multiple codepoints (and thus multiple codeunit sequences).
I know my terminology using the word element may be off, but I don't think codepoint and character are right either, particularly given that one element may have a codepoint size of 3, but have a length of only one.
1 glyph is comprised of 1+ codepoints, and each codepoint is encoded as 1+ codeunits.
Could someone implement the following function?
function GetElementAtIndex(S: String; StrIdx : Integer): String;
Try something like this:
uses
SysUtils, Character;
function MyCharNext(P: PChar): PChar;
begin
Result := P;
if Result <> nil then
begin
Result := StrNextChar(Result);
while GetUnicodeCategory(Result^) = ucCombiningMark do
Result := StrNextChar(Result);
end;
end;
function GetElementAtIndex(S: String; StrIdx : Integer): String;
var
pStart, pEnd: PChar;
begin
Result := '';
if (S = '') or (StrIdx < 0) then Exit;
pStart := PChar(S);
while StrIdx > 1 do
begin
pStart := MyCharNext(pStart);
if pStart^ = #0 then Exit;
Dec(StrIdx);
end;
pEnd := MyCharNext(pStart);
{$POINTERMATH ON}
SetString(Result, pStart, pEnd-pStart);
end;
Looping through the graphemes of a string can be more complicated than you might think. In Unicode 13, some graphemes are up to 14 bytes long. I advise using a third-party library for this. One of the best for this is Skia4Delphi: https://github.com/skia4delphi/skia4delphi
The code is very simple:
var LUnicode: ISkUnicode := TSkUnicode.Create;
for var LGrapheme: string in LUnicode.GetBreaks('Text', TSkBreakType.Graphemes) do
Showmessage(LGrapheme);
In the library demo itself there is an example of graphemes iterator too. Look:
Write a program to convert an integer number to its hexadecimal representation without using inbuilt functions.
Here is my code, but it is not working. Can anyone tell where is the mistake?
It is giving an error:
"Project raised exception class EAccessViolation with message 'Access violation at address 00453B7B in module 'Project.exe'.Write of address FFFFFFFF'.Process stopped.Use Step or Run to continue."
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls,Forms,
Dialogs;
type
TForm1 = class(TForm)
end;
function hexvalue(num:Integer):Char;
var
Form1: TForm1;
implementation
{$R *.dfm}
function hexvalue(num:Integer):Char;
begin
case num of
10: Result:='A';
11: Result:='B';
12: Result:='C';
13: Result:='D';
14: Result:='E';
15: Result:='F';
else Result:=Chr(num);
end;
end;
var
intnumber,hexnumber,actualhex:String;
integernum:Integer;
i,j,k:Byte;
begin
InputQuery ('Integer Number','Enter the integer number', intnumber);
integernum:=StrToInt(intnumber);
i:=0;
while integernum >= 16 do
begin
hexnumber[i]:=hexvalue(integernum mod 16);
integernum:= integernum div 16;
Inc(i);
end;
hexnumber[i]:= hexvalue(integernum);
k:=i;
for j:=0 to k do
begin
actualhex[j]:= hexnumber[i];
Dec(i);
end;
ShowMessage(actualhex);
end.
Since this obviously is a homework assignment, I don't want to spoil it for you and write the solution, but rather attempt to guide you to the solution.
User input
In real code you would need to be prepared for any mistake from the user and check that the input really is integer numbers only and politely ask the user to correct the input if erroneous.
Conversion loop
You have got that OK, using mod 16 for each nibble of integernum and div 16 to move to the next nibble, going from units towards higher order values.
Conversion of nibble to hex character
Here you go wrong. If you would have written out also the cases for 0..9, you could have got the case statement right. As others have commented, Chr() takes an ASCII code. However, using a case statement for such a simple conversion is tedious to write and not very efficient.
What if you would have a lookup table (array) where the index (0..15) directly would give you the corresponding hex character. That would be much simpler. Something like
const
HexChars: array[_.._] of Char = ('0',_____'F')
I leave it to you to fill in the missing parts.
Forming the result (hex string)
Your second major mistake and the reason for the AV is that you did not set the length of the string hexnumber before attempting to acess the character positions. Another design flaw is that you fill in hexnumber backwards. As a result you then need an extra loop where you reverse the order to the correct one.
There are at least two solutions to solve both problems:
Since you take 32 bit integer type input, the hex representation is not more than 8 characters. Thus you can preset the length of the string to 8 and fill it in from the lower order position using 8 - i as index. As a final step you can trim the string if you like.
Don't preset the length and just concatenate as you go in the loop hexnumber := HexChars[integernum mod 16] + hexnumber;.
Negative values
You did not in any way consider the possibility of negative values in your code, so I assume it wasn't part of the task.
First mistake : String are 1 indexed. Meaning that the index of their first character is 1 and not 0. You initialize "i" to 0 and then try to set hexnumber[i].
Second mistake : Strings might be dynamic, but they don't grow automatically. If you try to access the first character of an empty string, it won't work. You need to call SetLength(HeXNumber, NumberOfDigits). You can calculate the number of digits this way :
NumberOfDigits := Trunc(Log16(integernum)) + 1;
Since Log16 isn't really something that exists, you can either use LogN(16,integernum) or (Log(IntegerNum) / Log(16)) depending on what is available in your version of Delphi.
Note that this might return an invalid value for very, very large value (high INT64 range) due to rounding errors.
If you don't want to go that road, you could replace the instruction by
hexnumber := hexvalue(integernum mod 16) + hexnumber;
which would also remove the need to invert the string at the end.
Third Mistake : Using unsigned integer for loop variable. While this is debatable, the instruction
for I := 0 to Count - 1 do
is common practice in Delphi without checking Count > 0. When count = 0 and using an unsigned loop counter, you'll either get an integer overflow (if you have them activated in your project options) or you'll loop High(I) times, which isn't what you want to be doing.
Fourth mistake : already mentionned : Result:=Chr(num) should be replaced by something like Result := InttoStr(Num)[1].
Personally, I'd implement the function using an array.
HexArr : Array[0..15] of char = ('0', '1',...,'D','E','F');
begin
if InRange(Num, 0, 15) then
Result := HexArr[Num]
else
//whatever you want
end;
Consider:
{$R+}
i:= 1;
While i > 0 do
i:= i + 1;
ShowMessage(IntToStr(i));
If I declare i as Byte, Word, Shortint or TinyInt I get a range-check error, as expected.
If I declare i as LongWord, Cardinal, Integer, LongInt or Int64 it just goes through the while loop and gets to show the negative or 0 value, which i gets when you pass the upper bound.
Does Delphi 7 not support range checking for 32-bit and 64-bit numbers?
The operation i + 1 doesn't actually produce a range check error. The assignment operation does.
Delphi evaluates the constant '1' as an Integer and so the right hand side will produce a result that is either an Int64 or an Integer (The larger of i's type and Integer).
If we expand the line out we get the following
temp := i + 1
i := temp
temp will either be 32 or 64 bits, and will overflow if it hits the upper bound. By the time we do the assignment, we have a perfectly valid 32 or 64bit value so there's no possibility of a range check failure if i is 32bits or more.
If i is less than 32 bits, it will raise a range check if temp is too large to fit.
For i >= 32bits, you can catch the overflow error like so:
{$R+,Q+}
...
How could i multiply a string by a number to achieve n amount of repetitions of that string in a Labels caption, i.e
if n = 5 then 's'*n= 'sssss' which would then become the labels caption.
anything along those lines returns the error that the operator is not applicable to the operand type.
thanks
There's no built in operator that does what you want. Your code would work in Python, but not in Delphi.
If your string is a single character then you can use StringOfChar:
Caption := StringOfChar('s', n);
For a longer input string use DupeString from the StrUtils unit:
Caption := DupeString('blah', n);
Delphi does not allow that syntax. However, there is a function called DupeString, in StrUtils.pas that amounts to the same thing:
Label1.Caption := DupeString('test', 4);