With this program, I am trying to read a file and randomly print it to console. I am wondering If I have to use arrays for that. For example, I could assign my strings into an array, and randomly print from my array. But, I'm not sure how to approach to that. Also another problem is that, my current program does not read the first line from my file. I have a text file text.txt that contains
1. ABC
2. ABC
...
6. ABC
And below is my code.
type
arr = record
end;
var
x: text;
s: string;
SpacePos: word;
myArray: array of arr;
i: byte;
begin
Assign(x, 'text.txt');
reset(x);
readln(x, s);
SetLength(myArray, 0);
while not eof(x) do
begin
SetLength(myArray, Length(myArray) + 1);
readln(x, s);
WriteLn(s);
end;
end.
Please let me know how I could approach this problem!
There are a few issues with your program.
Your first Readln reads the first line of the file into s, but you don't use this value at all. It is lost. The first time you do a Readln in the loop, you get the second line of the file (which you do print to the console using Writeln).
Your arr record type is completely meaningless in this case (and in most cases), since it is a record without any members. It cannot store any data, because it has no members.
In your loop, you expand the length of the array, one item at a time. But you don't set the new item's value to anything, so you do this in vain. (And, because of the previous point, there isn't any value to set in any case: the elements of the array are empty records that cannot contain any data.)
Increasing the length of a dynamic array one item at a time is very bad practice, because it might cause a new heap allocation each time. The entire existing array might need to be copied to a new location in your computer's memory, every time.
The contents of the loop seem to be trying to do two things: saving the current line in the array, and printing it to the console. I assume the latter is only for debugging?
Old-style Pascal I/O (text, Assign, Reset) is obsolete. It is not thread-safe, possibly slow, handles Unicode badly, etc. It was used in the 90s, but shouldn't be used today. Instead, use the facilities provided by your RTL. (In Delphi, for instance, you can use TStringList, IOUtils.TFile.ReadAllLines, streams, etc.)
A partly fixed version of the code might look like this (still using old-school Pascal I/O and the inefficient array handling):
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils;
var
x: text;
arr: array of string;
begin
// Load file to string array (old and inefficient way)
AssignFile(x, 'D:\test.txt');
Reset(x);
try
while not Eof(x) do
begin
SetLength(arr, Length(arr) + 1);
Readln(x, arr[High(Arr)]);
end;
finally
CloseFile(x);
end;
Randomize;
// Print strings randomly
while True do
begin
Writeln(Arr[Random(Length(Arr))]);
Readln;
end;
end.
If you want to fix the inefficient array issue, but still not use modern classes, allocate in chunks:
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils;
var
x: text;
s: string;
arr: array of string;
ActualLength: Integer;
procedure AddLineToArr(const ALine: string);
begin
if Length(arr) = ActualLength then
SetLength(arr, Round(1.5 * Length(arr)) + 1);
arr[ActualLength] := ALine;
Inc(ActualLength);
end;
begin
SetLength(arr, 1024);
ActualLength := 0; // not necessary, since a global variable is always initialized
// Load file to string array (old and inefficient way)
AssignFile(x, 'D:\test.txt');
Reset(x);
try
while not Eof(x) do
begin
Readln(x, s);
AddLineToArr(s);
end;
finally
CloseFile(x);
end;
SetLength(arr, ActualLength);
Randomize;
// Print strings randomly
while True do
begin
Writeln(Arr[Random(Length(Arr))]);
Readln;
end;
end.
But if you have access to modern classes, things get much easier. The following examples use the modern Delphi RTL:
The generic TList<T> handles efficient expansion automatically:
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils, Generics.Defaults, Generics.Collections;
var
x: text;
s: string;
list: TList<string>;
begin
list := TList<string>.Create;
try
// Load file to string array (old and inefficient way)
AssignFile(x, 'D:\test.txt');
Reset(x);
try
while not Eof(x) do
begin
Readln(x, s);
list.Add(s);
end;
finally
CloseFile(x);
end;
Randomize;
// Print strings randomly
while True do
begin
Writeln(list[Random(list.Count)]);
Readln;
end;
finally
list.Free;
end;
end.
But you could simply use a TStringList:
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils, Classes;
var
list: TStringList;
begin
list := TStringList.Create;
try
list.LoadFromFile('D:\test.txt');
Randomize;
// Print strings randomly
while True do
begin
Writeln(list[Random(list.Count)]);
Readln;
end;
finally
list.Free;
end;
end.
Or you could keep the array approach and use IOUtils.TFile.ReadAllLines:
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils, IOUtils;
var
arr: TArray<string>;
begin
arr := TFile.ReadAllLines('D:\test.txt');
Randomize;
// Print strings randomly
while True do
begin
Writeln(arr[Random(Length(arr))]);
Readln;
end;
end.
As you can see, the modern approaches are much more convenient (less code). They are also faster and give you Unicode support.
Note: All snippets above assume that the file contains at least a single line. They will fail if this is not the case, and in real/production code, you must verify this, e.g. like
if Length(arr) = 0 then
raise Exception.Create('Array is empty.');
or
if List.Count = 0 then
raise Exception.Create('List is empty.');
before the // Print strings randomly part, which assumes that the array/list isn't empty.
Also another problem is that, my current program does not read the first line from my file.
Yes it does. But you don't write it to the console. See the third line, readln(x, s);
I am trying to read a file and randomly print it to console. I am wondering If I have to use arrays for that.
Yes that is a sound approach.
Instead of using an array of a record, just declare:
myArray : array of string;
To get a random value from the array, use Randomize to initialize the random generator, and Random() to get a random index.
var
x: text;
myArray: array of String;
ix: Integer;
begin
Randomize; // Initiate the random generator
Assign(x, 'text.txt');
reset(x);
ix := 0;
SetLength(myArray, 0);
while not eof(x) do
begin
SetLength(myArray, Length(myArray) + 1);
readln(x, myArray[ix]);
WriteLn(myArray[ix]);
ix := ix + 1;
end;
WriteLn('Random line:');
WriteLn(myArray[Random(ix)]); // Random(ix) returns a random number 0..ix-1
end.
I edited the answer as I read your question clearly.
Nonetheless, for your reading an extra line issue, you happened to read a line before going into your read loop. So this is your program from the begin to the end statement without that extra readln().
For this code the routine is simple and there is actually more than one method that I can think of. For the first method, you can read each line into an array. Then traverse through the array and create a random number of 0 or 1. If it is 1 print that line.
For the second method, each time read a line from the file, generate a random number of 0 or 1. If that random number is a 1, print that line.
Take note to use randomize before you run random() to not get the same random number of the last program execution. Another thing to take into consideration, if you going on a large text file, each set length can cost a lot. It is better to keep track of what going in there and set 20 - 30 length as one or even 100. That is if you going with the array route.
The code below is for the array method, for the method of not using an array, it is very simple once you see the routines below.
var
x: text;
SpacePos: word;
myArray: array of string;
i: integer;
begin
Randomize;
Assign(x, 'text.txt');
reset(x);
SetLength(myArray, 0);
i := 0;
while not eof(x) do
begin
SetLength(myArray, Length(myArray) + 1);
readln(x, myArray[i]);
i := i + 1;
end;
for i:= 0 to Length(myArray) - 1 do
begin
if random(2) = 1 then
WriteLn(myArray[i]);
end;
end.
Related
Delphi's generic TQueue class has a property called Capacity. If the number of items in the TQueue exceeds its capacity, additional items are still added to the queue. The documentation says the property "gets or sets the queue capacity, that is, the maximum size of the queue without resizing." It sounds like a queue is kind of like a fixed length array (memory-wise)--until it's full, at which point it becomes more like a dynamic array? Is that accurate?
When would a programmer want or need to get or set a TQueue's capacity?
Theory
Consider the following example, which generates a dynamic array of random integers:
program DynArrAlloc;
{$APPTYPE CONSOLE}
{$R *.res}
uses
Windows, System.SysUtils;
const
N = 100000000;
var
a: TArray<Integer>;
i: Integer;
tc1, tc2: Cardinal;
begin
tc1 := GetTickCount;
SetLength(a, 0);
for i := 1 to N do
begin
SetLength(a, Succ(Length(a)));
a[High(a)] := Random(1000);
end;
tc2 := GetTickCount;
Writeln(tc2 - tc1);
Readln;
end.
On my system, it takes 4.5 seconds to run it.
Notice that I -- in each iteration -- reallocate the array so it can hold one more item.
It would be better if I allocated a large enough array from the beginning:
program DynArrAlloc;
{$APPTYPE CONSOLE}
{$R *.res}
uses
Windows, System.SysUtils;
const
N = 100000000;
var
a: TArray<Integer>;
i: Integer;
tc1, tc2: Cardinal;
begin
tc1 := GetTickCount;
SetLength(a, N);
for i := 1 to N do
a[N - 1] := Random(1000);
tc2 := GetTickCount;
Writeln(tc2 - tc1);
Readln;
end.
This time, the program only takes 0.6 seconds.
Hence, one should always try not to reallocate unnecessarily. Each time I reallocate in the first example, I need to ask for more memory; then I need to copy the array to the new location, and finally free the old memory. Clearly, this is very inefficient.
Unfortunately, it isn't always possible to allocate a large enough array at the start. You simply might not know the final element count.
A common strategy then is to allocate in steps: when the array is full and you need one more slot, allocate several more slots but keep track of the actual number of used slots:
program DynArrAlloc;
{$APPTYPE CONSOLE}
{$R *.res}
uses
Windows, System.SysUtils;
const
N = 100000000;
var
a: TArray<Integer>;
i: Integer;
tc1, tc2: Cardinal;
ActualLength: Integer;
const
AllocStep = 1024;
begin
tc1 := GetTickCount;
SetLength(a, AllocStep);
ActualLength := 0;
for i := 1 to N do
begin
if ActualLength = Length(a) then
SetLength(a, Length(a) + AllocStep);
a[ActualLength] := Random(1000);
Inc(ActualLength);
end;
// Trim the excess:
SetLength(a, ActualLength);
tc2 := GetTickCount;
Writeln(tc2 - tc1);
Readln;
end.
Now we need 1.3 seconds.
In this example, I allocate in fixed-sized blocks. A more common strategy is probably to double the array at each reallocation (or multiply by 1.5 or something) or combine these options in a smart way.
Applying the theory
Under the hood, TList<T>, TQueue<T>, TStack<T>, TStringList etc. need to dynamically allocate space for an unlimited number of items. To make this performant, these classes do allocate more than necessary. The Capacity is the number of elements you can fit in the currently allocated memory while the Count <= Capacity is the actual number of elements in the container.
You can set the Capacity property to reduce the need for intermediate allocation when you fill a container and you do know the final number of elements from the beginning:
var
L: TList<Integer>;
begin
L := TList<Integer>.Create;
try
while not Something.EOF do
L.Add(Something.GetNextValue);
finally
L.Free;
end;
is OK and requires probably only a few reallocations, but
L := TList<Integer>.Create;
try
L.Capacity := Something.Count;
while not Something.EOF do
L.Add(Something.GetNextValue);
finally
L.Free;
end;
will be faster since there will be no intermediate reallocations.
Internally TQueue contains dynamic array that stores elements.
When item count reaches current capacity, array is reallocated (for example, doubles it's size) and you can add more and more elements.
If you know reliable limit for maximum item count, it is worth to set Capacity, so you will avoid memory reallocations, saving some time.
I'm work on a program for school (Cinema app) but I have a problem with my array. My app closed but nothing is showed.
program TFE;
{$APPTYPE CONSOLE}
uses
SysUtils,
StrUtils,
Crt;
var
MovieList, MovieInfo: Text;
Choice: Byte;
i: Integer;
L: String;
S: array of String[14];
begin
i := 0
Assign(MovieInfo, 'MovieInfo.txt');
Reset(MovieInfo);
Readln(Choice);
i := 0;
ClrScr;
While not eof (MovieInfo) do
begin
Readln(MovieInfo, L);
S[i] := L;
i := i + 1;
end;
Writeln(S[Choice]);
Readln;
end.
It's all my code for the moment.
Somebody can help me ?
In the title you speak about a variable MyVar, but the code doesn't show any such variable. For future reference, please carefully proof read your question before posting.
You have declared a dynamic array:
S: array of String[14];
that is, an array of 14 character strings (short strings). But you have never set the length of this array, and so it can not hold any strings at all.
Use procedure SetLength(var S: <string or dynamic array>; NewLength: Integer); to allocate space for items in the array.
As you dont know (I presume) how many movies there might be in the file, you must first allocate some amount, and then be prepared to expand the array (with a new call to SetLength()) if the array becomes filled up before all movies are read from the file. For example, initialize (before the while loop) with space for 10 movies:
SetLength(S, 10);
and then in the while loop, e.g. just before ReadLn(),
if i > (Length(S)-1) then
SetLength(S, Length(S)+10);
Another comment is that the user is not presented any prompt when requested for a choice, but maybe this is still under development ;-)
The message you got is correct, because you only work with the array inside the while not eof loop. At that moment, the compiler cannot know the content of the file. It may be blank, as far as he's concerned. If the file is, indeed, blank, he'll skip entirely the while not eof part and go straight to the array writing part. Because the array was never used, it doesn't have a defined value, hence this message.
The solution is simple: initialize the array's values with 0:
program TFE;
{$APPTYPE CONSOLE}
uses
SysUtils,
StrUtils,
Crt;
var
MovieList, MovieInfo: Text;
Choice: Byte;
i: Integer;
L: String;
S: array of String[14];
begin
SetLength(s,10); //10 is an example
for i:=0 to Length(s) do
s[i]:='';
Assign(MovieInfo, 'MovieInfo.txt');
Reset(MovieInfo);
Readln(Choice);
i := 0;
ClrScr;
While not eof (MovieInfo) do
begin
Readln(MovieInfo, L);
S[i] := L;
i := i + 1;
end;
Writeln(S[Choice]);
Readln;
end.
Digging deep into your code, you define MovieList and MovieInfo, but you only use MovieInfo. Why?
I have come up with this function to return the number of occurrences of a string in a Delphi Stream. However, I suspect there is a more efficient way to achieve this, since I am using "higher level" constructs (char), and not working at the lower byte/pointer level (which I am not that familiar with)
function ReadStream(const S: AnsiString; Stream: TMemoryStream): Integer;
var
Arr: Array of AnsiChar;
Buf: AnsiChar;
ReadCount: Integer;
procedure AddChar(const C: AnsiChar);
var
I: Integer;
begin
for I := 1 to Length(S) - 1 do
Arr[I] := Arr[I+1];
Arr[Length(S)] := C;
end;
function IsEqual: Boolean;
var
I: Integer;
begin
Result := True;
for I := 1 to Length(S) do
if S[I] <> Arr[I] then
begin
Result := False;
Break;;
end;
end;
begin
Stream.Position := 0;
SetLength(Arr, Length(S));
Result := 0;
repeat
ReadCount := Stream.Read(Buf, 1);
AddChar(Buf);
if IsEqual then
Inc(Result);
until ReadCount = 0;
end;
Can someone supply a procedure that is more efficient?
Stream has a method that will let you get into the internal buffer.
You can get a pointer to the internal buffer using the Memory property.
If you are working in 32 bit and you are willing to let go of the deprecated TMemoryStream and use TBytesStream instead you can use abuse the fact that a dynamic array and an AnsiString share the same structure in 32 bit.
Unfortunately Emba broke that compatibility in X64, Which means that for no good reason whatsoever you cannot have strings > 2GB in X64.
Note that this trick will break in 64 bit! (See fix below)
You can use Boyer-Moore string searching.
This allows you to write code like this:
function CountOccurrances(const Needle: AnsiString; const Haystack: TBytesStream): integer;
var
Start: cardinal;
Count: integer;
begin
Start:= 1;
Count:= 0;
repeat
{$ifdef CPUx86}
Start:= _FindStringBoyerAnsiString(string(HayStack.Memory), Needle, false, Start);
{$else}
Start:= __FindStringBoyerAnsiStringIn64BitTArrayByte(TArray<Byte>(HaySAtack.Memory), Needle, false, Start);
{$endif}
if Start >= 1 then begin
Inc(Start, Length(Needle));
Inc(Count);
end;
until Start <= 0;
Result:= Count;
end;
For 32 bit you'll have to rewrite the BoyerMoore code to use AnsiString; a trivial rewrite.
For 64 bit you'll have to rewrite the BoyerMoore code to use a TArray<byte> as a first parameter; a relatively simple task.
If you are looking for efficiency, please try and avoid WinAPI calls that use pchars. c-style strings are a horrible idea, because they do not have a length prefix.
Johan has given you a good answer about Boyer-Moore searching. BM is fine if
your are content to use it as a "black box", but if you want to understand what's going on,
there is a bit of a gulf between the complexity of your own code and a BM implementation.
You might find it helpful to explore searching that's more efficient than your own code
but not so complex as BM. There is one ultra-simple way to do what you want without
getting invoved with pointers, PChars, etc.
Let's leave aside for a moment the fact that you want to work with a TMemoryStream, and
consider finding the number of occurrences of a string SubStr in another string Target.
For efficiency, things you want to avoid are a) repeatedly scanning the same characters
over and over and b) copying one or both strings.
Since D7, Delphi has included a PosEx function:
function PosEx(const SubStr, S: string; Offset: Cardinal = 1): Integer;
Description
PosEx returns the index of SubStr in S, beginning the search at Offset. If Offset is 1 (default), PosEx is equivalent to Pos.
PosEx returns 0 if SubStr is not found, if Offset is greater than the length of S, or if Offset is less than 1.
So what you can do is repeatedly call PosEx, starting with Offset = 1, and each time it
finds SubStr in Target you increment Offset to skip over it, like this (in a console application):
function ContainsCount(const SubStr, Target : String) : Integer;
var
i : Integer;
begin
Result := 0;
i := 1;
repeat
i := PosEx(SubStr, Target, i);
if i > 0 then begin
Inc(Result);
i := i + Length(SubStr);
end;
until i <= 0;
end;
var
Count : Integer;
Target : String;
begin
Target := 'aa b ca';
Count := ContainsCount('a', Target);
writeln(Count);
readln;
end.
The fact that PosEx and ContainsCount both pass SubStr and Target as
consts meants that no string copying is involved, and it should be obvious
that ContainsCount never scans the same characters more that once.
Once you've satisfied yourself that this works, you might care to trace
into PosEx to see how it does its stuff.
You can do something which works in a similar way on PChars using the RTL functions StrPos/AnsiStrPos
To convert your memory stream to a string, you could use this code from
Rob Kennedy's answer to this q Converting TMemoryStream to 'String' in Delphi 2009
function MemoryStreamToString(M: TMemoryStream): string;
begin
SetString(Result, PChar(M.Memory), M.Size div SizeOf(Char));
end;
(Note what he says about the alternative version later in his answer)
Btw, if you look through the VCL + RTL code, you'll see that quite a lot of the string-parsing and processing code (e.g. in TParser, TStringList, TExpressionParser) all does its work with PChars. There's a reason for that of course, because it minimizes character copying and means that most scanning operations can be done by changing pointer (PChar) values.
I use Delphi 10.1 Berlin in Windows 10.
I have two records of different sizes. I wrote code to loop through two TList<T> of these records to test elapsed times. Looping through the list of the larger record runs much slower.
Can anyone explain the reason, and provide a solution to make the loop run faster?
type
tTestRecord1 = record
Field1: array[0..4] of Integer;
Field2: array[0..4] of Extended;
Field3: string;
end;
tTestRecord2 = record
Field1: array[0..4999] of Integer;
Field2: array[0..4999] of Extended;
Field3: string;
end;
procedure TForm1.Button1Click(Sender: TObject);
var
_List: TList<tTestRecord1>;
_Record: tTestRecord1;
_Time: TTime;
i: Integer;
begin
_List := TList<tTestRecord1>.Create;
for i := 0 to 4999 do
begin
_List.Add(_Record);
end;
_Time := Time;
for i := 0 to 4999 do
begin
if _List[i].Field3 = 'abcde' then
begin
Break;
end;
end;
Button1.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.000
_List.Free;
end;
procedure TForm1.Button2Click(Sender: TObject);
var
_List: TList<tTestRecord2>;
_Record: tTestRecord2;
_Time: TTime;
i: Integer;
begin
_List := TList<tTestRecord2>.Create;
for i := 0 to 4999 do
begin
_List.Add(_Record);
end;
_Time := Time;
for i := 0 to 4999 do
begin
if _List[i].Field3 = 'abcde' then
begin
Break;
end;
end;
Button2.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.045
_List.Free;
end;
First of all, I want to consider the entire code, even the code that populates the list which I do realise you have not timed. Because the second record is larger in size more memory needs to be copied when you make an assignment of that record type. Further when you read from the list the larger record is less cache friendly than the smaller record which impacts performance. This latter effect is likely less significant than the former.
Related to this is that as you add items the list's internal array of records has to be resized. Sometimes the resizing leads to a reallocation that cannot be performed in-place. When that happens a new block of memory is allocated and the previous content is copied to this new block. That copy is clearly ore expensive for the larger record. You can mitigate this by allocating the array once up front if you know it's length. The list Capacity is the mechanism to use. Of course, not always will you know the length ahead of time.
Your program does very little beyond memory allocation and memory access. Hence the performance of these memory operations dominates.
Now, your timing is only of the code that reads from the lists. So the memory copy performance difference on population is not part of the benchmarking that you performed. Your timing differences are mainly down to excessive memory copy when reading, as I will explain below.
Consider this code:
if _List[i].Field3 = 'abcde' then
Because _List[i] is a record, a value type, the entire record is copied to an implicit hidden local variable. The code is actually equivalent to:
var
tmp: tTestRecord2;
...
tmp := _List[i]; // copy of entire record
if tmp.Field3 = 'abcde' then
There are a few ways to avoid this copy:
Change the underlying type to be a reference type. This changes the memory management requirements. And you may have good reason to want to use a value type.
Use a container class that can return the address of an item rather than a copy of an item.
Switch from TList<T> to dynamic array TArray<T>. That simple change will allow the compiler to access individual fields directly without copying entire records.
Use the TList<T>.List to obtain access to the list object's underlying array holding the data. That would have the same effect as the previous item.
Item 4 above is the simplest change you could make to see a large difference. You would replace
if _List[i].Field3 = 'abcde' then
with
if _List.List[i].Field3 = 'abcde' then
and that should yield a very significant change in performance.
Consider this program:
{$APPTYPE CONSOLE}
uses
System.Diagnostics,
System.Generics.Collections;
type
tTestRecord2 = record
Field1: array[0..4999] of Integer;
Field2: array[0..4999] of Extended;
Field3: string;
end;
procedure Main;
const
N = 100000;
var
i: Integer;
Stopwatch: TStopwatch;
List: TList<tTestRecord2>;
Rec: tTestRecord2;
begin
List := TList<tTestRecord2>.Create;
List.Capacity := N;
for i := 0 to N-1 do
begin
List.Add(Rec);
end;
Stopwatch := TStopwatch.StartNew;
for i := 0 to N-1 do
begin
if List[i].Field3 = 'abcde' then
begin
Break;
end;
end;
Writeln(Stopwatch.ElapsedMilliseconds);
end;
begin
Main;
Readln;
end.
I had to compile it for 64 bit to avoid an out of memory condition. The output on my machine is around 700. Change List[i].Field3 to List.List[i].Field3 and the output is in single figures. The timing is rather crude, but I think this demonstrates the point.
The issue of the large record not being cache friendly remains. That is more complicated to deal with and would require a detailed analysis of how the real world code operated on its data.
As an aside, if you care about performance then you won't use Extended. Because it has size 10, not a power of two, memory access is frequently mis-aligned. Use Double or Real which is an alias to Double.
I have a dynamic array. But initially I am not knowing the length of the array. Can I do like first I set the length of it as 1 and then increase length as I needed without lost of previously stored data?
I know I can do such task using TList. But I want to know whether I can do it with array or not?
Dynamic Arrays can be resized to a larger size without losing the contained data.
The following program demonstrates this in action.
program Project7;
{$APPTYPE CONSOLE}
uses
SysUtils;
var
A : Array of Integer;
I : Integer;
begin
for I := 0 to 19 do
begin
SetLength(A,I+1);
A[I] := I;
end;
for I := Low(A) to High(A) do
begin
writeln(A[I]);
end;
readln;
end.