I need to find a certain string, in a text file, from bottom (the end of the line).
Once the string has been found, the function exit.
Here is my code, which is working fine. But, it is kind of slow.
I meant, I run this code every 5 seconds. And it consumes about 0.5% to 1% CPU time.
The text file is about 10 MB.
How to speed this up? Like, really fast and it doesn't consume much CPU time.
function TMainForm.GetVMem: string;
var
TS: TStrings;
sm: string;
i: integer;
begin
TS := TStringList.Create;
TS.LoadFromFile(LogFileName);
for i := TS.Count-1 downto 0 do
begin
Application.ProcessMessages;
sm := Trim(TS[i]);
if Pos('Virtual Memory Total =', sm) > 0 then
begin
Result := sm;
TS.Free;
exit;
end;
end;
Result := '';
TS.Free;
end;
You can use a TMemoryStream and use LoadFromFile to load the complete file content.
The you can cast the property Memory to either PChar or PAnsiChar or other character type depending on file content.
When you have the pointer, you can use it to check for content. I would avoid using string handling because it is much slower than pointer operation.
You can move the pointer from the end of memory (use Stream.Size) and search backward for the CR/LF pair (or whatever is used as line delimiter). Then from that point check for the searched string. If found, you are done, if not, loop searching the previous CR/LF.
That is more complex than the method you used but - if done correctly - will be faster.
If the file is to big to fit in memory, specially in a x32 application, you'll have to resort to read the file line by line, keeping only one line, going to the end of file. Each time you find the searched string, then save his position. At the end of file, this saved position - if any - will be the last searched file.
If the file is really very large, and the probability that the searched string is near the end, you may read the file backward (Setting TStream.position to have direct access). block by block. Then is each block, use the previous algorithm. Pay attention that the searched string may be split in two blocks depending on the block size.
Again, depending on the file size, you may split the search in several parallel searches using multi threading. Do not create to much thread neither. Pay attention that the searched string may be split in two blocks assigned to different threads depending on the block size.
Related
Take for example the following code:
for i := (myStringList.Count - 1) DownTo 0 do begin
dataList := SplitString(myStringList[i], #9);
x := StrToFloat(dataList[0]);
y := StrToFloat(dataList[1]);
z := StrToFloat(dataList[2]);
//Do something with these variables
myOutputRecordArray[i] := {SomeFunctionOf}(x,y,z)
//Free Used List Item
myStringList.Delete(i);
end;
//Free Memory
myStringList.Free;
How would you parallelise this using, for example, the OmniThreadLibrary? Is it possible? Or does it need to be restructured?
I'm calling myStringList.Delete(i); at each iteration as the StringList is large and freeing items after use at each iteration is important to minimise memory usage.
Simple answer: You wouldn't.
More involved answer: The last thing you want to do in a parallelized operation is modify shared state, such as this delete call. Since it's not guaranteed that each individual task will finish "in order"--and in fact it's highly likely that they won't at least once, with that probability approaching 100% very quickly the more tasks you add to the total workload--trying to do something like that is playing with fire.
You can either destroy the items as you go and do it serialized, or do it in parallel, finish faster, and destroy the whole list. But I don't think there's any way to have it both ways.
You can cheat. Setting the string value to an empty string will free most of the memory and will be thread safe. At the end of the processing you can then clear the list.
Parallel.ForEach(0, myStringList.Count - 1).Execute(
procedure (const index: integer)
var
dataList: TStringDynArray;
x, y, z: Single;
begin
dataList := SplitString(myStringList[index], #9);
x := StrToFloat(dataList[0]);
y := StrToFloat(dataList[1]);
z := StrToFloat(dataList[2]);
//Do something with these variables
myOutputRecordArray[index] := {SomeFunctionOf}(x,y,z);
//Free Used List Item
myStringList[index] := '';
end);
myStringList.Clear;
This code is safe because we are never writing to a shared object from multiple threads. You need to make sure that all of the variables you use that would normally be local are declared in the threaded block.
I'm not going to attempt to show how to do what you originally asked because it is a bad idea that will not lead to improved performance. Not even assuming that you deal with the many and various data races in your proposed parallel implementation.
The bottleneck here is the disk I/O. Reading the entire file into memory, and then processing the contents is the design choice that is leading to your memory problems. The correct way to solve this problem is to use a pipeline.
Step 1 of the pipeline takes as input the file on disk. The code here reads chunks of the file and then breaks those chunks into lines. These lines are the output of this step. The entire file is never in memory at one time. You'll have to tune the size of the chunks that you read.
Step 2 takes as input the strings the step 1 produced. Step 2 consumes those strings and produces vectors. Those vectors are added to your vector list.
Step 2 will be faster than step 1 because I/0 is so expensive. Therefore there's nothing to be gained by trying to optimise either of the steps with parallel algorithms. Even on a uniprocessor machine this pipelined implementation could be faster than non-pipelined.
hi was reading the following tutorial: http://www.forumkorner.com/thread-26894.html, the problem is that I find way to how to translate the following code (VB .net):
Dim file1 = File.OpenWrite("c:/test.exe")
Dim siza = file1.Seek(0, SeekOrigin.[End])
Dim size = Convert.ToInt32("20")
Dim bite As Decimal = size * 1048576
While siza < bite
siza += 1
file1.WriteByte(0)
End While
file1.Close()
End If
the question is where do I get the information on how to write bytes to a file ?
It is clearly that the tutoraial that OP refers to was written by some newbie so I wouldn't folow it.
Now if you wanna make program which simply increases the size of certain file there are easier ways to do it. You don't necessarily need to write into the file in order to increase its size.
This code does it without writing a single byte into it:
procedure TForm2.Button1Click(Sender: TObject);
var FS: TFileStream;
begin
if OpenDialog1.Execute then
begin
FS := TFileStream.Create(OpenDialog1.FileName,fmOpenReadWrite);
FS.Size := FS.Size *2;
FS.Free;
end;
end;
How? It simply tels the operating system that the size of that specific file needs to be set to specific value (usually used to reserve the disk space before writing large amounts of data into file) and operating system then only changes the information in the file allocation table.
Now biggest advantage of this is that on hard drive this procedure will be done almost instantly regardles of the desired file size while the approach shown in the tutorail OP refered to takes more time for larger files. If the file is on a Flash drive it could take some time due the fact that most flash drives use FAT32 partitions which works a bit different thatn NTFS and therefore require more data to be written into the allocation tables.
EDIT: WARNING! Never and I mean NEVER set the file size smaller that it currently is! If you do so you will cause the los of data which probably won't be posible to repair.
This is Delphi 2009, so Unicode applies.
I had some code that was loading strings from a buffer into a StringList as follows:
var Buffer: TBytes; RecStart, RecEnd: PChar; S: string;
FileStream.Read(Buffer[0], Size);
repeat
... find next record RecStart and RecEnd that point into the buffer;
SetString(S, RecStart, RecEnd - RecStart);
MyStringList.Add(S);
until end of buffer
But during some modifications, I changed my logic so that I ended up adding the identical records, but as a strings derived separately and not through SetString, i.e.
var SRecord: string;
repeat
SRecord := '';
repeat
SRecord := SRecord + ... processed line from the buffer;
until end of record in the buffer
MyStringList.Add(SRecord);
until end of buffer
What I noticed was the memory use of the StringList went up from 52 MB to about 70 MB. That was an increase of over 30%.
To get back to my lower memory usage, I found I had to use SetString to create the string variable to add to my StringList as follows:
repeat
SRecord := '';
repeat
SRecord := SRecord + ... processed line from the buffer;
until end of record in the buffer
SetString(S, PChar(SRecord), length(SRecord));
MyStringList.Add(S);
until end of buffer
Inspecting and comparing S and SRecord, they are in all cases exactly the same. But adding SRecord to MyStringList uses much more memory than adding S.
Does anyone know what's going on and why the SetString saves memory?
Followup. I didn't think it would, but I checked just to make sure.
Neither:
SetLength(SRecord, length(SRecord));
nor
Trim(SRecord);
releases the excess space. The SetString seems to be required to do so.
If you concatenate the string, the memory manager will allocate more memory because it assumes that you add more and more text to it and allocates additional space for future concatenations. This way the allocation size of the string is much larger than the used size (depending on the used memory manager). If you use SetString, the allocation size of the new string is almost the same as the used size. And when the SRecord string goes out of scope and its ref-count becomes zero, the memory occupied by SRecord is released. So you end up with the smallest needed allocation size for your string.
Try to install memory manager filter (Get/SetMemoryManager), which passes all calls to GetMem/FreeMem to default memory manager, but it also performs stats garhtering. You'll probably see that both variants are equal in memory consumption.
It's just memory fragmentation.
My application builds many objects in memory based on filenames (among other string based information). I was hoping to optimise memory usage by storing the path and filename separately, and then sharing the path between objects in the same path. I wasn't trying to look at using a string pool or anything, basically my objects are sorted so if I have 10 objects with the same path I want objects 2-10 to have their path "pointed" at object 1's path (eg object[2].Path=object[1].Path);
I have a problem though, I don't believe that my objects are in fact sharing a reference to the same string after I think I am telling them to (by the object[2].Path=object[1].Path assignment).
When I do an experiment with a string list and set all the values to point to the first value in the list I can see the "memory conservation" in action, but when I use objects I see absolutely no change at all, admittedly I am only using task manager (private working set) to watch for memory use changes.
Here's a contrived example, I hope this makes sense.
I have an object:
TfileObject=class(Tobject)
FpathPart: string;
FfilePart: string;
end;
Now I create 1,000,000 instances of the object, using a new string for each one:
var x: integer;
MyFilePath: string;
fo: TfileObject;
begin
for x := 1 to 1000000 do
begin
// create a new string for every iteration of the loop
MyFilePath:=ExtractFilePath(Application.ExeName);
fo:=TfileObject.Create;
fo.FpathPart:=MyFilePath;
FobjectList.Add(fo);
end;
end;
Run this up and task manager says I am using 68MB of memory or something. (Note that if I allocated MyFilePath outside of the loop then I do save memory because of 1 instance of the string, but this is a contrived example and not actually how it would happen in the app).
Now I want to "optimise" my memory usage by making all objects share the same instance of the path string, since it's the same value:
var x: integer;
begin
for x:=1 to FobjectList.Count-1 do
begin
TfileObject(FobjectList[x]).FpathPart:=TfileObject(FobjectList[0]).FpathPart;
end;
end;
Task Manager shows absouletly no change.
However if I do something similar with a TstringList:
var x: integer;
begin
for x := 1 to 1000000 do
begin
FstringList.Add(ExtractFilePath(Application.ExeName));
end;
end;
Task Manager says 60MB memory use.
Now optimise with:
var x: integer;
begin
for x := 1 to FstringList.Count - 1 do
FstringList[x]:=FstringList[0];
end;
Task Manager shows the drop in memory usage that I would expect, now 10MB.
So I seem to be able to share strings in a string list, but not in objects. I am obviously missing something conceptually, in code or both!
I hope this makes sense, I can really see the ability to conserve memory using this technique as I have a lot of objects all with lots of string information, that data is sorted in many different ways and I would like to be able to iterate over this data once it is loaded into memory and free some of that memory back up again by sharing strings in this way.
Thanks in advance for any assistance you can offer.
PS: I am using Delphi 2007 but I have just tested on Delphi 2010 and the results are the same, except that Delphi 2010 uses twice as much memory due to unicode strings...
When your Delphi program allocates and deallocates memory it does this not by using Windows API functions directly, but it goes through the memory manager. What you are observing here is the fact that the memory manager does not release all allocated memory back to the OS when it's no longer needed in your program. It will keep some or all of it allocated for later, to speed up later memory requests in the application. So if you use the system tools the memory will be listed as allocated by the program, but it is not in active use, it is marked as available internally and is stored in lists of usable memory blocks which the MM will use for any further memory allocations in your program, before it goes to the OS and requests more memory.
If you want to really check how any changes to your programs affect the memory consumption you should not rely on external tools, but should use the diagnostics the memory manager provides. Download the full FastMM4 version and use it in your program by putting it as the first unit in the DPR file. You can get detailed information by using the GetMemoryManagerState() function, which will tell you how much small, medium and large memory blocks are used and how much memory is allocated for each block size. For a quick check however (which will be completely sufficient here) you can simply call the GetMemoryManagerUsageSummary() function. It will tell you the total allocated memory, and if you call it you will see that your reassignment of FPathPart does indeed free several MB of memory.
You will observe different behaviour when a TStringList is used, and all strings are added sequentially. Memory for these strings will be allocated from larger blocks, and those blocks will contain nothing else, so they can be released again when the string list elements are freed. If OTOH you create your objects, then the strings will be allocated alternating with other data elements, so freeing them will create empty memory regions in the larger blocks, but the blocks won't be released as they contain still valid memory for other things. You have basically increased memory fragmentation, which could be a problem in itself.
As noted by another answer, memory that is not being used is not always immediately released to the system by the Delphi Memory Manager.
Your code guarantees a large quantity of such memory by dynamically growing the object list.
A TObjectList (in common with a TList and a TStringList) uses an incremental memory allocator. A new instance of one of these containers starts with memory allocated for 4 items (the Capacity). When the number of items added exceeds the Capacity additional memory is allocated, initially by doubling the capacity and then once a certain number of items has been reached, by increasing the capacity by 25%.
Each time the Count exceeds the Capacity, additional memory is allocated, the current memory copied to the new memory and the previously used memory released (it is this memory which is not immediately returned to the system).
When you know how many items are to be loaded into one of these types of list you can avoid this memory re-allocation behaviour (and achieve a significant performance improvement) by pre-allocating the Capacity of the list accordingly.
You do not necessarily have to set the precise capacity needed - a best guess (that is more likely to be nearer, or higher than, the actual figure required is still going to be better than the initial, default capacity of 4 if the number of items is significantly > 64)
Because task manager does not tell you the whole truth. Compare with this code:
var
x: integer;
MyFilePath: string;
fo: TfileObject;
begin
MyFilePath:=ExtractFilePath(Application.ExeName);
for x := 1 to 1000000 do
begin
fo:=TfileObject.Create;
fo.FpathPart:=MyFilePath;
FobjectList.Add(fo);
end;
end;
To share a reference, strings need to be assigned directly and be of the same type (Obviously, you can't share a reference between UnicodeString and AnsiString).
The best way I can think of to achieve what you want is as follow:
var StrReference : TStringlist; //Sorted
function GetStrReference(const S : string) : string;
var idx : Integer;
begin
if not StrReference.Find(S,idx) then
idx := StrReference.Add(S);
Result := StrReference[idx];
end;
procedure YourProc;
var x: integer;
MyFilePath: string;
fo: TfileObject;
begin
for x := 1 to 1000000 do
begin
// create a new string for every iteration of the loop
MyFilePath := GetStrReference(ExtractFilePath(Application.ExeName));
fo := TfileObject.Create;
fo.FpathPart := MyFilePath;
FobjectList.Add(fo);
end;
end;
To make sure it has worked correctly, you can call the StringRefCount(unit system) function. I don't know in which version of delphi that was introduced, so here's the current implementation.
function StringRefCount(const S: UnicodeString): Longint;
begin
Result := Longint(S);
if Result <> 0 then
Result := PLongint(Result - 8)^;
end;
Let me know if it worked as you wanted.
EDIT: If you are afraid of the stringlist growing too big, you can safely scan it periodically and delete from the list any string with a StringRefCount of 1.
The list could be wiped clean too... But that will make the function reserve a new copy of any new string passed to the function.
I have been developing for some time now, and I have not used pointers in my development so far.
So what are the benefits of pointers? Does an application run faster or uses fewer resources?
Because I am sure that pointers are important, can you “point” me to some articles, basic but good to start using pointers in Delphi? Google gives me too many, too special results.
A pointer is a variable that points to a piece of memory. The advantages are:
you can give that piece of memory the size you want.
you only have to change a pointer to point to a different piece of memory which saves a lot of time copying.
Delphi uses a lot of hidden pointers. For example, if you are using:
var
myClass : TMyClass;
begin
myClass := TMyClass.Create;
myClass is a pointer to the object.
An other example is the dynamic array. This is also a pointer.
To understand more about pointers, you need to understand more about memory. Each piece of data can exist in different pieces of data.
For example global variables:
unit X;
interface
var
MyVar: Integer;
A global variable is defined in the datasegment. The datasegment is fixed. And during the lifetime of the program these variables are available. Which means the memory can not be used for other uses.
Local variables:
procedure Test;
var
MyVar: Integer;
A local variable exists on the stack. This is a piece of memory that is used for housekeeping. It contains the parameters for the function (ok some are put in a register but that is not important now). It contains the return adress so the cpu knows where to return if the program has ended. And it contains the local variables used in the functions.
Local variables only exists during the lifetime of a function. If the function is ended, you can't access the local variable in a reliable way.
Heap variables:
procedure Test2;
var
MyClass: TMyClass;
begin
MyClass := TMyClass.Create;
The variable MyClass is a pointer (which is a local variable that is defined on the stack). By constructing an object you allocate a piece of memory on the heap (the large piece of 'other' memory that is not used for programs and stacks). The variable MyClass contains the address of this piece of memory.
Heap variables exist until you release them. That means that if you exit the funcion Test2 without freeing the object, the object still exists on the heap. But you won't be able to access it because the address (variable MyClass) is gone.
Best practices
It is almost always preferably to allocate and deallocate a pointer variable at the same level.
For example:
var
myClass: TMyClass;
begin
myClass := TMyClass.Create;
try
DoSomething(myClass);
DoSomeOtherthing(myClass);
finally
myClass.Free;
end;
end;
If you can, try to avoid functions that return an instance of an object. It is never certain if the caller needs to dispose of the object. And this creates memory leaks or crashes.
You have been given a lot of good answers so far, but starting with the answer that you are already dealing with pointers when you use long strings, dynamic arrays and object references you should start to wonder why you would use pointers, instead of long strings, dynamic arrays and object references. Is there any reason to still use pointers, given that Delphi does a good job hiding them from you, in many cases?
Let me give you two examples of pointer use in Delphi. You will see that this is probably not at all relevant for you if you mostly write business apps. It can however become important if you ever need to use Windows or third party API functions that are not imported by any of the standard Delphi units, and for which no import units in (for example) the JEDI libraries can be found. And it may be the key to achieve that necessary last bit of speed in string-processing code.
Pointers can be used to deal with data types of varying sizes (unknown at compile time)
Consider the Windows bitmap data type. Each image can have different width and height, and there are different formats ranging from black and white (1 bit per pixel) over 2^4, 2^8, 2^16, 2^24 or even 2^32 gray values or colours. That means that it is unknown at compile time how much memory a bitmap will occupy.
In windows.pas there is the TBitmapInfo type:
type
PBitmapInfo = ^TBitmapInfo;
tagBITMAPINFO = packed record
bmiHeader: TBitmapInfoHeader;
bmiColors: array[0..0] of TRGBQuad;
end;
TBitmapInfo = tagBITMAPINFO;
The TRGBQuad element describes a single pixel, but the bitmap does of course contain more than one pixel. Therefore one would never use a local variable of type TBitmapInfo, but always a pointer to it:
var
BmpInfo: PBitmapInfo;
begin
// some other code determines width and height...
...
BmpInfo := AllocMem(SizeOf(TBitmapInfoHeader)
+ BmpWidth * BmpHeight * SizeOf(TRGBQuad));
...
end;
Now using the pointer you can access all pixels, even though TBitmapInfo does only have a single one. Note that for such code you have to disable range checking.
Stuff like that can of course also be handled with the TMemoryStream class, which is basically a friendly wrapper around a pointer to a block of memory.
And of course it is much easier to simply create a TBitmap and assign its width, height and pixel format. To state it again, the Delphi VCL does eliminate most cases where pointers would otherwise be necessary.
Pointers to characters can be used to speed up string operations
This is, like most micro optimizations, something to be used only in extreme cases, after you have profiled and found the code using strings to consume much time.
A nice property of strings is that they are reference-counted. Copying them does not copy the memory they occupy, it only increases the reference count instead. Only when the code tries to modify a string which has a reference count greater than 1 will the memory be copied, to create a string with a reference count of 1, which can then safely be modified.
A not-so-nice property of strings is that they are reference-counted. Every operation that could possibly modify the string has to make sure that the reference count is 1, because otherwise modifications to the string would be dangerous. Replacing a character in a string is such a modification. To make sure that the reference count is 1 a call to UniqueString() is added by the compiler whenever a character in a string is written to. Now writing n characters of a string in a loop will cause UniqueString() to be called n times, even though after the first time is is assured that the reference count is 1. This means basically n - 1 calls of UniqueString() are performed unnecessarily.
Using a pointer to the characters is a common way to speed up string operations that involve loops. Imagine you want (for display purposes) to replace all spaces in a string with a small dot. Use the CPU view of the debugger and compare the code executed for this code
procedure MakeSpacesVisible(const AValue: AnsiString): AnsiString;
var
i: integer;
begin
Result := AValue;
for i := 1 to Length(Result) do begin
if Result[i] = ' ' then
Result[i] := $B7;
end;
end;
with this code
procedure MakeSpacesVisible(const AValue: AnsiString): AnsiString;
var
P: PAnsiChar;
begin
Result := AValue;
P := PAnsiChar(Result);
while P[0] <> #0 do begin
if P[0] = ' ' then
P[0] := $B7;
Inc(P);
end;
end;
In the second function there will be only one call to UniqueString(), when the address of the first string character is assigned to the char pointer.
You probably have used pointers, but you just don't know it. A class variable is a pointer, a string is a pointer, a dynamic array is a pointer, Delphi just hides it for you. You will see them when you are performing API calls (casting strings to PChar), but even then Delphi can hide a lot.
See Gamecats answer for advantages of pointers.
In this About.com article you can find a basic explanation of pointers in Delphi.
Pointers are necessary for some data structures. The simplest example is a linked list. The advantage of such structures is that you can recombine elements without moving them in memory. For example you can have a linked list of large complex objects and swap any two of them very quickly because you really have to adjust two pointers instead of moving this objects.
This applies to many languages including Object Pascal (Delphi).