I need to do some searching in the filesystem and would like to present a progress indication.
A rough approximation of that is the number of directories traversed.
function GetSubDirCount(Path : String): integer;
var
Index : Integer;
Temp : String;
SearchRec : TSearchRec;
begin
Result:= 0;
Temp:= Path;
if Path[Length(Path)] <> SysUtils.PathDelim then begin
Path:= Path + SysUtils.PathDelim;
end;
Path:= Path + '*.';
Index:= FindFirst(Path, faDirectory, SearchRec);
while Index = 0 do begin
if (SearchRec.Name = '.') or (SearchRec.Name = '..') then begin
Index:= FindNext(SearchRec);
Continue;
end;
Inc(Result);
Result:= Result + GetSubDirCount(Temp + SysUtils.PathDelim + SearchRec.Name);
Index:= FindNext(SearchRec);
end;
FindClose(SearchRec);
end;
I currently use the above code, is there a faster way?
I'm only interested in the count.
If there's a really fast way to get the number of files as well that would be a bonus.
As you are not specifying the Delphi version you are using, I suggest the corresponding methods from IOUtils - namely TDirectory.GetDirectories and TDirectory.GetFiles as they are available in recent Delphi versions.
Update: It is probably not the fastest way to count the number of directories and files, but if the files shall be iterated later anyway, one could as well use the result of these functions for the iteration.
Minor improvement: use const in the parameter declaration.
ex:
function GetSubDirCount(const Path : String): integer;
As Rob points out, this will not work as Path is modified in the body. I would still use this approach however, and NOT modify path in the body. I'd have a local string var "Suffix", modify that (add optional pathdelim, and '*.'), and pass both to FindFirst:
FindFirst(Path+Suffix, faDirectory, SearchRec);
#Johan
Since the Windows code takes up most time, I suggest you apply the fixes suggested by other respondents, and update your code to use threads if your feel comfortable with that:
As soon as you retrieve a subdirectory put add it to a (thread safe) list
Have a thread look at that list and spawn worker threads do to the actual file processing per directory
Update your progress all the time: number of dirs found/handled. This will be a bit wobbly in the beginning, but at least you can start working while Windows is still 'finding'
The 'usual' warnings apply:
don't make your number of threads too large
if your file processing creates new files make sure your find routines don't choke on the new output files
Related
I need to wait until a mapped network folder (\HostName\NetworkPath) become empty. What I mean is that program flow cannot continue until that network folder is empty.
So far I have the following logic in place but I noticed that it takes time before FindFirst notices that the network folder become empty.
If I keep observing an opened explorer windows, pointing to that network folder, I notice that it become empty far before FindFirst notices it.
I used Sleep(5000) to introduce some delay in calling again CheckNetworkFolderIsEmpty in my while loop, otherwise it is being called too often. But maybe that folder will become empty far before 5 seconds, so 5 seconds is an arbitrary time delay that may results in an unnecessary dealy in program execution, in the event that the folder become empty before.
What can be the culprit, what can be a better alternative?
Also I do not know what else to use instead of a simple Sleep.
while not CheckRawFolderIsEmpty do begin
Sleep(5000);
end;
function TForm1.CheckNetworkFolderIsEmpty: Boolean;
begin
Result := (CountFilesInFolder('\\HostName\NetworkPath', '*.txt') = 0);
end;
function CountFilesInFolder(const aPath, aFileMask: string): Integer;
var
Path: string;
SearchRec: TSearchRec;
begin
Path := IncludeTrailingPathDelimiter(aPath);
Result := 0;
if FindFirst(Path + aFileMask, faAnyFile and not faDirectory, SearchRec) = 0 then begin
repeat
Inc(Result);
until FindNext(SearchRec) <> 0;
FindClose(SearchRec);
end;
end;
Observing file system changes like you do is inefficient (FindFirst, FindNext) and inacurate as you've learned. Windows provides API FindFirstChangeNotification for that purpose as J... has pointed out in the comment under your question.
Good news is that you don't need to start studying the API from scratch, because some other people did the hard work for you. Check out some freeware wrappers for Delphi around the API:
https://torry.net/pages.php?id=252
http://www.angusj.com/delphi/dirwatch.html
...
Earlier today I opened a question here asking if my method to scan files in computer was correct. As solution, I received a few tips, and the one of the solutions I thought: "this need to be solved urgent!", was saying about memory overflow, once I was reading the files entirely in memory. So I started trying to find a way to read the files piece by piece, and I got something (wrong/bogus), that I need some help to figure out how to do this correctly.
The method is simple like this for now:
procedure ScanFile(FileName: string);
const
MAX_SIZE = 100*1024*1024;
var
i, aux, ReadLimit: integer;
MyFile: TFileStream;
Target: AnsiString;
PlainText: String;
Buff: array of byte;
TotalSize: Int64;
begin
if (POS('.exe', FileName) = 0) and (POS('.dll', FileName) = 0) and
(POS('.sys', FileName) = 0) then //yeah I know it's not the best way...
begin
try
MyFile:= TFileStream.Create(FileName, fmOpenRead);
except on E: EFOpenError do
MyFile:= NIL;
end;
if MyFile <> NIL then
try
TotalSize:= MyFile.Size;
while TotalSize > 0 do begin
ReadLimit:= Min(TotalSize, MAX_SIZE);
SetLength(Buff, ReadLimit);
MyFile.ReadBuffer(Buff[0], ReadLimit);
PlainText:= RemoveNulls(Buff); //this is to transform the array of bytes in string, I posted the code below too...
for i:= 1 to Length(PlainText) do
begin //Begin the search..
end;
dec(TotalSize, ReadLimit);
end;
finally
MyFile.Free;
end;
end;
Code for RemoveNulls is:
function RemoveNulls(const Buff: array of byte): String;
var
i: integer;
begin
for i:= 0 to Length(Buff) do
begin
if Buff[i] <> 0 then
Result:= Result + Chr(Ord(Buff[i]));
end;
end;
Ok, the problems I got with this code so far was:
1- each time the while is repeated, I get more memory consumed, when I was expecting to get only MAX 100MB as described in the MAX_SIZE variable, right?
2- I created a file with 2 occurrences of what should be filtered, and for some unknown reason I got about 10 repeated occurrences, looks like I'm scanning the file repeatedly.
I appreciate your help guys, and if someone have this kind of code already done, post here please, I don't pretend to re-create the wheel...
I'd say that RemoveNulls is your problem. Suppose that you just read 100MB into a string that you passed to RemoveNulls. You would then allocate a string of length 1. The reallocate to length 2. Then to length 3. Then to length 4. And so on, all the way to length 100*1024*1024.
That process will fragment your memory, as well as being appallingly slow. Heap allocation is to be avoided when performance matters. You've no need for it at all. Read a chunk of the file, and search directly in the buffer that you read.
There are various problems with your code that I can see:
Your file extension check is broken, as I described in your previous question.
You are not handling exceptions correctly, as I described in your previous question.
Your for loop in RemoveNulls has buffer overrun. Loop from low() to high().
It's not possible to comment on the search code since that's not present in the question.
I wrote Delphi debug visualizer for TDataSet to display values of current row, source + screenshot: http://delphi.netcode.cz/text/tdataset-debug-visualizer.aspx . Working good, but very slow. I did some optimalization (how to get fieldnames) but still for only 20 fields takes 10 seconds to show - very bad.
Main problem seems to be slow IOTAThread90.Evaluate used by main code shown below, this procedure cost most of time, line with ** about 80% time. FExpression is name of TDataset in code.
procedure TDataSetViewerFrame.mFillData;
var
iCount: Integer;
I: Integer;
// sw: TStopwatch;
s: string;
begin
// sw := TStopwatch.StartNew;
iCount := StrToIntDef(Evaluate(FExpression+'.Fields.Count'), 0);
for I := 0 to iCount - 1 do
begin
s:= s + Format('%s.Fields[%d].FieldName+'',''+', [FExpression, I]);
// FFields.Add(Evaluate(Format('%s.Fields[%d].FieldName', [FExpression, I])));
FValues.Add(Evaluate(Format('%s.Fields[%d].Value', [FExpression, I]))); //**
end;
if s<> '' then
Delete(s, length(s)-4, 5);
s := Evaluate(s);
s:= Copy(s, 2, Length(s) -2);
FFields.CommaText := s;
{ sw.Stop;
s := sw.Elapsed;
Application.MessageBox(Pchar(s), '');}
end;
Now I have no idea how to improve performance.
That Evaluate needs to do a surprising amount of work. The compiler needs to compile it, resolving symbols to memory addresses, while evaluating properties may cause functions to be called, which needs the debugger to copy the arguments across into the debugee, set up a stack frame, invoke the function to be called, collect the results - and this involves pausing and resuming the debugee.
I can only suggest trying to pack more work into the Evaluate call. I'm not 100% sure how the interaction between the debugger and the evaluator (which is part of the compiler) works for these visualizers, but batching up as much work as possible may help. Try building up a more complicated expression before calling Evaluate after the loop. You may need to use some escaping or delimiting convention to unpack the results. For example, imagine what an expression that built the list of field values and returned them as a comma separated string would look like - but you would need to escape commas in the values themselves.
Because Delphi is a different process than your debugged exe, you cannot direct use the memory pointers of your exe, so you need to use ".Evaluate" for everything.
You can use 2 different approaches:
Add special debug dump function into executable, which does all value retrieving in one call
Inject special dll into exe with does the same as 1 (more hacking etc)
I got option 1 working, 2 should also be possible but a little bit more complicated and "ugly" because of hacking tactics...
With code below (just add to dpr) you can use:
Result := 'Dump=' + Evaluate('TObjectDumper.SpecialDump(' + FExpression + ')');
Demo code of option 1, change it for your TDataset (maybe make CSV string of all values?):
unit Unit1;
interface
type
TObjectDumper = class
public
class function SpecialDump(aObj: TObject): string;
end;
implementation
class function TObjectDumper.SpecialDump(aObj: TObject): string;
begin
Result := '';
if aObj <> nil then
Result := 'Special dump: ' + aObj.Classname;
end;
initialization
//dummy call, just to ensure it is linked c.q. used by compiler
TObjectDumper.SpecialDump(nil);
end.
Edit: in case someone is interested: I got option 2 working too (bpl injection)
I have not had a chance to play with the debug visualizers yet, so I do not know if this work, but have you tried using Evaluate() to convert FExpression into its actual memory address? If you can do that, then type-cast that memory address to a TDataSet pointer and use its properties normally without going through additional Evaluate() calls. For example:
procedure TDataSetViewerFrame.mFillData;
var
DS: TDataSet;
I: Integer;
// sw: TStopwatch;
begin
// sw := TStopwatch.StartNew;
DS := TDataSet(StrToInt(Evaluate(FExpression)); // this line may need tweaking
for I := 0 to DS.Fields.Count - 1 do
begin
with DS.Fields[I] do begin
FFields.Add(FieldName);
FValues.Add(VarToStr(Value));
end;
end;
{
sw.Stop;
s := sw.Elapsed;
Application.MessageBox(Pchar(s), '');
}
end;
I have to make a unix compatible windows delphi routine that confirms if a file name exists in filesystem exactly in same CaSe as wanted, e.g. "John.txt" is there, not "john.txt".
If I check "FileExists('john.txt')" its always true for John.txt and JOHN.TXT due windows .
How can I create "FileExistsCaseSensitive(myfile)" function to confirm a file is really what its supposed to be.
DELPHI Sysutils.FileExists uses the following function to see if file is there, how to change it to double check file name is on file system is lowercase and exists:
function FileAge(const FileName: string): Integer;
var
Handle: THandle;
FindData: TWin32FindData;
LocalFileTime: TFileTime;
begin
Handle := FindFirstFile(PChar(FileName), FindData);
if Handle <> INVALID_HANDLE_VALUE then
begin
Windows.FindClose(Handle);
if (FindData.dwFileAttributes and FILE_ATTRIBUTE_DIRECTORY) = 0 then
begin
FileTimeToLocalFileTime(FindData.ftLastWriteTime, LocalFileTime);
if FileTimeToDosDateTime(LocalFileTime, LongRec(Result).Hi,
LongRec(Result).Lo) then Exit;
end;
end;
Result := -1;
end;
function FileExistsEx(const FileName: string): Integer;
var
Handle: THandle;
FindData: TWin32FindData;
LocalFileTime: TFileTime;
begin
Handle := FindFirstFile(PChar(FileName), FindData);
if Handle <> INVALID_HANDLE_VALUE then
begin
Windows.FindClose(Handle);
if (FindData.dwFileAttributes and FILE_ATTRIBUTE_DIRECTORY) = 0 then
begin
FileTimeToLocalFileTime(FindData.ftLastWriteTime, LocalFileTime);
if FileTimeToDosDateTime(LocalFileTime, LongRec(Result).Hi, LongRec(Result).Lo) then
if AnsiSameStr(FindData.cFileName, ExtractFileName(FileName)) then Exit;
end;
end;
Result := -1;
end;
Tom, I'm also intrigued by your use case. I tend to agree with Motti that it would be counter intuitive and might strike your users as odd.
On windows file names are not case sensitive so I don't see what you can gain from treating file names as if they were case sensitive.
In any case you can't have two files named "John.txt" and "john.txt" and failing to find "John.txt" when "john.txt" exists will probably result in very puzzled users.
Trying to enforce case sensitivity in this context is un-intuitive and I can't see a viable use-case for it (if you have one I'll be happy to hear what it is).
I dealt with this issue a while back, and even if I'm sure that there are neater solutions out there, I just ended up doing an extra check to see if the given filename was equal to the name of the found file, using the case sensitive string comparer...
I ran into a similar problem using Java. Ultimately I ended up pulling up a list of the directory's contents (which loaded the correct case of filenames for each file) and then doing string compare on the filenames of each of the files.
It's an ugly hack, but it worked.
Edit: I tried doing what Banang describes but in Java at least, if you open up file "a.txt" you'r program will stubbornly report it as "a.txt" even if the underlying file system names it "A.txt".
You can implement the approach mention by Kris using Delphi's FindFirst and FindNext routines.
See this article
I am loading a file into a array in binary form this seems to take a while
is there a better faster more efficent way to do this.
i am using a similar method for writing back to the file.
procedure openfile(fname:string);
var
myfile: file;
filesizevalue,i:integer;
begin
assignfile(myfile,fname);
filesizevalue:=GetFileSize(fname); //my method
SetLength(dataarray, filesizevalue);
i:=0;
Reset(myFile, 1);
while not Eof(myFile) do
begin
BlockRead(myfile,dataarray[i], 1);
i:=i+1;
end;
CloseFile(myfile);
end;
If your really want to read a binary file fast, let windows worry about buffering ;-) by using Memory Mapped Files. Using this you can simple map a file to a memory location an read like it's an array.
Your function would become:
procedure openfile(fname:string);
var
InputFile: TMappedFile;
begin
InputFile := TMappedFile.Create;
try
InputFile.MapFile(fname);
SetLength(dataarray, InputFile.Size);
Move(PByteArray(InputFile.Content)[0], Result[0], InputFile.Size);
finally
InputFile.Free;
end;
end;
But I would suggest not using the global variable dataarray, but either pass it as a var in the parameter, or use a function which returns the resulting array.
procedure ReadBytesFromFile(const AFileName : String; var ADestination : TByteArray);
var
InputFile : TMappedFile;
begin
InputFile := TMappedFile.Create;
try
InputFile.MapFile(AFileName);
SetLength(ADestination, InputFile.Size);
Move(PByteArray(InputFile.Content)[0], ADestination[0], InputFile.Size);
finally
InputFile.Free;
end;
end;
The TMappedFile is from my article Fast reading of files using Memory Mapping, this article also contains an example of how to use it for more "advanced" binary files.
You generally shouldn't read files byte for byte. Use BlockRead with a larger value (512 or 1024 often are best) and use its return value to find out how many bytes were read.
If the size isn't too large (and your use of SetLength seems to support this), you can also use one BlockRead call reading the complete file at once. So, modifying your approach, this would be:
AssignFile(myfile,fname);
filesizevalue := GetFileSize(fname);
Reset(myFile, 1);
SetLength(dataarray, filesizevalue);
BlockRead(myFile, dataarray[0], filesizevalue);
CloseFile(myfile);
Perhaps you could also change the procedure to a boolean function named OpenAndReadFile and return false if the file couldn't be opened or read.
It depends on the file format. If it consists of several identical records, you can decide to create a file of that record type.
For example:
type
TMyRecord = record
fieldA: integer;
..
end;
TMyFile = file of TMyRecord;
const
cBufLen = 100 * sizeof(TMyRecord);
var
file: TMyFile;
i : Integer;
begin
AssignFile(file, filename);
Reset(file);
i := 0;
try
while not Eof(file) do begin
BlockRead(file, dataarray[i], cBufLen);
Inc(i, cBufLen);
end;
finally
CloseFile(file);
end;
end;
If it's a long enough file that reading it this way takes a noticeable amount of time, I'd use a stream instead. The block read will be a lot faster, and there's no loops to worry about. Something like this:
procedure openfile(fname:string);
var
myfile: TFileStream;
filesizevalue:integer;
begin
filesizevalue:=GetFileSize(fname); //my method
SetLength(dataarray, filesizevalue);
myFile := TFileStream.Create(fname);
try
myFile.seek(0, soFromBeginning);
myFile.ReadBuffer(dataarray[0], filesizevalue);
finally
myFile.free;
end;
end;
It appears from your code that your record size is 1 byte long. If not, then change the read line to:
myFile.ReadBuffer(dataarray[0], filesizevalue * SIZE);
or something similar.
Look for a buffered TStream descendant. It will make your code a lot faster as the disk read is done fast, but you can loop through the buffer easily. There are various about, or you can write your own.
If you're feeling very bitheaded, you can bypass Win32 altogether and call the NT Native API function ZwOpenFile() which in my informal testing does shave a tiny bit off. Otherwise, I'd use Davy's Memory Mapped File solution above.