I have a directory with 14.000 files. 20-50 files will be added everyday to this directory.
I want to perform an action on all new files placed in this directory and I only want this action to be performed once per file.
I aldready made a rutine and it works but it really sucks.
It goes like this:
Get all files in direcroty to a Listbox
Load textfile with all processed files to another Listbox
Compare Listboxes and extract all new files.
Perform action on the new files.
Save textfile with all processed files.
This is the code for no. 3:
for i := 0 to FileListDir.Items.Count - 1 do
if FileListHandled.Items.IndexOf(FileListDir.Items[i]) = -1 then
FilesNeedHandling.Items.Add(FileListDir.Items[i]);
The rutine takes about 30-35 seconds to complete.
2 Questions:
How can I make my rutine faster ?
It is possible to get only the "new" files in the directory without using my rutine.
"find" command can be used for this.
find <path of directory> -mtime -1 -print
This prints files modified under the given directory within the past one day
find <path of directory> -mtime -1 -exec <command> {} \;
This finds the files modified within the past day and runs the given command on each of the files
find <path of directory> -mmin -60 -exec <command> {} \;
This handles files modified within the last 60 minutes.
Okay, this is not Phyton but Delphi (the tag was removed?)
I had to rethink it all, again. And this time I sucseeded.
I wrote my own rutine to load files into a listbox and made a bubble sort on it.
After that I could extract the filenames not handled.
Test scenario: 4628 Files
Test profiling goes like this:
Load filelist into:
TFileListBox Time: 2,24
TFileListBoxEx Time: 1,52
TJvFileListBox Time: 59,28 (Wtf is wrong with the Jvl libralies?)
Own rutine to load files with datetime info and bubble sort into:
TListBox Time: 1,61
// Get files list
if FindFirst(C_MailIncomingDir+'\*.eml', faAnyFile, Rec) = 0 then
repeat
Setlength(FileList, Length(FileList) + 1);
Setlength(DateList, Length(DateList) + 1);
FileList[High(FileList)]:= Rec.Name;
DateList[High(DateList)]:= FileDateToDateTime(Rec.Time);
until FindNext(Rec) <> 0;
FindClose(Rec);
// Sort
// Bubble sort
repeat
Done:= True;
for i:= 0 to High(FileList) - 1 do
if DateList[i] > DateList[i + 1] then
begin
Done:= False;
TempName:= FileList[i];
FileList[i]:= FileList[i + 1];
FileList[i + 1]:= TempName;
TempDate:= DateList[i];
DateList[i]:= DateList[i + 1];
DateList[i + 1]:= TempDate;
end;
until Done;
// Show in list
FilesInDir.Clear;
for i:= 0 to High(FileList) do
FilesInDir.Items.Add(FileList[i] + ' ' + DateTimeToStr(DateList[i]));
By sorting the files by datetime, it is possible to diff out all files not already handled with this line (0,06 instead of 10 secs for 4500 files):
for I := FileListHandled.Count to FilesInDir.Count-1 do
FilesNeedHandling.Items.Add(FilesInDir.Items[i]);
Related
I have a nested file structure where a parent folder contains multiple folders of different types of data. I am using an ImageJ macro script to batch process all of the image files within one of those folders. I currently need to process each folder separately, but I would like to batch process over the folders. I have looked up some batch processing of multiple folders, but it appears that the code is processing all folders and files within all of the folders. I only need to process one folder within each directory (all named the same). The images come from the instrument without any metadata, so the files are saved as such to separate the experiments, where all data for the experiment is contained within the parent folder. Also, I have two different scripts that I need to run, one after the other. It would be great if I could merge those, but I am don't know how to do that either.
An example of the structure is:
Experiment1/variable1/processed
Experiment1/variable2/processed
I am currently running my macro on each of the "processed" folders individually. I would like to batch each "processed" folder within each of the "variable" folders.
Any help would be greatly appreciated, I am really new to coding and am just really trying to learn and automate as much as possible.
Thank you!
Did you try the batch processing scripts you came across? Reading the batch processing example which is provided with ImageJ leads me to believe it would work for your example. If you haven't tested it, you should do so (you can put in a command like "print(list[i])" in the place of your actual macro while you test that you've got the file finding section working.
To merge two different scripts, the simplest option would be to make them individual functions. i.e.:
// function to scan folders/subfolders/files to find files with correct suffix
function processFolder(input) {
list = getFileList(input);
list = Array.sort(list);
for (i = 0; i < list.length; i++) {
if(File.isDirectory(input + File.separator + list[i]))
processFolder(input + File.separator + list[i]);
if(endsWith(list[i], suffix))
processFile(input, output, list[i]);
processOtherWay(input, output, list[i]);
}
}
function processFile(input, output, file) {
// Do the processing here by adding your own code.
// Leave the print statements until things work, then remove them.
print("Processing: " + input + File.separator + file);
print("Saving to: " + output);
}
function processOtherWay(input, output, file) {
// Do the processing here by adding your own code.
// Leave the print statements until things work, then remove them.
print("Processing: " + input + File.separator + file);
print("Saving to: " + output);
}
If the goal isn't to run them on the exact same image, then again make them standalone functions, and have the folder sorting section of the script be in two parts, one for function 1, one for function 2.
You can always just take the code you have and nest it in another for loop or two.
numVariables = ;//number of folders of interest
for(i = 1; i <= numVariables; i++) //here i starts at 1 because you indicated that is the first folder of interest, but this could be any number
{
openPath = "Experiment1/variable" + i + "/processed";
files = getFileList(openPath);
for(count = 0; count < files.length; count++) //here count should start at 0 in order to index through the folder properly (otherwise it won't start at the first file in the folder)
{
//all your other code, etc.
}
}
That should just about do it I think.
Goal of my Makefile is to create in the end a static library *.a out of Fortran77 files and some *.c's + *.h's whereas a specific part of the headers have to be precompiled with a special company internal precompiler which is provided via executable and all you have to hand over is the pathname+filename.
Let's call the Precompiler CPreComp.
The files needing the precompilation *_l.h .
So I want first to collect all the headers I need to precompile and then hand it over to a script which does some magic (env variables blubb blubb) and calls the precompiler.
Here you go with my Makefile:
SHELL=/usr/bin/bash
.SHELLFLAGS:= -ec
SOURCE_PATH = ./src
CPRECOMP = ./tools/cprecomp.exe
DO_CPreComp = $(SOURCE_PATH)/do_cprec
HDREXT = .h
PREC_HEADERS = $(foreach d, $(SOURCE_PATH), $(wildcard $(addprefix $(d)/*, $(HDREXT))))
.PHONY: all prereq
all:: \
prereq \
lib.a
prereq: chmod 777 $(DO_CPreComp)
echo $(PREC_HEADERS) >> makefileTellMeWhatYouHaveSoFar.txt
lib.a: \
obj/file1.o \
obj/file2.o
ar -r lib.a $?
obj/file1.o:
# do some fortran precompiling stuff here for a specific file
obj/file2.o: $(SOURCE_PATH)/*.h precomp_path/*.h $(SOURCE_PATH)/file2.c precomp_path/%_l.h
cc -c -g file2.c
precomp_path/%_l.h : DatabaseForPreComp.txt
precomp_path/%_l.h :
$(foreach i , $(PREC_HEADERS) , $(DO_CPreComp) $(i) $(CPRECOMP);)
So that is my Makefile, the script for the DO_CPreComp looks as follows:
#!/bin/bash
filename="(basename "$1")"
dir="$(dirname "$1")"
cprecomptool="$2"
echo ${dir} ${filename} ${cprecomptool} >> scriptTellMeWhatYouKnow.txt
"${cprecomptool}" "precomp_path/${filename}.1" >&cprecomp.err
cp "precomp_path/${filename}.1" "precomp_path/${filename}"
So according to the makefileTellMeWhatYouHaveSoFar.txt I collect all the headers, obviously also the ones not specified with _l.h . This has space for improvement but the precompiler is smart enough to skip the files which are not suitable. So makefileTellMeWhatYouHaveSoFar.txt looks like that:
header1.h header2.h header2_l.h headerx_l.h headery_l.h headerz.h
The Error tells me:
path_to_here/do_cprec : line xy: $2: unbound variable
make[2]: *** [precomp_path/%_l.h] Error 1
make[1]: *** [lib.a] Error 2
scriptTellMeWhatYouKnow.txt shows me the script knows nothing and it is not even created. If I modify cprecomptool and directly add it in the script hardcoded the scriptTellMeWhatYouKnow.txt shows me the argument $(CPRECOMP) twice as file name and path name and the hardcoded precompiler. And ofc it ends up with Segmentation fault, so the header name was never handed over.
Additionally:
If I do not call the script in the second foreach but let $(i) be printed out with echo in another file it is empty.
Perhaps I am just too blind. And please if you are able to help me , explain it to me for dumb people, such that for the next time I stumble over a problem I am smarter because I know what I am doing. :)
OK, now that the main issue is solved, let's have a look at make coding styles. The make way of accomplishing what you want is not exactly using foreach in recipes. There are several drawbacks with this approach like, for instance, the fact that make cannot run parallel jobs, while it is extremely good at this. And on modern multi-core architectures, it can really make a difference. Or the fact that things are always redone while they are potentially up to date.
Assuming the result of the pre-compilation of foo_l.h file is a foo.h (we will look at other options later), the make way is more something like:
SOURCE_PATH := ./src
CPRECOMP := ./tools/cprecomp.exe
DO_CPreComp := $(SOURCE_PATH)/do_cprec
HDREXT := .h
PREC_HEADERS := $(wildcard $(addsuffix /*_l.$(HDREXT),$(SOURCE_PATH)))
PRECOMPILED_HEADERS := $(patsubst %_l.h,%.h,$(PREC_HEADERS))
$(PRECOMPILED_HEADERS): %_l.h: %.h DatabaseForPreComp.txt
$(DO_CPreComp) $# $(CPRECOMP)
($# expands as the target). This is a static pattern rule. With this coding style only the headers that need to be pre-compiled (because they are older than their prerequisites) are re-built. And if you run make in parallel mode (make -j4 for 4 jobs in parallel) you should see a nice speed-up factor on a multi-core processor.
But what if the pre-compilation modifies the foo_l.h file itself? In this case you need another dummy (empty) file to keep track of when a file has been pre-compiled:
SOURCE_PATH := ./src
CPRECOMP := ./tools/cprecomp.exe
DO_CPreComp := $(SOURCE_PATH)/do_cprec
HDREXT := .h
PREC_HEADERS := $(wildcard $(addsuffix /*_l.$(HDREXT),$(SOURCE_PATH)))
PREC_TAGS := $(patsubst %,%.done,$(PREC_HEADERS))
$(PREC_TAGS): %.done: % DatabaseForPreComp.txt
$(DO_CPreComp) $< $(CPRECOMP) && \
touch $#
($< expands as the first prerequisite). The trick here is that the foo_l.h.done empty file is a marker. Its last modification time records the last time foo_l.h has been pre-compiled. If foo_l.h or DatabaseForPreComp.txt has changed since, then foo_l.h.done is out of date and make re-builds it, that is, pre-compiles foo_l.h and then touch foo_l.h.done to update its last modification time. Of course, if you use this, you must tell make that some other targets depend on $(PREC_TAGS).
With the help of #Renaud Pacalet I was able to find a solution.
In the comments you can read further try & errors.
I am using GNU Make 3.82 Built for x86_64-redhat-linux-gnu . Seems like the foreach does not like the space behind the i or furthermore takes the space as part of the variable.
# ... like beforehand check out in the question
PREC_HEADERS=$(shell find $(SOURCE_PATH) -iname '*_l.h')
# nothing changed here in between...
$(foreach i,$(PREC_HEADERS),$(DO_CPC) $i $(CPC);)
This has the advantage that I only precompile the headers which have the _l.h - ending. Having the brackets $(i) around the $i or not, doesn't make a change. What really changed everything was the space behind the first i .
Good luck!
Say that in a folder I have over 100+ .dat files with no common generic file names. For instance, the files are not named f001, f002, f003 (you get the pattern). The names are random. I need to parse these .dat files into SAS files. Each files have the same column/attributes. I use the following code to parse one of the .dat file:
data have;
infile 'C:\SAS\have.dat' dsd dlm='|';
input var1 var2 var3$;run;
The code is the same for each .dat files. Is there a way in SAS to simply parse all the files in a folder and name these SAS files the same as their original .dat. I want all the files to be separated and not under one SAS file.
[UPDATE]
I first start by reading all the filenames in my folder using the following SAS command:
data yfiles;
keep filename;
length fref $
8 filename $ 80;
rc = filename(fref,
'Y:\Files\Exchanges');
if rc = 0 then do ;
did = dopen(fref);
rc = filename(fref); end;
else do; length msg $200.;
msg = sysmsg();
put msg=; did =.;
end;
if did <=0 then putlog
'ERR' 'OR: Unable to open directory.';
dnum = dnum(did);
do i =1 to dnum;
filename = dread(did, i);
/* If this entry is a file, then output. */
fid = mopen(did, filename);
if fid >0 then output;
end;
rc = dclose(did);
run;
In yfiles I have all the names of my .dat datasets.
Now, how can I loop through each .dat files names of my yfiles dataset to apply the above parsing code?
Use CALL EXECUTE and a Data Step to loop through the file names. You use the Data Step to build and execute the SAS statements.
data _null_;
set yfiles;
format outStr $200.;
outStr = 'data have' || strip(put(_N_,best.)) || ';';
call execute(outStr);
outStr = "infile 'C:\SAS\" || strip(filename) || "' dsd dlm='|';";
call execute(outStr);
call execute("input var1 var2 var3$;run;");
run;
Try using a pipe file ref with the command doing a directory listing. Parse the output and loop over the directory contents.
I use this code to log the many lines (SQL.Add) making complex scripts i have to build:
Ex:
[...]
SQL.Add('ENTITY_ID, PRO_CODE, PHASE_CODE, TASK_CODE, PERIOD_REF');
SQL.Add('from ' + trim(SourceJrnl) + ' where');
SQL.Add('MASTER_ID = ' + IntToStr(TranID) + ' and');...
[...]
{ for debugging only }
for i := 0 to SQL.Count-1 do
ShowMessage('Line #' + IntToStr(i+1) + ' : '+ SQL.Strings[i]);
Any simple way (function) to have the lines written to a file out of a stringlist or memo.
[EDIT] Sorry. NO memo or stringlist but a simple log file.
Calling SQL.SaveToFile will write the query to a file, but it will clobber the previous file contents, so you can only see one query and no other logs. Instead, read the SQL.Text property to get all the lines in a single string, and then write it to your log file using whatever logging technique you have for the rest of your program. In a pinch, a simple way to write a line of text to a file is to call Writeln, but people have asked about real logging libraries before.
I am looking for a simple method of zipping and compressing with delphi. I have already looked at the components at torry delphi:http://www.torry.net/pages.php?s=99. They all seem as though they would accomplish what I want however a few disadvantages to using them is that none of them run in delphi 2009 and are very complex which makes it difficult for me to port them to delphi 2009. And besides, the documentation on them is scarce, well at least to me. I need basic zipping functionality without the overhead of using a bunch of DLLs. My quest lead me to FSCTL_SET_COMPRESSION which I thought would have settled the issue but unfortunately this too did not work. CREATEFILE looked promising, until I tried it yielded the same result as FSCTL_SET... I know that there are some limited native zipping capability on windows. For instance if one right clicks a file or folder and selects -> sendTo ->zipped folder, a zipped archive is smartly created. I think if I was able to access that capability from delphi it will be a solution. On a side issue, does linux have its own native zipping functions that can be used similar to this?
TurboPower's excellent Abbrevia can be downloaded for D2009 here, D2010 support is underway and already available in svn according to their forum.
Abbrevia used to be a commercial (for $$$) product, which means that the documentation is quite complete.
I use Zipforge. Why are there problems porting these to D2009? Is it because of the 64bit??
Here is some sample code
procedure ZipIt;
var
Archiver: TZipForge;
FileName: String;
begin
try
Archiver:= TZipForge.create(self);
with Archiver do begin
FileName := 'c:\temp\myzip.zip';
// Create a new archive file
OpenArchive(fmCreate);
// Set path to folder with some text files to BaseDir
BaseDir := 'c:\temp\';
// Add all files and directories from 'C:\SOURCE_FOLDER' to the archive
AddFiles('myfiletozip.txt');
// Close the archive
CloseArchive;
end;
finally
Archiver.Free;
end;
end;
If you can "do" COM from Delphi, then you can take advantage of the built-in zip capability of the Windows shell. It gives you good basic capability.
In VBScript it looks like this:
Sub CreateZip(pathToZipFile, dirToZip)
WScript.Echo "Creating zip (" & pathToZipFile & ") from folder (" & dirToZip & ")"
Dim fso
Set fso= Wscript.CreateObject("Scripting.FileSystemObject")
If fso.FileExists(pathToZipFile) Then
WScript.Echo "That zip file already exists - deleting it."
fso.DeleteFile pathToZipFile
End If
If Not fso.FolderExists(dirToZip) Then
WScript.Echo "The directory to zip does not exist."
Exit Sub
End If
NewZip pathToZipFile
dim sa
set sa = CreateObject("Shell.Application")
Dim zip
Set zip = sa.NameSpace(pathToZipFile)
WScript.Echo "opening dir (" & dirToZip & ")"
Dim d
Set d = sa.NameSpace(dirToZip)
For Each s In d.items
WScript.Echo s
Next
' http://msdn.microsoft.com/en-us/library/bb787866(VS.85).aspx
' ===============================================================
' 4 = do not display a progress box
' 16 = Respond with "Yes to All" for any dialog box that is displayed.
' 128 = Perform the operation on files only if a wildcard file name (*.*) is specified.
' 256 = Display a progress dialog box but do not show the file names.
' 2048 = Version 4.71. Do not copy the security attributes of the file.
' 4096 = Only operate in the local directory. Don't operate recursively into subdirectories.
WScript.Echo "copying files..."
zip.CopyHere d.items, 4
' wait until finished
sLoop = 0
Do Until d.Items.Count <= zip.Items.Count
Wscript.Sleep(1000)
Loop
End Sub
COM also allws you to use DotNetZip, which is a free download, that does password-encrypted zips, zip64, Self-extracting archives, unicode, spanned zips, and other things.
Personally I use VCL Zip which runs with D2009 and D2010 perfectly fine. it does cost $120 at the time of this post but is very simple, flexible and most of all FAST.
Have a look at VCLZIP and download the trail if your interested
code wise:
VCLZip1.ZipName := ‘myfiles.zip’;
VCLZip1.FilesList.add(‘c:\mydirectory\*.*’);
VCLZip1.Zip;
is all you need for a basic zip, you can of course set compression levels, directory structures, zip streams, unzip streams and much more.
Hope this is of some assistance.
RE
Take a look at this OpenSource SynZip unit. It's even faster for decompression than the default unit shipped with Delphi, and it will generate a smaller exe (crc tables are created at startup).
No external dll is needed. Works from Delphi 6 up to XE. No problem with Unicode version of Delphi. All in a single unit.
I just made some changes to handle Unicode file names inside Zip content, not only Win-Ansi charset but any Unicode chars. Feedback is welcome.