Merge multiple text files line by line using ANT - ant

How do I merge multiple text files into one file. I want to read text line by line from each file and then merged text into the final output file. Sample text in my files are:
File 1 :
aaa
bbb
ccc
File 2 :
ddd
eee
fff
File 3 :
ggg
hhh
iii
Expected Output:
aaa -->from file 1
ddd -->from file 2
ggg -->from file 3
bbb
eee
hhh
ccc
fff
iii
I have tried the target below
<target name="mergeappvars" >
<concat destfile="${out.dir}/ApplicationGV.txt" force="no">
<fileset dir="${work.dir}/application"
includes="*.txt"/>
</concat>
</target>
My logic is appending one file after another and I got the output as
aaa
bbb
ccc
ddd
eee
fff
ggg
hhh
iii

You need to write your own logic. This link will help you how to load a file and read from file using ant.
How to read data line by line from a file using ant script?

Related

How to delete multiple columns with grep (only first column)?

I have a file like this (delimited by \t):
AAED1 Previous_symbol PRXL2C
AARS Previous_symbol AARS1
ABP1 Previous_symbol AOC1
ACN9 Previous_symbol SDHAF3
ADCY3 Previous_symbol ADCY8
AK3 Previous_symbol AK4
AK8 Previous_symbol AK3
I want to delete the rows that contain AAED1 and AK3 in the first column. In reality my file have thousand of lines and I want to delete hundred of rows. I have a file with the patterns I want to search for (this is an example):
AAED1
AK3
I tried this:
grep -wvf pattern.txt file.txt
Expected output:
AARS Previous_symbol AARS1
ABP1 Previous_symbol AOC1
ACN9 Previous_symbol SDHAF3
ADCY3 Previous_symbol ADCY8
AK8 Previous_symbol AK3
The result I obtained:
AARS Previous_symbol AARS1
ABP1 Previous_symbol AOC1
ACN9 Previous_symbol SDHAF3
ADCY3 Previous_symbol ADCY8
The last row is also deleted because it contains AK3 on the third column. Is there a way to only grep the first column?
In the current set up, the file with patterns will search for any occurrence of those patterns in the lines and so the line:
AK8 Previous_symbol AK3
will also match AK3
You need to add a start of line marker to the patterns to ensure that the patterns are anchored checked at the start of the lines only and so:
^AAED1
^AK3
If you cannot directly edit the file with patterns use the following:
grep -f <(sed 's/^/^/' file1) file
With file 1 as the file with the patterns and file as the file to search. We run a sed command to replace the start of every line in file1 with ^ and then redirect the result back into grep as the patterns to check.

How to count the occurence of a string in a file, for all files in a directory and output into a new file with shell

I have hundreds of files in a directory that I would like to count the occurrence of a string in each file.
I would like the output to be a summary file that contains the original file name plus the count (ideally on the same line)
for example
file1 6
file2 3
file3 4
etc
Thanks for your consideration
CAUTION: I am pretty much an enthusiastic amateur, so take everything with a grain of salt.
Several questions for you - depending on your answers, the solution below may need some adjustments.
Are all your files in the same directory, or do you also need to look through subdirectories and sub-subdirectories, etc.? Below I make the simplest assumption - that all your files are in a single directory.
Are all your files text files? In the example below, the directory will contain text files, executable files, symbolic links, and directories; the count will only be given for text files. (What linux believe to be text files, anyway.)
There may be files that do not contain the searched-for string at all. Those are not included in the output below. Do you need to show them too, with a count of 0?
I assume by "count occurrences" you mean all of them - even if the string appears more than once on the same line. (Which is why a simple grep -c won't cut it, as that only counts lines that contain the substring, no matter how many times each.)
Do you need to include hidden files (whose name begins with a period)? In my code below I assumed you don't.
Do you care that the count appears first, and then the file name?
OK, so here goes.
[oracle#localhost test]$ ls -al
total 20
drwxr-xr-x. 3 oracle oinstall 81 Apr 3 18:42 .
drwx------. 39 oracle oinstall 4096 Apr 3 18:42 ..
-rw-r--r--. 1 oracle oinstall 40 Apr 3 17:44 aa
lrwxrwxrwx. 1 oracle oinstall 2 Apr 3 18:04 bb -> aa
drwxr-xr-x. 2 oracle oinstall 6 Apr 3 17:40 d1
-rw-r--r--. 1 oracle oinstall 38 Apr 3 17:56 f1
-rw-r--r--. 1 oracle oinstall 0 Apr 3 17:56 f2
-rwxr-xr-x. 1 oracle oinstall 123 Apr 3 18:15 zfgrep
-rw-r--r--. 1 oracle oinstall 15 Apr 3 18:42 .zz
Here's the command to count 'waca' in the text files in this directory (not recursive). I define a variable substr to hold the desired string. (Note that it could also be a regular expression, more generally - but I didn't test that so you will have to, if that's your use case.)
[oracle#localhost test]$ substr=waca
[oracle#localhost test]$ find . -maxdepth 1 -type f \
> -exec grep -osHI "$substr" {} \; | sed "s/^\.\/\(.*\):$substr$/\1/" | uniq -c
8 aa
2 f1
1 .zz
Explanation: I use find to find just the files in the current directory (excluding directories, links, and whatever other trash I may have in the directory). This will include the hidden files, and it will include binary files, not just text. In this example I find in the current directory, but you can use any path instead of . I limit the depth to 1, so the command only applies to files in the current directory - the search is not recursive. Then I pass the results to grep. -o means find all matches (even if multiple matches per line of text) and show each match on a separate line. -s is for silent mode (just in case grep thinks of printing messages), -H is to include file names (even when there is only one file matching the substring), and -I is to ignore binary files.
Then I pass this to sed so that from each row output by grep I keep just the file name, without the leading ./ and without the trailing :waca. This step may not be necessary - if you don't mind the output like this:
8 ./aa:waca
2 ./f1:waca
1 ./.zz:waca
Then I pass the output to uniq -c to get the counts.
You can then redirect the output to a file, if that's what you need. (Left as a trivial exercise - since I forgot that was part of the requirement, sorry.)
Thanks for the detailed answer it provides me with ideas for future projects.
In my case the files were all the same format (output from another script) and the only files in the directory.
I found the answer in another thread
grep -c -R 'xxx'

Search for a missing string in many files of which only some are interested (are target files)

I have more than 1000 files in one directory, but the files interested for me are half of them! I want to search only these files for certain things very often. Unfortunately it is not possible to match something in FILENAME.
The only thing I thing of is to match some string that is always presented in "interested" for me files in the example is: "special" .. and make somehow a LIST of these files and search them only in them for some things. Maybe some stacked grep commands ?
My attempt to search for missing string "line4" on all files:
grep -c 'line4' * | grep -P ':0$'
file2.log:0
file3.log:0
file4.log:0
file5.log:0
# -c count lines that match the string and grep zero ones..
For example we have 5 files in one folder:
ls -l
file1.log # interesed file (always contains on some line word: "special")
file2.log # interesed file (always contains on some line word: "special")
file3.log # interesed file (always contains on some line word: "special")
file4.log # NOT interesed file
file5.log # NOT interesed file
cat file1.log
line1
special
line2
line3
line4
line5
cat file2.log
line1
line2
special
line3
cat file3.log
line1
line2
special
line3
line5
cat file4.log
line1
line2
line3
line5
cat file5.log
line1
line2
line3
line5
The result should be only file2 and file3, because contains unique string "special" and "line4" word is NOT present in that interested files:
file2.log
file3.log
You can do this by "merging" two grep commands:
grep -wl special `grep -wvl line4 *`
The internal grep will generate list of files which do not contain string "line4". External grep will search for word "special" in to the list of files, generated by first grep
With GNU awk for ENDFILE and word boundaries:
awk '/\<special\>/{x=1} /\<line4\>/{y=1} ENDFILE{if (x && y) print FILENAME; x=y=0}' *

Grep in multiple files prints matches line with file name

I'm using grep to found matching lines from a file in two different files. It finds the matching files just fine from File1 into File2 and File3, but from the moment there is more than one file, it prints the file name in which it was found next to the line.
grep -w -f File1 File2 File3
Output:
File2: pattern
File2: pattern
File3: pattern
Is there an option to avoid the print of File2: and File3:?
grep --no-filename -w -f File1 File2 File3
If you're on a UNIX system, please refer to the man pages. Whenever you encounter a problem, your first step should be man $programName. In this case, man grep. It appears that you want the "-h" option. Here's an excerpt from the man page:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default when there is only one file (or only standard input) to search.

Combine zip archives in ANT with both filtering and case-sensativity

I want to combine several zip files together using ANT, but I've got three restrictions that cause the standard techniques to fail:
There are files (with known filenames) that I do not want included in the final archive.
Some of the source archives contain files with the same name, but different capitalization.
The machine that runs the script uses a case-insensitive filesystem.
To make my problem concrete, here's an example source archive. I do not know the file names represented by a.txt and A.txt, but I do know the filename b.txt.
$ touch a.txt ; zip src.zip a.txt ; rm a.txt
$ touch A.txt ; zip src.zip A.txt ; rm A.txt
$ touch b.txt ; zip src.zip b.txt ; rm b.txt
$ unzip -l src.zip
Archive: src.zip
Length Date Time Name
-------- ---- ---- ----
0 09-23-11 11:35 a.txt
0 09-23-11 11:35 A.txt
0 09-23-11 11:36 b.txt
-------- -------
0 3 files
And here's what I want: (everything from the original archive except b.txt)
$ ant
$ unzip -l expected.zip
Archive: expected.zip
Length Date Time Name
-------- ---- ---- ----
0 09-23-11 11:35 a.txt
0 09-23-11 11:35 A.txt
-------- -------
0 2 files
The two techniques that I've found recommended on the internet are:
<target name="unzip-then-rezip">
<!-- Either a.txt or A.txt is lost during unzip and
does not appear in out.zip -->
<delete dir="tmp"/>
<delete file="out.zip"/>
<mkdir dir="tmp"/>
<unzip src="src.zip" dest="tmp"/>
<zip destfile="out.zip" basedir="tmp" excludes="b.txt"/>
</target>
<target name="direct-zip">
<!-- Have not found a way to exclude b.txt from out.zip -->
<delete file="out.zip"/>
<zip destfile="out.zip">
<zipgroupfileset dir="." includes="*.zip" />
</zip>
</target>
Using unzip-then-rezip, I loose either a.txt or A.txt because the underlying filesystem is case-insensitive and can not store both files. Using direct-zip seems like the right way to go, but I have yet to find a way to filter out the files I don't want included.
I'm about to resort to creating my own ANT task to do the job, but I'd much rather use standard ANT tasks (or even ant-contrib), even if there's a performance or readability penalty.
I ended up creating a custom ANT task to solve the problem. The task accepts nested excludes elements which provide regular expressions that are matched against the entires in the source zip file.
As an added bonus, I was also able address another problem: renaming zip entries using regular expressions using a nested rename element.
The ANT code looks something like this:
<filter-zip srcfile="tmp.zip" tgtfile="target.zip">
<exclude pattern="^b\..*$"/>
<rename pattern="^HELLO/(.*)" replacement="hello/$1"/>
</filter-zip>
The kernel of the ANT task looks something like this:
zIn = new ZipInputStream(new FileInputStream(srcFile));
zOut = new ZipOutputStream(new FileOutputStream(tgtFile));
ZipEntry entry = null;
while ((entry = zIn.getNextEntry()) != null) {
for (Rename renameClause : renameClauses) {
...
}
for (Exclude excludeClause : excludeClauses) {
...
}
zOut.putNextEntry(...);
// Copy zIn to zOut
zOut.closeEntry();
zIn.closeEntry();
}
In my original question, I said I wanted to combine several zip files together. This is pretty straight forward using the 'direct-zip' method in the original question. I use this to create an intermediate zip file (tmp.zip) which I then use as the source to my filter-zip task:
<zip destfile="tmp.zip">
<zipgroupfileset dir="." includes="*.zip" />
</zip>
At the moment my filter-zip task runs a little slower then the zip (assemble all the zips) task... so the performance is (probably) pretty close to ideal. Combining the two steps together would be a nice little exercise, but not very high ROI for me.
Have a look at Ant's Resource Collections, especially things like restrict that allow you to filter files (and zip file contents etc.) in quite flexible ways.
This snippet seems to what you want (on my machine at least - OSX):
<project default="combine">
<target name="combine">
<delete file="expected.zip" />
<zip destfile="expected.zip">
<restrict>
<zipfileset src="src.zip" />
<not>
<name name="b.txt" />
</not>
</restrict>
</zip>
</target>
</project>
The input file:
$ unzip -l src.zip
Archive: src.zip
Length Date Time Name
-------- ---- ---- ----
0 09-24-11 00:55 a.txt
0 09-24-11 00:55 A.txt
0 09-24-11 00:55 b.txt
-------- -------
0 3 files
The output file:
$ unzip -l expected.zip
Archive: expected.zip
Length Date Time Name
-------- ---- ---- ----
0 09-24-11 00:55 A.txt
0 09-24-11 00:55 a.txt
-------- -------
0 2 files

Resources