I want to find file containing some interesting stuff for me. I want that file to have extension .h or .cc. Is there some faster way than typing two times:
grep -r "some stuff" * --include="*.h"
grep -r "some stuff" * --include="*.cc"
?
I have a bash function defined in my .bashrc which searches in the current directory recursively but skips files and directories known to not be of interest:
function cgrep () {
egrep -nrI --color=auto --exclude="*.svn-base" --exclude=".svn" --exclude="entries" --exclude=".*.d" --exclude="cscope.out" --exclude="*.syms" --exclude="*.dis" --exclude="*.d" "$#" .
}
Call it via
> cgrep uint64_t
./fs/nfsd/nfs4xdr.c:3016: uint64_t minor_id = 0;
./fs/nfs/callback.h:67: uint64_t size;
./fs/nfs/callback.h:68: uint64_t change_attr;
./fs/nfs/fscache-index.c:188: uint64_t *size)
The searched pattern is actually colored! :-)
Related
Unix shell ksh
I created a file list and am currently trying to copy each file to their correct path.
(mylist)
-1111
-2222
-3333
-4444
-5555
current directory
/sample/dir/unknown/
-1111fileneeded.txt
-2222fileneeded.txt
-3333fileneeded.txt
-4444fileneeded.txt
-5555fileneeded.txt
-6666dontneed.txt
-7777dontneed.txt
-8888dontneed.txt
...etc
The first 4 characters of each file matches with their correct path to where they need to go.
/sample/dir/1111/
/sample/dir/2222/
/sample/dir/3333/
/sample/dir/4444/
So here is what I currently have..
for i in `cat mylist`
do echo "$i"
find /sample/dir/unknown/mylist*
this is where I am kinda stuck and trying to figure out what needs to be done to have each file moved into their correct directory.
This should work
#!/bin/ksh
while IFS=\| read -r line; do
dir=`echo $line | cut -c 2-5`
mv "$line /sample/$dir/$line"
done > filelist.txt
IFS is escape special char, just in case.
cut -c 2-5 is taking all char from 2 to 5 (because there is a dash at the start of your file name)
Let me know if there is something else you don't understand.
I'd like to calculate MD5 for all files in a tar archive. I tried tar with --to-command.
tar -xf abc.tar --to-command='md5sum'
it outputs like below.
cb6bf052c851c1c30801ef27c9af1968 -
f509549ab4eeaa84774a4af0231cccae -
Then I want to replace '-' with file name.
tar -xf abc.tar --to-command='md5sum | sed "s#-#$TAR_FILENAME#"' it reports error.
md5sum: |: No such file or directory
md5sum: sed: No such file or directory
md5sum: s#-#./bin/busybox#: No such file or directory
tar: 23255: Child returned status 1
You don't have a shell so this won't work (you also might see that the | gets to md5sum as an argument). one way could be to invoke the shell yourself, but there is some hassle with nested quotes:
tar xf some.tar --to-command 'sh -c "md5sum | sed \"s|-|\$TAR_FILENAME|\""'
At first, it's better to avoid using sed, not only because it's slow, but because $TAR_FILENAME can contain magic chars to be interpreted by sed (you already noticed that, having to use # instead of / for substitution command, didnt you?). Use deadproof solution, like head, followed by echoing actual filename.
Then, as Patrick mentions in his answer, you can't use complex commands without having them wrapped with shell, but for convenience I suggest to use built-it shell escapement ability, for bash it's printf '%q' "something", so the final command be like:
tar xf some.tar \
--to-command="sh -c $(printf '%q' 'md5sum | head -c 34 && printf "%s\n" "$TAR_FILENAME"')"
"34" is number of bytes before file name in md5sum output format; && instead of ; to allow md5sum's error code (if any) reach tar; printf instead of echo used because filenames with leading "-" may be interpreted by echo as options.
I use the following command to my liking, but perfection is better;-)
grep -w -i -r -n -f all.txt . > output.txt
./index.php:86:complete paragraph1
./index.php:89:complete paragraph2
With this:
grep -w -i -r -o -n -f all.txt . > output.txt
We get :
./index.php:86:match1
./index.php:89:match2
Is it also possible to get a combination of that? Like this:
./index.php:86:match1:complete paragraph1
./index.php:89:match2:complete paragraph2
Would be great, still better than that would be even a part ofthe paragraph, but i guess that is a little much to ask for with such a simple tool;-)
Thanks!
grep doesn't have a facility for this, but it's easy to reimplement the useful parts in a simple Awk script.
awk 'NR==FNR { p[++i] = tolower($0); next }
{ line = tolower($0); for (j=1; j<=i; ++j) if (match(line, p[j]))
{ printf "%s:%i:%s:%s\n", FILENAME, FNR, substr($0, RSTART, RLENGTH), $0;
next } }' all.txt files...
The NR==FNR condition matches on the first input file. Each line in that file is converted to lowercase and read into the array p.
The second action only applies to the second and subsequent files. It loops over the items in p and checks whether the current line matches. If so, a match message is printed, and we skip to the next input line.
I'm trying to setup a grep command, that searches my current directory, but excludes a directory, only if it's the root directory.
So for the following directories, I want #1 to be excluded, and #2 to be included
1) vendor/phpunit
2) app/views/vendor
I originally started with the below command
grep -Ir --exclude-dir=vendor keywords *
I tried using ^vendor, ^vendor/, ^vendor/, ^vendor, but nothing seems to work.
Is there a way to do this with grep? I was looking to try to do it with one grep call, but if I have to, I can pipe the results to a second grep.
With pipes:
grep -Ir keywords * | grep -v '^vendor/'
The problem with exclude-dir is, it tests the name of the directory and not the path before going into it, so it is not possible to distinguish between two vendor directories based on their depths.
Here is a better solution, which will actually ignore the specified directory:
function grepex(){
excludedir="$1"
shift;
for i in *; do
if [ "$i" != "$excludedir" ]; then
grep $# "$i"
fi
done
}
You use it as a drop-in replacement to grep, just have the excluded dir as the first argument and leave the * off the end. So, your command would look like:
grepex vendor -Ir keywords
It's not perfect, but as long as you don't have any really weird folders (e.g. with names like -- or something), it will cover most use cases. Feel free to refine it if you want something more elaborate.
I am trying to use a shell script (well a "one liner") to find any common lines between around 50 files.
Edit: Note I am looking for a line (lines) that appears in all the files
So far i've tried grep grep -v -x -f file1.sp * which just matches that files contents across ALL the other files.
I've also tried grep -v -x -f file1.sp file2.sp | grep -v -x -f - file3.sp | grep -v -x -f - file4.sp | grep -v -x -f - file5.sp etc... but I believe that searches using the files to be searched as STD in not the pattern to match on.
Does anyone know how to do this with grep or another tool?
I don't mind if it takes a while to run, I've got to add a few lines of code to around 500 files and wanted to find a common line in each of them for it to insert 'after' (they were originally just c&p from one file so hopefully there are some common lines!)
Thanks for your time,
When I first read this I thought you were trying to find 'any common lines'. I took this as meaning "find duplicate lines". If this is the case, the following should suffice:
sort *.sp | uniq -d
Upon re-reading your question, it seems that you are actually trying to find lines that 'appear in all the files'. If this is the case, you will need to know the number of files in your directory:
find . -type f -name "*.sp" | wc -l
If this returns the number 50, you can then use awk like this:
WHINY_USERS=1 awk '{ array[$0]++ } END { for (i in array) if (array[i] == 50) print i }' *.sp
You can consolidate this process and write a one-liner like this:
WHINY_USERS=1 awk -v find=$(find . -type f -name "*.sp" | wc -l) '{ array[$0]++ } END { for (i in array) if (array[i] == find) print i }' *.sp
old, bash answer (O(n); opens 2 * n files)
From #mjgpy3 answer, you just have to make a for loop and use comm, like this:
#!/bin/bash
tmp1="/tmp/tmp1$RANDOM"
tmp2="/tmp/tmp2$RANDOM"
cp "$1" "$tmp1"
shift
for file in "$#"
do
comm -1 -2 "$tmp1" "$file" > "$tmp2"
mv "$tmp2" "$tmp1"
done
cat "$tmp1"
rm "$tmp1"
Save in a comm.sh, make it executable, and call
./comm.sh *.sp
assuming all your filenames end with .sp.
Updated answer, python, opens only each file once
Looking at the other answers, I wanted to give one that opens once each file without using any temporary file, and supports duplicated lines. Additionally, let's process the files in parallel.
Here you go (in python3):
#!/bin/env python
import argparse
import sys
import multiprocessing
import os
EOLS = {'native': os.linesep.encode('ascii'), 'unix': b'\n', 'windows': b'\r\n'}
def extract_set(filename):
with open(filename, 'rb') as f:
return set(line.rstrip(b'\r\n') for line in f)
def find_common_lines(filenames):
pool = multiprocessing.Pool()
line_sets = pool.map(extract_set, filenames)
return set.intersection(*line_sets)
if __name__ == '__main__':
# usage info and argument parsing
parser = argparse.ArgumentParser()
parser.add_argument("in_files", nargs='+',
help="find common lines in these files")
parser.add_argument('--out', type=argparse.FileType('wb'),
help="the output file (default stdout)")
parser.add_argument('--eol-style', choices=EOLS.keys(), default='native',
help="(default: native)")
args = parser.parse_args()
# actual stuff
common_lines = find_common_lines(args.in_files)
# write results to output
to_print = EOLS[args.eol_style].join(common_lines)
if args.out is None:
# find out stdout's encoding, utf-8 if absent
encoding = sys.stdout.encoding or 'utf-8'
sys.stdout.write(to_print.decode(encoding))
else:
args.out.write(to_print)
Save it into a find_common_lines.py, and call
python ./find_common_lines.py *.sp
More usage info with the --help option.
Combining this two answers (ans1 and ans2) I think you can get the result you are needing without sorting the files:
#!/bin/bash
ans="matching_lines"
for file1 in *
do
for file2 in *
do
if [ "$file1" != "$ans" ] && [ "$file2" != "$ans" ] && [ "$file1" != "$file2" ] ; then
echo "Comparing: $file1 $file2 ..." >> $ans
perl -ne 'print if ($seen{$_} .= #ARGV) =~ /10$/' $file1 $file2 >> $ans
fi
done
done
Simply save it, give it execution rights (chmod +x compareFiles.sh) and run it. It will take all the files present in the current working directory and will make an all-vs-all comparison leaving in the "matching_lines" file the result.
Things to be improved:
Skip directories
Avoid comparing all the files two times (file1 vs file2 and file2 vs file1).
Maybe add the line number next to the matching string
Hope this helps.
Best,
Alan Karpovsky
See this answer. I originally though a diff sounded like what you were asking for, but this answer seems much more appropriate.