I want to search a directory (excluding paths that contain any of certain words, ideally a regex pattern) and find all files with contents that match my query (ideally a regex pattern, which I'd make case-insensitive) and were modified between 2 specific dates.
Based on this answer, my current command is:
find /mnt/c/code -type f -mtime -100 -mtime +5 -print0 |
xargs -0 grep -l -v "firstUnwantedTerm" 'mySearchTerm'
Apparently this query does not exclude all paths that contain "firstUnwantedTerm".
Also, I'd love if the results could be sorted by modified datetime descending, displaying: their modified time, the full file name, and the search query (maybe in a different color in the console) surrounded by some context where it was seen.
grep -rnwl --exclude='*firstUnwantedTerm*' '/mnt/c/code' -e "mySearchTerm" from here also seemed to be a step in the right direction in the sense that it seems to correctly exclude my exclusion term, but it doesn't filter by modified datetime and doesn't output all the desired fields, of course.
This is just quick & dirty and without sorting by date, but with 3 lines of context before/after each match and coloured matches:
find ~/mnt/c/code -type f -mtime -100 -mtime +5 | grep -v 'someUnwantedPath' | xargs -I '{}' sh -c "ls -l '{}' && grep --color -C 3 -h 'mySearchTerm' '{}'"
Broken down into pieces with some explanation:
# Find regular files between 100 and 5 days old (modification time)
find ~/mnt/c/code -type f -mtime -100 -mtime +5 |
# Remove unwanted files from list
grep -v 'someUnwantedPath' |
# List each file, then find search term in each file,
# highlighting matches and
# showing 3 lines of context above and below each match
xargs -I '{}' sh -c "ls -l '{}' && grep --color -C 3 -h 'mySearchTerm' '{}'"
I think you can take it from here. Of course this can be made more beautiful and fulfill all your requirements, but I just had a couple of minutes and leave it to the UNIX gurus to beat me and make this whole thing 200% better.
Update: version 2 without xargs and with only one grep command:
find ~/mnt/c/code -type f -mtime -30 -mtime +25 ! -path '*someUnwantedPath*' -exec stat -c "%y %s %n" {} \; -exec grep --color -C 3 -h 'mySearchTerm' {} \;
! -path '*someUnwantedPath*' filters out unwanted paths, and the two -exec subcommands list candidate files and then show the grep results (which could also be empty), just like before. Please note that I changed from using ls -l to stat -c "%y %s %n" in order to list file date, size and name (just modify as you wish).
Again, with additional line breaks for readability:
find ~/mnt/c/code
-type f
-mtime -30 -mtime +25
! -path '*someUnwantedPath*'
-exec stat -c "%y %s %n" {} \;
-exec grep --color -C 3 -h 'mySearchTerm' {} \;
Related
I am trying to grep or find for 2 specific words in each file in a directory. And then If i find more than one file found with such a combination - only then I should print those file names to a CSV file.
Here is what I tried so far:
find /dir/test -type f -printf "%f\n" | xargs grep -r -l -e 'ABCD1' -e 'ABCD2' > log1.csv
But this will provide all file names that have "ABCD1" and "ABCD2". In other words, this command will print the filename even if there is only one file that has this combo.
I will need to grep the entire directory for those 2 words and both words MUST be in more than one file if it has to write the filenames to CSV. I should also be able to include sub directories
Any help would be great!
Thanks
find + GNU grep solution:
find . -type f -exec grep -qPz 'ABCD1[\s\S]*ABCD2|ABCD2[\s\S]*ABCD1' {} \; -printf "%f\n" \
| tee /tmp/flist | [[ $(wc -l) -gt 1 ]] && cat /tmp/flist > log1.csv
Alternative way:
grep -lr 'ABCD2' /dir/test/* | xargs grep -l 'ABCD1' | tee /tmp/flist \
| [[ $(wc -l) -gt 1 ]] && sed 's/.*\/\([^\/]*\)$/\1/' /tmp/flist > log1.csv
How do I efficiently find all the files in the system whose contents starts with \x0000000000 (5 NUL bytes)?
Tried to do the following
$ find . -type f -exec grep -m 1 -ovP "[^\x00]" {}
$ find . -type f -exec grep -m 1 -vP "^\00{5}" {}
but the first variant works only for all-NUL files, and the last one searches through the whole file, not only the first 5 bytes, which makes it very slow and gives many false positives.
Try this :
grep -r '^\\x0000000000' * | cut -d ":" -f 1
I want to grep -R a directory but exclude symlinks how dow I do it?
Maybe something like grep -R --no-symlinks or something?
Thank you.
Gnu grep v2.11-8 and on if invoked with -r excludes symlinks not specified on the command line and includes them when invoked with -R.
If you already know the name(s) of the symlinks you want to exclude:
grep -r --exclude-dir=LINK1 --exclude-dir=LINK2 PATTERN .
If the name(s) of the symlinks vary, maybe exclude symlinks with a find command first, and then grep the files that this outputs:
find . -type f -a -exec grep -H PATTERN '{}' \;
The '-H' to grep adds the filename to the output (which is the default if grep is searching recursively, but is not here, where grep is being handed individual file names.)
I commonly want to modify grep to exclude source control directories. That is most efficiently done by the initial find command:
find . -name .git -prune -o -type f -a -exec grep -H PATTERN '{}' \;
For now.. here is how I would exclude symbolic links when using grep
If you want just file names matching your search:
for f in $(grep -Rl 'search' *); do if [ ! -h "$f" ]; then echo "$f"; fi; done;
Explaination:
grep -R # recursive
grep -l # file names only
if [ ! -h "file" ] # bash if not a symbolic link
If you want the matched content output, how about a double grep:
srch="whatever"; for f in $(grep -Rl "$srch" *); do if [ ! -h "$f" ]; then
echo -e "\n## $f";
grep -n "$srch" "$f";
fi; done;
Explaination:
echo -e # enable interpretation of backslash escapes
grep -n # adds line numbers to output
.. It's not perfect of course. But it could get the job done!
If you're using an older grep that does not have the -r behavior described in Aryeh Leib Taurog's answer, you can use a combination of find, xargs and grep:
find . -type f | xargs grep "text-to-search-for"
If you are using BSD grep (Mac) the following works similar to '-r' option of Gnu grep.
grep -OR <PATTERN> <PATH> 2> /dev/null
From man page
-O If -R is specified, follow symbolic links only if they were explicitly listed on the command line.
My script is not matching exact words only. Example: 12312312Alachua21321 or Alachuas would match for Alachua.
KEYWORDS=("Alachua" "Gainesville" "Hawthorne")
IFS=$'\n'
find . -size +1c -type f ! -exec grep -qF "${KEYWORDS[*]}" {} \; -exec truncate -s 0 {} \;
If you want grep to match exact words, use grep -w.
You may also want to read the grep manual by running man grep.
so far I have gotten this far:
prompt$ find path/to/project -type f | grep -v '*.ori|*.pte|*.uh|*.mna' | xargs dos2unix 2> log.txt
However, the files with extensions .ori, .pte, .uh and .mna still show up.
It is better to leave the excluding to find, see Birei's answer.
The problem with your grep pattern is that you have specified it as a shell glob. By default grep expects basic regular expressions (BRE) as its first argument. So if you replace your grep pattern with: .*\.\(ori\|pte\|uh\|mna\)$ it should work. Or if you would rather use extended regular expressions (ERE), you can enable them with -E. Then you can express the same exclusion like this: .*\.(ori|pte|uh|mna)$.
Full command-line:
find . -type f | grep -vE '.*\.(ori|pte|uh|mna)$'
One way:
find path/to/project *.* -type f ! \( -name '*.ori' -o -name '*.pte' -o -name '*.uh' -o -name '*.mna' \)
| xargs dos2unix 2> log.txt