find and grep or another way? - grep

I am using Linux Mint
I would like to find files less then 90k and search containing text "media" in them.
I have tried;
find /directory -size -90k -exec grep -lir --text "media" {} \+
This also finds files greater then 90k
what is the best way of finding only small files containing certain text?

I would restrict the search to files (-type -f), then use xargs to feed the result list to grep, with the -print0 and --null options on both side of the pipe, in case some path contains space:
find /directory -type f -size -90k -print0 | xargs --null grep -il --text "media"

Related

grep -v no longer excluding pattern after migration

One of our shared hosting sites got moved recently. New server is Red Hat 4.8.5-36. The other binaries' versions are grep (GNU grep) 2.20 and find (GNU findutils) 4.5.11
This cron job had previously functioned fine for at least 6 years and gave us a list of updated files which did not match logs, cache etc.
find /home/example/example.com/public_html/ -mmin -12 \
| grep -v 'error_log|logs|cache'
After the move the -v seems to be ineffectual and we get results like
/home/example/example.com/public_html/products/cache/ssu/pc/d/5/c
The change in results occurred immediately after the move. Anyone have an idea why it is now broken? Additionally - how do I restore the filtered output?
If you like to exclude a group of words.
grep -v -e 'error_log' -e 'logs' -e 'cache' file
With awk you can do:
awk '!/error_log|logs|cache/' file
It will exclude all lines with these words.
grep -v 'error_log|logs|cache'
only excludes strings that contain literally error_log|logs|cache. To use alternation, use extended regular expressions:
grep -Ev 'error_log|logs|cache'
GNU grep supports alternation as an extension to Basic Regular Expressions, but | needs to be escaped, so this might work as well:
grep -v 'error_log\|logs\|cache'
However, grep isn't required in the first place, we can use (GNU) find to do all the work:
find /home/example/example.com/public_html/ -mmin -12 \
-not \( -name '*error_log*' -or -name '*logs*' -or -name '*cache*' \)
or, POSIX compliant:
find /home/example/example.com/public_html/ -mmin -12 \
\! \( -name '*error_log*' -o -name '*logs*' -o -name '*cache*' \)
or, if your find supports -regex (both GNU and BSD find do):
find /home/example/example.com/public_html/ -mmin -12 \
-not -regex '.*\(error_log\|logs\|cache\).*'

grep without extended pattern option on finding files that have characters after the pattern

I have set of files in a directory. In those, few files contain a matching pattern config_dict["backup.moduleDir"] and some characters following them. In few other files the pattern appears exactly at the end of the line (no characters followed after the pattern). Note that, the pattern appears exactly one time in all these files.
Now, I want to find those file names which have some characters following a matching pattern. I use the below code:
find . -type f -name "*.py" -exec grep -El 'config_dict\["backup.moduleDir"].+$' {} \;
Actually I want to avoid the use of regex character '+' and extended pattern option -E of grep. So I tried using the grep -v logic by the following 2 ways, but it did not give me the expected result. What really went wrong in the below 2 methods?
grep -vl 'config_dict\["backup.moduleDir"\]$' `find . -type f -name "*.py" -exec grep -l 'backup.moduleDir' {} \;`
find . -type f -name "*.py" -exec grep -l 'backup.moduleDir' {} \; | xargs grep -vl 'config_dict["backup.moduleDir"]$'
Surprisingly in the above working code, I have to escape only the opening square bracket '[' where as escaping is optional for closing square bracket ']' and for double quotes and for dot character between the strings "backup" and "moduleDir". How this is possible?
Using a simple dot without + does the job:
grep 'config_dict\["backup.moduleDir"].' *.py
This will find config_dict["backup.moduleDir"] followed by at least 1 character, in all python scripts.

How can I grep hidden files?

I am searching through a Git repository and would like to include the .git folder.
grep does not include this folder if I run
grep -r search *
What would be a grep command to include this folder?
Please refer to the solution at the end of this post as a better alternative to what you're doing.
You can explicitly include hidden files (a directory is also a file).
grep -r search * .[^.]*
The * will match all files except hidden ones and .[^.]* will match only hidden files without ... However this will fail if there are either no non-hidden files or no hidden files in a given directory. You could of course explicitly add .git instead of .*.
However, if you simply want to search in a given directory, do it like this:
grep -r search .
The . will match the current path, which will include both non-hidden and hidden files.
I just ran into this problem, and based on #bitmask's answer, here is my simple modification to avoid the problem pointed out by #sehe:
grep -r search_string * .[^.]*
Perhaps you will prefer to combine "grep" with the "find" command for a complete solution like:
find . -exec grep -Hn search {} \;
This command will search inside hidden files or directories for string "search" and list any files with a coincidence with this output format:
File path:Line number:line with coincidence
./foo/bar:42:search line
./foo/.bar:42:search line
./.foo/bar:42:search line
./.foo/.bar:42:search line
To prevent matching . and .. which are not hidden files, you can use grep with ls -A like in this example:
ls -A | grep "^\."
^\. states that the first character must be .
The -A or --almost-all option excludes the results . and .. so that only hidden files and directories are matched.
You may want to use this approach, assuming you're searching the current directory (otherwise replace . with the desired directory):
find . -type f | xargs grep search
or if you just want to search at the top level (which is quicker to test if you're trying these out):
find . -type f -maxdepth 1 | xargs grep search
UPDATE: I modified the examples in response to Scott's comments. I also added "-type f".
To search within ONLY all hidden files and directories from your current location:
find . -name ".*" -exec grep -rs search {} \;
ONLY all hidden files:
find . -name ".*" -type f -exec grep -s search {} \;
ONLY all hidden directories:
find . -name ".*" -type d -exec grep -rs search {} \;
All the other answers are better. This one might be easy to remember:
find . -type f | xargs grep search
It finds only files (including hidden) and greps each file.
To find only within a certain folder you can use:
ls -al | grep " \."
It is a very simple command to list and pipe to grep.
In addition to Tyler's suggestion, Here is the command to grep all files and folders recursively including hidden files
find . -name "*.*" -exec grep -li 'search' {} \;
You can also search for specific types of hidden files like so for hidden directory files:
grep -r --include=*.directory "search-string"
This may work better than some of the other options. The other options that worked can be too slow.

Searching HTML files in a directory for text

Ok, I'm very new to programming but am understanding how to conceptualize and talk about what I want and need to learn and find better.
Right now I am working with a directory /Food and have .html pages that I've downloaded from several sites.
I'd like to create a script to basically use the directory /Food and all files in this folder and its sub-directories, and compare the text for files that contain the same strings I input.
So something like:
commandforsearchingtextfiles [option for directory]/food *.[or command for all files following this directory path]
salt (string1)
sugar (string 2)
flour (string 3)
echo results
The results/output should be the files that contain the strings... and if you can add extra ideas on how to organize the output
Again, if this is covered, please just point me in the right locations of where to learn about this but if you have any quick advice or a quick script, that would be great too.
You on linux? Or could use cygwin (if on windows)?
... if so the basic linux commands would cope with this pretty well.
eg to search for all files containing salt...
find Food/ -type f -name "*.html" -print0 | xargs -0 grep salt
can narrow/widen the search by adding more switches to the various commands, eg case insensitive:
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -i salt
or just the filenames (not the matched text)
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -l salt
for more check "grep -h".
Multi-word phrases are possible
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -i "the quick brown fox"
But there is an added complication - HTML itself doesnt care about whitespace, so the phrase could be split over multiple lines. Which means the whitespace in the documents could be different to your search. eg the above wont match
the quick
brown fox
but tis valid html. Use Regex to workaround that...
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -iE "the[[:space:]]+quick[[:space:]]+brown[[:space:]]+fox"
but its starting to get messy.
You could put this in a .sh to so dont have to type all of that.
eg
#!/usr/bin/sh
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -i "$*"
which when saved as a file. And made executable, can be just run to run a test searc
find_in_food salt
will display a list of filenames.
(this is of course barely touching the surface of whats possible with this!)

use grep to return a list of files, given multiple keywords (like google returns a list of webpages)

I need to find ALL files that have multiple keywords anywhere in the file (not necessarily on the same line), given a starting directory like ~/. Does "grep -ro" do this?
(I'm using Unix, Mac OSX 10.4)
You can use the -l option to get a list of filenames with matches, so it's just a matter of finding all of the files that have the first keyword and then filtering that list down to the files that also have the second keyword:
grep -rl first_keyword basedir | xargs grep -l second_keyword
To search just *.txt
find ~/. -name "*.txt" | xargs grep -l first_keyword | xargs grep -l second_keyword
Thanks Adam!

Resources