Ok, I'm very new to programming but am understanding how to conceptualize and talk about what I want and need to learn and find better.
Right now I am working with a directory /Food and have .html pages that I've downloaded from several sites.
I'd like to create a script to basically use the directory /Food and all files in this folder and its sub-directories, and compare the text for files that contain the same strings I input.
So something like:
commandforsearchingtextfiles [option for directory]/food *.[or command for all files following this directory path]
salt (string1)
sugar (string 2)
flour (string 3)
echo results
The results/output should be the files that contain the strings... and if you can add extra ideas on how to organize the output
Again, if this is covered, please just point me in the right locations of where to learn about this but if you have any quick advice or a quick script, that would be great too.
You on linux? Or could use cygwin (if on windows)?
... if so the basic linux commands would cope with this pretty well.
eg to search for all files containing salt...
find Food/ -type f -name "*.html" -print0 | xargs -0 grep salt
can narrow/widen the search by adding more switches to the various commands, eg case insensitive:
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -i salt
or just the filenames (not the matched text)
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -l salt
for more check "grep -h".
Multi-word phrases are possible
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -i "the quick brown fox"
But there is an added complication - HTML itself doesnt care about whitespace, so the phrase could be split over multiple lines. Which means the whitespace in the documents could be different to your search. eg the above wont match
the quick
brown fox
but tis valid html. Use Regex to workaround that...
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -iE "the[[:space:]]+quick[[:space:]]+brown[[:space:]]+fox"
but its starting to get messy.
You could put this in a .sh to so dont have to type all of that.
eg
#!/usr/bin/sh
find Food/ -type f -name "*.html" -print0 | xargs -0 grep -i "$*"
which when saved as a file. And made executable, can be just run to run a test searc
find_in_food salt
will display a list of filenames.
(this is of course barely touching the surface of whats possible with this!)
Related
I am using Linux Mint
I would like to find files less then 90k and search containing text "media" in them.
I have tried;
find /directory -size -90k -exec grep -lir --text "media" {} \+
This also finds files greater then 90k
what is the best way of finding only small files containing certain text?
I would restrict the search to files (-type -f), then use xargs to feed the result list to grep, with the -print0 and --null options on both side of the pipe, in case some path contains space:
find /directory -type f -size -90k -print0 | xargs --null grep -il --text "media"
I have set of files in a directory. In those, few files contain a matching pattern config_dict["backup.moduleDir"] and some characters following them. In few other files the pattern appears exactly at the end of the line (no characters followed after the pattern). Note that, the pattern appears exactly one time in all these files.
Now, I want to find those file names which have some characters following a matching pattern. I use the below code:
find . -type f -name "*.py" -exec grep -El 'config_dict\["backup.moduleDir"].+$' {} \;
Actually I want to avoid the use of regex character '+' and extended pattern option -E of grep. So I tried using the grep -v logic by the following 2 ways, but it did not give me the expected result. What really went wrong in the below 2 methods?
grep -vl 'config_dict\["backup.moduleDir"\]$' `find . -type f -name "*.py" -exec grep -l 'backup.moduleDir' {} \;`
find . -type f -name "*.py" -exec grep -l 'backup.moduleDir' {} \; | xargs grep -vl 'config_dict["backup.moduleDir"]$'
Surprisingly in the above working code, I have to escape only the opening square bracket '[' where as escaping is optional for closing square bracket ']' and for double quotes and for dot character between the strings "backup" and "moduleDir". How this is possible?
Using a simple dot without + does the job:
grep 'config_dict\["backup.moduleDir"].' *.py
This will find config_dict["backup.moduleDir"] followed by at least 1 character, in all python scripts.
I am searching through a Git repository and would like to include the .git folder.
grep does not include this folder if I run
grep -r search *
What would be a grep command to include this folder?
Please refer to the solution at the end of this post as a better alternative to what you're doing.
You can explicitly include hidden files (a directory is also a file).
grep -r search * .[^.]*
The * will match all files except hidden ones and .[^.]* will match only hidden files without ... However this will fail if there are either no non-hidden files or no hidden files in a given directory. You could of course explicitly add .git instead of .*.
However, if you simply want to search in a given directory, do it like this:
grep -r search .
The . will match the current path, which will include both non-hidden and hidden files.
I just ran into this problem, and based on #bitmask's answer, here is my simple modification to avoid the problem pointed out by #sehe:
grep -r search_string * .[^.]*
Perhaps you will prefer to combine "grep" with the "find" command for a complete solution like:
find . -exec grep -Hn search {} \;
This command will search inside hidden files or directories for string "search" and list any files with a coincidence with this output format:
File path:Line number:line with coincidence
./foo/bar:42:search line
./foo/.bar:42:search line
./.foo/bar:42:search line
./.foo/.bar:42:search line
To prevent matching . and .. which are not hidden files, you can use grep with ls -A like in this example:
ls -A | grep "^\."
^\. states that the first character must be .
The -A or --almost-all option excludes the results . and .. so that only hidden files and directories are matched.
You may want to use this approach, assuming you're searching the current directory (otherwise replace . with the desired directory):
find . -type f | xargs grep search
or if you just want to search at the top level (which is quicker to test if you're trying these out):
find . -type f -maxdepth 1 | xargs grep search
UPDATE: I modified the examples in response to Scott's comments. I also added "-type f".
To search within ONLY all hidden files and directories from your current location:
find . -name ".*" -exec grep -rs search {} \;
ONLY all hidden files:
find . -name ".*" -type f -exec grep -s search {} \;
ONLY all hidden directories:
find . -name ".*" -type d -exec grep -rs search {} \;
All the other answers are better. This one might be easy to remember:
find . -type f | xargs grep search
It finds only files (including hidden) and greps each file.
To find only within a certain folder you can use:
ls -al | grep " \."
It is a very simple command to list and pipe to grep.
In addition to Tyler's suggestion, Here is the command to grep all files and folders recursively including hidden files
find . -name "*.*" -exec grep -li 'search' {} \;
You can also search for specific types of hidden files like so for hidden directory files:
grep -r --include=*.directory "search-string"
This may work better than some of the other options. The other options that worked can be too slow.
I've been googling around, and I can't find the answer I'm looking for.
Say I have a file, text1.txt, in directory mydir whose contents are:
one
two
and another called text2.txt, also in mydir, whose contents are:
two
three
four
I'm trying to get a list of files (for a given directory) which contain all (not any) patterns I search for. In the example I provided, I'm looking for output somewhere along the lines of:
./text1.txt
or
./text1.txt:one
./text1.txt:two
The only things I've been able to find are concerning matching any patterns in a file, or matching multiple patterns in a single file (which I tried extending to a whole directory, but received grep usage errors).
Any help is much appreciated.
Edit-Things I've tried
grep "pattern1" < ./* | grep "pattern2" ./*
"ambiguous redirect"
grep 'pattern1'|'pattern2' ./*
returns files that match either pattern
One way could be like this:
find . | xargs grep 'pattern1' -sl | xargs grep 'pattern2' -sl
I think this is what you need (you can add easily more patterns)
grep -EH 'pattern1|pattern2' mydir
To refine brain's answer:
find . -type f -print0 | xargs -0 grep 'pattern1' -slZ | xargs -0 grep 'pattern2' -sl
This will keep grep from trying to search directories, and can properly handle filenames with spaces, if you pass the -Z flag to grep for all but the last pattern and pass -0 to xargs.
I need to find ALL files that have multiple keywords anywhere in the file (not necessarily on the same line), given a starting directory like ~/. Does "grep -ro" do this?
(I'm using Unix, Mac OSX 10.4)
You can use the -l option to get a list of filenames with matches, so it's just a matter of finding all of the files that have the first keyword and then filtering that list down to the files that also have the second keyword:
grep -rl first_keyword basedir | xargs grep -l second_keyword
To search just *.txt
find ~/. -name "*.txt" | xargs grep -l first_keyword | xargs grep -l second_keyword
Thanks Adam!