why doesn't recursive grep appear to work? - grep

The man page for grep says:
-r, --recursive
Read all files under each directory, recursively
OK, then how is this possible:
# grep -r BUILD_AP24_TRUE apache
# grep BUILD_AP24_TRUE apache/Makefile.in
#BUILD_AP24_TRUE#mod_shib_24_la_DEPENDENCIES = $(am__DEPENDENCIES_1) \
(...)

There are two likely causes this type of problem:
grep is aliased to something that excluded the files you were interested in.
The file of interest is in a symlinked directory or is itself a symlink.
In your case, it appears to be the second cause. One solution is to use grep -R in place of grep -r. From man grep:
-r, --recursive
Read all files under each directory, recursively, following symbolic
links only if they are on the command line. Note that if no file
operand is given, grep searches the working directory.
This is equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively.
Follow all symbolic links, unlike -r.

Related

grep - difference between ways of specifying directory and globs

Say I'm in a project folder and want to grep a keyword using grep -rni. What's the difference between these 3 commands?
grep -rni . -e "keyword"
grep -rni * -e "keyword"
grep -rni **/* -e "keyword"
I tested this and noticed that the first two commands return the same number of matches, although in different ordering. The third one returned significantly more matches than the first two, however.
Is there any reason to use the third one ever? Is the reason it's returning more matches duplicates?
First of all, the difference has nothing to do with the arguments -n and -i.
From grep man page:
-n, --line-number
Prefix each line of output with the 1-based line number within its input file.
-i, --ignore-case
Ignore case distinctions in patterns and input data, so that characters that differ only in case match each other.
-r, --recursive
Read all files under each directory, recursively, following symbolic links only if they are on the command line. Note that if no file operand is given, grep searches the working directory. This is equivalent to the
-d recurse option.
So, the difference is actually on how the strings * and **/* are interpreted by the shell.
With . you pass the current directory as an argument to grep. No mystery here because it is grep the one who walks the current working directory.
With * you pass every file in the current directory as an argument to grep (this include directories).
Now, suppose you have the following directory structure:
├── file.txt
├── one
│   └── file.txt
└── two
└── file.txt
Running grep -rni * -e keyword is translated to:
grep -rni file.txt one two -e keyword
This conditions grep to iterate files and nested directories in that order.
Finally, grep -rni **/* -e keyword will translate to this command line:
grep -rni file.txt one one/file.txt two two/file.txt -e keyword
The problem with this last approach is that some files will be processed more than once. For instance: one/file.txt will be processed twice: once because it is explicitly in the argument list, and another time because it belongs to the directory one, which is also in the argument list.

Mingw64 shell's grep ignores -r option?

I'm trying to do a grep in Microsoft Windows, using the MINGW64 shell v4.4.23(1). (That's what the title bar says. I assume this means MingW-W64.)
I want to list all files in a specified directory tree that have a certain filename extension and do not contain a certain string.
With the current directory set to the top of the tree I entered
grep -r -L thestring *.theextension
It lists only files in the current directory, not the tree.
I tried some variations and determined that grep is simply ignoring the -r option. It ignores --recursive, too.
But when I enter grep --help, it lists both -r and --recursive as valid options, with the expected meaning.
Is this a bug in the shell, or am I doing something stupid?
With grep -r -L thestring *.theextension you are telling grep to search recursively in any file or folder matching *.theextension. If you don't have any folders matching that you shouldn't expect it to go through any other folders. The -L flag doesn't mean it's going to look at anything not matching *.theextension, maybe that's what was confusing you...

How to use grep to search for a word recursively in all directories named /src and only *.js files

In a react project, I want to search for the case insensitive word "tolowercase" in all /src directories and only search in *.js files.
Use
grep -RHlie tolowercase --include '*.js' /src
The options are elaborated on from running man grep:
-R, -r, --recursive
Recursively search subdirectories listed.
-H Always print filename headers with output lines.
-l, --files-with-matches
Only the names of files containing selected lines are written to
standard output. grep will only search a file until a match has
been found, making searches potentially less expensive. Path-
names are listed once per file searched. If the standard input
is searched, the string ``(standard input)'' is written.
-i, --ignore-case
Perform case insensitive matching. By default, grep is case sen-
sitive.
-e pattern, --regexp=pattern
Specify a pattern used during the search of the input: an input
line is selected if it matches any of the specified patterns.
This option is most useful when multiple -e options are used to
specify multiple patterns, or when a pattern begins with a dash
(`-').

grep recursive filename matching (grep -ir "xyz" *.cpp) does not work

while
grep -ir "xyz" * recursively searches through the directories and tell me that the text is present in ./x/y/z/abc.cpp
However ,
grep -ir "xyz" *.cpp offers no result.
Isn't the second command supposed to recursively grep all cpp files inside the directory ?
What am I missing here?
Grep will recurse through any directories you match with your glob pattern. (In your case, you probably do not have any directories that match the pattern "*.cpp") You could explicitly specify them: grep -ir "xyz" *.cpp */*.cpp */*/*.cpp */*/*/*.cpp, etc. You can also use the --include option (see the example below)
If you are using GNU grep, then you can use the following:
grep -ir --include "*.cpp" "xyz" .
The command above says to search recursively starting in current directory ignoring case on the pattern and to only search in files that match the glob pattern "*.cpp".
OR if you are on some other Unix platform, you can use this:
find ./ -type f -name "*.cpp" -print0 | xargs -0 grep -i "xyz"
If you are sure that none of your files have spaces in their names, you can omit the -print0 argument to find and the -0 to xargs
The command above says the following: find all files (-type f) under the current directory (./) that match the name glob/wildcard "*.cpp" (-name "*.cpp") and then print them out delimited by a null (-print0). That list of files found should be written to the stdin of the next command: xargs. xargs should read from stdin (default behavior) and split its input on nulls (-0) and then call the grep command with the specified options (grep -i "xyz") on that list of files.
If you are interested in learning more about why grep -ir "xyz" *.cpp does not work the way you think it should, you should search for "shell globbing" (here is a good first article on the subject). I'll also try to provide a quick explanation. When you type in the command grep -ir "xyz" *.cpp and hit enter, there are two programs that are involved in executing your command. The first program is your shell (and unless you've done something to customize things, you are probably usually the bash shell - if you've never heard of a shell or bash, that's where you should start looking, there are tons of good articles). Suffice it say that a shell is just a program that is designed to let you navigate the filesystem on your computer and run other programs. (In Windows, when you double click on an icon to launch a program, or open a folder to access a file, the program that you are running is explorer.exe and it is the Windows graphical shell). So, when you type the command grep -ir "xyz" *.cpp, before grep is run, the shell handles reading your command and does a few things. One of the things is does is expand glob patterns (things like *.txt or [0-9]+.pdf). Like I said, if you want to understand it, go read more about it, but the thing you should take away is that the grep command never sees the *.cpp. What happens is, the shell looks in the current directory for any files or directories with a name that match the pattern *.cpp and then replaces them on the command line BEFORE it runs the grep command. (If it doesn't find anything that matches, then it will leave the *.cpp there and grep will see it, but grep because doesn't normally do glob matching, this doesn't do anything for you).
Alternatively, when you type in grep -ir "xyz" *, what happens is that the shell replaces the * with the name of every file and directory in the current directory (because * matches anything). Let's say you had a directory that contained file1, file2, and dir1, and dir2, then the shell would perform its replacements and then execute a command that looked like this grep -ir "xyz" file1 file2 dir1 dir2, which means grep would search file1 and file2 for a line with the string xyz, and because of the -ir it also search recursively through dir1 and dir2 and search any files found for that string as well. Lastly, if you've followed everything I've said so far, then it will make sense to you that grep does have a way to use glob patterns on recursive searches, and that is to use the --include option, as in the command I described earlier: grep -ir --include "*.cpp" "xyz" ., and the reason why we put the *.cpp in quotes in that command is to prevent the shell from trying to expand the glob pattern before we run the command.

To understand the practical use of Grep's option -H in different situations

This question is based on this answer.
Why do you get the same output from the both commands?
Command A
$sudo grep muel * /tmp
masi:muel
Command B
$sudo grep -H muel * /tmp
masi:muel
Rob's comment suggests me that Command A should not give me masi:, but only muel.
In short, what is the practical purpose of -H?
Grep will list the filenames by default if more than one filename is given. The -H option makes it do that even if only one filename is given. In both your examples, more than one filename is given.
Here's a better example:
$ grep Richie notes.txt
Richie wears glasses.
$ grep -H Richie notes.txt
notes.txt:Richie wears glasses.
It's more useful when you're giving it a wildcard for an unknown number of files, and you always want the filenames printed even if the wildcard only matches one file.
If you grep a single file, -H makes a difference:
$ grep muel mesi
muel
$ grep -H muel mesi
masi:muel
This could be significant in various scripting contexts. For example, a script (or a non-trivial piped series of commands) might not be aware of how many files it's actually dealing with: one, or many.
When you grep from multiple files, by default it shows the name of the file where the match was found. If you specify -H, the file name will always be shown, even if you grep from a single file. You can specify -h to never show the file name.
Emacs has grep interface (M-x grep, M-x lgrep, M-x rgrep). If you ask Emacs to search for foo in the current directory, then Emacs calls grep and process the grep output and then present you with results with clickable links. Clickable links, just like Google.
What Emacs does is that it passes two options to grep: -n (show line number) and -H (show filenames even if only one file. the point is consistency) and then turn the output into clickable links.
In general, consistency is good for being a good API, but consistency conflicts with DWIM.
When you directly use grep, you want DWIM, so you don't pass -H.

Resources