How to not conflict with filename when combining grep commands?

How to not conflict with filename when combining grep commands? - grep

Imagine you want to grep recursively for string1 but not string1_suffix. Trivial approach would be
grep -r string1 | grep -v string1_suffix`
But what if the file names can contain string1_suffix?
A line containing string1_suffix_data.json: blabla string1 would be filtered away by the second grep.
Is it possible to circumvent this somehow? Of course in this trivial example I could just turn around the first and the second part, but what about the general case?

If you have PCRE with -P option, you can use string1(?!_suffix)
For a general case, use ^(?!.*str2).*str1 to match lines containing str1 but not str2
With find+awk (tested on GNU awk, not sure about other implementations)
find -type f -exec awk '/str1/ && !/str2/{print FILENAME ":" $0}' {} +

Related

grep without extended pattern option on finding files that have characters after the pattern

I have set of files in a directory. In those, few files contain a matching pattern config_dict["backup.moduleDir"] and some characters following them. In few other files the pattern appears exactly at the end of the line (no characters followed after the pattern). Note that, the pattern appears exactly one time in all these files.
Now, I want to find those file names which have some characters following a matching pattern. I use the below code:
find . -type f -name "*.py" -exec grep -El 'config_dict\["backup.moduleDir"].+$' {} \;
Actually I want to avoid the use of regex character '+' and extended pattern option -E of grep. So I tried using the grep -v logic by the following 2 ways, but it did not give me the expected result. What really went wrong in the below 2 methods?
grep -vl 'config_dict\["backup.moduleDir"\]$' `find . -type f -name "*.py" -exec grep -l 'backup.moduleDir' {} \;`
find . -type f -name "*.py" -exec grep -l 'backup.moduleDir' {} \; | xargs grep -vl 'config_dict["backup.moduleDir"]$'
Surprisingly in the above working code, I have to escape only the opening square bracket '[' where as escaping is optional for closing square bracket ']' and for double quotes and for dot character between the strings "backup" and "moduleDir". How this is possible?

Using a simple dot without + does the job:
grep 'config_dict\["backup.moduleDir"].' *.py
This will find config_dict["backup.moduleDir"] followed by at least 1 character, in all python scripts.

Find parameters of a method with grep

I need some help with a grep command (in the Bash).
In my source files, I want to list all unique parameters of a function. Background: I want to search through all files, to see, which permissions ([perm("abc")] are used.
Example.txt:
if (x) perm("this"); else perm("that");
perm("what");
I'd like to have my grep output:
this
that
what
If I do my grep with this search expression
perm\(\"(.*?)\"\)
I'll get perm("this), perm("that"), etc. but I'd like to have just the permissions: this and that and what.
How can I do that?

Use a look-behind:
$ grep -Po '(?<=perm\(")[^"]*' file
this
that
what
This looks for all the text occurring after perm(" and until another " is found.
Note -P is used to allow this behaviour (it is a Perl regex) and -o to just print the matched item, instead of the whole line.

Here is a gnu awk version (due to multiple characters in RS)
awk -v RS='perm\\("' -F\" 'NR>1 {print $1}' file
this
that
what

How to match a non string in gnu grep

I'll use an example to illustrate my problem. Suppose we have the file name 'file.txt' that contains the following string:
AooYoZooYZoAoooooYZ
I'd like to use grep to find all substrings that begin with 'A' and end with 'YZ' but do not contain 'YZ' in between the 'A' and 'YZ'. The desired output would be:
AooYoZooYZ
AoooooYZ
My best guess is to do the following:
$grep -E -o 'A[^(YZ)]*YZ' file.txt
But the output is only:
AoooooYZ
I'd like the parentheses to hold their meaning for the YZ but I read in the GNU grep manual (http://www.gnu.org/software/grep/manual/grep.html) that:
"Most meta-characters lose their special meaning inside bracket expressions." I've also tried:
$grep -E -o 'A.*YZ file.txt
But this outputs the entire line:
AooYoZooYZoAoooooYZ
Is there a way to override this or another way of solving my problem?

Maybe you can use non-greedy match which can be used in Perl regexp
echo 'AooYoZooYZoAoooooYZ' | grep -P -o 'A.*?YZ'
However, note that the manual for GNU grep says that -P option is highly experimental.

How to escape parenthesis in grep

I want to grep for a function call 'init()' in all JavaScript files in a directory. How do I do this using grep?
Particularly, how do I escape parenthesis, ()?

It depends. If you use regular grep, you don't escape:
echo '(foo)' | grep '(fo*)'
You actually have to escape if you want to use the parentheses as grouping.
If you use extended regular expressions, you do escape:
echo '(foo)' | grep -E '\(fo*\)'

If you want to search for exactly the string "init()" then use fgrep "init()" or grep -F "init()".
Both of these will do fixed string matching, i.e. will treat the pattern as a plain string to search for and not as a regex. I believe it is also faster than doing a regex search.

$ echo "init()" | grep -Erin 'init\([^)]*\)'
1:init()
$ echo "init(test)" | grep -Erin 'init\([^)]*\)'
1:init(test)
$ echo "initwhat" | grep -Erin 'init\([^)]*\)'

Move to your root directory (if you are aware where the JavaScript files are). Then do the following.
grep 'init()' *.js

Can grep show only words that match search pattern?

Is there a way to make grep output "words" from files that match the search expression?
If I want to find all the instances of, say, "th" in a number of files, I can do:
grep "th" *
but the output will be something like (bold is by me);
some-text-file : the cat sat on the mat
some-other-text-file : the quick brown fox
yet-another-text-file : i hope this explains it thoroughly
What I want it to output, using the same search, is:
the
the
the
this
thoroughly
Is this possible using grep? Or using another combination of tools?

Try grep -o:
grep -oh "\w*th\w*" *
Edit: matching from Phil's comment.
From the docs:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.

Cross distribution safe answer (including windows minGW?)
grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"
If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.
Linux cross distribution safe answer
grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'
To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)
More from the manual for grep
-o Print each match, but only the match, not the entire line.
-h Never print filename headers (i.e. filenames) with output lines.
-w The expression is searched for as a word (as if surrounded by
`[[:<:]]' and `[[:>:]]';
The reason why the original answer does not work for everyone
The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more
Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep
As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.
(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)
Credit for the "-o" workaround from #AdamRosenfield answer

It's more simple than you think. Try this:
egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)
egrep -iwo 'th.[a-z]*' filename.txt ### (Case Insensitive)
Where,
egrep: Grep will work with extended regular expression.
w : Matches only word/words instead of substring.
o : Display only matched pattern instead of whole line.
i : If u want to ignore case sensitivity.

You could translate spaces to newlines and then grep, e.g.:
cat * | tr ' ' '\n' | grep th

Just awk, no need combination of tools.
# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly

grep command for only matching and perl
grep -o -P 'th.*? ' filename

I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.
It seems like ack (or ack-grep if you use Ubuntu) can do this easily:
# ack-grep -ho "\bth.*?\b" *
the
the
the
this
thoroughly
If you omit the -h flag you get:
# ack-grep -o "\bth.*?\b" *
some-other-text-file
1:the
some-text-file
1:the
the
yet-another-text-file
1:this
thoroughly
As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:
# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file
1, 5, 12/27/2010

cat *-text-file | grep -Eio "th[a-z]+"

You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.
From Wikipedia:
cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple
grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple

I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.
At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o
so, I think that could be something similar to (I'm NOT a regex Master) :
egrep -o "the*|this{1}|thoroughly{1}" filename

To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.
ack -oh --type=html "\w*icon-\w*" | sort | uniq

You could pipe your grep output into Perl like this:
grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'

grep --color -o -E "Begin.{0,}?End" file.txt
? - Match as few as possible until the End
Tested on macos terminal

$ grep -w
Excerpt from grep man page:
-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.

ripgrep
Here are the example using ripgrep:
rg -o "(\w+)?th(\w+)?"
It'll match all words matching th.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to not conflict with filename when combining grep commands? - grep

If you have PCRE with -P option, you can use string1(?!_suffix) For a general case, use ^(?!.str2).str1 to match lines containing str1 but not str2 With find+awk (tested on GNU awk, not sure about other implementations) find -type f -exec awk '/str1/ && !/str2/{print FILENAME ":" $0}' {} +

Related

grep without extended pattern option on finding files that have characters after the pattern

Find parameters of a method with grep

How to match a non string in gnu grep

How to escape parenthesis in grep

Can grep show only words that match search pattern?

Categories

Resources

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to not conflict with filename when combining grep commands? - grep

If you have PCRE with -P option, you can use string1(?!_suffix) For a general case, use ^(?!.*str2).*str1 to match lines containing str1 but not str2 With find+awk (tested on GNU awk, not sure about other implementations) find -type f -exec awk '/str1/ && !/str2/{print FILENAME ":" $0}' {} +

Related

grep without extended pattern option on finding files that have characters after the pattern

Find parameters of a method with grep

How to match a non string in gnu grep

How to escape parenthesis in grep

Can grep show only words that match search pattern?

Categories

Resources

If you have PCRE with -P option, you can use string1(?!_suffix) For a general case, use ^(?!.str2).str1 to match lines containing str1 but not str2 With find+awk (tested on GNU awk, not sure about other implementations) find -type f -exec awk '/str1/ && !/str2/{print FILENAME ":" $0}' {} +