Grep: looking for pattern only if *not behind* another pattern - grep

I try to get grep to find a pattern only if it's not behind another pattern.
So, for instance, in the three lines below, I'm looking for foo if not behind # (you guess why :-)
1./make_maps_meteo_ecmwf.pl:# foo
2./make_maps_meteo_ecmwf.pl: foo
3./make_maps_meteo_ecmwf.pl: foo #
I need lines 2 and 3, not 1.
this does not help:
grep '[^#].*foo'
piped grep won't help because it will exclude line 3
grep 'foo' | grep -v '#'
Any ideas?

No # before foo:
$ grep "^[^#]*foo" file
2./make_maps_meteo_ecmwf.pl: foo
3./make_maps_meteo_ecmwf.pl: foo #

Related

How can i make grep show a line ignoring the words i want?

I am trying to use grep with the pwd command.
So, if i enter pwd, it shows me something like:
/home/hrq/my-project/
But, for purposes of a script i am making, i need to use it with grep, so it only prints what is after hrq/, so i need to hide my home folder always (the /home/hrq/) excerpt, and show only what is onwards (like, in this case, only my-project).
Is it possible?
I tried something like
pwd | grep -ov 'home', since i saw that the "-v" flag would be equivalent to the NOT operator, and combine it with the "-o" only matching flag. But it didn't work.
Given:
$ pwd
/home/foo/tmp
$ echo "$PWD"
/home/foo/tmp
Depending on what it is you really want to do, either of these is probably what you really should be using rather than trying to use grep:
$ basename "$PWD"
tmp
$ echo "${PWD#/home/foo/}"
tmp
Use grep -Po 'hrq/\K.*', for example:
grep -Po 'hrq/\K.*' <<< '/home/hrq/my-project/'
my-project/
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
\K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. Specifically, ignore the preceding part of the regex when printing the match.
SEE ALSO:
grep manual
perlre - Perl regular expressions

grep or ripgrep: How to find only files that match multiple patterns (not only on the same line)?

I'm searching for a fast method to find all files in a folder which contain 2 or more patterns
grep -l -e foo -e bar ./*
or
rg -l -e foo -e bar
show all files containing 'foo' AND 'bar' in the same line or 'foo' OR 'bar' in different lines but I want only files that have at a minimum one 'foo' match AND one 'bar' match in different lines. Files which only have 'foo' matches or only 'bar' matches shall be filtered out.
I know I could chain the grep calls but this will be too slow.
rg with multiline does work, however it will print as result everything in-between the criteria and sometimes that's not useful.
For the use case of chaining searches (in e.g. html, json, etc), where the 1st criterium is just to narrow down the files, and the 2nd criterium is actually what I am looking for, this is a possible solution:
rg -0 -l crit1 | xargs -0 -I % rg -H crit2 %
Alternatively I have just discovered ugrep which supports combining multiple criteria using boolean operators both on line and file level. This is quite something. It's a bit slower than rg + xargs, however it prints nicely all lines matching all criteria from the files (instead of just showing the last criteria from above):
ugrep --files -e crit1 --and -e crit2
If you want to search for two or more words that occur on multiple lines you can use ripgrep's option --multiline-dotall, in addition to to provide -U/--multiline. You also need to search for foo before bar and bar before foo using the | operator:
rg -lU --multiline-dotall 'foo.*bar|bar.*foo' .
For any number of words you'll need to | all permutations of those words. For that I use a small python script (which I called rga) which searches in
the current directory (and downwards), for files that contain all arguments given on the commandline:
#! /opt/util/py310/bin/python
import sys
import subprocess
from itertools import permutations
rgarg = '|'.join(('.*'.join(x) for x in permutations(sys.argv[1:])))
cmd = ['rg', '-lU', '--multiline-dotall', rgarg, '.']
# print(' '.join(cmd))
proc = subprocess.run(cmd, capture_output=True)
sys.stdout.write(proc.stdout.decode('utf-8'))
I have searched successfully with six arguments, above that the commandline becomes to long. There are probably ways around that by saving the argument to a file and adding -f file_name, but I never needed/investigated that.
$ cat f1
afoot
2bar
$ cat f2
foo bar
$ cat f3
foot
$ cat f4
bar
$ cat f5
barred
123
foo3
$ rg -Ul '(?s)foo.*?\n.*?bar|bar.*?\n.*?foo'
f5
f1
You can use -U option to match across lines. The s flag will enable . to match newlines as well. Since you want the matches to be across different lines, you need to match a newline character in between the search terms as well.
So this doesn't perfectly answer the question, but, this is the StackOverflow question that pops up every time I google "ripgrep multiple patterns". So I'm leaving my answer here for the future googler (including myself)...
I primarily work in PowerShell, so this is how I perform an and search in ripgrep in PowerShell. This will match same line matches, which is why it's not a perfect answer, but it will identify files that match both patterns, and runs relatively quickly:
rg -l 'SecondSearchPattern' (rg -l 'FirstSearchPattern')
Explanation:
First the parens run: rg -l 'FirstSearchPattern', which searches all files for the pattern FirstSearchPattern. By using -l it returns a list of file paths only.
By placing it in (parentheses), it runs the whole command first, then "splats" the results of the command into the external rg command.
The external rg command is now run like this:
rg -l 'SecondSearchPattern' "file.txt" "directory\file.txt"
And yes, it does put them into quotes, so it handles paths with spaces. This searches all provided files that match the pattern SecondSearchPattern. Thus returning only files that match both patterns.
You can go one step further and add on | Get-Item (| gi) to return filesystem objects, and | % FullName to get the full path.
rg -l 'SecondSearchPattern' (rg -l 'FirstSearchPattern') | gi | % FullName

Escaping < and > with grep

I'm loosing something here. If I have a file with contents:
foo
<foo>
And I do grep "\<foo\>" file_name shouldn't it match only the second line? I'm also matching the first.
I'm not very good with grep so I'm probably messing things up.
Escaping them activates their meta-character properties and turns them into word boundaries in GNU grep:
$ grep 'foo' file
foo
<foo>
foobar
$ grep '\<foo\>' file
foo
<foo>
The 2nd grep above isn't looking for the string <foo>, it's looking for the string foo NOT preceded or succeeded immediately by word-constituent characters.
In general it's not safe to escape characters without knowing exactly what it means to do so. Here's another example:
$ printf 'aa\na{2}b\n'
aa
a{2}b
$ printf 'aa\na{2}b\n' | grep 'a{2}'
a{2}b
$ printf 'aa\na{2}b\n' | grep 'a\{2\}'
aa
The above \{..\} is activating their meta character properites as regexp interval delimiters.

Can I use grep to show only the matched line, and not the file it appeared in?

I sometimes want to grep for a function to see examples of how it is used in context, eg. what sort of parameters it is called with. When I am doing this, the name of the file the match appears in becomes useless clutter. Is there any way to instruct grep to not include it? (Or a grep alternative that solves the same problem?)
You can tell grep not to indicate the filename in the output with the option -h:
-h, --no-filename
Suppress the prefixing of file names on output. This is the
default when there is only one file (or only standard input) to
search.
Test
$ echo "hello" > f1
$ echo "hello man" > f2
$ grep "hello" f*
f1:hello
f2:hello man
$ grep -h "hello" f*
hello
hello man

Can grep show only words that match search pattern?

Is there a way to make grep output "words" from files that match the search expression?
If I want to find all the instances of, say, "th" in a number of files, I can do:
grep "th" *
but the output will be something like (bold is by me);
some-text-file : the cat sat on the mat
some-other-text-file : the quick brown fox
yet-another-text-file : i hope this explains it thoroughly
What I want it to output, using the same search, is:
the
the
the
this
thoroughly
Is this possible using grep? Or using another combination of tools?
Try grep -o:
grep -oh "\w*th\w*" *
Edit: matching from Phil's comment.
From the docs:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
Cross distribution safe answer (including windows minGW?)
grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"
If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.
Linux cross distribution safe answer
grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'
To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)
More from the manual for grep
-o Print each match, but only the match, not the entire line.
-h Never print filename headers (i.e. filenames) with output lines.
-w The expression is searched for as a word (as if surrounded by
`[[:<:]]' and `[[:>:]]';
The reason why the original answer does not work for everyone
The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more
Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep
As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.
(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)
Credit for the "-o" workaround from #AdamRosenfield answer
It's more simple than you think. Try this:
egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)
egrep -iwo 'th.[a-z]*' filename.txt ### (Case Insensitive)
Where,
egrep: Grep will work with extended regular expression.
w : Matches only word/words instead of substring.
o : Display only matched pattern instead of whole line.
i : If u want to ignore case sensitivity.
You could translate spaces to newlines and then grep, e.g.:
cat * | tr ' ' '\n' | grep th
Just awk, no need combination of tools.
# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly
grep command for only matching and perl
grep -o -P 'th.*? ' filename
I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.
It seems like ack (or ack-grep if you use Ubuntu) can do this easily:
# ack-grep -ho "\bth.*?\b" *
the
the
the
this
thoroughly
If you omit the -h flag you get:
# ack-grep -o "\bth.*?\b" *
some-other-text-file
1:the
some-text-file
1:the
the
yet-another-text-file
1:this
thoroughly
As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:
# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file
1, 5, 12/27/2010
cat *-text-file | grep -Eio "th[a-z]+"
You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.
From Wikipedia:
cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple
grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple
I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.
At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o
so, I think that could be something similar to (I'm NOT a regex Master) :
egrep -o "the*|this{1}|thoroughly{1}" filename
To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.
ack -oh --type=html "\w*icon-\w*" | sort | uniq
You could pipe your grep output into Perl like this:
grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'
grep --color -o -E "Begin.{0,}?End" file.txt
? - Match as few as possible until the End
Tested on macos terminal
$ grep -w
Excerpt from grep man page:
-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.
ripgrep
Here are the example using ripgrep:
rg -o "(\w+)?th(\w+)?"
It'll match all words matching th.

Resources