Linux RegEx Grep Repeat character from n to m times

Linux RegEx Grep Repeat character from n to m times - grep

I have a problem with this Linux command:
ls | grep -E 'i{2,3}'
.It should take a file that has at least 2 i and max 3 i, but it doesn't work.
This is the output
ls:
life.py, viiva.txt, viiiiiiiiiva.txt
grep:
viiva.txt, viiiiiiiiiva.txt (with the first 3 I highlighted)
Thanks for the help.

Issue with OP's attempt grep -E 'i{2,3}' will match two or three consecutive occurrences of i anywhere in the input, so 4 or more consecutive i is also a valid match.
Parsing ls output is not recommended, see Why not parse ls (and what to do instead)?. If you wish to pass the filenames after filtering to some other command, find is a good option.
$ ls
1i2i3i.txt aibi.txt II.txt life.py viiiiiiiiiva.txt viiva.txt
$ # files with 2 or 3 consecutive i
$ # note that the regex will act on entire filename, thus anchors are not needed
$ find -type f -regextype egrep -regex '[^i]*i{2,3}[^i]*'
./viiva.txt
$ # files with 2 or 3 i anywhere in the name
$ find -type f -regextype egrep -regex '[^i]*i[^i]*i[^i]*(i[^i]*)?'
./aibi.txt
./1i2i3i.txt
./viiva.txt
$ # files with 2 or 3 i anywhere in the name, ignoring case
$ find -type f -regextype egrep -iregex '[^i]*i[^i]*i[^i]*(i[^i]*)?'
./II.txt
./aibi.txt
./1i2i3i.txt
./viiva.txt
If filenames won't cause an issue, you can grep -xE or grep -ixE with above regex, where x option will make sure the regex matches the whole line, instead of anywhere in the line. Or you can also use awk:
$ # NF will give number of fields after splitting on i
$ ls | awk -F'i' 'NF>=3 && NF<=4'
1i2i3i.txt
aibi.txt
viiva.txt
$ ls | awk -F'[iI]' 'NF>=3 && NF<=4'
1i2i3i.txt
aibi.txt
II.txt
viiva.txt

Related

show filename with matching word from grep only

I am trying to find which words happened in logfiles plus show the logfilename for anything that matches following pattern:
'BA10\|BA20\|BA21\|BA30\|BA31\|BA00'
so if file dummylogfile.log contains BA10002 I would like to get a result such as:
dummylogfile.log:BA10002
it is totally fine if the logfile shows up twice for duplicate matches.
the closest I got is:
for f in $(find . -name '*.err' -exec grep -l 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' {} \+);do printf $f;printf ':';grep -o 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' $f;done
but this gives things like:
./register-05-14-11-53-59_24154.err:BA10
BA10
./register_mdw_files_2020-05-14-11-54-32_24429.err:BA10
BA10
./process_tables.2020-05-18-11-18-09_11428.err:BA30
./status_load_2020-05-18-11-35-31_9185.err:BA30
so,
1) there are empty lines with only the second match and
2) the full match (e.g., BA10004) is not shown.
thanks for the help

There are a couple of options you can pass to grep:
-H: This will report the filename and the match
-o: only show the match, not the full line
-w: The match must represent a full word (string build from [A-Za-z0-9_])
If we look at your regex, you use BA01, this will match only BA01 which can appear anywhere in the text, also mid word. If you want the regex to match a full word, it should read BA01[[:alnum:]_]* which adds any sequence of word-constituent characters (equivalent to [A-Za-z0-9_]). You can test this with
$ echo "foo BA01234 barBA012" | grep -Ho "BA01"
(standard input):BA01
(standard input):BA01
$ echo "foo BA01234 barBA012" | grep -How "BA01"
$ echo "foo BA01234 barBA012" | grep -How "BA01[[:alnum:]_]*"
(standard input):BA01234
So your grep should look like
grep -How "\('BA10\|BA20\|BA21\|BA30\|BA31\|BA00'\)[[:alnum:]_]*" *.err

From your example it seems that all files are in one directory. So the following works right away:
grep -l 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' *.err
If the files are in different directories:
find . -name '*.err' -print | xargs -I {} grep 'BA10\|BA20\|BA21\|BA30\|BA31\|BA00' {} /dev/null
Explanation: the addition of /dev/null to the filename {} forces grep to report the matching filename

grep: Find all files containing the word `star`, but not the word `start`

I have a bunch of files: some contain the word star, some contain the word start, some contain both.
I'd like to grep for files that contain the word star, but not the word start.
How can this be accomplished using only grep?

grep has some options for inverting the matches at the line or file level. You want the latter option, with the -L switch. The following will print the names of all the files in a folder that don't contain the text start:
grep -LF start *
-F tells grep that start is a literal string and not a regex. It's optional here, but might speed things up a tiny bit.
You can use the resulting list to search for files that contain star:
grep -lF star $(grep -LF start *)
-l prints only the names of files containing a match, not any line-by-line or match-by-match details. If this is not exactly what you want, man grep is your friend.
This uses an additional shell construct to run the inverted match, but it technically doesn't call any additional programs that aren't grep.
Update
Since you mention wanting to look through all the files starting with a given root folder, change -LF to -LFr. Replace * with your root folder if you don't want to change working directories.
-r tells grep to recurse into directories, and search every file it finds along the way.

With GNU grep for -w:
$ cat file
foo star bar
oof start rab
$ grep -w star *
foo star bar
or if you just want the names of the files containing star:
$ grep -lw star *
file
and to just find files to look in:
$ find . -maxdepth 1 -type f -exec grep -w 'star' {} \;
foo star bar

How do I 'grep -c' and avoid printing files with zero '0' count

The command 'grep -c blah *' lists all the files, like below.
% grep -c jill *
file1:1
file2:0
file3:0
file4:0
file5:0
file6:1
%
What I want is:
% grep -c jill * | grep -v ':0'
file1:1
file6:1
%
Instead of piping and grep'ing the output like above, is there a flag to suppress listing files with 0 counts?
SJ

How to grep nonzero counts:
grep -rIcH 'string' . | grep -v ':0$'
-r Recurse subdirectories.
-I Ignore binary files (thanks #tongpu, warlock).
-c Show count of matches. Annoyingly, includes 0-count files.
-H Show file name, even if only one file (thanks #CraigEstey).
'string' your string goes here.
. Start from the current directory.
| grep -v ':0$' Remove 0-count files. (thanks #LaurentiuRoescu)
(I realize the OP was excluding the pipe trick, but this is what works for me.)

Just use awk. e.g. with GNU awk for ENDFILE:
awk '/jill/{c++} ENDFILE{if (c) print FILENAME":"c; c=0}' *

How to grep with a list of words

I have a file A with 100 words in it separated by new lines. I would like to search file B to see if ANY of the words in file A occur in it.
I tried the following but does not work to me:
grep -F A B

You need to use the option -f:
$ grep -f A B
The option -F does a fixed string search where as -f is for specifying a file of patterns. You may want both if the file only contains fixed strings and not regexps.
$ grep -Ff A B
You may also want the -w option for matching whole words only:
$ grep -wFf A B
Read man grep for a description of all the possible arguments and what they do.

To find a very long list of words in big files, it can be more efficient to use egrep:
remove the last \n of A
$ tr '\n' '|' < A > A_regex
$ egrep -f A_regex B

Ignoring directories from a file

I am in the process of creating a script that lists all files opened via lsof output. I would like to checksum specific files and ignore directories from that output but am at a loss to do so EFFECTIVELY. For example: (I'm using FreeBSD btw)
lsof | awk '/\//{print $9}' | sort -u | head -n 5
prints:
/
/bin/sleep
/dev/bpf
What I'd like to do is: FROM that output, ignore any directories and perform an md5 on FILES (not directories).
Any pointers?

Give a try to following perl command:
lsof | perl -MDigest::MD5=md5_hex -ane '
$f = $F[ $#F ];
-f $f and printf qq|%s %s\n|, $f, md5_hex( $f )
'
It filters lsof output to plain files (-f). Take a look into perlfunc to change it to add different kind of files.
It outputs each file and its md5 separated by a space character. An example in my system is like:
/usr/lib/libm-2.17.so a2d3b2de9a1f59fb99427714fefb49ca
/usr/lib/libdl-2.17.so d74d8ac16c2d13128964353d4be7061a
/usr/lib/libnsl-2.17.so 34b6909ec60c337c21b044642b9baa3d
/usr/lib/ld-2.17.so 3d0e7b5b5c4e59c5c4b6a858cc79fcf1
/usr/sbin/lsof b9b8fbc8f296e47969713f6369d97c0d
/usr/lib/locale/locale-archive 3ea56273193198a718b9a5de33d553db
/usr/lib/libc-2.17.so ba51eeb4025b7f5d7f400f1968f4b5f9
/usr/lib/ld-2.17.so 3d0e7b5b5c4e59c5c4b6a858cc79fcf1
...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Linux RegEx Grep Repeat character from n to m times - grep

I have a problem with this Linux command: ls | grep -E 'i{2,3}' .It should take a file that has at least 2 i and max 3 i, but it doesn't work. This is the output ls: life.py, viiva.txt, viiiiiiiiiva.txt grep: viiva.txt, viiiiiiiiiva.txt (with the first 3 I highlighted) Thanks for the help.

Related

show filename with matching word from grep only

grep: Find all files containing the word `star`, but not the word `start`

How do I 'grep -c' and avoid printing files with zero '0' count

How to grep with a list of words

Ignoring directories from a file

Categories

Resources