grep exact word from input file - grep

My input files:
text.txt - The file to be searched
patterns.txt - File containing a list of words (one word per line) that are to be searched for in the text.txt file.
text.txt is:
abc def ghi
jkl mno pqr ; stu
zzz yyy xxx
jkl abs abc1 ; mno
jjj aaa abc1M1
and pattern.txt is:
abc
pq
abc1M1
If I do:
| => grep -f patterns.txt text.txt
abc def ghi
jkl mno pqr ; stu
jkl abs abc1 ; mno
jjj aaa abc1M1
However, only two lines should be returned:
abc def ghi
jjj aaa abc1M1
i.e. only those lines that match the complete words "abc" and "abc1M1" as given in the patterns.txt file. How should I structure my query?
thanks for your help!
Edit: Suggestion from Cyrus to try '-Fwf' option, but it still doesn't give me what I want to get:
| => grep -Fwf patterns.txt text.txt
abc def ghi
Also, I am running the grep on my mac:
| => grep --version
grep (BSD grep) 2.5.1-FreeBSD

With GNU grep:
grep -Fwf pattern.txt text.txt
Output:
abc def ghi
jjj aaa abc1M1
-F: Interpret PATTERNS as fixed strings, not regular expressions.
-w: Select only those lines containing matches that form whole words.
See: man grep

Related

Search file for usernames, and sort number of instances for each user in file?

I am tasked with taking a file that has line entries that include string username=xxxx:
$ cat file.txt
Yadayada username=jdoe blablabla
Yadayada username=jdoe blablabla
Yadayada username=jdoe blablabla
Yadayada username=dsmith blablabla
Yadayada username=dsmith blablabla
Yadayada username=sjones blablabla
And finding how many times each user in the file shows up, which I can do manually by feeding username=jdoe for example:
$ grep -r "username=jdoe" file.txt | wc -l | tr -d ' '
3
What's the best way to report each user in the file, and the number of lines for each user, sorted from highest to lowest instances:
3 jdoe
2 dsmith
1 sjones
Been thinking of how to approach this, but drawing blanks, figured I'd check with our gurus on this forum. :)
TIA,
Don
In GNU awk:
$ awk '
BEGIN { RS="[ \n]" }
/=/ {
split($0,a,"=")
u[a[2]]++ }
END {
PROCINFO["sorted_in"]="#val_num_desc"
for(i in u)
print u[i],i
}' file
3 jdoe
2 dsmith
1 sjones
Using grep :
$ grep -o 'username=[^ ]*' file | cut -d "=" -f 2 | sort | uniq -c | sort -nr
Awk alone:
awk '
{sub(/.*username=/,""); sub(/ .*/,"")}
{a[$0]++}
END {for(i in a) printf "%d\t%s\n",a[i],i | "sort -nr"}
' file.txt
This uses awk's sub() function to achieve what grep -o does in other answers. It embeds the call to sort within the awk script. You could of course use that pipe after the awk script rather than within it if you prefer.
Oh, and unlike the other awk solutions presented here, this one (1) is portable to non-GNU-awk environments (like BSD, macOS) and doesn't depend on the username being in a predictable location on each line (i.e. $2).
Why might awk be a better choice than simpler tools like uniq? It probably wouldn't, for a super simple requirement like this. But good to have in your toolbox if you want something with the capability of a little more text processing.
Using sed, uniq, and sort:
sed 's/.*username=\([^ ]*\).*/\1/' file.txt | sort | uniq -c | sort -nr
If there are lines without usernames:
sed -n 's/.*username=\([^ ]*\).*/\1/p' input | sort | uniq -c | sort -nr
$ awk -F'[= ]' '{print $3}' file | sort | uniq -c | sort -nr
3 jdoe
2 dsmith
1 sjones
Following awk may help you on same too.
awk -F"[ =]" '{a[$3]++} END{for(i in a){print a[i],i | "sort -nr"}}' Input_file

Grep only exact last 4 digits from Number file

Grep only exact last 4 digits from Number file.
$ cat test
12298700077
56198700770
23192604888
34198701041
89198701285
$ cat test | grep 0077
12298700077
56198700770
Required output is just this
12298700077
Use regex and especially (man 7 regex): '$' (matching the null string at the end of a line):
$ grep 0077$ file
12298700077

numbers from egrep result in one line

I use egrep to output some lines with platform names:
XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$"
[30] i686-nptl-linux-gnu
[34] i686-w64-mingw32
[75] x86_64-unknown-linux-gnu
[77] x86_64-w64-mingw32
what I need is:
export PLATNUMS=30,34,75,77
How can I pipe the egrep command to sed / awk / bash script?
Try:
$ command | awk -F'[][ \t]+' '/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{printf "%s%s",(f?",":"export PLATNUMS="),$2; f=1} END{print""}'
export PLATNUMS=30,34,75,77
How it works
-F'[][ \t]+'
Use any number of spaces, tabs, or [ or ] as field separators.
/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{...}`
For the lines of interest, perform the commands in curly braces.
printf "%s%s",(f?",":"export PLATNUMS="),$2; f=1
For the lines of interest, print what we want.
The variable f marks whether this is the first line of interest.
END{print""}
After reading all lines, print a newline.
Creating a shell variable
export PLATNUMS=$(command | awk -F'[][ \t]+' '/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{printf "%s%s",(f?",":""),$2; f=1} END{print""}')
For example, if the file input contains your data:
$ export PLATNUMS=$(awk -F'[][ \t]+' '/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{printf "%s%s",(f?",":""),$2; f=1} END{print""}' input)
$ declare -p PLATNUMS
declare -x PLATNUMS="30,34,75,77"
For those who prefer their commands spread out over multiple lines:
export PLATNUMS=$(command | awk -F'[][ \t]+' '
/i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$/{
printf "%s%s",(f?",":""),$2
f=1
}
END{
print""
}
')
Perhaps this way, I can't try with your egrep.
export PLATNUMS=$(XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$" | sed ':A;s/\[\([[0-9]*\)].*/\1/;$bB;N;bA;:B;s/\n/,/g')
echo $PLATNUMS
How this work ?
Your egrep command return a multiline text
so sed read this text line by line this way
sed '
:A # label A
# here with your example
# on the first line the pattern space look like that
# [30] i686-nptl-linux-gnu
# on the second line the pattern space look like
# 30
# [34] i686-w64-mingw32
s/\[\([[0-9]*\)].*/\1/ # substitute all digit enclose by [] by only the digit
# on the first line the pattern space become
# 30
# on the second line the pattern space become
# 30
# 34
# and so on for each line
$bB # on the last line jump to B
N # get a newline in the pattern space
bA # It is not the last line so jump to A
:B # label B
# here we have read all the line
# the pattern space look like that without the #
# 30
# 34
# 75
# 77
s/\n/,/g' # subtitute all \n by a comma
# the pattern space become
# 30,34,75,77
# $(XXX | egrep .... | sed ...) return 30,34,75,77 in the variable PLATNUMS
# It is better not to use all capital letters in your variable name
With GNU sed and tr:
$ XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$" | sed -E 's,]\s+.+$,,g' | sed 's,^\[,,g' | tr '\n' ',' | sed -E 's,(^.+$),export PLATNUMS=\1,' | sed 's/,$//' && echo
I'm not sure what you want to achieve but you might want to automatically eval the output export:
$ eval $(XXX | egrep "i686-nptl-linux-gnu$|i686-w64-mingw32$|x86_64-unknown-linux-gnu$|x86_64-w64-mingw32$" | sed -E 's,]\s+.+$,,g' | sed 's,^\[,,g' | tr '\n' ',' | sed -E 's,(^.+$),export PLATNUMS=\1,' | sed 's/,$//' && echo)
$ echo $PLATNUMS
30,34,75,77
If you ever think you need grep+sed or 2 greps or 2 seds or any other combination then you should use 1 call to awk instead, and you never need grep or sed when you're using awk:
export PLATNUMS=$(XXX | awk -F'[][]' '/(i686-nptl-linux-gnu|i686-w64-mingw32|x86_64-unknown-linux-gnu|x86_64-w64-mingw32)$/{p=(p ? p "," : "") $2} END{print p}')
Btw in case it's useful, here's a couple of briefer regexps:
(i686-(nptl-linux-gnu|w64-mingw32)|x86_64-(unknown-linux-gnu|w64-mingw32))$
((i686-nptl|x86_64-unknown)-linux-gnu|(i686|x86_64)-w64-mingw32)$
and depending on your input data (since this will include combinations not provided by the above) you MIGHT only need:
(i686|x86_64)-(nptl|unknown|w64)-(linux-gnu|mingw32)$

grep that match around the first match

I would like to grep a specific word 'foo' inside specific files, then get the N lines around my match and show only the blocks that contain a second grep.
I found this but it doesn't really work...
find . | grep -E '.*?\.(c|asm|mac|inc)$' | \
xargs grep --color -C3 -rie 'foo' | \
xargs -n1 --delimiter='--' | grep --color -l 'bar'
For instance I have the file 'a':
a
b
c
d
bar
f
foo
g
h
i
j
bar
l
The file b:
a
bar
c
d
e
foo
g
h
i
j
k
I expect this for grep -c2 on both files because bar is contained in the -c2 range of foo. I do not get any match for ./bar because bar is not in the range -c2 of foo...
--
./foo- bar
./foo- f
./foo- **foo**
./foo- g
./foo- h
--
Any ideas?
You could do this pretty simply with a "while read line" loop:
find -regextype posix-extended -regex "./file[a-z]" | while read line; do grep -nHC2 "foo" $line | grep --color bar; done
Output:
./filea-5-bar
./filec-46-... host pwns.me [94.23.120.252]: 451 4.7.1 Local bar
configuration error ...
In this example, I created the following files:
filea - your example a
fileb - your example b
filec - some random exim log output with foo and bar tossed in 2 lines apart
filed - the same exim log output, but with foo and bar tossed in 3 lines apart
You could also pipe the output after done, to alter the format:
; done | sed 's/-([0-9]{1,6})-/: line: \1 ::: /'
Formatted output
./filea: line: 5 ::: bar
./filec: line: 46 ::: ... host pwns.me [94.23.120.252]: 451 4.7.1 Local bar configuration error ...
I think I only understand the first line of your question and this does what I think you mean!
#!/bin/bash
N=2
pattern1=a
pattern2=z
matchinglines=$(awk -v p="$pattern1" '$0~p{print NR}' file) # Generate array of matching line numbers
for x in ${matchinglines[#]}
do
((start=x-N))
[[ $start -lt 1 ]] && start=1 # Avoid passing negative line nmumbers to sed
((end=x+N))
echo DEBUG: Checking block between lines $start and $end
sed -ne "${start},${end}p" file | grep -q "$pattern2"
[[ $? -eq 0 ]] && sed -ne "${start},${end}p" file
done
You need to set pattern1 and pattern2 at the start of the script. It basically does some awk to build an array of the line numbers that match your first pattern. Then it loops through the array and sets the start and end range to +/-N either side of each matching line number. It then uses sed to extraact that block and passes it through grep to see if it contains pattern2 printing it if it does. It may not be the most efficient, but it is easy enough to understand and maintain.
It assumes your file is called file
pipe it twice
grep "[^foo\n]" | grep "\n{ntimes}foo\n{ntimes}"

using OR in egrep

How do I select only the lines those start with any digit or "** SETTLE" word with a few stars?
Following will return the lines starting with number but do not return the lines with the word SETTLE.
# cat somefile.txt | egrep "(^[0-9]|'^*************** SETTLE ')"
egrep "^(([0-9])|([*]{3,} SETTLE))"
$ egrep '^([0-9]|\**+ SETTLE )' somefile.txt

Resources