Grep exact match in line - grep

File1:
Btr_0449a 447
Btr_0449 447
Desired output:
Btr_0449 447
I want grep to find 'Btr_0449', not 'Btr_0449a'. Seems I'm doing something wrong since:
grep -F "Btr_0449"
Btr_0449a 447
Btr_0449 447

This should do it:
grep -Fw "Btr_0449"
From the grep manpage:
"-w Select only those lines containing matches that form whole words. "

If you insist to use '-F' flag, then adding a space after your string will do.
grep -F "Btr_0449 "
For the future, you will get much better results if you'll use regex patterns, so for the above query, you could do:
grep -e "Btr_0449\s"
...which will match your string followed by any whitespace character (space, tab, new line, carriage return...)

Related

Regex for line containing one or more spaces or dashes

I got .txt file with city names, each in separate line. Some of them are few words with one or multiple spaces or words connected with '-'. I need to create bash command which will echo those lines out. Currently I'm using cat piped with grep but I can't get both spaces and dash into one search and I had problems with checking for multiple spaces.
print lines with dash:
cat file.txt | grep ".*-.*"
print lines with spaces:
cat file.txt | grep ".*\s.*"
tho when I try to do:
cat file.txt | grep ".*\s+.*"
I get nothing.
Thanks for help
Something like that should work:
grep -E -- ' |\-' file.txt
Explanation:
-E: to interpret patterns as extended regular expressions
--: to signify the end of command options
' |\-': the line contains either a space or a dash
This does not directly address your question, but is too much to put in a comment.
You don't need the .* in your patterns. .* at the beginning or end of a pattern is useless, because it means "0 or more of any character" and so will always match.
These lines are all identical:
cat file.txt | grep ".*-.*"
cat file.txt | grep "-.*"
cat file.txt | grep "-"
Plus you don't need to cat and pipe:
grep "-" file.txt
When grep pattern matches, the default action is to print the whole line, so .* in all your patterns are redundant, you may delete them. Also, you don't have to use cat file | as you may specify the file to grep directly after pattern, i.e. grep 'pattern' file.txt.
Here are some more details:
grep ".*-.*" = grep -- "-" - returns any lines having a - char (-- singals the end of options, the next thing is the pattern)
grep ".*\s.*" = grep "\s" - matches and returns lines containing a whitespace char (only GNU grep)
grep ".*\s+.*" = grep "\s+" - returns line containing a whitespace followed with a literal + char (since you are using POSIX BRE regex here the unescaped + matches a literal plus symbol).
You want
grep "[[:space:]-]" file.txt
See the online demo:
#!/bin/bash
s='abc - def
ghi
jkl mno'
grep '[[:space:]-]' <<< "$s"
Output:
abc - def
jkl mno
The [[:space:]-] POSIX BRE and ERE (enabled with -E option) compliant pattern matches either any whitespace (with the [:space:] POSIX character class) or a hyphen.
Note that [\s-] won't work since \s inside a bracket expression is not treated as a regex escape sequence but as a mere \ or s.

How to make "grep" output complete word that includes the match?

I would like grep to print out all complete words that include the match.
Google did not help me. Here what I tried:
cat file.txt
21676 Mm.24685 NM_009346 ENSMUSG00000055320
20349 Mm.134093 NM_011348 ENSMUSG00000063531
12456 Mm.134000 NM_011228 GM415666
grep -o "ENSMUS" file.txt
ENSMUS
ENSMUS
Desired output:
ENSMUSG00000055320
ENSMUSG00000063531
Thanks for your help!
You may use:
grep -wo "ENSMUS[^[:blank:]]*" file.txt
ENSMUSG00000055320
ENSMUSG00000063531
Here [^[:blank:]]* will match 0 or more characters that are not whitespaces. -w will ensure full word matches.
To extract ENSEMBL mouse accession numbers without the version number:
grep -Po 'ENSMUS\w+' in_file
With the version number:
grep -Po 'ENSMUS\S+' in_file
Here,
\w+ : 1 or more word characters ([A-Za-z0-9_]).
\S+ : 1 or more non-whitespace characters (you can also be more restrictive and use [\w.]+, which is 1 or more word character or literal dot).
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
grep manual
perlre - Perl regular expressions

Grep -w is ignoring hyphen[-]

I have text file sample.txt like following
ID=Sam-S-PA.path1;Name=Sam-S-PA 23 Hz42
ID=GlcAT-S-PA.path1;Name=GlcAT-S-PA 45 iu7s
ID=TfIIA-S-PA.path1;Name=TfIIA-S-PA 76 5ghz
ID=S-PA.path1;Name=S-PA 69 ivcs
ID=TyrRS-PA.path1;Name=TyrRS-PA 51 Pqas
ID=HisRS-PA.path1;Name=HisRS-PA 32 Majs
I would like to extract row containing only S-PA using grep. I tried following command:
grep -w "S-PA" sample.txt
But it gave a output that included all the entries which I dont want. I want the following output
ID=S-PA.path1;Name=S-PA 69 ivcs
Kindly guide me. Thanks in advance.
Using negative look-ahead and look-behind.
$ grep -P '(?<![\w-])S-PA(?![\w-])' sample.txt
ID=S-PA.path1;Name=S-PA 69 ivcs
Effectively you include - into the "word" for word boundary considerations.
(?<![\w-]) ensures that S-PA is not preceded with a word character or -.
Similarly (?![\w-]) ensures the same for the following characters.
Using regex.
grep -oE "S-PA (.+)" sample.txt
or
egrep -o "S-PA (.+)" sample.txt
It seems you want to match =S-PA followed with a space. Use
grep '=S-PA ' sample.txt
or
grep '=S-PA[[:blank:]]' sample.txt
where [[:blank:]] matches either a regular space or a tab char.
See this regex demo showing how this regex works.

How to grep in one line starting from particular string to end with particular string

I want to grep "[calleruid]=aab01b055-89e3-49f3-839e-507bb128d07e&smscresponse"
in Below file
2014-10-15 18:38:32,831 plivo-rest[2781]: INFO: Fetching GET http://*******/outbound_callback.aspx with smscresponse[to]=8912722fsf9&smscresponse[ALegUUID]=5bb516fsd64-546c-11e4-879f-551816a551303677&smscresponse[calluid]=aab01b055-89e3-49f3-839e-507bb128d07e&smscresponse[direction]=outbosund&smscresfdsponse[endreason]=UNALLOCATED_NUMBER&smscresponse[from]=83339995896999&smscresponse[starttime]=0&smscresponse[ALegRequestUUID]=5bb4bafc-546c-11e4-891d-000c29ec6e41&smscresponse[RequestUUID]=5bb4bafc-546c-11e4-891d-000c29ec6e41&smscresponse[callstatus]=completed&smscresponse[endtime]=1413378509&smscresponse[ScheduledHangupId]=5bb4c15a-546c-11e4-891d-000c29ec6e41&smscresponse[event]=missed_call_hangup
I used this command
$ grep -oP '(calluid).*$'
this greps upto end of file
I used this command
$ grep -oP '(calluid).{40}'
it fetches 40 characters but i have 1000's of calleruid's so each have different no.s of characters
So please guide me to grep exact callerid data
Use a lookahead to force the regex engine to do the match upto a specific character or a boundary.
$ grep -oP '\[calluid\][^\]\[]*(?=\[|$)' file
[calluid]=aab01b055-89e3-49f3-839e-507bb128d07e&smscresponse
Here is an gnu awk (due to multiple characters in RS) version:
awk -v RS="[[]calluid[]]=" -F[ 'NR==2 {print $1}' file
aab01b055-89e3-49f3-839e-507bb128d07e&smscresponse
You can also set RS like this: RS="\\\[calluid]="

Use grep to report back only line numbers

I have a file that possibly contains bad formatting (in this case, the occurrence of the pattern \\backslash). I would like to use grep to return only the line numbers where this occurs (as in, the match was here, go to line # x and fix it).
However, there doesn't seem to be a way to print the line number (grep -n) and not the match or line itself.
I can use another regex to extract the line numbers, but I want to make sure grep cannot do it by itself. grep -no comes closest, I think, but still displays the match.
try:
grep -n "text to find" file.ext | cut -f1 -d:
If you're open to using AWK:
awk '/textstring/ {print FNR}' textfile
In this case, FNR is the line number. AWK is a great tool when you're looking at grep|cut, or any time you're looking to take grep output and manipulate it.
All of these answers require grep to generate the entire matching lines, then pipe it to another program. If your lines are very long, it might be more efficient to use just sed to output the line numbers:
sed -n '/pattern/=' filename
Bash version
lineno=$(grep -n "pattern" filename)
lineno=${lineno%%:*}
I recommend the answers with sed and awk for just getting the line number, rather than using grep to get the entire matching line and then removing that from the output with cut or another tool. For completeness, you can also use Perl:
perl -nE 'say $. if /pattern/' filename
or Ruby:
ruby -ne 'puts $. if /pattern/' filename
using only grep:
grep -n "text to find" file.ext | grep -Po '^[^:]+'
You're going to want the second field after the colon, not the first.
grep -n "text to find" file.txt | cut -f2 -d:
To count the number of lines matched the pattern:
grep -n "Pattern" in_file.ext | wc -l
To extract matched pattern
sed -n '/pattern/p' file.est
To display line numbers on which pattern was matched
grep -n "pattern" file.ext | cut -f1 -d:

Resources