I would like to search for a certain pattern (say Bar line) but also print lines above and below (i.e 1 line) the pattern or 2 lines above and below the pattern.
Foo line
Bar line
Baz line
....
Foo1 line
Bar line
Baz1 line
....
Use grep with the parameters -A and -B to indicate the number a of lines After and Before you want to print around your pattern:
grep -A1 -B1 yourpattern file
An stands for n lines "after" the match.
Bm stands for m lines "before" the match.
If both numbers are the same, just use -C:
grep -C1 yourpattern file
Test
$ cat file
Foo line
Bar line
Baz line
hello
bye
hello
Foo1 line
Bar line
Baz1 line
Let's grep:
$ grep -A1 -B1 Bar file
Foo line
Bar line
Baz line
--
Foo1 line
Bar line
Baz1 line
To get rid of the group separator, you can use --no-group-separator:
$ grep --no-group-separator -A1 -B1 Bar file
Foo line
Bar line
Baz line
Foo1 line
Bar line
Baz1 line
From man grep:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-C NUM, -NUM, --context=NUM
Print NUM lines of output context. Places a line containing a
group separator (--) between contiguous groups of matches. With
the -o or --only-matching option, this has no effect and a
warning is given.
grepis the tool for you, but it can be done with awk
awk '{a[NR]=$0} $0~s {f=NR} END {for (i=f-B;i<=f+A;i++) print a[i]}' B=1 A=2 s="Bar" file
NB this will also find one hit.
or with grep
grep -A2 -B1 "Bar" file
Related
I have a corpus file and the rules file. I am trying to find matching words where the word from rule appear in corpus.
# cat corpus.txt
this is a paragraph number one
second line
third line
# cat rule.txt
a
b
c
This returns 2 lines
# grep -F0 -f rule.txt corpus.txt
this is a paragraph number one
second line
But I am expecting 4 words like this...
a
paragraph
number
second
Trying to achive these results using grep or awk.
Assuming words are seperated by white spaces
awk '{print "\\S*" $1 "\\S*"}' rule.txt | grep -m 4 -o -f - corpus.txt
I have a fasta file like the test one here:
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 1:N:0:GCCAAT
CCTAGCACCATGATTTAATGTTTCTTTTGTACGTTCTTTCTTTGGAAACTGCACTTGTTGCAACCTTGCAAGCCATATAAACACATTTCAGATATAAGGCT
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 2:N:0:GCCAAT
AAAACATAAATTTGAGCTTGACAAAAATTAAAAATGAGCCCAGCCTTATATCTGAAATGTGTTTATATGGCTTGCAAGGTTGCAACAAGTGCAGTTTCCAA
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 1:N:0:GCCAAT
ATATTTGAATTATCAGAAATAAACACAAAGAAAACCTAGAACAGATAATTTCTTCCACATTATTGATCAGATACAGATTTCAAGGGTACCGTTGTGAATTG
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 1:N:0:GCCAAT
CTTACTTTGCCTCTCTCAGCCAATGTCTCCTGAGTCTAATTTTTTGGAGGCTAAGCTATGAGCTAATGATGGGTTCCATTTGGGGCCAATGCTTCAGCCTG
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
When i type in a simple grep command like:
grep -B1 "CTT" test.fasta
I get a really strange output in which "--" is sometimes placed on a newline above the grep hit like so:
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
--
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
I can't figure out why some fasta entries have this and others don't. I don't get this problem when i remove the -B1. I can remove those lines from my file with a grep -v "--" statement, but I'd really like to understand what's going on here.
You are asking for one line of leading context by using the -B1 option. This means grep will display both the line which matched and the line directly before it. Each match will be separated by -- on a line by itself as shown below:
$ man grep | grep -B1 context
-A num, --after-context=num
Print num lines of trailing context after each match. See also
--
-B num, --before-context=num
Print num lines of leading context before each match. See also
--
-C[num, --context=num]
Print num lines of leading and trailing context surrounding each
--
--context[=num]
Print num lines of leading and trailing context. The default is
The reason you aren't seeing -- between every match is that the context is only displayed above a sequence of consecutive matches. So see the following example:
seq 13 | grep -B1 1
1
--
9
10
11
12
13
The seq command produces all the numbers between 1 and 13. Only the first line and the lines from 10 on contain a 1, so you see the 1 in its own group, then --, then the one line context, then the group of consecutive matching lines.
GREP_COLORS section of the grep manpage says :
Specifies the colors and other attributes used to highlight various > parts of the output. Its value is a colon-separated list
of capabilities that defaults to
ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36 with the rv and
ne boolean capabilities omitted (i.e., false).
and
se=36 SGR substring for separators that are inserted between
selected line fields (:), between context line fields, (-), and
between groups of adjacent lines when nonzero context is
specified (--). The default is a cyan text foreground over the
terminal's default background.
Consider file sample.txt :
$cat sample.txt
ABBB
AAB
AAB
S
S
S
AABB
ABAA
BAA
CCC
$grep -B2 'AAB' sample.txt
ABBB
AAB
AAB
--
S
S
AABB
Here -- is the way of grep to tell you that AAB before -- and S after -- are not adjacent lines in the actual file.
I have several files that goes like that:
abcd
several lines
abcd
several lines
abcd
several lines
.
.
.
what I want to do (preferably using grep) is to get the 20 lines immediately following the LAST abcd line.
Any help is appreciated.
Thanks
Use -A option:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines. Places a line
containing a group separator (--) between contiguous groups of matches.
With the -o or --only-matching option, this has no effect and a warning
is given.
So:
$ grep -A 20 abcd file.txt
will give you abcd lines + 20 lines after each. To get that last 21 lines, use tail:
$ grep -A 20 abcd file.txt | tail -21
You can do this:
awk '/abcd/ {n=NR} {a[NR]=$0} END {for (i=n;i<=n+20;i++) print a[i]}' file
It will search for pattern abcd and update n so only last will be stored.
It also store all line in array a
Then it print 20 lines form last pattern found in the END section.
I have a big txt file and I am looking for seq id that starts with species name "ABS". When I do grep "ABS", I only get the list of ABS but not seq id followed by that word. For example list what I am looking for is like this:
ABS|contig05671,
ABS|contig04453,
ABS|CL5170Contig1,
ABS|contig02526,
But, when I do, grep "ABS" filename.txt, I get the result like this:
ABS,
ABS,
ABS,
ABS,
Any help is greatly appreciated. Thanks in advance.
From man grep:
Context Line Control
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-C NUM, -NUM, --context=NUM
Print NUM lines of output context. Places a line containing a
group separator (--) between contiguous groups of matches. With
the -o or --only-matching option, this has no effect and a
warning is given.
So if you need the matching line and the following one, you do grep -A1 ABS file.txt, and similarly for the preceding line with -B1.
However, if you want to format the results in another way (e.g. put the two lines on one and separate by the pipe character) you need a different tool than grep. grep does searching, whereas you also want editing.
I've used a grep command with sed and cut filters that basically turns my output to something similar to this
this line 1
this line 2
another line 3
another line 4
I'm trying to get an output without the spaces in between the lines and in front of the lines so it'd look like
this line 1
this line 2
another line 3
another line 4
I'd like to add another | filter
Add this filter to remove whitespace from the beginning of the line and remove blank lines, notice that it uses two sed commands, one to remove leading whitespace and another to delete lines with no content
| sed -e 's/^\s*//' -e '/^$/d'
There is an example in the Wikipedia article for sed which uses the d command to delete lines that are either blank or only contain spaces, my solution uses the escape sequence \s to match any whitespace character (space, tab, and so on), here is the Wikipedia example:
sed -e '/^ *$/d' inputFileName
The caret (^) matches the beginning of the line.
The dollar sign ($) matches the end of the line.
The asterisk (*) matches zero or more occurrences of the previous character.
This can be done with the tr command as well. Like so
| tr -s [:space:]
or alternatively
| tr -s \\n
if you want to remove the line breaks only, without the space chars in the beginning of each line.
I would do this, short and simple:
sed 's: ::g'
Add this at the end of your command, and all whitespace will go poof. For example try this command:
cat /proc/meminfo | sed 's: ::g'
You can also use grep:
... | grep -o '[^$(printf '\t') ].*'
Here we print lines that have at least one character that isn't white space. By using the "-o" flag, we print only the match, and we force the match to start on a non white space character.
EDIT: Changed command so it can remove the leading white space characters.
Hope this helps =)
Use grep "^." filename to remove blank lines while printing.Here,the lines starting with any character is matched so that the blank lines are left out.
^ indicates start of the line.
. checks for any character.
(whateverproducesthisoutput)|sed -E 's/^[[:space:]]+//'|grep -v '^$'
(depending on your sed, you can replace [[:space:]] with \s).