I have several files that goes like that:
abcd
several lines
abcd
several lines
abcd
several lines
.
.
.
what I want to do (preferably using grep) is to get the 20 lines immediately following the LAST abcd line.
Any help is appreciated.
Thanks
Use -A option:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines. Places a line
containing a group separator (--) between contiguous groups of matches.
With the -o or --only-matching option, this has no effect and a warning
is given.
So:
$ grep -A 20 abcd file.txt
will give you abcd lines + 20 lines after each. To get that last 21 lines, use tail:
$ grep -A 20 abcd file.txt | tail -21
You can do this:
awk '/abcd/ {n=NR} {a[NR]=$0} END {for (i=n;i<=n+20;i++) print a[i]}' file
It will search for pattern abcd and update n so only last will be stored.
It also store all line in array a
Then it print 20 lines form last pattern found in the END section.
Related
I have a corpus file and the rules file. I am trying to find matching words where the word from rule appear in corpus.
# cat corpus.txt
this is a paragraph number one
second line
third line
# cat rule.txt
a
b
c
This returns 2 lines
# grep -F0 -f rule.txt corpus.txt
this is a paragraph number one
second line
But I am expecting 4 words like this...
a
paragraph
number
second
Trying to achive these results using grep or awk.
Assuming words are seperated by white spaces
awk '{print "\\S*" $1 "\\S*"}' rule.txt | grep -m 4 -o -f - corpus.txt
I'm trying to print line containing 2 or 3 numbers along with the rest of the line. I came with the code:
grep -P '[[:digit:]]{2,3}' address
But this even prints the line having 4 digits. I don know why is this happening.
Output:
Neither this code works;
grep -E '[0-9]{2,3}' address
Here is the file containing address text:
12 main st
123 main street
1234 main street
I have already specified to print 2 or 3 values with {2,3} still the filter doesn't work and more than 3 digits line is being printed. Can anyone assist me on this? Thank you so much.
You can use inverted grep (-v) to filter all lines with 4 digits (and above):
grep -vE '[0-9]{4}' address
EDIT:
I noticed you want only 2 or 3 digit along the line, so first command will get you also 1 digit.
Here's the fix, again using same method:
grep -E '[0-9]{2,3}' txt.txt | grep -vE '[0-9]{4}'
I have a fasta file like the test one here:
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 1:N:0:GCCAAT
CCTAGCACCATGATTTAATGTTTCTTTTGTACGTTCTTTCTTTGGAAACTGCACTTGTTGCAACCTTGCAAGCCATATAAACACATTTCAGATATAAGGCT
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 2:N:0:GCCAAT
AAAACATAAATTTGAGCTTGACAAAAATTAAAAATGAGCCCAGCCTTATATCTGAAATGTGTTTATATGGCTTGCAAGGTTGCAACAAGTGCAGTTTCCAA
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 1:N:0:GCCAAT
ATATTTGAATTATCAGAAATAAACACAAAGAAAACCTAGAACAGATAATTTCTTCCACATTATTGATCAGATACAGATTTCAAGGGTACCGTTGTGAATTG
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 1:N:0:GCCAAT
CTTACTTTGCCTCTCTCAGCCAATGTCTCCTGAGTCTAATTTTTTGGAGGCTAAGCTATGAGCTAATGATGGGTTCCATTTGGGGCCAATGCTTCAGCCTG
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
When i type in a simple grep command like:
grep -B1 "CTT" test.fasta
I get a really strange output in which "--" is sometimes placed on a newline above the grep hit like so:
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
--
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
I can't figure out why some fasta entries have this and others don't. I don't get this problem when i remove the -B1. I can remove those lines from my file with a grep -v "--" statement, but I'd really like to understand what's going on here.
You are asking for one line of leading context by using the -B1 option. This means grep will display both the line which matched and the line directly before it. Each match will be separated by -- on a line by itself as shown below:
$ man grep | grep -B1 context
-A num, --after-context=num
Print num lines of trailing context after each match. See also
--
-B num, --before-context=num
Print num lines of leading context before each match. See also
--
-C[num, --context=num]
Print num lines of leading and trailing context surrounding each
--
--context[=num]
Print num lines of leading and trailing context. The default is
The reason you aren't seeing -- between every match is that the context is only displayed above a sequence of consecutive matches. So see the following example:
seq 13 | grep -B1 1
1
--
9
10
11
12
13
The seq command produces all the numbers between 1 and 13. Only the first line and the lines from 10 on contain a 1, so you see the 1 in its own group, then --, then the one line context, then the group of consecutive matching lines.
GREP_COLORS section of the grep manpage says :
Specifies the colors and other attributes used to highlight various > parts of the output. Its value is a colon-separated list
of capabilities that defaults to
ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36 with the rv and
ne boolean capabilities omitted (i.e., false).
and
se=36 SGR substring for separators that are inserted between
selected line fields (:), between context line fields, (-), and
between groups of adjacent lines when nonzero context is
specified (--). The default is a cyan text foreground over the
terminal's default background.
Consider file sample.txt :
$cat sample.txt
ABBB
AAB
AAB
S
S
S
AABB
ABAA
BAA
CCC
$grep -B2 'AAB' sample.txt
ABBB
AAB
AAB
--
S
S
AABB
Here -- is the way of grep to tell you that AAB before -- and S after -- are not adjacent lines in the actual file.
I want to find out if there are lines in a text that are similar and after eachother. For example I want to find if there are any lines that has "cccc" in and after eacother.
aaaaaaaa
bbbbaaaa
ccccxxxx
ddddaaaa
eeeeaaaa
ccccxxxx <---
ccccyyyy <---
ddddaaaa
eeeeaaaa
So I should print out only the double cccc**** lines.
I tried something like:
grep "cccc" -A1 file.txt
but got all "cccc*" lines.
Simple problem I know...
Another example:
Search for duplicates of "Finland":
Iceland
Germany
FinlandsIsNiceButNoMatch
France
FinlandWillMatchTHisTime <---
FinlandWillAlsoMatch <---
Hungary
This will match two lines if they both begin with at least 3 identical letters:
grep -Pzo "([a-zA-Z]{3}).*\n\1.*" file.txt
I have a big txt file and I am looking for seq id that starts with species name "ABS". When I do grep "ABS", I only get the list of ABS but not seq id followed by that word. For example list what I am looking for is like this:
ABS|contig05671,
ABS|contig04453,
ABS|CL5170Contig1,
ABS|contig02526,
But, when I do, grep "ABS" filename.txt, I get the result like this:
ABS,
ABS,
ABS,
ABS,
Any help is greatly appreciated. Thanks in advance.
From man grep:
Context Line Control
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-C NUM, -NUM, --context=NUM
Print NUM lines of output context. Places a line containing a
group separator (--) between contiguous groups of matches. With
the -o or --only-matching option, this has no effect and a
warning is given.
So if you need the matching line and the following one, you do grep -A1 ABS file.txt, and similarly for the preceding line with -B1.
However, if you want to format the results in another way (e.g. put the two lines on one and separate by the pipe character) you need a different tool than grep. grep does searching, whereas you also want editing.