I have a fasta file like the test one here:
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 1:N:0:GCCAAT
CCTAGCACCATGATTTAATGTTTCTTTTGTACGTTCTTTCTTTGGAAACTGCACTTGTTGCAACCTTGCAAGCCATATAAACACATTTCAGATATAAGGCT
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 2:N:0:GCCAAT
AAAACATAAATTTGAGCTTGACAAAAATTAAAAATGAGCCCAGCCTTATATCTGAAATGTGTTTATATGGCTTGCAAGGTTGCAACAAGTGCAGTTTCCAA
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 1:N:0:GCCAAT
ATATTTGAATTATCAGAAATAAACACAAAGAAAACCTAGAACAGATAATTTCTTCCACATTATTGATCAGATACAGATTTCAAGGGTACCGTTGTGAATTG
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 1:N:0:GCCAAT
CTTACTTTGCCTCTCTCAGCCAATGTCTCCTGAGTCTAATTTTTTGGAGGCTAAGCTATGAGCTAATGATGGGTTCCATTTGGGGCCAATGCTTCAGCCTG
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
When i type in a simple grep command like:
grep -B1 "CTT" test.fasta
I get a really strange output in which "--" is sometimes placed on a newline above the grep hit like so:
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
--
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
I can't figure out why some fasta entries have this and others don't. I don't get this problem when i remove the -B1. I can remove those lines from my file with a grep -v "--" statement, but I'd really like to understand what's going on here.
You are asking for one line of leading context by using the -B1 option. This means grep will display both the line which matched and the line directly before it. Each match will be separated by -- on a line by itself as shown below:
$ man grep | grep -B1 context
-A num, --after-context=num
Print num lines of trailing context after each match. See also
--
-B num, --before-context=num
Print num lines of leading context before each match. See also
--
-C[num, --context=num]
Print num lines of leading and trailing context surrounding each
--
--context[=num]
Print num lines of leading and trailing context. The default is
The reason you aren't seeing -- between every match is that the context is only displayed above a sequence of consecutive matches. So see the following example:
seq 13 | grep -B1 1
1
--
9
10
11
12
13
The seq command produces all the numbers between 1 and 13. Only the first line and the lines from 10 on contain a 1, so you see the 1 in its own group, then --, then the one line context, then the group of consecutive matching lines.
GREP_COLORS section of the grep manpage says :
Specifies the colors and other attributes used to highlight various > parts of the output. Its value is a colon-separated list
of capabilities that defaults to
ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36 with the rv and
ne boolean capabilities omitted (i.e., false).
and
se=36 SGR substring for separators that are inserted between
selected line fields (:), between context line fields, (-), and
between groups of adjacent lines when nonzero context is
specified (--). The default is a cyan text foreground over the
terminal's default background.
Consider file sample.txt :
$cat sample.txt
ABBB
AAB
AAB
S
S
S
AABB
ABAA
BAA
CCC
$grep -B2 'AAB' sample.txt
ABBB
AAB
AAB
--
S
S
AABB
Here -- is the way of grep to tell you that AAB before -- and S after -- are not adjacent lines in the actual file.
Related
I would like to grep for process path which has a variable. Example -
This is one of the proceses running.
/var/www/vhosts/rcsdfg/psd_folr/rcerr-m-deve-udf-172/bin/magt queue:consumers:start customer.import_proditns --single-thread --max-messages=1000
I would like to grep for "psd_folr/rcerr-m-deve-udf-172/bin/magt queue" from the running processes.
The catch is that the number 172 keeps changing, but it will be a 3 digit number only. Please suggest, I tried below but it is not returning any output.
sudo ps axu | grep "psd_folr/rcerr-m-deve-udf-'^[0-9]$'/bin/magt queue"
The most relevant section of your regular expression is -'^[0-9]$'/ which has following problems:
the apostrophes have no syntactical meaning to grep other than read an apostrophe
the caret ^ matches the beginning of a line, but there is no beginning of a line in ps's output at this place
the dollar $ matches the end of a line, but there is no end of a line in ps's output at this place
you want to read 3 digits but [0-9] will only match a single one
Thus, the part of your expression should be modified like this -[0-9]+/ to match any number of digits (+ matches the preceding character any number of times but at least once) or like this -[0-9]{3}/ to match exactly three times ({n} matches the preceding character exactly n times).
If you alter your command, give grep the -E flag so it uses extended regular expressions, otherwise you need to escape the plus or the braces:
sudo ps axu | grep -E "psd_folr/rcerr-m-deve-udf-[0-9]+/bin/magt queue"
I have several files that goes like that:
abcd
several lines
abcd
several lines
abcd
several lines
.
.
.
what I want to do (preferably using grep) is to get the 20 lines immediately following the LAST abcd line.
Any help is appreciated.
Thanks
Use -A option:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines. Places a line
containing a group separator (--) between contiguous groups of matches.
With the -o or --only-matching option, this has no effect and a warning
is given.
So:
$ grep -A 20 abcd file.txt
will give you abcd lines + 20 lines after each. To get that last 21 lines, use tail:
$ grep -A 20 abcd file.txt | tail -21
You can do this:
awk '/abcd/ {n=NR} {a[NR]=$0} END {for (i=n;i<=n+20;i++) print a[i]}' file
It will search for pattern abcd and update n so only last will be stored.
It also store all line in array a
Then it print 20 lines form last pattern found in the END section.
I would like to search for a certain pattern (say Bar line) but also print lines above and below (i.e 1 line) the pattern or 2 lines above and below the pattern.
Foo line
Bar line
Baz line
....
Foo1 line
Bar line
Baz1 line
....
Use grep with the parameters -A and -B to indicate the number a of lines After and Before you want to print around your pattern:
grep -A1 -B1 yourpattern file
An stands for n lines "after" the match.
Bm stands for m lines "before" the match.
If both numbers are the same, just use -C:
grep -C1 yourpattern file
Test
$ cat file
Foo line
Bar line
Baz line
hello
bye
hello
Foo1 line
Bar line
Baz1 line
Let's grep:
$ grep -A1 -B1 Bar file
Foo line
Bar line
Baz line
--
Foo1 line
Bar line
Baz1 line
To get rid of the group separator, you can use --no-group-separator:
$ grep --no-group-separator -A1 -B1 Bar file
Foo line
Bar line
Baz line
Foo1 line
Bar line
Baz1 line
From man grep:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-C NUM, -NUM, --context=NUM
Print NUM lines of output context. Places a line containing a
group separator (--) between contiguous groups of matches. With
the -o or --only-matching option, this has no effect and a
warning is given.
grepis the tool for you, but it can be done with awk
awk '{a[NR]=$0} $0~s {f=NR} END {for (i=f-B;i<=f+A;i++) print a[i]}' B=1 A=2 s="Bar" file
NB this will also find one hit.
or with grep
grep -A2 -B1 "Bar" file
I have a big txt file and I am looking for seq id that starts with species name "ABS". When I do grep "ABS", I only get the list of ABS but not seq id followed by that word. For example list what I am looking for is like this:
ABS|contig05671,
ABS|contig04453,
ABS|CL5170Contig1,
ABS|contig02526,
But, when I do, grep "ABS" filename.txt, I get the result like this:
ABS,
ABS,
ABS,
ABS,
Any help is greatly appreciated. Thanks in advance.
From man grep:
Context Line Control
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-C NUM, -NUM, --context=NUM
Print NUM lines of output context. Places a line containing a
group separator (--) between contiguous groups of matches. With
the -o or --only-matching option, this has no effect and a
warning is given.
So if you need the matching line and the following one, you do grep -A1 ABS file.txt, and similarly for the preceding line with -B1.
However, if you want to format the results in another way (e.g. put the two lines on one and separate by the pipe character) you need a different tool than grep. grep does searching, whereas you also want editing.
I am writing a csh script that will extract a line from a file xyz.
the xyz file contains a no. of lines of code and the line in which I am interested appears after 2-3 lines of the file.
I tried the following code
set product1 = `grep -e '<product_version_info.*/>' xyz`
I want it to be in a way so that as the script find out that line it should save that line in some variable as a string & terminate reading the file immediately ie. it should not read furthermore aftr extracting the line.
Please help !!
grep has an -m or --max-count flag that tells it to stop after a specified number of matches. Hopefully your version of grep supports it.
set product1 = `grep -m 1 -e '<product_version_info.*/>' xyz`
From the man page linked above:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count
greater than NUM. When the -v or --invert-match option is also
used, grep stops after outputting NUM non-matching lines.
As an alternative, you can always the command below to just check the first few lines (since it always occurs in the first 2-3 lines):
set product1 = `head -3 xyz | grep -e '<product_version_info.*/>'`
I think you're asking to return the first matching line in the file. If so, one solution is to pipe the grep result to head
set product1 = `grep -e '<product_version_info.*/>' xyz | head -1`