I'm trying to extract the a specific sequence from a fastq file using grep to search the sequence ID
less all_barcode03.fastq.gz
#3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t runid=7204dc15205b93bfd6430ca0f3a0218f11ce0787 read=10 ch=120 start_time=2019-04-12T13:55:25Z
TCGGTAGCCACTTCGTTCAGTCAATTTGGGTTGTTTAACCGAGTCTTGTGTGTCCCAGTTACCAGGGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTCGTGCGCCGCTTCAGTGATCAGTGAAGATGGGTTTGTGGTGGAATACTCTTGCTGCTCATGGCAAACTTTATGTTGGTTTTCTCATGCATTTGTTTCTCGTAATCCCATACGTCATCCAAAGTCATCTGAAAAAGAGGGAAGGGGTGGATTGTGGGTGAAATGTTGTGTACTCCTCTATAATGGGGCTCAGTTGACAAACAGGTGGAGGAGAGGATCATTTGCTTAAAGGGGTGAGTGAAGCGGAGTTTAAGGATAATTCAAGCTTTTAAAAGTGGCTTTAGAGGTAAAGGGTTAGCTCCCATGACCCACAGGATTTATAGGAGATGGCTCTGAACAAACCAGAGCCACACACACA
+
-%&&($$%%#%,-*),-5(&,$$%$%+).'-(+-4-(')%%$*+-,3...14,7/))/03.06-./-3:8.0(*,/7+*,966006.,(*(,-(&(*,./+--902/./),,,0,-/./,4(+0/,0).0-7048,(+*',*/.)*#(((.0--10764+('(%.3/+$&%&'./4'0.;:6.895778+0/*(28/),(+-/404/*'(),.16517&83+*/0/0.--033**$&'*,''*/,,,/..0.*0*0$##*((($/6&('-,.230/01/2+4,,::8719(*.4.'.26/0(*))0*+,(*+-,-+-.4765-$%&.'%.*/')(&''#-()*21,-.;+3).*,,'557686+(-7;-2:8))(&%%'*)**%&&).6&,*(.-'$'(*2+*0587:0+*+)/*/63--/*('#&)-68664&%534)/13.))'14*+**%%$$#
#69e7e435-a78c-4ec8-94cd-b0c1f3c40c11_t runid=7204dc15205b93bfd6430ca0f3a0218f11ce0787 read=15 ch=465 start_time=2019-04-12T13:55:25Z
TCGGTACTTCGTTCGGTTGGAGAAGGTGGTGTTGCCGAGTCTTGTGTCCCAGTTACCAGGGTTTTCGCATTTATCGTGGCTTGCTGCGTTTTCGTGCGCCACCGCTTCATGTGTGTGTGTGTGTCTGGTGTTATTACTCACTTGGCAAGCGTGTCTGGACAGCAGCTGTTTGAGTGTTGAGAGCGCTTCTTCTCCAGGAGAAGCGGTTGAGCCTAAGCTGAATCCCCGTCCGTCTTTATCTTCGGACATGCTCTGGATATGCCTGAGGAGGACAATGGAGGAACAGAACAGATGGATGAAGAGCTCATAAAACTGGCACACATGCATCAAAGCCCACCTTCGTCACTCTGATGACCAGTGACTGCCGTTTATTACTGCGATTTACCATGAAGTTATCTGCTTTTTGGGTCAGTTAGTGTGTGTGTGTGTGTGTGTGTGTGTGCCTTTTCTGTCCTCCAGATACTCAGTACTACAGAGGAGCTATTAATACTTACTACATCGATATGTTATGTAATATCATTCTAGCCTGCTACTCCTGTCTTCTGTATACAACTGTCGTCTGTCCCGAATAGCTCCTGGGTGCCCTCTCCTCCATAGTAGCCACAGTTACAGGAATATTACTCTTTATCATAGAAGCGGTATCTAGTAGAACAGTCCTTAGTTAAAATAATAACGGGGTGTGGGCATGTACAGCCTCTGGTATTCCGTTGCTCAGCAGAGCCTCATAACTCTCCTAGTGGCTCAGGAAGGCTGAAACAGGCTGTGTGCACCCAGCCAGCTGGAACTGTGTTTGAGTGCCATCTTGGAATACTGTTTATAAGCGCTCTTAAGTTATATGTGAGGATGGTGGTATTAGATATGGAAGTGTGTAGGAGGAGAAAGAGGAAATAGTGTCATGTTGATATGAACAGTTTGGTCAGTAAAATGAGGGCAGTAAAAAAGTGTTTTAAGCGTTTTGTCGGTCGACAATATGATAATAAAATGCATTTGGTTCACGATAACAAGAAAACAGAAAAGACCAGCAATGAATATTTAGCATTTTTTGTTTGAAAGATGAAACAAATAATTGAAATAGCTGCCAAATATTTGTGAAATGTACTAAATGGTCAGAGTGAAGATGCAGCTTTGAAAAGAAGATTCGGA
+
+&$%)'./-0,*1(&&&%#%&$(&)'%&&%$"#$&+,'*-*1+++5-73+)*/+,32/46552:/-+2025/+-057,$#$$&)/01,)433/2732'&$#&$"$'$((+*+),+,,,*+,,-11)*'&((*"0#"&*((,*.--.&.+-*,)-17861+&%'%)),73:60-/-32:++(('.')+56894,4+)./'%')%$&-,('%#41.'$%&')$0))/2.*04632,20)(+'&&,+7.97825-++**166678950-))%*+,-26-.6,*/(4.$+'+-5/0/.-02/-+)'%+73//245+(&(%%'))(&$#&&(7.:2-0;7014354398')-83/00/04:*330))&#)))-5/(-*++5#./+50-(,0765/1,,8//05/0.:0/%#$&)--+4+)+5575312+1&-')).'+&*%)(,,,((%++/,.2486112'&#$&##$%'(*+,1/)/+...+-.1312/1+**-(-.8---,*+,-.5,1,(+%..1,)--.8;441019.1780000313658;99621-,,.++)#,-.011537%#&-2,',-,86)(.''%(.2+/24,.23/./+*$)4--.0.340/+())0..62019-7:+).2(/*%),&--30/32*)&)%)$%')+2;829%*)'4:;401/,-71%.,'(*+)2837653/0-&/63861'(*-6*()5:.3--'%')',)2977&(%(%'+-/**-0727112246..*1,-..3&/.4535-3+3.00,7*%'1+12311321.35567:93&)*))'-/,2-7-.6/,..-4;6/3/&(&%**03745+-.-.::95544467..--))'*)#('*+,..(%)&'(%%&-+'++)*/1&&'$%&+*&())$()(,%+'$&'&($&'2.44:0..++#%).78*(((/1'($$&-:;98.(*00;;2-''),053.//3+&))+14-8&**,..01.2:;743425:7(,*.((+*,,-+'&*'+057,*(.53-(+3703/210.06256;.+,01.5<<5,06;:+.7)')3,$(+'4;.,*'*'*-4--)+-*)+&--,*$(+&(-$*,''/2778:;9/.857+%%'()*((*11-,)+-5-+,31/#&%$%5)-#%#
Then try to show one of the sequences by searching for the sequence ID:
grep '#3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t' all_barcode03.fastq.gz
grep '*#3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t*' all_barcode03.fastq.gz
grep #3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t all_barcode03.fastq.gz
grep *#3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t* all_barcode03.fastq.gz
All the above grep commands return no results however there is a line in the file staring with #3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t
Use zgrep not grep on .gz files.
zgrep - search possibly compressed files for a regular expression
The command 'grep -c blah *' lists all the files, like below.
% grep -c jill *
file1:1
file2:0
file3:0
file4:0
file5:0
file6:1
%
What I want is:
% grep -c jill * | grep -v ':0'
file1:1
file6:1
%
Instead of piping and grep'ing the output like above, is there a flag to suppress listing files with 0 counts?
SJ
How to grep nonzero counts:
grep -rIcH 'string' . | grep -v ':0$'
-r Recurse subdirectories.
-I Ignore binary files (thanks #tongpu, warlock).
-c Show count of matches. Annoyingly, includes 0-count files.
-H Show file name, even if only one file (thanks #CraigEstey).
'string' your string goes here.
. Start from the current directory.
| grep -v ':0$' Remove 0-count files. (thanks #LaurentiuRoescu)
(I realize the OP was excluding the pipe trick, but this is what works for me.)
Just use awk. e.g. with GNU awk for ENDFILE:
awk '/jill/{c++} ENDFILE{if (c) print FILENAME":"c; c=0}' *
I am trying to use grep to just capture a number in a string but I am having difficulty.
echo "There are <strong>54</strong> cities | grep -o "([0-9]+)"
How am I suppose to just have it return "54"? I have tried the above grep command and it doesn't work.
echo "You have <strong>54</strong>" | grep -o '[0-9]' seems to sort of work but it prints
5
4
instead of 54
Don't parse HTML with regex, use a proper parser :
$ echo "There are <strong>54</strong> cities " |
xmllint --html --xpath '//strong/text()' -
OUTPUT:
54
Check RegEx match open tags except XHTML self-contained tags
You need to use the "E" option for extended regex support (or use egrep). On my Mac OSX:
$ echo "There are <strong>54</strong> cities" | grep -Eo "[0-9]+"
54
You also need to think if there are going to be more than one occurrence of numbers in the line. What should be the behavior then?
EDIT 1: since you have now specified the requirement to be a number between <strong> tags, I would recommend using sed. On my platform, grep does not have the "P" option for perl style regexes. On my other box, the version of grep specifies that this is an experimental feature so I would go with sed in this case.
$ echo "There are <strong>54</strong> 12 cities" | sed -rn 's/^.*<strong>\s*([0-9]+)\s*<\/strong>.*$/\1/p'
54
Here "r" is for extended regex.
EDIT 2: If you have the "PCRE" option in your version of grep, you could also utilize the following with positive lookbehinds and lookaheads.
$ echo "There are <strong>54 </strong> 12 cities" | grep -o -P "(?<=<strong>)\s*([0-9]+)\s*(?=<\/strong>)"
54
RegEx Demo
Okay I have a file that contains numbers like this:
L21479
What I am trying to do is use grep (or a similar tool) to find all the strings in a file that have the format:
L#####
The # will be the number. SO an L followed by 5 numbers.
Is this even possible in grep? Should I load the file and perform regex?
You can do this with grep, for example with the following command:
grep -E -o 'L[0-9]{5}' name_of_file
For example, given a file with the text:
kasdhflkashl143112343214L232134614
3L1431413543454L2342L3523269ufoidu
gl9983ugsdu8768IUHI/(JHKJASHD/(888
The command above will output:
L23213
L14314
L35232
If it is just in a single file, you can do something along the lines of:
grep -e 'L[0-9]{5}' filename
If you need to search all files in a directory for these strings:
find . -type f | xargs grep -e 'L[0-9]{5}'
I'm using grep to found matching lines from a file in two different files. It finds the matching files just fine from File1 into File2 and File3, but from the moment there is more than one file, it prints the file name in which it was found next to the line.
grep -w -f File1 File2 File3
Output:
File2: pattern
File2: pattern
File3: pattern
Is there an option to avoid the print of File2: and File3:?
grep --no-filename -w -f File1 File2 File3
If you're on a UNIX system, please refer to the man pages. Whenever you encounter a problem, your first step should be man $programName. In this case, man grep. It appears that you want the "-h" option. Here's an excerpt from the man page:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default when there is only one file (or only standard input) to search.