how to use grep to match partial occurrence of searched string

how to use grep to match partial occurrence of searched string - grep

my text file :
vol0 0.9 vol2 1.2 vol3 3 vol1_1 1.4 voll_2 5
vol1 1 vol2 1.2 vol3 3 vol1_1 1.4 voll_2 5
vol2 1.3 vol2 1.8 vol3 3.2 vol1_1 1.2 voll_2 4.8
....
How to grep lines with vol1 ,i.e only 2nd line. From what i am doing, it is taking all the three line.

I'm assuming you only want to match "vol1" as a distinct word, not "vol1_1",
With grep, use the -w flag (and since you're not searching for a regex, add -F)
grep -Fw vol1 filename
With Tcl, regular expressions can include the \y zero-width constraint that matches at the start or end of a word:
set fh [open $filename r]
while {[gets $fh line] != -1} {
if {[regexp {\yvol1\y} $line]} {
puts $line
}
}
close $fh

Related

extract the adjacent character of selected letter

I have this text file:
# cat letter.txt
this
is
just
a
test
to
check
if
grep
works
The letter "e" appear in 3 words.
# grep e letter.txt
test
check
grep
Is there any way to return the letter printed on left of the selected character?
expected.txt
t
h
r

With shown samples in awk, could you please try following.
awk '/e/{print substr($0,index($0,"e")-1,1)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/e/{ ##Looking if current line has e in it then do following.
print substr($0,index($0,"e")-1,1)
##Printing sub string from starting value of index e-1 and print 1 character from there.
}
' Input_file ##Mentioning Input_file name here.

You can use positive lookahead to match a character that is followed by an e, without making the e part of the match.
cat letter.txt | grep -oP '.(?=e)'

With sed:
sed -nE 's/.*(.)e.*/\1/p' letter.txt

Assuming you have this input file:
cat file
this
is
just
a
test
to
check
if
grep
works
egg
element
You may use this grep + sed solution to find letter or empty string before e:
grep -oE '(^|.)e' file | sed 's/.$//'
t
h
r
l
m
Or alternatively this single awk command should also work:
awk -F 'e' 'NF > 1 {
for (i=1; i<NF; i++) print substr($i, length($i), 1)
}' file

This might work for you (GNU sed):
sed -nE '/(.)e/{s//\n\1\n/;s/^[^\n]*\n//;P;D}' file
Turn off implicit printing and enable extended regexp -nE.
Focus only on lines that meet the requirements i.e. contain a character before e.
Surround the required character by newlines.
Remove any characters before and including the first newline.
Print the first line (up to the second newline).
Delete the first line (including the newline).
Repeat.
N.B. The solution will print each such character on a separate line.
To print all such characters on their own line, use:
sed -nE '/(.e)/{s//\n\1/g;s/^/e/;s/e[^\n]*\n?//g;s/\B/ /g;p}' file
N.B. Remove the s/\B /g if space separation is not needed.

With GNU awk you can use empty string as FS to split the input as individual characters:
awk -v FS= '/[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file
t
h
r
Excluding "e" at the beginning in the for loop.
edited
empty string if e is the first character in the word.
For example, this input:
cat file2
grep
erroneously
egg
Wednesday
effectively
awk -v FS= '/^[e]/ {print ""} /[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file2
r
n
W
n
f
v

Unix: Strings beginning and ending with certain characters and writing them in a text doc

So basically, the task is to count the number of students from who (command in putty) which have a student ID starting with 15 or 16 and ending with even number are currently logged into the machine, and write the output in the file sample.txt.
I've tried this command and it doesn't seem to work...
grep '^15\|16.*[02468]' | who | wc -l > sample.txt
Example:
155053
165054
175055
155056
Num of lines: 2
Any ideas?

You need
grep '^1[56].*[02468]$'
See the grep demo online.
Detais
^ - start of string
1 - 1
[56] - 5 or 6
.* - ant 0+ chars (replace with [0-9]* if you want to make sure pnly digit-only strings are matched)
[02468] - 0, 2, 4, 6, 8
$ - end of string.

Counting specific lines that don't contain specific word

Please I have question: I have a file like this
#HWI-ST273:296:C0EFRACXX:2:2101:17125:145325/1
TTAATACACCCAACCAGAAGTTAGCTCCTTCACTTTCAGCTAAATAAAAG
+
8?8A;DDDD;#?++8A?;C;F92+2A#19:1*1?DDDECDE?B4:BDEEI
#BBBB-ST273:296:C0EFRACXX:2:1303:5281:183410/1
TAGCTCCTTCGCTTTCAGCTAAATAAAAGCCCAGTACTTCTTTTTTACCA
+
CCBFFFFFFHHHHJJJJJJJJJIIJJJJJJJJJJJJJJJJJJJIJJJJJI
#HWI-ST273:296:C0EFRACXX:2:1103:16617:140195/1
AAGTTAGCTCCTTCGCTTTCAGCTAAATAAAAGCCCAGTACTTCTTTTTT
+
#C#FF?EDGFDHH#HGHIIGEGIIIIIEDIIGIIIGHHHIIIIIIIIIII
#HWI-ST273:296:C0EFRACXX:2:1207:14316:145263/1
AATACACCCAACCAGAAGTTAGCTCCTTCGCTTTCAGCTAAATAAAAGCC
+
CCCFFFFFHHHHHJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJIJ
I
I'm interested just about the line that starts with '#HWI', but I want to count all the lines that are not starting with '#HWI'. In the example shown, the result will be 1 because there's one line that starts with '#BBB'.
To be more clear: I just want to know know the number of the first line of the patterns (that are 4 line that repeated) that are not '#HWI'; I hope I'm clear enough. Please tell me if you need more clarification

With GNU sed, you can use its extended address to print every fourth line, then use grep to count the ones that don't start with #HWI:
sed -n '1~4p' file.fastq | grep -cv '^#HWI'
Otherwise, you can use e.g. Perl
perl -ne 'print if 1 == $. % 4' -- file.fastq | grep -cv '^#HWI'
$. contains the current line number, % is the modulo operator.
But once we're running Perl, we don't need grep anymore:
perl -lne '++$c if 1 == $. % 4; END { print $c }' -- file.fastq
-l removes newlines from input and adds them to output.

How can I extract some data out of the middle of a noisy file using Perl 6?

I would like to do this using idiomatic Perl 6.
I found a wonderful contiguous chunk of data buried in a noisy output file.
I would like to simply print out the header line starting with Cluster Unique and all of the lines following it, up to, but not including, the first occurrence of an empty line. Here's what the file looks like:
</path/to/projects/projectname/ParameterSweep/1000.1.7.dir> was used as the working directory.
....
Cluster Unique Sequences Reads RPM
1 31 3539 3539
2 25 2797 2797
3 17 1679 1679
4 21 1636 1636
5 14 1568 1568
6 13 1548 1548
7 7 1439 1439
Input file: "../../filename.count.fa"
...
Here's what I want parsed out:
Cluster Unique Sequences Reads RPM
1 31 3539 3539
2 25 2797 2797
3 17 1679 1679
4 21 1636 1636
5 14 1568 1568
6 13 1548 1548
7 7 1439 1439

One-liner version
.say if /Cluster \s+ Unique/ ff^ /^\s*$/ for lines;
In English
Print every line from the input file starting with the once containing the phrase Cluster Unique and ending just before the next empty line.
Same code with comments
.say # print the default variable $_
if # do the previous action (.say) "if" the following term is true
/Cluster \s+ Unique/ # Match $_ if it contains "Cluster Unique"
ff^ # Flip-flop operator: true until preceding term becomes true
# false once the term after it becomes true
/^\s*$/ # Match $_ if it contains an empty line
for # Create a loop placing each element of the following list into $_
lines # Create a list of all of the lines in the file
; # End of statement
Expanded version
for lines() {
.say if (
$_ ~~ /Cluster \s+ Unique/ ff^ $_ ~~ /^\s*$/
)
}
lines() is like <> in perl5. Each line from each file listed on the command line is read in one at a time. Since this is in a for loop, each line is placed in the default variable $_.
say is like print except that it also appends a newline. When written with a starting ., it acts directly on the default variable $_.
$_ is the default variable, which in this case contains one line from the file.
~~ is the match operator that is comparing $_ with a regular expression.
// Create a regular expression between the two forward slashes
\s+ matches one or more spaces
ff is the flip-flop operator. It is false as long as the expression to its left is false. It becomes true when the expression to its left is evaluated as true. It becomes false when the expression to its right becomes true and is never evaluated as true again. In this case, if we used ^ff^ instead of ff^, then the header would not be included in the output.
When ^ comes before (or after) ff, it modifies ff so that it is also false the iteration that the expression to its left (or right) becomes true.
/^\*$/ matches an empty line
^ matches the beginning of a string
\s* matches zero or more spaces
$ matches the end of a string
By the way, the flip-flop operator in Perl 5 is .. when it is in a scalar context (it's the range operator in list context). But its features are not quite as rich as in Perl 6, of course.

I would like to do this using idiomatic Perl 6.
In Perl, the idiomatic way to locate a chunk in a file is to read the file in paragraph mode, then stop reading the file when you find the chunk you are interested in. If you are reading a 10GB file, and the chunk is found at the top of the file, it's inefficient to continue reading the rest of the file--much less perform an if test on every line in the file.
In Perl 6, you can read a paragraph at a time like this:
my $fname = 'data.txt';
my $infile = open(
$fname,
nl => "\n\n", #Set what perl considers the end of a line.
); #Removed die() per Brad Gilbert's comment.
for $infile.lines() -> $para {
if $para ~~ /^ 'Cluster Unique'/ {
say $para.chomp;
last; #Quit reading the file.
}
}
$infile.close;
# ^ Match start of string.
# 'Cluster Unique' By default, whitespace is insignificant in a perl6 regex. Quotes are one way to make whitespace significant.
However, in perl6 rakudo/moarVM the open() function does not read the nl argument correctly, so you currently can't set paragraph mode.
Also, there are certain idioms that are considered by some to be bad practice, like:
Postfix if statements, e.g. say 'hello' if $y == 0.
Relying on the implicit $_ variable in your code, e.g. .say
So, depending on what side of the fence you live on, that would be considered a bad practice in Perl.

grep Top n Matches Across Files

I'm using grep to extract lines across a set of files:
grep somestring *.log
Is it possible to limit the maximum number of matches per file? Ideally I'd just to print out n lines from each of the *.log files.

To limit 11 lines per file:
grep -m11 somestring *.log

Here is an alternate way of simulating it with awk:
awk 'f==10{f=0; nextfile; exit} /regex/{++f; print FILENAME":"$0}' *.log
Explanation:
f==10 : f is a flag we set and check if the value of it is equal to 10. You can configure it depending on the number of lines
you wish to match.
nextfile : Moves processing to the next file.
exit : Breaks out of awk.
/regex/ : You're search regex or pattern.
{++f;print FILENAME":"$0} : We increment the flag and print the filename and line.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

how to use grep to match partial occurrence of searched string - grep

my text file : vol0 0.9 vol2 1.2 vol3 3 vol1_1 1.4 voll_2 5 vol1 1 vol2 1.2 vol3 3 vol1_1 1.4 voll_2 5 vol2 1.3 vol2 1.8 vol3 3.2 vol1_1 1.2 voll_2 4.8 .... How to grep lines with vol1 ,i.e only 2nd line. From what i am doing, it is taking all the three line.

Related

extract the adjacent character of selected letter

Unix: Strings beginning and ending with certain characters and writing them in a text doc

Counting specific lines that don't contain specific word

How can I extract some data out of the middle of a noisy file using Perl 6?

grep Top n Matches Across Files

Categories

Resources