Pattern matching using grep - grep

Assuming we have one input string like
Nice
And we have the pattern
D*A*C*N*a*g*.h*ca*e
then "Nice" will match the pattern. (* means 0 or more occurrence, . means one char)
I think using grep is better than java in this case(maybe). How can I do it in grep?

Use the same regular expression:
grep 'D*A*C*N*a*g*.h*ca*e' <<EOF
Nice
EOF
If the input is "Nicely" it still prints it! How does it work?
The current regex looks for the pattern anywhere on the line. If it must match exactly (the whole line), then add anchors to start (^) and end ($) of line:
grep '^D*A*C*N*a*g*.h*ca*e$' <<EOF
Nice
Nicely
Darce
Darcy
Darcey
EOF

Related

How to type AND in regex word matching

I'm trying to do a word search with regex and wonder how to type AND for multiple criteria.
For example, how to type the following:
(Start with a) AND (Contains p) AND (Ends with e), such as the word apple?
Input
apple
pineapple
avocado
Code
grep -E "regex expression here" input.txt
Desired output
apple
What should the regex expression be?
In general you can't implement and in a regexp (but you can implement then with .*) but you can in a multi-regexp condition using a tool that supports it.
To address the case of ands, you should have made your example starts with a and includes p and includes l and ends with e with input including alpine so it wasn't trivial to express in a regexp by just putting .*s in between characters but is trivial in a multi-regexp condition:
$ cat file
apple
pineapple
avocado
alpine
Using &&s will find both words regardless of the order of p and l as desired:
$ awk '/^a/ && /p/ && /l/ && /e$/' file
apple
alpine
but, as you can see, you can't just use .*s to implement and:
$ grep '^a.*p.*l.*e$' file
apple
If you had to use a single regexp then you'd have to do something like:
$ grep -E '^a.*(p.*l|l.*p).*e$' file
apple
alpine
two ways you can do it
all that "&&" is same as negating the totality of a bunch of OR's "||", so you can write the reverse of what you want.
at a single bit-level, AND is same as multiplication of the bits, which means, instead of doing all the && if u think it's overly verbose, you can directly "multiply" the patterns together :
awk '/^a/ * /p/ * /e$/'
so by multiplying them, you're doing the same as performing multiple logical ANDs all at once
(but only use the short hand if inputs aren't too gigantic, or when savings from early exit are known to be negligible.
don't think of them as merely regex patterns - it's easier for one to think of anything not inside an action block, what's typically referred to as pattern, as
any combination and collection of items that could be evaluated for a boolean outcome of TRUE or FALSE in the end
e.g. POSIX-compliant expressions that work in the space include
sprintf()
field assignments, etc
(even decrementing NR - if there's such a need)
but not
statements like next, print, printf(),
delete array etc, or any of the loop structures
surprisingly though, getline is directly doable
in the pattern space area (with some wrapper workaround)

Can grep print its matches to multiple lines, even if found on the same line?

For example, with the following string:
[:variable_one] == options[:variable_two]
and the following grep argument:
grep -Eo "\[\:.*?\]"
It will show the output of:
[:variable_one] == options[:variable_two]
but instead, I'm looking to get an output of:
[:variable_one]
[:variable_two]
Is there a way to "split" each match into a separate line, even if it finds multiple matches on a single line? Basically looking for the opposite answer of this: Print multiple regex matches using grep on the same line
The : and ] (that is not part of a bracket expression) chars are not special inside a regex pattern. *? is treated as * in the POSIX ERE pattern, so it is too greedy and matches until the rightmost occurrence of ].
A POSIX BRE compliant regex for use with grep can look like
#!/bin/bash
s='[:variable_one] == options[:variable_two]'
grep -o "\[:[^][]*]" <<< "$s"
See the online demo. Output:
[:variable_one]
[:variable_two]

is there a sophisticated way to grep this file

I have one file. Written in BNF it could be
<line>:== ((<ISBN10>|<ISBN13>)([a-Z/0-9]*)) {1,4})
For example
123456789X/abscd/1234567890123/djfkldsfjj
How can I grep the ISBN10 or ISBN13 ONLY one per line even when in the line are more ISBNs. If there are more ISBNs in the line it should take only the first in line.
When I grep that way
grep -Po "[0-9]{9,13}X{0,1}" file
then I get more lines than the file originally has. (As there could be max 4 ISBNs in line)
I would also need the linecount of file should be the linecount of the grepresult.
Any advices?
Well, assuming the other answer offered isn't correct in assuming that the 'first' ISBN isn't at the start of line, you could always try in perl.
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
chomp;
my ( $first_isbn, #rest ) = m/(\d{9,13}X{0,1})/g;
print $., ":", $first_isbn, "\n" if $first_isbn;
}
$. is the line number in perl, and so we print that and the match if there's a match. <> says read and iterate either filenames or STDIN much like grep does. So you could invoke this in a similar way to grep:
perl myscript.pl <filename>
Or:
cat <filename> | ./myscript.pl
This would one-liner-ify as:
perl -lne 'my ( $first_isbn ) = m/(\d{9,13}X{0,1})/g; print $., ":", $first_isbn, "\n" if $first_isbn;'
One trivial solution is to include the beginning of the line in your regex:
grep -Po "^[0-9]{9,13}X{0,1}" file
This ensures that matches after the first do not satisfy the regex. It does seem from your BNF that the ISBNs, if present, are guaranteed to be the first characters of the line.
Another way is to use sed:
sed -n "s/\([0-9]\{9,13\}X\).*/\1/p" file
This matches your pattern along with the rest of the line, but only prints your pattern. You could then use another utility to add line numbers. E.g. pipe your output to nl -nrz -w9.

Filter a specific letter within a word using Grep

I've been trying to find a way to filter a specific letter within a word using a regular expression. For exemple, filtering the letter "a" in the word "latin". Filtering only a letter would be simple using something like :
grep "\ba\b"
but I can't find a way to get the "a" only in a certain word.
Thanks for your help!
You can pipe to another grep, like this:
grep "\ba\b" /path/to/input/file | grep -o "a"
The latter part of the pipe uses the o flag which only outputs the matched part. Alternatively grep -o "a" should return all a's.

Confusion in Linux grep command

I have a very basic confusion about grep. Suppose I have a following file to grep in:
test.txt:
This is an article
from some newspaper
Article is good
newspaper is not.
Now if I grep with following expression
grep -P "is\s*g" test.txt
I get the line:
Article is good
However if I do this:
grep -P "is*g" test.txt
I don't get anything. My question is since asterix (*) is a wildcard which represents 0 or more repetitions of the previous character, shouldn't the output of grep be the same. Why the zero or more repetitions of 's' is not giving any output?
What am I missing here. Thanks for the help!
Because there's nothing in your input that matches i, then 0 or more repetitions of s, then g. "Article is good" can't match because it has a space after the s, not a g. The pattern is\s*g matches because \s is a special pattern that matches any sort of whitespace — so the overall pattern is is, then any amount of space, then g, which naturally matches "is g".
I see no ig, isg, issg, issssg in your input...
Since I don't know what you wanted to match, here is my best guess:
grep -P "is.*g" test.txt
You should see regular expression first before you use grep, also you will find it usefull with other commands... http://www.regular-expressions.info/
It's 0 or more repetition of the previous regex atom, and that atom is \s. So \s* can match tab-space-tab-space-space.

Resources