Confusion in Linux grep command

Confusion in Linux grep command - grep

I have a very basic confusion about grep. Suppose I have a following file to grep in:
test.txt:
This is an article
from some newspaper
Article is good
newspaper is not.
Now if I grep with following expression
grep -P "is\s*g" test.txt
I get the line:
Article is good
However if I do this:
grep -P "is*g" test.txt
I don't get anything. My question is since asterix (*) is a wildcard which represents 0 or more repetitions of the previous character, shouldn't the output of grep be the same. Why the zero or more repetitions of 's' is not giving any output?
What am I missing here. Thanks for the help!

Because there's nothing in your input that matches i, then 0 or more repetitions of s, then g. "Article is good" can't match because it has a space after the s, not a g. The pattern is\s*g matches because \s is a special pattern that matches any sort of whitespace — so the overall pattern is is, then any amount of space, then g, which naturally matches "is g".

I see no ig, isg, issg, issssg in your input...
Since I don't know what you wanted to match, here is my best guess:
grep -P "is.*g" test.txt

You should see regular expression first before you use grep, also you will find it usefull with other commands... http://www.regular-expressions.info/

It's 0 or more repetition of the previous regex atom, and that atom is \s. So \s* can match tab-space-tab-space-space.

Related

grep for path in process(ps) containing number

I would like to grep for process path which has a variable. Example -
This is one of the proceses running.
/var/www/vhosts/rcsdfg/psd_folr/rcerr-m-deve-udf-172/bin/magt queue:consumers:start customer.import_proditns --single-thread --max-messages=1000
I would like to grep for "psd_folr/rcerr-m-deve-udf-172/bin/magt queue" from the running processes.
The catch is that the number 172 keeps changing, but it will be a 3 digit number only. Please suggest, I tried below but it is not returning any output.
sudo ps axu | grep "psd_folr/rcerr-m-deve-udf-'^[0-9]$'/bin/magt queue"

The most relevant section of your regular expression is -'^[0-9]$'/ which has following problems:
the apostrophes have no syntactical meaning to grep other than read an apostrophe
the caret ^ matches the beginning of a line, but there is no beginning of a line in ps's output at this place
the dollar $ matches the end of a line, but there is no end of a line in ps's output at this place
you want to read 3 digits but [0-9] will only match a single one
Thus, the part of your expression should be modified like this -[0-9]+/ to match any number of digits (+ matches the preceding character any number of times but at least once) or like this -[0-9]{3}/ to match exactly three times ({n} matches the preceding character exactly n times).
If you alter your command, give grep the -E flag so it uses extended regular expressions, otherwise you need to escape the plus or the braces:
sudo ps axu | grep -E "psd_folr/rcerr-m-deve-udf-[0-9]+/bin/magt queue"

Grep {n} The preceding item is matched exactly n times, is not clear to me in case of "Hair", "Haair" and "Haaair"

Suppose there is three strings "Hair", "Haair" and "Haaair" , When i use grep -E '^Ha{1}' , it returns all the former three words, instead i was expecting only "Hair", as i have asked return a line which starts with H and is followed by letter 'a' exactly once.

grep does not check that its input matches the given search expression. Grep finds substrings of the input that match the search.
See:
grep test <<< This is a test.
The input does not exactly match test. Only part of the input matches,
This is a test.
but that is enough for grep to output the whole line.
Similarly, when you say
grep -E '^Ha{1}' <<< Haaair
The input does not exactly match the search, but a part of it does,
Haaair
and that is enough. Note that {n,m} syntax is purely a convenience: Ha{1} is exactly equivalent to Ha, Ha{3,} is Haaa+, Ha{2,5} is Haa(a?){3} is Haaa?a?a?, etc. In other words, {1} does not mean "exactly once", it just means "once".
What you want to do is match a Ha that is not followed by another a. You have two options:
If your grep supports PCRE, you can use a negative lookahead:
grep -P '^Ha(?!a)'
(?!a) is a zero-length assertion, like ^. It doesn't match any characters; it simply causes the match to fail if there is an a after the first one.
Or, you can keep it simple and use a negative []:
grep -E '^Ha([^a]|$)'
Where [^a] matches any single character that is not a, and the alternation with $ handles the case of no character at all.

Getting only grep exact matches

I am trying to grep a file for the exact occurrence of a match, but I get also longer spurious matches:
grep CAT1717O99 myfile.txt -F -w
Output:
CAT1717O99
CAT1717O99.5
I would like to output only the first exactly matching line. Is there any way to get rid of the second line?
Thanks in advance.
Arturo
This is the file 'myfile.txt':
CAT1717O99
CAT1717O99.5

This will do the work for you.
grep -Fx "CAT1717O99" textfile
-F means Fixed
-x mean exact

Use the power of Perl-compatible regular expression (PCRE) and search the matches to the given pattern:
grep -Po "\bCAT1717O99(\s|$)" myfile.txt
(\s|$) - alternative group, ensures matching substring CAT1717O99 if it's followed by whitespace or placed at the end of the line
-P option, allows regular expressions
-o option, prints only matched parts of matching lines

You'll need explicitly request spaces in order to ignore special chars.
grep -E '(^| )CAT1717O99( |$)' myFile.txt
from grep manual :
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.

Grep: First word in line that begins with ? and ends with?

I'm trying to do a grep command that finds all lines in a file whos first word begins "as" and whos first word also ends with "ng"
How would I go about doing this using grep?

This should just about do it:
$ grep '^as\w*ng\b' file
Regexplanation:
^ # Matches start of the line
as # Matches literal string as
\w # Matches characters in word class
* # Quantifies \w to match either zero or more
ng # Matches literal string ng
\b # Matches word boundary
May have missed the odd corner case.
If you only want to print the words that match and not the whole lines then use the -o option:
$ grep -o '^as\w*ng\b' file
Read man grep for all information on the available options.

I am pretty sure this should work:
grep "^as[a-zA-Z]*ng\b" <filename>
hard to say without seeing samples from the actual input file.

sudo has already covered it well, but I wanted to throw out one more simple one:
grep -i '^as[^ ]*ng\b' <file>
-i to make grep case-insensitive
[^ ]* matches zero or more of any character, except a space

^ finds the 'first character in a line', so you can search for that with:
grep '^as' [file]
\w matches a word character, so \w* would match any number of word characters:
grep '^as\w*' [file]
\b means 'a boundary between a word and whitespace' which you can use to ensure that you're matching the 'ng' letters at the end of the word, instead of just somewhere in the middle:
grep '^as\w*ng\b' [file]
If you choose to omit the [file], simply pipe your files into it:
cat [file] | grep '^as\w*ng\b'
or
echo [some text here] | grep '^as\w*ng\b'
Is that what you're looking for?

Pattern matching using grep

Assuming we have one input string like
Nice
And we have the pattern
D*A*C*N*a*g*.h*ca*e
then "Nice" will match the pattern. (* means 0 or more occurrence, . means one char)
I think using grep is better than java in this case(maybe). How can I do it in grep?

Use the same regular expression:
grep 'D*A*C*N*a*g*.h*ca*e' <<EOF
Nice
EOF
If the input is "Nicely" it still prints it! How does it work?
The current regex looks for the pattern anywhere on the line. If it must match exactly (the whole line), then add anchors to start (^) and end ($) of line:
grep '^D*A*C*N*a*g*.h*ca*e$' <<EOF
Nice
Nicely
Darce
Darcy
Darcey
EOF

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Confusion in Linux grep command - grep

I see no ig, isg, issg, issssg in your input... Since I don't know what you wanted to match, here is my best guess: grep -P "is.*g" test.txt

You should see regular expression first before you use grep, also you will find it usefull with other commands... http://www.regular-expressions.info/

It's 0 or more repetition of the previous regex atom, and that atom is \s. So \s* can match tab-space-tab-space-space.

Related

grep for path in process(ps) containing number

Grep {n} The preceding item is matched exactly n times, is not clear to me in case of "Hair", "Haair" and "Haaair"

Getting only grep exact matches

Grep: First word in line that begins with ? and ends with?

Pattern matching using grep

Categories

Resources