Grep regular expression - Pattern issue

Grep regular expression - Pattern issue - grep

I'm trying to using grep to try to find things from a given pattern. For instance I have these lines:
A secret word: CoolKapplan
A secret word: Kapplan
A secret word: Bungyjump
So if I get to know the first and last letter of a word. In this example I get 'K' - 'n'.
PATTERN = K.....n
I do this: grep -w -r -H --color=always "^$PATTERN" *
And I except it to only give me the lines containing the patterns that are starting with K. But that command would also include the first line, so the result would be:
A secret word: CoolKapplan
A secret word: Kapplan
How do I make it so it searches for a pattern and not give me the pattern that is included in another word?

After some more trial and error attempts I found out that you have to add '-o' flag for it to work.

Related

avoid following dash in grep results

I am trying to select the line that has the name "paul" in it.
!grep -w '^paul' some_file
This also returns the lines starting with paul-henri. How do I select the single line that starts with the word 'paul' only?
(In other words, dash - or slash / and dot . are getting selected if followed by the word paul)
Update:
Thanks to Tim, this worked:
grep -w '^paul' some_file | grep -vE 'paul[-./?]'

You could match on the pattern ^paul[^-]:
!grep -w '^paul[^-]' some_file
This would match any line starting with paul, which is then followed by one or more characters other than dash. If you need to also match possible lines starting with and containing only paul, then you might need to use a negative lookahead:
^paul(?!-)
But, this would require an extended version of grep, and your version of grep might not support it.

Grep {n} The preceding item is matched exactly n times, is not clear to me in case of "Hair", "Haair" and "Haaair"

Suppose there is three strings "Hair", "Haair" and "Haaair" , When i use grep -E '^Ha{1}' , it returns all the former three words, instead i was expecting only "Hair", as i have asked return a line which starts with H and is followed by letter 'a' exactly once.

grep does not check that its input matches the given search expression. Grep finds substrings of the input that match the search.
See:
grep test <<< This is a test.
The input does not exactly match test. Only part of the input matches,
This is a test.
but that is enough for grep to output the whole line.
Similarly, when you say
grep -E '^Ha{1}' <<< Haaair
The input does not exactly match the search, but a part of it does,
Haaair
and that is enough. Note that {n,m} syntax is purely a convenience: Ha{1} is exactly equivalent to Ha, Ha{3,} is Haaa+, Ha{2,5} is Haa(a?){3} is Haaa?a?a?, etc. In other words, {1} does not mean "exactly once", it just means "once".
What you want to do is match a Ha that is not followed by another a. You have two options:
If your grep supports PCRE, you can use a negative lookahead:
grep -P '^Ha(?!a)'
(?!a) is a zero-length assertion, like ^. It doesn't match any characters; it simply causes the match to fail if there is an a after the first one.
Or, you can keep it simple and use a negative []:
grep -E '^Ha([^a]|$)'
Where [^a] matches any single character that is not a, and the alternation with $ handles the case of no character at all.

Getting only grep exact matches

I am trying to grep a file for the exact occurrence of a match, but I get also longer spurious matches:
grep CAT1717O99 myfile.txt -F -w
Output:
CAT1717O99
CAT1717O99.5
I would like to output only the first exactly matching line. Is there any way to get rid of the second line?
Thanks in advance.
Arturo
This is the file 'myfile.txt':
CAT1717O99
CAT1717O99.5

This will do the work for you.
grep -Fx "CAT1717O99" textfile
-F means Fixed
-x mean exact

Use the power of Perl-compatible regular expression (PCRE) and search the matches to the given pattern:
grep -Po "\bCAT1717O99(\s|$)" myfile.txt
(\s|$) - alternative group, ensures matching substring CAT1717O99 if it's followed by whitespace or placed at the end of the line
-P option, allows regular expressions
-o option, prints only matched parts of matching lines

You'll need explicitly request spaces in order to ignore special chars.
grep -E '(^| )CAT1717O99( |$)' myFile.txt
from grep manual :
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.

Pattern matching using grep

Assuming we have one input string like
Nice
And we have the pattern
D*A*C*N*a*g*.h*ca*e
then "Nice" will match the pattern. (* means 0 or more occurrence, . means one char)
I think using grep is better than java in this case(maybe). How can I do it in grep?

Use the same regular expression:
grep 'D*A*C*N*a*g*.h*ca*e' <<EOF
Nice
EOF
If the input is "Nicely" it still prints it! How does it work?
The current regex looks for the pattern anywhere on the line. If it must match exactly (the whole line), then add anchors to start (^) and end ($) of line:
grep '^D*A*C*N*a*g*.h*ca*e$' <<EOF
Nice
Nicely
Darce
Darcy
Darcey
EOF

Confusion in Linux grep command

I have a very basic confusion about grep. Suppose I have a following file to grep in:
test.txt:
This is an article
from some newspaper
Article is good
newspaper is not.
Now if I grep with following expression
grep -P "is\s*g" test.txt
I get the line:
Article is good
However if I do this:
grep -P "is*g" test.txt
I don't get anything. My question is since asterix (*) is a wildcard which represents 0 or more repetitions of the previous character, shouldn't the output of grep be the same. Why the zero or more repetitions of 's' is not giving any output?
What am I missing here. Thanks for the help!

Because there's nothing in your input that matches i, then 0 or more repetitions of s, then g. "Article is good" can't match because it has a space after the s, not a g. The pattern is\s*g matches because \s is a special pattern that matches any sort of whitespace — so the overall pattern is is, then any amount of space, then g, which naturally matches "is g".

I see no ig, isg, issg, issssg in your input...
Since I don't know what you wanted to match, here is my best guess:
grep -P "is.*g" test.txt

You should see regular expression first before you use grep, also you will find it usefull with other commands... http://www.regular-expressions.info/

It's 0 or more repetition of the previous regex atom, and that atom is \s. So \s* can match tab-space-tab-space-space.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Grep regular expression - Pattern issue - grep

After some more trial and error attempts I found out that you have to add '-o' flag for it to work.

Related

avoid following dash in grep results

Grep {n} The preceding item is matched exactly n times, is not clear to me in case of "Hair", "Haair" and "Haaair"

Getting only grep exact matches

Pattern matching using grep

Confusion in Linux grep command

Categories

Resources