Does grep on AIX support the? - grep

I am trying to use the following grep command
grep "Fred\(eric\)\? Smith" names.txt
where names.txt contains Fred Smith and Frederic Smith. However, grep only matches Frederic Smith. So, is ? supported in grep where ? refers to the character occurring 0 or 1 time.

AIX uses UNIX grep. So it supports basic RE. But I suggest you use extended RE with -E option. Then you dont have escape those special characters.
grep -E "Fred(eric)? Smith" names.txt

Related

grep "?" does not match valid matches

I want to match tags in files (with optional brackets) ... easy one would think ... the regex is something like ^\[?MyTag\]?. But ... Grep doesn't like it. None of the lines that would be valid matches are actually matched.
The interesting part is: if I replace the ? with a * (so zero to infinite matches, not zero or one) it matches everything like it should, but really that would mean the feature is broken and I don't believe that.
Any input?
Using grep (GNU grep) 2.22 on Windows.
PS: so grep is like this ...
grep -e "^\[?MyTag\]?" file.txt
and my test file is like this
[MyTag] hello
NotMyTag ugly
[NotMyTag] dumb
MyTag world
which obviously should result in 1st and 4th line showing but shows nothing.
First off, ? is not supported in vanilla grep, so you need to use the -E flag to enable extended regex. You can easily verify this by running grep '?' <<< 'a' and grep -E '?' <<< 'a'. Only the latter will match. -e just explicitly indicates what your regex is. It is not the same as -E.
Your initial command works fine if you change the -e to upper case:
grep -E '^\[?MyTag\]?'
Example:
$ grep -E '^\[?MyTag\]?' <<< '[MyTag] hello
> NotMyTag ugly
> [NotMyTag] dumb
> MyTag world'
Output:
[MyTag] hello
MyTag world
Credit goes to the answers of this question on SuperUser.
? is not part of the basic regular expressions, which grep supports. GNU grep supports them as an extension, but you have to escape them:
$ grep '^\[\?MyTag\]\?' file.txt
[MyTag] hello
MyTag world
Or, as pointed out, use grep -E to enable extended regular expressions.
For GNU grep, the only difference between grep and grep -E, i.e., using basic and extended regular expressions, is what you have to escape and what not.
Basic regular expressions
Capture groups and quantifying have to be escaped: \( \) and \{ \}
Zero or one (?), one or more (+) and alternation (|) are not part of BRE, but supported by GNU grep as an extension (but need to be escaped: \? \+ \|)
Extended regular expressions
Capture groups and quantifying don't have to be escaped: ( ) and { }
?, + and | are supported and don't need be be escaped

How do I grep for a string excluding some other string

I want to do sth like:
grep -A 10 'myString' && NOT 'anotherString'
If I didn't need -A 10 I know I could pipe greps and use -v, but it would not work like that in this case. So I would do sth like that:
grep "myString" | grep -v "anotherString"
Any ideas?
Try to invert and place the grep with the -A 10 argument in the end. Like this:
grep -v 'anotherString' | grep -A 10 'myString'
The only POSIX supported options for grep are -EFcefilnqsvx so be aware that the -A option may not be present on all implementations of grep. And even on GNU grep there is no option to specify "match OR match" and there is no regex that can emulate this as all it can do is provide additional matches, but can not withhold them. Essentially the only way to accomplish this with grep alone is to use a pipe.

Grep: Capture just number

I am trying to use grep to just capture a number in a string but I am having difficulty.
echo "There are <strong>54</strong> cities | grep -o "([0-9]+)"
How am I suppose to just have it return "54"? I have tried the above grep command and it doesn't work.
echo "You have <strong>54</strong>" | grep -o '[0-9]' seems to sort of work but it prints
5
4
instead of 54
Don't parse HTML with regex, use a proper parser :
$ echo "There are <strong>54</strong> cities " |
xmllint --html --xpath '//strong/text()' -
OUTPUT:
54
Check RegEx match open tags except XHTML self-contained tags
You need to use the "E" option for extended regex support (or use egrep). On my Mac OSX:
$ echo "There are <strong>54</strong> cities" | grep -Eo "[0-9]+"
54
You also need to think if there are going to be more than one occurrence of numbers in the line. What should be the behavior then?
EDIT 1: since you have now specified the requirement to be a number between <strong> tags, I would recommend using sed. On my platform, grep does not have the "P" option for perl style regexes. On my other box, the version of grep specifies that this is an experimental feature so I would go with sed in this case.
$ echo "There are <strong>54</strong> 12 cities" | sed -rn 's/^.*<strong>\s*([0-9]+)\s*<\/strong>.*$/\1/p'
54
Here "r" is for extended regex.
EDIT 2: If you have the "PCRE" option in your version of grep, you could also utilize the following with positive lookbehinds and lookaheads.
$ echo "There are <strong>54 </strong> 12 cities" | grep -o -P "(?<=<strong>)\s*([0-9]+)\s*(?=<\/strong>)"
54
RegEx Demo

Simple Grep Issue

I am trying to parse items out of a file I have. I cant figure out how to do this with grep
here is the syntax
<FQDN>Compname.dom.domain.com</FQDN>
<FQDN>Compname1.dom.domain.com</FQDN>
<FQDN>Compname2.dom.domain.com</FQDN>
I want to spit out just the bits between the > and the <
can anyone assist?
Thanks
grep can do some text extraction. however not sure if this is what you want:
grep -Po "(?<=>)[^<]*"
test
kent$ echo "<FQDN>Compname.dom.domain.com</FQDN>
dquote>
dquote> <FQDN>Compname1.dom.domain.com</FQDN>
dquote>
dquote> <FQDN>Compname2.dom.domain.com</FQDN>"|grep -Po "(?<=>)[^<]*"
Compname.dom.domain.com
Compname1.dom.domain.com
Compname2.dom.domain.com
Grep isn't what you are looking for.
Try sed with a regular expression : http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
You can do it like you want with grep :
grep -oP '<FQDN>\K[^<]+' FILE
Output:
Compname.dom.domain.com
Compname1.dom.domain.com
Compname2.dom.domain.com
As others have said, grep is not the ideal tool for this. However:
$ echo '<FQDN>Compname.dom.domain.com</FQDN>' | egrep -io '[a-z]+\.[^<]+'
Compname.dom.domain.com
Remember that grep's purpose is to MATCH things. The -o option shows you what it matched. In order to make regex conditions that are not part of the expression that is returned, you'd need to use lookahead or lookbehind, which most command-line grep does not support because it's part of PCRE rather than ERE.
$ echo '<FQDN>Compname.dom.domain.com</FQDN>' | grep -Po '(?<=>)[^<]+'
Compname.dom.domain.com
The -P option will work in most Linux environments, but not in *BSD or OSX or Solaris, etc.

Can grep show only words that match search pattern?

Is there a way to make grep output "words" from files that match the search expression?
If I want to find all the instances of, say, "th" in a number of files, I can do:
grep "th" *
but the output will be something like (bold is by me);
some-text-file : the cat sat on the mat
some-other-text-file : the quick brown fox
yet-another-text-file : i hope this explains it thoroughly
What I want it to output, using the same search, is:
the
the
the
this
thoroughly
Is this possible using grep? Or using another combination of tools?
Try grep -o:
grep -oh "\w*th\w*" *
Edit: matching from Phil's comment.
From the docs:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
Cross distribution safe answer (including windows minGW?)
grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"
If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.
Linux cross distribution safe answer
grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'
To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)
More from the manual for grep
-o Print each match, but only the match, not the entire line.
-h Never print filename headers (i.e. filenames) with output lines.
-w The expression is searched for as a word (as if surrounded by
`[[:<:]]' and `[[:>:]]';
The reason why the original answer does not work for everyone
The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more
Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep
As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.
(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)
Credit for the "-o" workaround from #AdamRosenfield answer
It's more simple than you think. Try this:
egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)
egrep -iwo 'th.[a-z]*' filename.txt ### (Case Insensitive)
Where,
egrep: Grep will work with extended regular expression.
w : Matches only word/words instead of substring.
o : Display only matched pattern instead of whole line.
i : If u want to ignore case sensitivity.
You could translate spaces to newlines and then grep, e.g.:
cat * | tr ' ' '\n' | grep th
Just awk, no need combination of tools.
# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly
grep command for only matching and perl
grep -o -P 'th.*? ' filename
I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.
It seems like ack (or ack-grep if you use Ubuntu) can do this easily:
# ack-grep -ho "\bth.*?\b" *
the
the
the
this
thoroughly
If you omit the -h flag you get:
# ack-grep -o "\bth.*?\b" *
some-other-text-file
1:the
some-text-file
1:the
the
yet-another-text-file
1:this
thoroughly
As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:
# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file
1, 5, 12/27/2010
cat *-text-file | grep -Eio "th[a-z]+"
You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.
From Wikipedia:
cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple
grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple
I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.
At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o
so, I think that could be something similar to (I'm NOT a regex Master) :
egrep -o "the*|this{1}|thoroughly{1}" filename
To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.
ack -oh --type=html "\w*icon-\w*" | sort | uniq
You could pipe your grep output into Perl like this:
grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'
grep --color -o -E "Begin.{0,}?End" file.txt
? - Match as few as possible until the End
Tested on macos terminal
$ grep -w
Excerpt from grep man page:
-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.
ripgrep
Here are the example using ripgrep:
rg -o "(\w+)?th(\w+)?"
It'll match all words matching th.

Resources