grep: repetition-operator operand invalid - grep

I have this regular express (?<=heads\/)(.*?)(?=\n) and you can see it working here
http://regexr.com?347dm
I need this regex to work in the grep command but I'm getting this error.
$ grep -Eio '(?<=heads\/)(.*?)(?=\n)' text.txt
grep: repetition-operator operand invalid
It works great in ack but I dont have ack on the machine I need to run this on.
ack text.txt -o --match '(?<=heads\/)(.*?)(?=\n)'
text.txt
74f3649af36984e1b784e46502fe318e91d29570 HEAD
06d4463ab47a6246e6bd94dc3b9267d59fc16c2e refs/heads/ARC
0597e13c22b6397a1b260951f9d064f668b26f08 refs/heads/LocationAge
e7e1ed942d15efb387c878b9d0335b37560c8807 refs/heads/feature/311-312-breaking-banner-updates
d0b2632b465702d840a358d0b192198ae505011c refs/heads/gulf-news
509173eafc6792739787787de0d23b0c804d4593 refs/heads/jbb-new-applicationdidfinishlaunching
1e7b03ce75b1a7ba47ff4fb5128bc0bf43a7393b refs/heads/locationdebug
74f3649af36984e1b784e46502fe318e91d29570 refs/heads/master
5d2ede384325877c24db7ba1ba0338dc7b7f84fb refs/heads/mixed-media
3f3b6a81dd3baea8744aec6b95c2fe4aaeb20ea3 refs/heads/post-onezero
4198a43aab2dfe72d7ae9e9e53fbb401fc9dac1f refs/heads/whitelabel
76741013b3b2200de29f53800d51dfd6dc7bac5e refs/tags/r10
fc53b1a05dad3072614fb397a228819a67615b82 refs/tags/r10^{}
afdcfd970c9387f6fda0390ef781c2776aa666c3 refs/tags/r11

grep does not support the (?<=...) or *? or (?=...) operators. See this table.

$ grep -Pio '(?<=heads\/)(.*?)(?=\n)' text.txt # P option instead of E
If you use GNU grep, you can use -P or --perl-regexp options.
In case you are using OS X, you need to install GNU grep.
$ brew install grep
Due to recent changes, to use GNU grep on macOS you either have to prepend the command with a 'g'
$ ggrep -Pio '(?<=heads\/)(.*?)(?=\n)' text.txt # P option instead of E
Or change the path name

Try this
grep -Eoh 'heads/.*' text.txt | grep -Eoh '/.*' | grep -Eoh '[a-zA-Z].*'

Related

grep workaround with pattern files on MacOS 10.13

I am having a perplexing problem with grep that I can't debug. This is reproducible on Mac OS High Sierra, but the problem does not occur on a current Ubuntu (where it works as expected).
I have three files:
cat haystack
apple
aardvark
cow
cat pattern1
a
aardvark
animal
cat pattern2
c
b
apple
You can create these 3 files with:
perl -e 'print "a\naardvark\nanimal"' > pattern1;
perl -e 'print "c\nb\napple"' > pattern2;
perl -e 'print "apple\naardvark\ncow"' > haystack;
Here's the problem: This yields the expected response:
grep -iowFf pattern2 haystack
apple
To explain, the grep...
-i = case insensitive
-o = display the match
-w = word match <== this is the option which is breaking it
The expression is searched for as a word (as if surrounded by `[[:<:]]' and `[[:>:]]'
-F = fast grep (fixed strings)
-f = read pattern from file
This returns nothing:
grep -iowFf pattern1 haystack
But I would expect "pattern1" to return "aardvark".
I was experimenting with this small testbed, but my real project is much larger. And I found that when I change the sequence of the lines in the patternN files, the results change.
sort -r pattern1 > pattern1.reverse
grep -iowFf pattern1.reverse haystack
That returns "aardvark"
What am I missing? I've been banging my head on this. Is it a bug in MacOS 10.13? Is there a workaround? (yes, one workaround is to replace the -w parameter with \b....\b in my patterns and turn off -F, but I am working on very large files, and I want the performance.)
On MacOSX:
$ grep -V
grep (BSD grep) 2.5.1-FreeBSD
On Centos7 e.g.
$ grep -V
grep (GNU grep) 2.20
Now, both versions work differently (as you noticed). To workaround this you can install the GNU version of grep on MacOSX with brew install grep which installs GNU grep with the prefix g. Now you can do:
$ ggrep -iowFf pattern1 haystack
aardvark

How do I 'grep -c' and avoid printing files with zero '0' count

The command 'grep -c blah *' lists all the files, like below.
% grep -c jill *
file1:1
file2:0
file3:0
file4:0
file5:0
file6:1
%
What I want is:
% grep -c jill * | grep -v ':0'
file1:1
file6:1
%
Instead of piping and grep'ing the output like above, is there a flag to suppress listing files with 0 counts?
SJ
How to grep nonzero counts:
grep -rIcH 'string' . | grep -v ':0$'
-r Recurse subdirectories.
-I Ignore binary files (thanks #tongpu, warlock).
-c Show count of matches. Annoyingly, includes 0-count files.
-H Show file name, even if only one file (thanks #CraigEstey).
'string' your string goes here.
. Start from the current directory.
| grep -v ':0$' Remove 0-count files. (thanks #LaurentiuRoescu)
(I realize the OP was excluding the pipe trick, but this is what works for me.)
Just use awk. e.g. with GNU awk for ENDFILE:
awk '/jill/{c++} ENDFILE{if (c) print FILENAME":"c; c=0}' *

How to grep in one line starting from particular string to end with particular string

I want to grep "[calleruid]=aab01b055-89e3-49f3-839e-507bb128d07e&smscresponse"
in Below file
2014-10-15 18:38:32,831 plivo-rest[2781]: INFO: Fetching GET http://*******/outbound_callback.aspx with smscresponse[to]=8912722fsf9&smscresponse[ALegUUID]=5bb516fsd64-546c-11e4-879f-551816a551303677&smscresponse[calluid]=aab01b055-89e3-49f3-839e-507bb128d07e&smscresponse[direction]=outbosund&smscresfdsponse[endreason]=UNALLOCATED_NUMBER&smscresponse[from]=83339995896999&smscresponse[starttime]=0&smscresponse[ALegRequestUUID]=5bb4bafc-546c-11e4-891d-000c29ec6e41&smscresponse[RequestUUID]=5bb4bafc-546c-11e4-891d-000c29ec6e41&smscresponse[callstatus]=completed&smscresponse[endtime]=1413378509&smscresponse[ScheduledHangupId]=5bb4c15a-546c-11e4-891d-000c29ec6e41&smscresponse[event]=missed_call_hangup
I used this command
$ grep -oP '(calluid).*$'
this greps upto end of file
I used this command
$ grep -oP '(calluid).{40}'
it fetches 40 characters but i have 1000's of calleruid's so each have different no.s of characters
So please guide me to grep exact callerid data
Use a lookahead to force the regex engine to do the match upto a specific character or a boundary.
$ grep -oP '\[calluid\][^\]\[]*(?=\[|$)' file
[calluid]=aab01b055-89e3-49f3-839e-507bb128d07e&smscresponse
Here is an gnu awk (due to multiple characters in RS) version:
awk -v RS="[[]calluid[]]=" -F[ 'NR==2 {print $1}' file
aab01b055-89e3-49f3-839e-507bb128d07e&smscresponse
You can also set RS like this: RS="\\\[calluid]="

Strange behaviour of grep command

When I do the following grep I get results I cannot explain to myself:
host:/usr/local/tomcat > grep '-XX:PermSize=256m' *
RELEASE-NOTES: http://www.apache.org/licenses/LICENSE-2.0
RUNNING.txt: http://www.apache.org/licenses/LICENSE-2.0
Afaik, none of the characters in my regular expression have a special meaning (inside square brackets, - has one, but there are none). I also put it into single quotes so it shouldn’t be modified by the shell.
Grep version: grep (GNU grep) 2.5.1
Tomcat version: 6.0.36 (binary
distro)
Since your pattern begins with a minus sign -, grep interprets it as an argument.
You could say:
grep -- '-XX:PermSize=256m' *
-- would tell grep to stop processing command line arguments.
Alternatively, you could say:
grep -- '[-]XX:PermSize=256m' *
(When you say [-] the hyphen is interpreted as a literal. Since you say (inside square brackets, - has one.., it seemed that it should be clarified.)
The shell expands the command to the following:
grep -XX:PermSize=256m LICENSE NOTICE RELEASE-NOTES RUNNING.txt bin conf lib …
Grep accepts a secret parameter -Xoptarg without reporting an invalid option. In the source code it reads
/* -X is undocumented on purpose. */
However, the next token, LICENSE is taken as regular expression. Typing grep -X * takes the LICENSE token as input string to the parameter -X and greps for NOTICE in RELEASE-NOTES and RUNNING.txt.
The meaning of the secret grep capital X parameter is to set the internal mode, i.e.
grep -X grep → behave normally (grep -G)
grep -X egrep → behave like grep -E
grep -X awk → uses RE_SYNTAX_AWK
grep -X fgrep → behave like grep -F
grep -X perl → behave like grep -P
grep -X 'unrecognized string' → behave normally
My grep complains about
grep: invalid matcher X:PermSize=256m
but what you can see here is that grep considers -X... as an option. To make it stop interpreting options, use --, i.e.
grep -- -XX:PermSize=256m *
The single-quotes are not necessary.

Simple Grep Issue

I am trying to parse items out of a file I have. I cant figure out how to do this with grep
here is the syntax
<FQDN>Compname.dom.domain.com</FQDN>
<FQDN>Compname1.dom.domain.com</FQDN>
<FQDN>Compname2.dom.domain.com</FQDN>
I want to spit out just the bits between the > and the <
can anyone assist?
Thanks
grep can do some text extraction. however not sure if this is what you want:
grep -Po "(?<=>)[^<]*"
test
kent$ echo "<FQDN>Compname.dom.domain.com</FQDN>
dquote>
dquote> <FQDN>Compname1.dom.domain.com</FQDN>
dquote>
dquote> <FQDN>Compname2.dom.domain.com</FQDN>"|grep -Po "(?<=>)[^<]*"
Compname.dom.domain.com
Compname1.dom.domain.com
Compname2.dom.domain.com
Grep isn't what you are looking for.
Try sed with a regular expression : http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
You can do it like you want with grep :
grep -oP '<FQDN>\K[^<]+' FILE
Output:
Compname.dom.domain.com
Compname1.dom.domain.com
Compname2.dom.domain.com
As others have said, grep is not the ideal tool for this. However:
$ echo '<FQDN>Compname.dom.domain.com</FQDN>' | egrep -io '[a-z]+\.[^<]+'
Compname.dom.domain.com
Remember that grep's purpose is to MATCH things. The -o option shows you what it matched. In order to make regex conditions that are not part of the expression that is returned, you'd need to use lookahead or lookbehind, which most command-line grep does not support because it's part of PCRE rather than ERE.
$ echo '<FQDN>Compname.dom.domain.com</FQDN>' | grep -Po '(?<=>)[^<]+'
Compname.dom.domain.com
The -P option will work in most Linux environments, but not in *BSD or OSX or Solaris, etc.

Resources