grep matching control -ie vs -ei and patterns - grep

I don't understand the matching control with grep.
grep --include "*.md" -rnw -ie 'word1.*word2'
vs
grep --include "*.md" -rnw . -ie 'word1.*word2'
vs
grep --include "*.md" -rnw -ei 'word1.*word2'
vs
grep --include "*.md" -rnw . -ei 'word1.*word2'
I do not understand why these grep expression are not equivalent.
The 1st one outputs as I presumed
any content which starts with word1 followed by a set of chars before word2.
This expression seems to be equivalent to the 2nd one.
The last expression outputs seems to be equivalent to
grep --include "*.md" -rnw . -ei
and outputs isolated i, such as "i.e" or i) or -i
the 3rd expression I do not understand its output
Thanks

Related

How to grep repeated strings on a single line?

I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?
Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"
I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m
You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!
The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m
The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.

How to grep two patterns at once

Often times I have to do so commandline thing where I pipe to grep and want matches for two different expressions. ( An OR match A OR B. )
For example I want to grep the output of generate_out for either foo[0-9]+ or bar[0-9]+. I of course could just execute twice:
generate_out| grep "foo[0-9]+"
generate_out| grep "bar[0-9]+"
but often generate_out is expensive and I would rather not want to run it twice ( or store it's output ). Rather I would like to just use one expression:
generate_out| grep "foo[0-9]+ OR bar[0-9]+"
of course this will not work but I would like the equivalent expression which will.
use grep's -e option to specify multiple patterns that are "OR'ed":
$ seq 15 | grep -e 5 -e 3
3
5
13
15
Use an alternation in your regex:
generate_out | grep -E '(foo|bar)[0-9]+'
The use of -E enables ERE features, of this which is one. (By default, grep only supports BRE; some implementations of BRE -- such as GNU's -- may have special syntax for enabling ERE features; in the GNU case, \| in BRE is equivalent to | in ERE; however, it's not portable to rely on such extensions instead of just turning on ERE properly).
egrep is a backwards-compatibility synonym for grep -E; however, only the latter is specified as a requirement by POSIX.
Use awk for simplicity:
generate_out| awk '/foo[0-9]+/ || /bar[0-9]+/'
which of course could be simplified in this particular case to:
generate_out| awk '/(foo|bar)[0-9]+/'
but in general you want to use awk for simple, consistent ORs and ANDs of regexps:
cmd | grep -E 'foo.*bar|bar.*foo'
cmd | awk '/foo/ && /bar/'
cmd | grep 'foo' | grep -v 'bar'
cmd | awk '/foo/ && !/bar/'
cmd | grep -E 'foo|bar'
cmd | awk '/foo/ || /bar/' (or awk '/foo|bar/')
cmd | grep -E 'foo|bar' | grep -E -v 'foo.*bar|bar.*foo'
cmd | awk '(/foo/ && !/bar/) || (/bar/ && !/foo/)'

How to grep for filenames found by find in other files?

How can I grep for the result of find within another pattern?
That's how I get all filenames with a certain pattern (in my case ending with "ext1")
find . -name *ext1 -printf "%f\n"
And then I want to grep for these filenames with another pattern (in my case ending on "ext2"):
grep -r '[filname]' *ext2
I tried with
find . -name *ext1 -printf "%f\n" | xargs grep -r *ext2
But this only makes grep tell me that it can not find the files found by find.
You would tell grep that the patterns are in a file with the -f option, and use the "stdin filename" -:
find ... | grep -r -f - *ext2

Search and replace in xib file

I am trying to search a text in some of the xibs in my project and replace the found text with some other text. I am using below mentioned command to perform the mentioned action but it is saying
"grep: warning: recursive search of stdin" and going to infinite waiting state.
grep -i -r --include=*.xib “$MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQA" myProjectPath | sort | uniq | xargs perl -e “s/$MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQA/$MC4xNTI5NDExODIzIDAuODA3ODQzMjA4MyAwLjE4MDM5MjE2MQA/" -pi
Please let me know where i am going wrong.
Thanx in advance.
The shell is expanding $MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQAas a variable and grep is producing output in the form "file: ", which sort | uniq is not correcting.
grep -l -i -r --include=*.xib '\$MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQA' myProjectPath | xargs perl -pi -e 's/\$MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQA/\$MC4xNTI5NDExODIzIDAuODA3ODQzMjA4MyAwLjE4MDM5MjE2MQA/' "$file"

grep multiple extension current and subfolders

I'm trying to grep multiple extensions within the current and all sub-folders.
grep -i -r -n 'hello' somepath/*.{php,html}
This is only grepping the current folder but not sub-folders.
What would be a good way of doing this?
Using only grep:
grep -irn --include='*.php' --include='*.html' 'hello' somepath/
One of these:
find '(' -name '*.php' -o -name '*.html' ')' -exec grep -i -n hello {} +
find '(' -name '*.php' -o -name '*.html' ')' -print0 | xargs -0 grep -i -n hello
I was looking the same and when decided to do a bash script I started with vim codesearch and surprise I already did this before!
#!/bin/bash
context="$3"
#ln = line number mt = match mc = file
export GREP_COLORS="sl=32:mc=00;33:ms=05;40;31:ln="
if [[ "$context" == "" ]]; then context=5; fi
grep --color=always -n -a -R -i -C"$context" --exclude='*.mp*'\
--exclude='*.avi'\
--exclude='*.flv'\
--exclude='*.png'\
--exclude='*.gif'\
--exclude='*.jpg'\
--exclude='*.wav'\
--exclude='*.rar'\
--exclude='*.zip'\
--exclude='*.gz'\
--exclude='*.sql' "$2" "$1" | less -R
paste this code into in a file named codesearch and set the chmod to 700 or 770
I guess this could be better here for the next time that I forgot
this script will show with colors the matches and the context around
./codesearch '/full/path' 'string to search'
and optional defining the number of context line around default 5
./codesearch '/full/path' 'string to search' 3
I edited the code and added some eye candy
example ./codesearch ./ 'eval' 2
Looks like this when you have enabled "allow blinking text" in terminal

Resources