grep matching control -ie vs -ei and patterns

grep matching control -ie vs -ei and patterns - grep

I don't understand the matching control with grep.
grep --include "*.md" -rnw -ie 'word1.*word2'
vs
grep --include "*.md" -rnw . -ie 'word1.*word2'
vs
grep --include "*.md" -rnw -ei 'word1.*word2'
vs
grep --include "*.md" -rnw . -ei 'word1.*word2'
I do not understand why these grep expression are not equivalent.
The 1st one outputs as I presumed
any content which starts with word1 followed by a set of chars before word2.
This expression seems to be equivalent to the 2nd one.
The last expression outputs seems to be equivalent to
grep --include "*.md" -rnw . -ei
and outputs isolated i, such as "i.e" or i) or -i
the 3rd expression I do not understand its output
Thanks

Related

How to grep repeated strings on a single line?

I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?

Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"

I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m

You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!

The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m

The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.

How to grep two patterns at once

Often times I have to do so commandline thing where I pipe to grep and want matches for two different expressions. ( An OR match A OR B. )
For example I want to grep the output of generate_out for either foo[0-9]+ or bar[0-9]+. I of course could just execute twice:
generate_out| grep "foo[0-9]+"
generate_out| grep "bar[0-9]+"
but often generate_out is expensive and I would rather not want to run it twice ( or store it's output ). Rather I would like to just use one expression:
generate_out| grep "foo[0-9]+ OR bar[0-9]+"
of course this will not work but I would like the equivalent expression which will.

use grep's -e option to specify multiple patterns that are "OR'ed":
$ seq 15 | grep -e 5 -e 3
3
5
13
15

Use an alternation in your regex:
generate_out | grep -E '(foo|bar)[0-9]+'
The use of -E enables ERE features, of this which is one. (By default, grep only supports BRE; some implementations of BRE -- such as GNU's -- may have special syntax for enabling ERE features; in the GNU case, \| in BRE is equivalent to | in ERE; however, it's not portable to rely on such extensions instead of just turning on ERE properly).
egrep is a backwards-compatibility synonym for grep -E; however, only the latter is specified as a requirement by POSIX.

Use awk for simplicity:
generate_out| awk '/foo[0-9]+/ || /bar[0-9]+/'
which of course could be simplified in this particular case to:
generate_out| awk '/(foo|bar)[0-9]+/'
but in general you want to use awk for simple, consistent ORs and ANDs of regexps:
cmd | grep -E 'foo.*bar|bar.*foo'
cmd | awk '/foo/ && /bar/'
cmd | grep 'foo' | grep -v 'bar'
cmd | awk '/foo/ && !/bar/'
cmd | grep -E 'foo|bar'
cmd | awk '/foo/ || /bar/' (or awk '/foo|bar/')
cmd | grep -E 'foo|bar' | grep -E -v 'foo.*bar|bar.*foo'
cmd | awk '(/foo/ && !/bar/) || (/bar/ && !/foo/)'

How to grep for filenames found by find in other files?

How can I grep for the result of find within another pattern?
That's how I get all filenames with a certain pattern (in my case ending with "ext1")
find . -name *ext1 -printf "%f\n"
And then I want to grep for these filenames with another pattern (in my case ending on "ext2"):
grep -r '[filname]' *ext2
I tried with
find . -name *ext1 -printf "%f\n" | xargs grep -r *ext2
But this only makes grep tell me that it can not find the files found by find.

You would tell grep that the patterns are in a file with the -f option, and use the "stdin filename" -:
find ... | grep -r -f - *ext2

Search and replace in xib file

I am trying to search a text in some of the xibs in my project and replace the found text with some other text. I am using below mentioned command to perform the mentioned action but it is saying
"grep: warning: recursive search of stdin" and going to infinite waiting state.
grep -i -r --include=*.xib “$MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQA" myProjectPath | sort | uniq | xargs perl -e “s/$MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQA/$MC4xNTI5NDExODIzIDAuODA3ODQzMjA4MyAwLjE4MDM5MjE2MQA/" -pi
Please let me know where i am going wrong.
Thanx in advance.

The shell is expanding $MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQAas a variable and grep is producing output in the form "file: ", which sort | uniq is not correcting.
grep -l -i -r --include=*.xib '\$MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQA' myProjectPath | xargs perl -pi -e 's/\$MSAwLjMxMTU4NDA0NDMgMC4wOTczNjMxNzM3NQA/\$MC4xNTI5NDExODIzIDAuODA3ODQzMjA4MyAwLjE4MDM5MjE2MQA/' "$file"

grep multiple extension current and subfolders

I'm trying to grep multiple extensions within the current and all sub-folders.
grep -i -r -n 'hello' somepath/*.{php,html}
This is only grepping the current folder but not sub-folders.
What would be a good way of doing this?

Using only grep:
grep -irn --include='*.php' --include='*.html' 'hello' somepath/

One of these:
find '(' -name '*.php' -o -name '*.html' ')' -exec grep -i -n hello {} +
find '(' -name '*.php' -o -name '*.html' ')' -print0 | xargs -0 grep -i -n hello

I was looking the same and when decided to do a bash script I started with vim codesearch and surprise I already did this before!
#!/bin/bash
context="$3"
#ln = line number mt = match mc = file
export GREP_COLORS="sl=32:mc=00;33:ms=05;40;31:ln="
if [[ "$context" == "" ]]; then context=5; fi
grep --color=always -n -a -R -i -C"$context" --exclude='*.mp*'\
--exclude='*.avi'\
--exclude='*.flv'\
--exclude='*.png'\
--exclude='*.gif'\
--exclude='*.jpg'\
--exclude='*.wav'\
--exclude='*.rar'\
--exclude='*.zip'\
--exclude='*.gz'\
--exclude='*.sql' "$2" "$1" | less -R
paste this code into in a file named codesearch and set the chmod to 700 or 770
I guess this could be better here for the next time that I forgot
this script will show with colors the matches and the context around
./codesearch '/full/path' 'string to search'
and optional defining the number of context line around default 5
./codesearch '/full/path' 'string to search' 3
I edited the code and added some eye candy
example ./codesearch ./ 'eval' 2
Looks like this when you have enabled "allow blinking text" in terminal

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

grep matching control -ie vs -ei and patterns - grep

Related

How to grep repeated strings on a single line?

How to grep two patterns at once

How to grep for filenames found by find in other files?

Search and replace in xib file

grep multiple extension current and subfolders

Categories

Resources