Why is \s not matching whitespace in my grep? [duplicate] - grep

Both of the regexes below work In my case.
grep \s
grep ^[[:space:]]
However all those below fail. I tried both in git bash and putty.
grep ^\s
grep ^\s*
grep -E ^\s
grep -P ^\s
grep ^[\s]
grep ^(\s)
The last one even produces a syntax error.
If I try ^\s in debuggex it works.
Debuggex Demo
How do I find lines starting with whitespace characters with grep ? Do I have to use [[:space:]] ?

grep \s works for you because your input contains s. Here, you escape s and it matches the s, since it is not parsed as a whitespace matching regex escape. If you use grep ^\\s, you will match a string starting with whitespace since the \\ will be parsed as a literal \ char.
A better idea is to enable POSIX ERE syntax with -E and quote the pattern:
grep -E '^\s' <<< "$s"
See the online demo:
s=' word'
grep ^\\s <<< "$s"
# => word
grep -E '^\s' <<< "$s"
# => word

Related

How to display two substring from a line?

How can we find two substrings within a line in particular order using grep?
For example:
grep -c "word1" | grep -r "word2" logs
gives if string has both word1 and word2. I am looking for string which has "... word1.... word2..."
Try a regex in grep like grep -E "word1.*word2"
$ echo -e 'both word1 and word2. \nI hich\n has "... word1.... word2..."' | grep -E "word1.*word2"
both word1 and word2.
has "... word1.... word2..."
You may need a better regex to match exactly the words, but that is not your question.

How to grep with regex lookahead

I can't see what I'm missing in my grep command, can you?
http://regexr.com/5shri
echo "2021-05-09 15:38:56.888 T:1899877296 NOTICE: VideoPlayer::OpenFile:plugin://plugin.video.arteplussept/play/SHOW/069083-002-A" | grep -oE "\w+(?=\/play)/g" -
Expect: arteplussept
You need to
Use the PCRE regex engine, with -P option, not -E (which stands for POSIX ERE)
Remove /g, grep -o extracts all matches and there is no need to "embed" this modifier into the pattern
There is no need to escape /
So, you can just use
grep -oP '\w+(?=/play)'

How to grep repeated strings on a single line?

I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?
Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"
I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m
You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!
The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m
The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.

repesent digits using regular expression

grep -w "ing_[0-9][0-9][0-9][0-9]"
The command mentioned above is working. But is there a short version of 4 digits?
This does not work:
grep -w "ing_[0-9]\+ {4}"
Grep by default use Basic Regular expressions. In BRE , you need to escape the curly braces so that it would consider the curly braces as repetition quantifier.
grep -w "ing_[0-9]\{4\}" file
Example:
$ echo 'ing_6786 says' | grep -w "ing_[0-9]{4}"
$ echo 'ing_6786 says' | grep -w "ing_[0-9]\{4\}"
ing_6786 says
If you are lucky and your grep supports modern (Perl) regular expressions, try -P argument
grep -wP "ing_[0-9]{4}"

Match specific word with grep -e

I am trying to use grep to print only lines that start with a specific pattern. Here is an example
$SERVER_IP = 2.2.2.2
$SERVER_IP_PORT = 1111
$SERVER_IP_XXX = blablabla
I want grep to print only SERVER_IP = 2.2.2.2 and not the other three lines.
I tried the command below but it did not work
grep -e "^\s*\$SERVER_IP$"
If I try:
grep -e "^\s*\$SERVER_IP"
grep will print all three lines
How can I accomplish this using grep -e or egrep? Thank you
grep -e "^\s*\$SERVER_IP\>"
The \> means "word-boundary", or "place where word characters meet non-word characters."
Use grep -e '^\$SERVER_IP =' to match any line that starts with $SERVER_IP =
If you have awk, you can do:
awk '$1=="$SERVER_IP"' file
$SERVER_IP = 2.2.2.2
The == makes it match only while field 1 is exact $SERVER_IP

Resources