how to match a whole substring with delimiter in bash - grep

I have a String like this
"ABCD EFGH IJKL MNOP"
I get user input to match with each of these substrings as whole and not part.
How do I do this?
This is what I am doing currently
printf "\n Enter user input:"
read userinput
INPUT=$userinput
if echo "ABCD EFGH IJKL MNOP" | grep -q "$INPUT"; then
echo "Matching...";
else
echo "Invalid entry ";
fi
The problem with the above code is it will match a partial substring like "ABC",
"GH" etc which I do not want. I just need the user input to compare with whole substrings separated by delimiter.

Use -w to match an entire word in grep
echo "ABCD EFGH IJKL MNOP" | grep -w "$INPUT";
Example
>>> INPUT=ABC
>>> echo "ABCD EFGH IJKL MNOP" | grep -w "$INPUT";
>>>
>>> INPUT=ABCD
>>> echo "ABCD EFGH IJKL MNOP" | grep -w "$INPUT";
ABCD EFGH IJKL MNOP

grep -w
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
Similar link: grep -w with only space as delimiter

Related

How to display two substring from a line?

How can we find two substrings within a line in particular order using grep?
For example:
grep -c "word1" | grep -r "word2" logs
gives if string has both word1 and word2. I am looking for string which has "... word1.... word2..."
Try a regex in grep like grep -E "word1.*word2"
$ echo -e 'both word1 and word2. \nI hich\n has "... word1.... word2..."' | grep -E "word1.*word2"
both word1 and word2.
has "... word1.... word2..."
You may need a better regex to match exactly the words, but that is not your question.

Why these patterns return same result?

I saw this question: count (non-blank) lines-of-code in bash
I understand this pattern is correct.
grep -vc ^$ filename
Why this pattern returns same result?
grep -c '[^ ]' filename
What is trick in '[^ ]'?
$ printf 'foo 123\n \nxyz\n\t\n' > ip.txt
$ cat -T ip.txt
foo 123
xyz
^I
$ grep -vc '^$' ip.txt
4
$ grep -c '[^ ]' ip.txt
3
$ grep -c '[^[:blank:]]' ip.txt
2
grep -c '[^ ]' counts any line that has a non-space character. For example, foo 123 will be counted since alphabets are not space characters. So, which one to use depends on whether a line containing only space characters should be counted or not.

How to grep repeated strings on a single line?

I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?
Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"
I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m
You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!
The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m
The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.

Grep find pattern but print another line

I want to match all cyrillic characters, but print the ID to file. For example:
Author: Doe, John
Title: Оптимизация ресурсного потенциала промышленности города с учетом его конкурентных преимуществ
ID: 1234567
My current approach is to grep for cyrillic characters:
grep -i -r --include=*{rdf,redif,rdf~} --color="auto" -P -n '[\x{0400}-\x{04FF}]' > cyrillic.txt
How can I just print the ID line to a file and not the matching line?
Use -A1 option if the ID: line is right after the matching pattern. Then pipe it to another grep to get the line with ID:.
grep -A1 -i -r --include=*{rdf,redif,rdf~} --color="auto" -P -n '[\x{0400}-\x{04FF}]' \
| grep 'ID: ' > cryllic.txt
Use grep flag h - to suppress output file names - you'll have output like:
4:string with matching pattern
5:string with matching pattern
7:string with matching pattern
Now you can pipe this output into awk and print only fist column, which is matching string number:
{your_grep} | awk -F ':' '{print $1}' > cyrillic.txt

grep is not working inside while loop

I have two files
File1
area a
area b
areaf
File2
area a :aaaa
area b:bbbb
area3:abc
areaf:hsg
area4:uhg
area5:yutr
while read -r line
do
grep -w ^line File2 | cut -d ":" -f2
done < File1
Desired output
aaaa
bbbb
hsg
actual output
grep: can't open a
area a
grep: cant open b
area3:abc
areaf:hsg
area4:uhg
area5:yutr
but when i run grep -w ^"area a" File2 | cut -d ":" -f2 it is giving the correct output :
aaaa
Please assist me on this. i tried for loop also. no success. grep is not working inside loop.
Your variable line might contain "special characters". For example, a space that might be interpreted as a separator by the shell. Or some characters that might be interpreted as pattern metacharacter by grep.
You both need to use fgrep and to quote your variable (I'm not sure -w add anything to that command -- why do you feel the need of it?):
fgrep -w "$line"
But doing so you loose the ability to locate "the first character"
An other option if the "start of line" match is required is to escape the search string:
while read -r line
do
line=$(echo "$line" | sed -e 's/[]\/$*.^|[]/\\&/g')
grep -w "^$line" File2 | cut -d ":" -f2
done < File1
You can achieve the same result without a loop, since grep can read patterns from a file via the -f option. This will be more robust:
grep -f input1 input2 | cut -d: -f2
Gives:
aaaa
bbbb
hsg

Resources