How to grep repeated strings on a single line?

How to grep repeated strings on a single line? - grep

I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?

Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"

I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m

You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!

The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m

The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.

Related

Why these patterns return same result?

I saw this question: count (non-blank) lines-of-code in bash
I understand this pattern is correct.
grep -vc ^$ filename
Why this pattern returns same result?
grep -c '[^ ]' filename
What is trick in '[^ ]'?

$ printf 'foo 123\n \nxyz\n\t\n' > ip.txt
$ cat -T ip.txt
foo 123
xyz
^I
$ grep -vc '^$' ip.txt
4
$ grep -c '[^ ]' ip.txt
3
$ grep -c '[^[:blank:]]' ip.txt
2
grep -c '[^ ]' counts any line that has a non-space character. For example, foo 123 will be counted since alphabets are not space characters. So, which one to use depends on whether a line containing only space characters should be counted or not.

grep for pattern with special character and output only matched string

Team,
I want to grep for a substring container - and then only output that string and not whole line. how can i? I know i can awk on space and pull using $ but want to know how to do in grep?
echo $test_pods_info | grep -F 'test-'
output
test-78ac951e-89a6-4199-87a4-db8a1b8b054f export-9b55f0d5-071d-431-1d2ux0-avexport-xavierisp-sjc4--a4dd85-102 1/1 Running 0 19h
expected output
test-78ac951e-89a6-4199-87a4-db8a1b8b054f

awk is more suitable for this as you want to get first field in a matching line:
awk '/test-/{print $1}' <<< "$taxIncluded"
test-78ac951e-89a6-4199-87a4-db8a1b8b054f
If you really want to use grep then this might be what you're looking for:
grep -o 'test-\S*' <<< "$taxIncluded"
or:
grep -o 'test-[^[:space:]]*' <<< "$taxIncluded"

Try
echo $test_pods_info | grep -o 'test-'
the -o option is:
show[ing] only the part of a line matching PATTERN
according to grep --help. Of course, this will only print test-, so you'll need to rework your regex:
grep -oE '(test).*[[:space:]]\b'

Figured it out..
echo $test_pods_info | grep -o "\test-\w*-\w*\-\w*\-\w*\-\w*"
outoput
test-78ac951e-89a6-4199-87a4-db8a1b8b054f
but i wish there is simple way. like \test-*\

How to grep lines non-repeatedly for same command?

I have a space-separated file that looks like this:
$ cat in_file
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004927566.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004919950.1 FAD_binding_3
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 FAD_binding_3
I am using the following shell script utilizing grep to search for strings:
$ cat search_script.sh
grep "GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1" Pfam_anntn_temp.txt
grep "GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1" Pfam_anntn_temp.txt
The problem is that I want each grep command to return only the first instance of the string it finds exclusive of the previous identical grep command's output.
I need an output which would look like this:
$ cat out_file
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 Chal_sti_synt_C
GCF_000046845.1_ASM4684v1_protein.faa WP_004920342.1 FAD_binding_3
in which line 1 is exclusively the output of the first grep command and line 2 is exclusively the output of the second grep command. How do I do it?
P.S. I am running this on a big file (>125,000 lines). So, search_script.sh is mostly composed of unique grep commands. It is the identical commands' execution that is messing up my downstream analysis.

I'm assuming you are generating search_script.sh automatically from the contents of in_file. If you can count how many times you'll repeat the same grep command you can just use grep once and use head, for example if you know you'll be using it 2 times:
grep "foo" bar.txt | head -2
Will output the first 2 occurrences of "foo" in bar.txt.
If you have to do the grep commands separately, for example if you have other code in between the grep commands, you can mix head and tail:
grep "foo" bar.txt | head -1 | tail -1
Some other commands...
grep "foo" bar.txt | head -2 | tail -1
head -n displays the first n lines of the input
tail -n displays the last n lines of the input
If you really MUST always use the same command, but ensure that the outputs always differ, the only way I can think of to achieve this is using temporary files and a complex sequence of commands:
cat foo.bar.txt.tmp 2>&1 | xargs -I xx echo "| grep -v \\'xx\\' " | tr '\n' ' ' | xargs -I xx sh -c "grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp"
So to explain this command, given foo as a search string and bar.txt as the filename, then foo.bar.txt.tmp is a unique name for a temporary file. The temporary file will hold the strings that have already been output:
cat foo.bar.txt.tmp 2>&1 : outputs the contents of the temporary file. If none is present, will output an error message to stdout, (important because if the output was empty the rest of the command wouldn't work.)
xargs -I xx echo "| grep -v \\'xx\\' " adds | grep -v to the start of each line in the temporary file, grep -v something excludes lines that include something.
tr '\n' ' ' replaces newlines with spaces, to have on a single string a sequence of grep -vs.
xargs -I xx sh -c "grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp" runs a new command, grep 'foo' bar.txt xx | head -1 | tee -a foo.bar.txt.tmp, replacing xx with the previous output. xx should be the sequence of grep -vs that exclude previous outputs.
head -1 makes sure only one line is output at a time
tee -a foo.bar.txt.tmp appends the new output to the temporary file.
Just be sure to clear the temporary files, rm *.tmp, at the end of your script.

If I am getting question right and you want to remove duplicates based on last field of each line then try following(this should be easy task for awk).
awk '!a[$NF]++' Input_file

grep -v under double quotes query

We have a portion of code which states,
"diff file1 file2 | /usr/bin/grep -v "#" | /usr/bin/grep ^\> | /usr/bin/awk '{print $3}' | /usr/bin/xargs mkdir"
The whole statement is enclosed in double quotes(is a requirement of the application syntax). When the application reaches this stage , it gives the grep error.
This statement works well on the command line. But through application, gives error for grep.
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
So not sure if it is first grep or second grep which is a problem.

Seems like a problem with double quotes. Try changing your first grep to /usr/bin/grep -v '#' and the second grep to /usr/bin/grep '^>'

You are using grep -v ^> and > means "redirect".
If you for example do:
grep ^>output
all the output will be stored in the file output.
So what you need to do is to quote ^> so that it is interpreted as the pattern you are looking for:
"diff file1 file2 | /usr/bin/grep -v "#" | /usr/bin/grep "^>" | /usr/bin/awk '{print $3}' | /usr/bin/xargs mkdir"
^ ^
By the way, note all your greps can be reduced like this:
diff file1 file2 | awk '/#/ || /^>/ {print $3}' | /usr/bin/xargs mkdir
^^^ ^^ ^^^^
either contains # | |
or starts with >

grep is not working inside while loop

I have two files
File1
area a
area b
areaf
File2
area a :aaaa
area b:bbbb
area3:abc
areaf:hsg
area4:uhg
area5:yutr
while read -r line
do
grep -w ^line File2 | cut -d ":" -f2
done < File1
Desired output
aaaa
bbbb
hsg
actual output
grep: can't open a
area a
grep: cant open b
area3:abc
areaf:hsg
area4:uhg
area5:yutr
but when i run grep -w ^"area a" File2 | cut -d ":" -f2 it is giving the correct output :
aaaa
Please assist me on this. i tried for loop also. no success. grep is not working inside loop.

Your variable line might contain "special characters". For example, a space that might be interpreted as a separator by the shell. Or some characters that might be interpreted as pattern metacharacter by grep.
You both need to use fgrep and to quote your variable (I'm not sure -w add anything to that command -- why do you feel the need of it?):
fgrep -w "$line"
But doing so you loose the ability to locate "the first character"
An other option if the "start of line" match is required is to escape the search string:
while read -r line
do
line=$(echo "$line" | sed -e 's/[]\/$*.^|[]/\\&/g')
grep -w "^$line" File2 | cut -d ":" -f2
done < File1

You can achieve the same result without a loop, since grep can read patterns from a file via the -f option. This will be more robust:
grep -f input1 input2 | cut -d: -f2
Gives:
aaaa
bbbb
hsg

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart