awk version issue - convert hex to decimal - parsing

I usually write scripts on my mac and then once it is ready, I sftp them to my test box at work. The issue I am facing here is that I have a stream of data that is an I.P address in hex format. I am using mix of sed and awk to parse it and convert it into a more readable format.
$echo $content12
cb5c860100000000000000000000000000
[DoD#MBP-13~] echo $content12 |
sed -e 's/../&./g' -e 's/.$//' | sed 's/[0-9a-z][0-9a-z]/0x&/g' |
awk -F"." '{for (i=1;i<NF;i++) printf ("%d\n", $i)}' |
awk '{if (NR<5) printf $0; printf "."}' | sed 's/\.\.*$//'
203.92.134.1
When I ported this to my test box at work, the script did not work as expected.
$echo $content12 |
sed -e 's/../&./g' -e 's/.$//' | sed 's/[0-9a-z][0-9a-z]/0x&/g' |
awk -F"." '{for (i=1;i<NF;i++) printf ("%d\n", $i)}' |
awk '{if (NR<5) printf $0; printf "."}' | sed 's/\.\.*$//'
0.0.0.0
Version of awk and uname on my mac -
[DoD#MBP-13~] awk --version
awk version 20070501
[DoD#MBP-13~] uname -a
Darwin MBP-13.local 11.2.0 Darwin Kernel Version 11.2.0: Tue Aug 9 20:54:00 PDT 2011;
root:xnu-1699.24.8~1/RELEASE_X86_64 x86_64
Version of awk and uname on my test box at work -
$ awk --version
GNU Awk 3.1.5
Copyright (C) 1989, 1991-2005 Free Software Foundation
$uname -a
Linux 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010
x86_64 x86_64 x86_64 GNU/Linux
Is this something I can fix with minor changes. I am still very new to UNIX environment so my one-liner may seem abnormally long to you. Any suggestions would be greatly appreciated.

You can use the --non-decimal-data option of gawk to cause it to handle octal and hex numbers in the input:
$ echo 0x10 | gawk --non-decimal-data '{ printf "%d", $1 }'
16
versus:
$ echo 0x10 | gawk '{ printf "%d", $1 }'
0

In essence this problem boils down to feeding printf a string of parameters.printf is a shell builtin so:
echo "cb5c860100000000000000000000000000" |
sed 's/\(.\{8\}\).*/\1/;s/../"0x&" /g;s/^/printf "%d.%d.%d.%d\n" /'|sh
203.92.134.1
In GNU sed you can evaluate the pattern space, like so:
echo "cb5c860100000000000000000000000000" |
sed 's/\(.\{8\}\).*/\1/;s/../"0x&" /g;s/^/printf "%d.%d.%d.%d" /e'
203.92.134.1
In programming, I've found the hardest thing is not coding but saying what you mean.

Apparently the GNU awk(1) implementation doesn't handle 0x11 as an argument to printf() as you've implemented it:
$ echo cb5c860100000000000000000000000000 | sed -e 's/../&./g' -e 's/.$//' |
sed 's/[0-9a-z][0-9a-z]/0x&/g'
0xcb.0x5c.0x86.0x01.0x00.0x00.0x00.0x00.0x00.0x00.0x00.0x00.0x00.0x00.0x00.0x00.0x00
$ echo cb5c860100000000000000000000000000 | sed -e 's/../&./g' -e 's/.$//' |
sed 's/[0-9a-z][0-9a-z]/0x&/g' |
awk -F"." '{for (i=1;i<NF;i++) printf ("%d\n", $i)}'
0
0
0
...
The mawk(1) installed on my system (by Mike Brennan) -- an alternative to GNU awk(1) that claims to be smaller, faster, and still POSIX 1003.2 (draft 11.3) compliant -- does interpret this as you expected:
$ echo cb5c860100000000000000000000000000 | sed -e 's/../&./g' -e 's/.$//' |
sed 's/[0-9a-z][0-9a-z]/0x&/g' |
mawk -F"." '{for (i=1;i<NF;i++) printf ("%d\n", $i)}' |
mawk '{if (NR<5) printf $0; printf "."}' | sed 's/\.\.*$//'
203.92.134.1$
If you're lucky enough to also have mawk(1) installed and available, this solution may be suitable.

Related

extract part of grep line using regex

I'm using the following command to get java.home path
java -XshowSettings:properties -version 2>&1 > /dev/null | grep 'java.home'
the command above returns
java.home = /usr/lib/jvm/java-11-openjdk-amd64
How can I get it to only return "/usr/lib/jvm/java-11-openjdk-amd64"
Just use awk to cut out the last field:
java -XshowSettings:properties -version 2>&1 >/dev/null | grep 'java.home' | awk '{print $NF}'
Or a little shorter:
java -XshowSettings:properties -version 2>&1 | awk '/java.home/ {print $NF}'

How to grep repeated strings on a single line?

I have this a file.txt with one line, whose content is
/app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
and when I do
cat file.txt | grep -io "Xms.*" | awk '{FS" "; print $1} ' | cut -d "s" -f2
output:
3g
why is grep not reading the second occurrence, i.e. I expect 3g and 8192m.
Infact, how do I print only 8192m in this case?
Your regex just says "find Xms followed by anything repeated 0 to n times". That returns the rest of the row from Xms onward.
What you actually want is something like "find Xms followed by anything until there's a whitespace repeated 0 to n times".
grep -io "Xms[^ ]*" file.txt | awk '{FS" "; print $1} ' | cut -d "s" -f2
In [^ ] the ^ means "not"
I'm not really sure what you are trying to achieve here but if you want the endings of all space-separated strings starting with -Xms, using bare awk is:
$ awk -v RS=" " '/^-Xms/{print substr($0,5)}' file
3g
8192m
Explained:
$ awk -v RS=" " ' # space separated records
/^-Xms/ { # strings starting with -Xms
print substr($0,5) # print starting from 5th position
}' file
If you wanted something else (word repeated in the title puzzles me a bit), please update the question with more detailed requirements.
Edit: I just noticed how do I print only 8192m in this case (that's the repeated maybe). Let's add a counter c and not print the first instance:
$ awk -v RS=" " '/^-Xms/&&++c>1{print substr($0,5)}' file
8192m
You could use grep -io "Xms[0-9]*[a-zA-Z]" instead of grep -io "Xms.*" to match a sequence of digits followed by a single character instead the entire line within a single group:
cat file.txt | grep -io "Xms[0-9]*[a-zA-Z]" | awk '{FS" "; print $1} ' | cut -d "s" -f2
Hope this helps!
The .* in your regexp is matching the rest of the line, you need [^ ]* instead. Look:
$ grep -o 'Xms.*' file
Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/Oracle/Middleware/Oracle_Home/user_projects/domains/O2pPod8_domain/servers/O2pPod8_mapp_msrv1_1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms8192m -Xmx8192m -XX:MaxPermSize=2048m -XX:NewSize=1300m -XX:MaxNewSize=1300m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
$ grep -o 'Xms[^ ]*' file
Xms3g
Xms8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2
3g
8192m
$ grep -o 'Xms[^ ]*' file | cut -d's' -f2 | tail -1
8192m
or more concisely:
$ sed 's/.*Xms\([^ ]*\).*/\1/' file
8192m
The positive lookbehind of PCRE (the form: (?<=RE1)RE2) can resolve the problem easily:
$ grep -oP '(?<=Xms)\S+' file.txt
3g
8192m
Explains:
-o: show only the part of a line matching PATTERN.
-P: PATTERN is a Perl regular expression.
(?<=Xms)\S+: matches all continuous non-whitespace strings which are just following the string Xms.

How to grep two patterns at once

Often times I have to do so commandline thing where I pipe to grep and want matches for two different expressions. ( An OR match A OR B. )
For example I want to grep the output of generate_out for either foo[0-9]+ or bar[0-9]+. I of course could just execute twice:
generate_out| grep "foo[0-9]+"
generate_out| grep "bar[0-9]+"
but often generate_out is expensive and I would rather not want to run it twice ( or store it's output ). Rather I would like to just use one expression:
generate_out| grep "foo[0-9]+ OR bar[0-9]+"
of course this will not work but I would like the equivalent expression which will.
use grep's -e option to specify multiple patterns that are "OR'ed":
$ seq 15 | grep -e 5 -e 3
3
5
13
15
Use an alternation in your regex:
generate_out | grep -E '(foo|bar)[0-9]+'
The use of -E enables ERE features, of this which is one. (By default, grep only supports BRE; some implementations of BRE -- such as GNU's -- may have special syntax for enabling ERE features; in the GNU case, \| in BRE is equivalent to | in ERE; however, it's not portable to rely on such extensions instead of just turning on ERE properly).
egrep is a backwards-compatibility synonym for grep -E; however, only the latter is specified as a requirement by POSIX.
Use awk for simplicity:
generate_out| awk '/foo[0-9]+/ || /bar[0-9]+/'
which of course could be simplified in this particular case to:
generate_out| awk '/(foo|bar)[0-9]+/'
but in general you want to use awk for simple, consistent ORs and ANDs of regexps:
cmd | grep -E 'foo.*bar|bar.*foo'
cmd | awk '/foo/ && /bar/'
cmd | grep 'foo' | grep -v 'bar'
cmd | awk '/foo/ && !/bar/'
cmd | grep -E 'foo|bar'
cmd | awk '/foo/ || /bar/' (or awk '/foo|bar/')
cmd | grep -E 'foo|bar' | grep -E -v 'foo.*bar|bar.*foo'
cmd | awk '(/foo/ && !/bar/) || (/bar/ && !/foo/)'

How do i Extract integer value from a string in Unix

when i type this command
/usr/local/afs7/bin/afs_paftools -a about.afs | grep TOTAL_DOCUMENTS
I get a result
TOTAL_DOCUMENTS = 74195
How i can extract the integer number(74195) after =
using grep command
One way is to use grep:
$ echo "TOTAL_DOCUMENTS = 74195" | grep -o '[0-9]\+'
74195
or since you know, that it's the last field, use awk:
$ echo "TOTAL_DOCUMENTS = 74195" | awk '{print $NF}'
74195
or just use awk for the lot:
your-command -a about.afs | awk '/TOTAL_DOCUMENTS/{print $NF}'
If there are no space:
TOTAL_DOCUMENTS=74195
Use this awk
echo "TOTAL_DOCUMENTS=74195" | awk -F= '{print $NF}'
74195

xargs: String concatenation

zgrep -i XXX XXX | grep -o "RID=[0-9|A-Z]*" |
uniq | cut -d "=" -f2 |
xargs -0 -I string echo "RequestID="string
My output is
RequestID=121212112
8127127128
8129129812
But my requirement is to have the request ID prefixed before all the output.
Any help is appreciated
I had a similar task and this worked for me. It might be what you are looking for:
zgrep -i XXX XXX | grep -o "RID=[0-9|A-Z]*" |
uniq | cut -d "=" -f2 |
xargs -I {} echo "RequestID="{}
Try -n option of xargs.
-n max-args
Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s option)
is exceeded,
unless the -x option is given, in which case xargs will exit.
Example:
$ echo -e '1\n2' | xargs echo 'str ='
str = 1 2
$ echo -e '1\n2' | xargs -n 1 echo 'str ='
str = 1
str = 2

Resources