Linux Text Extraction GREP - grep

I am running a Teamspeak 3 server on a Ubuntu server and I would like to fetch the clients currently connected using a script.
The script currently outputs this from the Teamspeak Server Query:
clid=1 cid=11 client_database_id=161 client_nickname=Music client_type=1|clid=3 cid=11 client_database_id=153 client_nickname=Music\sBot client_type=0|clid=5 cid=1 client_database_id=68 client_nickname=Unknown\sfrom\s127.0.0.1:52537 client_type=1|clid=12 cid=11 client_database_id=3 client_nickname=FriendlyMan client_type=0|clid=16 cid=11 client_database_id=161 client_nickname=Windows\s10\sUser client_type=0|clid=20 cid=11 client_database_id=225 client_nickname=3C2J0N47H4N client_type=0
How can I extract the nicknames from this mess?
More Specifically only the ones that contain "client_type=0".
Played around with GREP (grep -E -o 'client_nickname=\w+'), close to what I want.
client_nickname=Music
client_nickname=Music
client_nickname=Unknown
client_nickname=FriendlyMan
client_nickname=Windows
client_nickname=3C2J0N47H4N
Desired Output:
Music Bot,FriendlyMan,Windows 10 User,3C2J0N47H4N

Our input consists of a single line:
$ cat file
clid=1 cid=11 client_database_id=161 client_nickname=Music client_type=1|clid=3 cid=11 client_database_id=153 client_nickname=Music\sBot client_type=0|clid=5 cid=1 client_database_id=68 client_nickname=Unknown\sfrom\s127.0.0.1:52537 client_type=1|clid=12 cid=11 client_database_id=3 client_nickname=FriendlyMan client_type=0|clid=16 cid=11 client_database_id=161 client_nickname=Windows\s10\sUser client_type=0|clid=20 cid=11 client_database_id=225 client_nickname=3C2J0N47H4N client_type=0
Using grep + sed
Here is one approach that starts with grep and then uses sed to cleanup to the final format:
$ grep -oP '(?<=client_nickname=)[^=]+(?=client_type=0)' file | sed -nE 's/\\s/ /g; H;1h; ${x; s/ *\n/,/g;p}'
Music Bot,FriendlyMan,Windows 10 User,3C2J0N47H4N
Using awk
Here is another approach that just uses awk:
$ awk -F'[= ]' '/client_type=0/{gsub(/\\s/, " ", $8); printf (f?",":"")$8; f=1} END{print ""}' RS='|' file
Music Bot,FriendlyMan,Windows 10 User,3C2J0N47H4N
The awk code uses | as the record separator and awk reads in one record at a time. Each record is divided into fields with the field separator being either a space or an equal sign. If the record contains the text client_type=0, then we replace all occurrences of \s in field 8 with space and then print the resulting field 8.
Using bash
#!/bin/bash
sep=
( cat file; echo "|"; ) | while read -r -d\| clid cid db name type misc
do
[ "$type" = "client_type=0" ] || continue
name=${name//\\s/ }
printf "%s%s" "$sep" "${name#client_nickname=}"
sep=,
done
echo ""
This produces the output:
Music Bot,FriendlyMan,Windows 10 User,3C2J0N47H4N

Related

print the rest of input along with matching line

I am new to linux and I am experimenting with basic terminal commands. I found out that I can list all users using compgen -u but what if I only want to display the bottom line outputs ?
Ok lets say the output of compgen -u goes like this:
extra
extra
extra
extra
extra
extra
extra
extra
extra
John
William
Kate
Harold
I can only use grep to find a single text (ex. compgen -u | grep John). But what if I want to use grep to display John as well as all the remaining entries after it ?
sed or awk solution would be easier, but if you can only use grep, then the option --after-context (or -A) might do:
grep -A 5 John file
The drawback is that you need to know the number of lines to display after the matching (or use an arbitrary big number for the rest of the file).
compgen -u | grep -A$(compgen -u| wc -l) John
Explanation:
From man grep
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines. Places a line containing a group separator (described under --group-separator) between
contiguous groups of matches.
grep -A -- print number of rows after pattern
$() -- Execute unix command
compgen -u| wc -l --> Get total number of rows of output of command.
You can use the following one-liner :
n=$( compgen -u | grep -n John | head -1 | cut -d ":" -f 1 ) && compgen -u | tail -n +$n
This finds out the line number for first occurrence of John, and prints everything starting that line.

Cutting a length of specific string with grep

Let's say we have a string "test123" in a text file.
How do we cut out "test12" only or let's say there is other garbage behind "test123" such as test123x19853 and we want to cut out "test123x"?
I tried with grep -a "test123.\{1,4\}" testasd.txt and so on, but just can't get it right.
I also looked for example, but never found what I'm looking for.
expr:
kent$ x="test123x19853"
kent$ echo $(expr "$x" : '\(test.\{1,4\}\)')
test123x
What you need is -o which print out matched things only:
$ echo "test123x19853"|grep -o "test.\{1,4\}"
test123x
$ echo "test123x19853"|grep -oP "test.{1,4}"
test123x
-o, --only-matching show only the part of a line matching PATTERN
If you are ok with awkthen try following(not this will look for continuous occurrences of alphabets and then continuous occurrences of digits, didn't limit it to 4 or 5).
echo "test123x19853" | awk 'match($0,/[a-zA-Z]+[0-9]+/){print substr($0,RSTART,RLENGTH)}'
In case you want to look for only 1 to 4 digits after 1st continuous occurrence of alphabets then try following(my awk is old version so using --re-interval you could remove it in case you have latest version of ittoo).
echo "test123x19853" | awk --re-interval 'match($0,/[a-zA-Z]+[0-9]{1,4}/){print substr($0,RSTART,RLENGTH)}'

Grep: Capture just number

I am trying to use grep to just capture a number in a string but I am having difficulty.
echo "There are <strong>54</strong> cities | grep -o "([0-9]+)"
How am I suppose to just have it return "54"? I have tried the above grep command and it doesn't work.
echo "You have <strong>54</strong>" | grep -o '[0-9]' seems to sort of work but it prints
5
4
instead of 54
Don't parse HTML with regex, use a proper parser :
$ echo "There are <strong>54</strong> cities " |
xmllint --html --xpath '//strong/text()' -
OUTPUT:
54
Check RegEx match open tags except XHTML self-contained tags
You need to use the "E" option for extended regex support (or use egrep). On my Mac OSX:
$ echo "There are <strong>54</strong> cities" | grep -Eo "[0-9]+"
54
You also need to think if there are going to be more than one occurrence of numbers in the line. What should be the behavior then?
EDIT 1: since you have now specified the requirement to be a number between <strong> tags, I would recommend using sed. On my platform, grep does not have the "P" option for perl style regexes. On my other box, the version of grep specifies that this is an experimental feature so I would go with sed in this case.
$ echo "There are <strong>54</strong> 12 cities" | sed -rn 's/^.*<strong>\s*([0-9]+)\s*<\/strong>.*$/\1/p'
54
Here "r" is for extended regex.
EDIT 2: If you have the "PCRE" option in your version of grep, you could also utilize the following with positive lookbehinds and lookaheads.
$ echo "There are <strong>54 </strong> 12 cities" | grep -o -P "(?<=<strong>)\s*([0-9]+)\s*(?=<\/strong>)"
54
RegEx Demo

Use awk to parse and modify every CSV field

I need to parse and modify a each field from a CSV header line for a dynamic sqlite create table statement. Below is what works from the command line with the appropriate output:
echo ",header1,header2,header3"| awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}'
,header1 text ,header2 text ,header3 text
Well, it breaks when it is run from within a bash shell script. I got it to work by writing the output to a file like below:
echo $optionalHeaders | awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}' > optionalHeaders.txt
This sucks! There are a lot of examples that show how to parse/modify specific Nth fields. This issue requires each field to be modified. Is there a more concise and elegant Awk one liner that can store its contents to a variable rather than writing to a file?
sed is usually the right tool for simple substitutions on a single line. Take your pick:
$ echo ",header1,header2,header3" | sed 's/[^,][^,]*/& text/g'
,header1 text,header2 text,header3 text
$ echo ",header1,header2,header3" | sed -r 's/[^,]+/& text/g'
,header1 text,header2 text,header3 text
The last 1 above requires GNU sed to use EREs instead of BREs. You can do the same in awk using gsub() if you prefer:
$ echo ",header1,header2,header3" | awk '{gsub(/[^,]+/,"& text")}1'
,header1 text,header2 text,header3 text
I found the problem and it was me... I forgot to echo the contents of the variable to the Awk command. Brianadams comment was so simple that forced me to re-look at my code and find the problem! Thanks!
I am ok with resolving this but if anyone wants to propose a more concise and elegant Awk one liner - that would be cool.
You can try the following:
#! /bin/bash
header=",header1,header2,header3"
newhead=$(awk 'BEGIN {FS=OFS=","}; {for(i=2;i<=NF;i++) $i=$i" text"}1' <<<"$header")
echo "$newhead"
with output:
,header1 text,header2 text,header3 text
Instead of modifying fields one by one, another option is with a simple substitution:
echo ",header1,header2,header3" | awk '{gsub(/[^,]+/, "& text", $0); print}'
That is, replace a sequence of non-comma characters with text appended.
Another alternative would be replacing the commas, but due to the irregularities of your header line (first comma must be left alone, no comma at the end), that's a bit less easy:
echo ",header1,header2,header3" | awk '{gsub(/,/, " text,", $0); sub(/^ text,/, "", $0); print $0 " text"}'
Btw, the rough equivalent of the two commands in sed:
echo ",header1,header2,header3" | sed -e 's/[^,]\{1,\}/& text/g'
echo ",header1,header2,header3" | sed -e 's/\(.\),/\1 text,/g' -e 's/$/ text/'

Ignoring directories from a file

I am in the process of creating a script that lists all files opened via lsof output. I would like to checksum specific files and ignore directories from that output but am at a loss to do so EFFECTIVELY. For example: (I'm using FreeBSD btw)
lsof | awk '/\//{print $9}' | sort -u | head -n 5
prints:
/
/bin/sleep
/dev/bpf
What I'd like to do is: FROM that output, ignore any directories and perform an md5 on FILES (not directories).
Any pointers?
Give a try to following perl command:
lsof | perl -MDigest::MD5=md5_hex -ane '
$f = $F[ $#F ];
-f $f and printf qq|%s %s\n|, $f, md5_hex( $f )
'
It filters lsof output to plain files (-f). Take a look into perlfunc to change it to add different kind of files.
It outputs each file and its md5 separated by a space character. An example in my system is like:
/usr/lib/libm-2.17.so a2d3b2de9a1f59fb99427714fefb49ca
/usr/lib/libdl-2.17.so d74d8ac16c2d13128964353d4be7061a
/usr/lib/libnsl-2.17.so 34b6909ec60c337c21b044642b9baa3d
/usr/lib/ld-2.17.so 3d0e7b5b5c4e59c5c4b6a858cc79fcf1
/usr/sbin/lsof b9b8fbc8f296e47969713f6369d97c0d
/usr/lib/locale/locale-archive 3ea56273193198a718b9a5de33d553db
/usr/lib/libc-2.17.so ba51eeb4025b7f5d7f400f1968f4b5f9
/usr/lib/ld-2.17.so 3d0e7b5b5c4e59c5c4b6a858cc79fcf1
...

Resources