Parsing apache files grep awk - parsing

I'm trying to do a simple script task but I have a very serious lack of knowledge in AWK and I'm not able to understand exactly how to accomplish this silly task.
Basically I have a very big regular vhost.conf with hundreds of domains.
The idea is just iterate or parse this unique file and get a list of ServerName and DocumentRoot.
The file is divided in multiple parts. If I run this command I get an output like:
grep -E "DocumentRoot|ServerName" /etc/httpd/conf.d/vhost-pro.conf | awk '!/#/{print $2}'
/home/webs/t2m/PRO/default
t2m.net
/home/webs/uoc/PRO/default
uoc.com
so...now. How process this output? If I was able to concatenate the path and the domain name into just single one line, maybe I could store in a array or in a file and then just take the info piece by piece. But I simply don't know how to do it.
Any clue or tip about how to proceed?
Thanks!

I would suggest something like this:
awk '/<VirtualHost/ { sn=""; dr="" }
/ServerName/ { sn=$2 }
/DocumentRoot/ { dr = $2 }
/\/VirtualHost/ { print dr, sn }' /etc/httpd/conf.d/vhost-pro.conf

I appreciate your comments. I will pr command as well. I'm going to share how I did it. I was luck to find some kind of awk magic. To be honest, it worked but I don't what I'm doing and this is bad for me :P
I had this output:
/home/webs/t2m/PRO/default
t2m.net
/home/webs/uoc/PRO/default
uoc.com
So what I did to turn it into just one line was:
grep -E "DocumentRoot|ServerName" /etc/httpd/conf.d/vhost-pro.conf | awk '{key=$2;getline;print key " " $2}'
this way my new output is something like:
/home/webs/t2m/PRO/default t2m.net
/home/webs/uoc/PRO/default uoc.com
then I think I will store this into a temp file to iterate after and store it to a var.
Thanks!

Related

How can I find files that match a two-line pattern using grep?

I created a test file with the following:
<cert>
</cert>
I'm now trying to find this with grep and the following command, but it take forever to run.
How can I search quickly for files that contain adjacent lines like these?
tr -d '\n' | grep '<cert></cert>' test.test
So, from the comments, you're trying to get the filenames that contain an empty <cert>..</cert> element. You're using several tools wrong. As #iiSeymour pointed out, tr only reads from standard input-- so if you want to use it to select from lots of filenames, you'll need to use a loop. grep prints out matching lines, not filenames; though you could use grep -l to see the filenames instead.
But you're only joining lines because grep works one line at a time; so let's use a better tool. Here's how to search with awk:
awk '/<cert>/ { started=1; }
/<\/cert>/ { if (started) { print FILENAME; nextfile;} }
!/<cert>/ { started = 0; }' file1 file2 *.txt
It checks each line and keeps track of whether the previous line matched <cert>. (!/pattern/ sets the flag back to zero on lines not matching /pattern/.) Call it with all your files (or with a wildcard like *.txt).
And a friendly suggestion: Next time, try each command separately (you've been stuck on this for hours and you still don't know what grep does?). And have a quick look at the manual for the tools you want to use. Unix tools are usually too complex for simple trial and error.

Match pattern ending with a certain character in grep

This is a common problem I encounter when using grep. Say the pattern is 'chr1' in a third column of a file, when I do the following:
grep 'chr1' file
How can I avoid getting the results including chr10, chr11, chr13 etc as well?
Thanks!
It seems this works:
grep -w 'chr1' file
Since you're interested in values in specific columns, you're much better off using awk:
awk '$3 == "chr1"' file

How to extract certain part of line that's between quotes

For example if I have file.txt with the following
object = {
'name' : 'namestring',
'type' : 'type',
'real' : 'yes',
'version' : '2.0',
}
and I want to extract just the version so the output is 2.0 how would I go about doing this?
I would suggest that grep is probably the wrong tool for this. Nevertheless, it is possible, using grep twice.
grep 'version' input.txt | grep -Eo '[0-9.]+'
The first grep isolates the line you're interested in, and the second one prints only the characters of the line that match the regex, in this case numbers and periods. For your input data, this should work.
However, this solution is weak in a few areas. It doesn't handle cases where multiple version lines exist, it's hugely dependent on the structure of the file (i.e. I suspect your file would be syntactically valid if all the lines were joined into a single long line). It also uses a pipe, and in general, if there's a way to achieve something with a pipe, and a way without a pipe, you choose the latter.
One compromise might be to use awk, assuming you're always going to have things split by line:
awk '/version/ { gsub(/[^0-9.]/,"",$NF); print $NF; }' input.txt
This is pretty much identical in functionality to the dual grep solution above.
If you wanted to process multiple variables within that section of file, you might do something like the following with awk:
BEGIN {
FS=":";
}
/{/ {
inside=1;
next;
}
/}/ {
inside=0;
print a["version"];
# do things with other variables too
#for(i in a) { printf("i=%s / a=%s\n", i, a[i]); } # for example
delete a;
}
inside {
sub(/^ *'/,"",$1); sub(/' *$/,"",$1); # strip whitespace and quotes
sub(/^ *'/,"",$2); sub(/',$/,"",$2); # strip whitespace and quotes
a[$1]=$2;
}
A better solution would be to use a tool that actually understands the file format you're using.
A simple and clean solution using grep and cut
grep version file.txt | cut -d \' -f4

How can I grep a file for multiple unique values?

I have some firewall logs and I want to find multiple unique values. I need to find every unique combination of source IP and destination port, which are in this format in /var/log/iptables.
SRC=123.123.123.123
DPT=137
So, if source IP 123.123.123.123 makes multiple appearances on multiple ports, I want to see that but, just once for each SRC/DPT combo.
Thanks!
This awk solution might help. The first awk command combines each pair of successive SRC and DPT lines into a single line. The output from this command is then piped to the second awk command, which provides uniquefied output, retaining original order
awk '/^SRC|^DPT/{ORS=$0 ~ /^SRC/?" ":"\n"; print}' file.* | awk '!a[$0]++'
If multiple SRC, DPT entries exist per line, the following should work
grep -oE 'SRC=[[:digit:].]+[[:space:]]+DPT=[[:digit:].]+' file.txt | awk '!a[$0]++'
You can try "grep AND", see examples from the link:
http://www.thegeekstuff.com/2011/10/grep-or-and-not-operators/

Recursively grep results and pipe back

I need to find some matching conditions from a file and recursively find the next conditions in previously matched files , i have something like this
input.txt
123
22
33
The files where you need to find above terms in following files, the challenge is if 123 is found in say 10 files , the 22 should be searched in these 10 files only and so on...
Example of files are like f1,f2,f3,f4.....f1200
so it is like i need to grep -w "123" f* | grep -w "123" | .....
its not possible to list them manually so any easier way?
You can solve this using awk script, i ve encountered a similar problem and this will work fine
awk '{ if(!NR){printf("grep -w %d f*|",$1)} else {printf("grep -w %d f*",$1)} }' input.txt | sh
What it Does?
it reads input.txt line by line
until it is at last record , it prints grep -w %d | (note there is a
pipe here)
which is then sent to shell for execution and results are piped back
to back
and when you reach the end the pipe is avoided
Perhaps taking a meta-programming viewpoint would help. Have grep output a series of grep commands. Or write a little PERL program. Maybe Ruby, if the mood suits.
You can use grep -lw to write the list of file names that matched (note that it will stop after finding the first match).
You capture the list of file names and use that for the next iteration in a loop.

Resources