How to extract certain part of line that's between quotes - grep

For example if I have file.txt with the following
object = {
'name' : 'namestring',
'type' : 'type',
'real' : 'yes',
'version' : '2.0',
}
and I want to extract just the version so the output is 2.0 how would I go about doing this?

I would suggest that grep is probably the wrong tool for this. Nevertheless, it is possible, using grep twice.
grep 'version' input.txt | grep -Eo '[0-9.]+'
The first grep isolates the line you're interested in, and the second one prints only the characters of the line that match the regex, in this case numbers and periods. For your input data, this should work.
However, this solution is weak in a few areas. It doesn't handle cases where multiple version lines exist, it's hugely dependent on the structure of the file (i.e. I suspect your file would be syntactically valid if all the lines were joined into a single long line). It also uses a pipe, and in general, if there's a way to achieve something with a pipe, and a way without a pipe, you choose the latter.
One compromise might be to use awk, assuming you're always going to have things split by line:
awk '/version/ { gsub(/[^0-9.]/,"",$NF); print $NF; }' input.txt
This is pretty much identical in functionality to the dual grep solution above.
If you wanted to process multiple variables within that section of file, you might do something like the following with awk:
BEGIN {
FS=":";
}
/{/ {
inside=1;
next;
}
/}/ {
inside=0;
print a["version"];
# do things with other variables too
#for(i in a) { printf("i=%s / a=%s\n", i, a[i]); } # for example
delete a;
}
inside {
sub(/^ *'/,"",$1); sub(/' *$/,"",$1); # strip whitespace and quotes
sub(/^ *'/,"",$2); sub(/',$/,"",$2); # strip whitespace and quotes
a[$1]=$2;
}
A better solution would be to use a tool that actually understands the file format you're using.

A simple and clean solution using grep and cut
grep version file.txt | cut -d \' -f4

Related

Parsing apache files grep awk

I'm trying to do a simple script task but I have a very serious lack of knowledge in AWK and I'm not able to understand exactly how to accomplish this silly task.
Basically I have a very big regular vhost.conf with hundreds of domains.
The idea is just iterate or parse this unique file and get a list of ServerName and DocumentRoot.
The file is divided in multiple parts. If I run this command I get an output like:
grep -E "DocumentRoot|ServerName" /etc/httpd/conf.d/vhost-pro.conf | awk '!/#/{print $2}'
/home/webs/t2m/PRO/default
t2m.net
/home/webs/uoc/PRO/default
uoc.com
so...now. How process this output? If I was able to concatenate the path and the domain name into just single one line, maybe I could store in a array or in a file and then just take the info piece by piece. But I simply don't know how to do it.
Any clue or tip about how to proceed?
Thanks!
I would suggest something like this:
awk '/<VirtualHost/ { sn=""; dr="" }
/ServerName/ { sn=$2 }
/DocumentRoot/ { dr = $2 }
/\/VirtualHost/ { print dr, sn }' /etc/httpd/conf.d/vhost-pro.conf
I appreciate your comments. I will pr command as well. I'm going to share how I did it. I was luck to find some kind of awk magic. To be honest, it worked but I don't what I'm doing and this is bad for me :P
I had this output:
/home/webs/t2m/PRO/default
t2m.net
/home/webs/uoc/PRO/default
uoc.com
So what I did to turn it into just one line was:
grep -E "DocumentRoot|ServerName" /etc/httpd/conf.d/vhost-pro.conf | awk '{key=$2;getline;print key " " $2}'
this way my new output is something like:
/home/webs/t2m/PRO/default t2m.net
/home/webs/uoc/PRO/default uoc.com
then I think I will store this into a temp file to iterate after and store it to a var.
Thanks!

How can I find files that match a two-line pattern using grep?

I created a test file with the following:
<cert>
</cert>
I'm now trying to find this with grep and the following command, but it take forever to run.
How can I search quickly for files that contain adjacent lines like these?
tr -d '\n' | grep '<cert></cert>' test.test
So, from the comments, you're trying to get the filenames that contain an empty <cert>..</cert> element. You're using several tools wrong. As #iiSeymour pointed out, tr only reads from standard input-- so if you want to use it to select from lots of filenames, you'll need to use a loop. grep prints out matching lines, not filenames; though you could use grep -l to see the filenames instead.
But you're only joining lines because grep works one line at a time; so let's use a better tool. Here's how to search with awk:
awk '/<cert>/ { started=1; }
/<\/cert>/ { if (started) { print FILENAME; nextfile;} }
!/<cert>/ { started = 0; }' file1 file2 *.txt
It checks each line and keeps track of whether the previous line matched <cert>. (!/pattern/ sets the flag back to zero on lines not matching /pattern/.) Call it with all your files (or with a wildcard like *.txt).
And a friendly suggestion: Next time, try each command separately (you've been stuck on this for hours and you still don't know what grep does?). And have a quick look at the manual for the tools you want to use. Unix tools are usually too complex for simple trial and error.

Grep Filenames from ls for specific part of them

I want to extract a specific part out of the filenames to work with them.
Example:
ls -1
REZ-Name1,Surname1-02-04-2012.png
REZ-Name2,Surname2-07-08-2013.png
....
So I want to get only the part with the name.
How can this be achieved ?
There are several ways to do this. Here's a loop:
for file in REZ-*-??-??-????.png
do
name=${file#*-}
name=${name%-??-??-????.png}
echo "($name)"
done
Given a variety of filenames with all sorts of edge cases from spacing, additional hyphens and line feeds:
REZ-Anna-Maria,de-la-Cruz-12-32-2015.png
REZ-Bjørn,Dæhlie-01-01-2015.png
REZ-First,Last-12-32-2015.png
REZ-John Quincy,Adams-11-12-2014.png
REZ-Ridiculous example # this is one filename
is ridiculous,but fun-22-11-2000.png # spanning two lines
it outputs:
(Anna-Maria,de-la-Cruz)
(Bjørn,Dæhlie)
(First,Last)
(John Quincy,Adams)
(Ridiculous example
is ridiculous,but fun)
If you're less concerned with correctness, you can simplify it further:
$ ls | grep -o '[^-]*,[^-]*'
Maria,de
Bjørn,Dæhlie
First,Last
John Quincy,Adams
is ridiculous,but fun
In this case, cut makes more sense than grep:
ls -l | cut -f2 -d-
cut the second field from the input, using '-' as the field delimiter. That other guy's answer will correctly handle some cases mine will not, but for one off uses, I generally find the semantics of cut to be much easier to remember.

grep from beginning of found word to end of word

I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.

Opposite of "only-matching" in grep?

Is there any way to do the opposite of showing only the matching part of strings in grep (the -o flag), that is, show everything except the part that matches the regex?
That is, the -v flag is not the answer, since that would not show files containing the match at all, but I want to show these lines, but not the part of the line that matches.
EDIT: I wanted to use grep over sed, since it can do "only-matching" matches on multi-line, with:
cat file.xml|grep -Pzo "<starttag>.*?(\n.*?)+.*?</starttag>"
This is a rather unusual requirement, I don't think grep would alternate the strings like that. You can achieve this with sed, though:
sed -n 's/$PATTERN//gp' file
EDIT in response to OP's edit:
You can do multiline matching with sed, too, if the file is small enough to load it all into memory:
sed -rn ':r;$!{N;br};s/<starttag>.*?(\n.*?)+.*?<\/starttag>//gp' file.xml
You can do that with a little help from sed:
grep "pattern" input_file | sed 's/pattern//g'
I don't think there is a way in grep.
If you use ack, you could output Perl's special variables $` and $' variables to show everything before and after the match, respectively:
ack string --output="\$`\$'"
Similarly if you wanted to output what did match along with other text, you could use $& which contains the matched string;
ack string --output="Matched: $&"

Resources