grep pattern match with begin and end - grep

I have the following text (single line) returned from a call to an API:
data=$(gcloud dns record-sets list --zone=production-internal | grep proj-name-name-dp)
echo $data
proj-name-name-dp.int.proj-name.abc.title.com. CNAME 300 proj-name-name-dp.int.proj-name.abc.title.com.
However I would like to get just proj-name-name-dp.int.proj-name.abc.title.com
Everything from the dot after com should not be stored in data variable.
grep -o didn't help.
Any help is appreciated.
Thanks

If you are ok with awk then could you please try following.
data=$(gcloud dns record-sets list --zone=production-internal | awk '/proj-name-name-dp/{sub(/\.com.*/,".com")} 1')

Related

grep -o search stop at first instance of second expression, rather then last? Greedy?

Not sure who to phrase this question
This is an example line.
30/Oct/2019:00:17:22 +0000|v1|177.95.140.78|www.somewebsite.com|200|162512|-|-|0.000|GET /product/short-velvet-cloak-with-hood/?attribute_pa_color=dark-blue&attribute_pa_accent-color=gold&attribute_pa_size=small HTTP/1.0|0|0|-
I need to extract attribute_pa_color=
So I have
cat somewebsite.access.log.2.csv | grep -o "?.*=" > just-parameters.txt
Which works but if there are multiple parameters in the URL is returns all of them
So instead of stopping the match at the first instance of "=" its taking the last instance of "=" in the line.
How can I make it stop at the first.
I tried this
cat somewebsite.access.log.2.csv | grep -o "?(.*?)=" > just-parameters2.txt
cat somewebsite.access.log.2.csv | grep -o "\?(.*?)=" > just-parameters2.txt
Both return nothing
Also I need each unique parameter so once I created the file I ran
sort just-parameters.txt | uniq > clean.txt
Which does not appear to work, is it possible to remove duplicates and have it be part of them same command?
You can try something like with awk
awk -F'[?&]' '{print $2}' somewebsite.access.log.2.csv|sort -u > clean.txt
This will work if attribute_pa_color is the first parameter on URL
If you want to extract only text attribute_pa_color= you can try something like:
awk -F'[?&]' '{print $2}' somewebsite.access.log.2.csv|awk -F\= '{print $1"="}'|sort -u > clean.txt
Instead of using second awk you can try something like:
awk -F'[?&]' '{split($2,a,=);print a[1]}' somewebsite.access.log.2.csv|sort -u > clean.txt
Split internally in awk using = as delimiter

how to search "." in a string using grep command

I have some string which include "." as part of it example
VCAT.VSCH.VIVEK
VIVEK
I want to grep the sting which include ".vivek". i tried using grep -iw ".vivek" but it return no data.
please help me finding the string.
Thanks in advance
Vivek
You should remove w and use
s="VCAT.VSCH.VIVEK
VIVEK"
grep -i '\.vivek' <<< "$s"
# => VCAT.VSCH.VIVEK
See the online demo
Or, with a word boundary at the end to match vivek and not viveks:
grep -i '\.vivek\b' <<< "$s"
See another grep demo.

How to grep out substring which can change?

Basically I have a very large text file and each line contains
tag=yyyyy;id=xxxxx;db_ref=zzzzz;
What I want is to grep out the id, but the id can change in length and form, I was wondering if its possible to use grep -o and then grep for "id=" then extract everything that comes after it until the semicolon?
You could do:
$ grep -o 'id=[^;]*' file
And if you don't want to inlcude the id= part you can using positive look-behind:
$ grep -Po '(?<=id=)[^;]*' file
try :
grep -Po "(?<=id=)[^;]*" file
Via grep:
grep -o 'id=[^;]*'
Via awk:
awk -F';' '{ print $2}' testlog
id=xxxxx
edit: see sudo_O's answer for the look-behind. it's more to the point of your question, IMO.
You could try this awk. It should also work if there are multiple id= entries per line and it would not give a false positive for ...;pid=blabla;...
awk '/^id=/' RS=\; file
Try the following:
grep -oP 'id=\K[^;]*' file
perl -lne 'print $1 if(/id=([^\;]*);/)' your_file
tested:
> echo "tag=yyyyy;id=xxxxx;db_ref=zzzzz; "|perl -lne 'print $1 if(/id=([^\;]*);/)'
xxxxx
>

How do I get the URLs out of an HTML file?

I need to get a long list of valid URLs for testing my DNS server. I found a web page that has a ton of links in it that would probably yield quite a lot of good links (http://www.cse.psu.edu/~groenvel/urls.html), and I figured that the easiest way to do this would be to download the HTML file and simply grep for the URLs. However, I can't get it to list out my results with only the link.
I know there are lots of ways to do this. I'm not picky how it's done.
Given the URL above, I want a list of all of the URLs (one per line) like this:
http://www.cse.psu.edu/~groenvel/
http://www.acard.com/
http://www.acer.com/
...
Method 1
Step1:
wget "http://www.cse.psu.edu/~groenvel/urls.html"
Step2:
perl -0ne 'print "$1\n" while (/a href=\"(.*?)\">.*?<\/a>/igs)' /PATH_TO_YOUR/urls.html | grep 'http://' > /PATH_TO_YOUR/urls.txt
Just replace the "/PATH_TO_YOUR/" with your filepath. This would yield a text file with only urls.
Method 2
If you have lynx installed you could simply do this in 1 step:
Step1:
lynx --dump http://www.cse.psu.edu/~groenvel/urls.html | awk '/(http|https):\/\// {print $2}' > /PATH_TO_YOUR/urls.txt
Method 3
Using curl:
Step1
curl http://www.cse.psu.edu/~groenvel/urls.html 2>&1 | egrep -o "(http|https):.*\">" | awk 'BEGIN {FS="\""};{print $1}' > /PATH_TO_YOUR/urls.txt
Method 4
Using wget:
wget -qO- http://www.cse.psu.edu/~groenvel/urls.html 2>&1 | egrep -o "(http|https):.*\">" | awk 'BEGIN {FS="\""};{print $1}' > /PATH_TO_YOUR/urls.txt
you need wget, grep, sed.
I will try a solution and update my post later.
Update:
wget [the_url];
cat urls.html | egrep -i '<a href=".*">' | sed -e 's/.*<A HREF="\(.*\)">.*/\1/i'

basic grep

I have a large file where each line contains a substring such as ABC123. If I execute
grep ABC file.txt
or
grep ABC1 file.txt
I get those lines back as expected, but if I execute
grep ABC12 file.txt
grep fails to find the corresponding lines.
This seems pretty trivial functionality, but I'm not a heavy user of grep so perhaps I'm missing some gotcha.
Use something like
od -x -a < filename
to dump out the file contents in hex. That'll immediately show you if what you have in your file is what you expect. Which I suspect it isn't :-)
Note: od has lots of useful options to help you here. Too many to list, in fact.
Is there a chance your file contains some hidden character, such as 0x00 ?
This doesn't make sense. Are you sure the file contains "ABC123"?
You can verify this by running following command in a shell
echo "ABC123" | grep ABC12
If the lines contain ABC123, then "grep ABC12" should get them. Do you perhaps mean that you want to match several different strings, such as ABC1, ABC2 and ABC3? In that case you can try this:
grep -E 'ABC1|ABC2|ABC3'
I'm not sure what the problem is.. grep works exactly as it should.. For example, the contents of my test file:
$ cat file.txt
ABC
ABC1
ABC12
ABC123
..and grep'ing for ABC, ABC1, ABC12, ABC123:
$ grep ABC file.txt
ABC
ABC1
ABC12
ABC123
$ grep ABC1 file.txt
ABC1
ABC12
ABC123
$ grep ABC12 file.txt
ABC12
ABC123
$ grep ABC123 file.txt
ABC123
grep is basically a filter, any line containing the first argument (ABC, or ABC1 etc) will be displayed. If it doesn't contain the entire string, it will not be displayed

Resources