grep wildcards inside file - grep

I am using the following code to filter SPAM on my mail server:
if 822field from | grep -qiFf "/email_filters/from.txt"; then exit 99; fi
and also:
if 822field subject | grep -qiFf "email_filters/bad_subject.txt"; then exit 99; fi
Inside of the two above-referenced files, I have lists of SPAM email sources and subjects (e.g., pfizer, cialis, viagra, etc.), each on its own line in the text file.
How can I have wildcards within the file? For example, I have recently begun receiving fake U.S. Postal Service notifications from
status_38#unikmetal.ru", so I'd like to block "#_.ru".

When you add -F to grep, it processes a fixed string not a regular expression. To use wildcards you must use regular expressions as far as I know.
Remove the -F in the grep command
grep -qif "/email_filters/from.txt"
To block your russian email addresses you can add something like this to your filters
#.*\.ru
Explanation
# - match '#' character literally
.* - match any character 0 or more times
\. - match '.' literally
ru - match 'ru' literally

Related

Linux Grep Command - Extract multiple texts between strings

Context;
After running the following command on my server:
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022 > analisis.txt
I get a text file with thousands of lines like this example:
loggers1/PCRF1_17868/PCRF12_01_03_2022_00_15_39.log:[C]|01-03-2022:00:18:20:183401|140404464875264|TRACKING: CCR processing Compleated for SubId-5281181XXXXX, REQNO-1, REQTYPE-3,
SId-mscp01.herpgwXX.epc.mncXXX.mccXXX.XXXXX.org;25b8510c;621dbaab;3341100102036XX-27cf0XXX,
RATTYPE-1004, ResCode-5005 |processCCR|ProcessingUnit.cpp|423
(X represents incrementing numbers)
Problem:
The output is filled with unnecessary data. The only string portions I need are the MSISDN,IMSI comma separated for each line, like this:
5281181XXXXX,3341100102036XX
Steps I tried
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022| grep -o -P
'(?<=SubId-).*?(?=, REQ)' > analisis1.txt
This gave me the first part of the solution
5281181XXXXX
However, when I tried to get the second string located between '334110' and "-"
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022| grep -o -P
'(?<=SubId-).?(?=, REQ)' | grep -o -P '(?<=334110).?(?=-)' >
analisis1.txt
it doesn't work.
Any input will be appreciated.
To get 5281181XXXXX or the second string located between '334110' and "-" you can use a pattern like:
\b(?:SubId-|334110)\K[^,\s-]+
The pattern matches:
\b A word boundary to prevent a partial word match
(?: Non capture group to match as a whole
SubId- Match literally
| Or
334110 Match literally
) Close the non capture group
\K Forget what is matched so far
[^,\s-]+ Match 1+ occurrences of any char except a whitespace char , or -
See the matches in this regex demo.
That will match:
5281181XXXXX
0102036XX
The command could look like
zgrep "ResCode-5005" /loggers1/PCRF*/_01_03_2022 | grep -oP '\b(?:SubId-|334110)\K[^,\s-]+' > analisis1.txt

grep file with a large array

Hi i have a few archive of FW log and occasionally im required to compare them with a series of IP addresses (thousand of them) to get the date and time if the ip addresses matches. my current script is as follow:
#input the list of ip into array
mapfile -t -O 1 var < ip.txt while true
do
#check array is not null
if [[-n "${var[i]}"]] then
zcat /.../abc.log.gz | grep "${var[i]}"
((i++))
It does work but its way too slow and i would think that grep-ping a line with multiple strings would be faster than zcat on every ip line. So my question is is there a way to generate a 'long grep search string' from the ip.txt? or is there a better way to do this
Sure. One thing is that using cat is usually slightly inefficient. I'd recommend using zgrep here instead. You could generate a regex as follows
IP=`paste -s -d ' ' ip.txt`
zgrep -E "(${IP// /|})" /.../abc.log.gz
The first line loads the IP addresses into IP as a single line. The second line builds up a regex that looks something like (127.0.0.1|8.8.8.8) by replacing spaces with |'s. It then uses zgrep to search through abc.log.gz once, with that -Extended regex.
However, I recommend that you do not do this. Firstly, you should escape strings put into a regex. Even if you know that ip.txt really contains IP addresses (e.g. not controlled by a malicious user), you should still escape the periods. But rather than building up a search string and then escape it, just use the -Fixed strings and -file features of grep. Then you get the simple and fast one-liner:
zgrep -F -f ip.txt /.../abc.log.gz

Grep looking for something that fits one paramter but NOT the other

In my data set, we have a multitude of emails that must be parsed (alongside a myriad of other unrelated information like phone numbers and addresses and such.)
I am attempting to look for something that meets the criteria of an email, but does not have the proper format of an email. So, I tried using grep's "AND" function, wherein it fits the second parameter but not the first.
grep -E -c -v "^[a-mA-M][a-zA-Z]*\.#[A-Za-z]+\.[A-Za-z]{2,6}"Data.bash | grep # Data.bash
How should I be implementing this? As this just finds anything with an # in it (as the first parameter returns 0 and the second is just finding everything with an # in it).
In short, How do I AND two conditions together in grep?
EDIT: Sample Data
An email address has a user-id and domain names can consist of letters, numbers,
periods, and dashes.
Matches:
saltypickle#gmail.com
saltypickle#g-mail.com
No Match:
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#
grep -P '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#gmail.com
saltypickle#g-mail.com
This will allow any alpha ,number, - or . as domain name.
Following will print invalid email addresses:
grep -vP '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#

grep from beginning of found word to end of word

I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.

How to filter using grep on a selected word

grep (GNU grep) 2.14
Hello,
I have a log file that I want to filter on a selected word. However, it tends to filter on many for example.
tail -f gateway-* | grep "P_SIP:N_iptB1T1"
This will also find words like this:
"P_SIP:N_iptB1T10"
"P_SIP:N_iptB1T11"
"P_SIP:N_iptB1T12"
etc
However, I don't want to display anything after the 1. grep is picking up 11, 12, 13, etc.
Many thanks for any suggestions,
You can restrict the word to end at 1:
tail -f gateway-* | grep "P_SIP:N_iptB1T1\>"
This will work assuming that you have a matching case which is only "P_SIP:N_iptB1T1".
But if you want to extract from P_SIP:N_iptB1T1x, and display only once, then you need to restrict to show only first match.
grep -o "P_SIP:N_iptB1T1"
-o, --only-matching show only the part of a line matching PATTERN
More info
At least two approaches can be tried:
grep -w pattern matches for full words. Seems to work for this case too, even though the pattern has punctuation.
grep pattern -m 1 to restrict the output to first match. (Also doable with grep xxx | head -1)
If the lines contains the quotes as in your example, just use the -E option in grep and match the closing quote with \". For example:
grep -E "P_SIP:N_iptB1T1\"" file
If these quotes aren't in the text file, and there's blank spaces or endlines after the word, you can match these too:
# The word is followed by one or more blanks
grep -E "P_SIP:N_iptB1T1\s+" file
# Match lines ending with the interesting word
grep -E "P_SIP:N_iptB1T1$" file

Resources