I have a list of email addresses. I also have a list of common first and last names. I want to filter the email list against the one with common first and last names, thus only printing emails with either a common first and / or last name in the output file.
So, I tried:
cat file | egrep -e -i < whitelist | tee emails_with_common_first_and_last_names.txt
At first, this seemed like it was working. Then, after examining the output, it did not seem to do anything.
wc -l input output
This revealed that nothing was filtered.
So, how else can I do this or what am I doing incorrectly?
Here is a sample of the file that I would like filtered:
aynz#falskdf.com
8zlkhsdf0#fmail.com
afjsg#domain.com
Here is a sample of the whitelist that I would like to use as a reference to filer the file:
ALEX
johnson
WINTERS
miles
christina
tonya
jackson
schmidt
jake
So, if an email contains any of these, grep or whatever needs to print it to a output file.
Related
I have a txt file that has only information about location (location.txt)
Another large txt file (all.txt) has a lot of information like id , and a location.txt is subset of all.txt ( some records common in both )
I want to search the location.txt in another file with grep (all.txt)
and print all common records ( but all information like all.txt )
I try to grep by :
grep -f location.txt all.txt
the problem grep just give me the last location not all locations
how can I print all location?
I'm assuming you mean to use one of the files as a set of patterns for grep. If this is the case, you seem to be looking for a way to print all lines in one file not found in the other and this is what you want:
grep -vFf file_with_patterns other_file
Explanation
-F means to interpret the pattern(s) literally, giving no particular meaning to regex metacharacters (like * and +, for example)
-f means read regex patterns from the file named as argument (file_with_patterns in this case).
I try to grab some activities in a logfile with grep.
The logfile looks like the line below:
[07/16/2019 14:09:26:516 CEST] 00018e0a main I [APPName][DEBUG][Various Activity Name][SERVICE] node: <resultSet recordCount="3" columnCount="6">
So my target is to grab over the logfile and get all activity names (various and sometimes unknown names).
I tried with regex but I can't reach the final goal :-P.
How can i grab like (here comes my pseudoCode):
grep "[AppName][DEBUG] [*]" logfile.log
You're probably better off using sed for this. Just make sure that you are escaping characters properly. Something like this should work and you can pipe it to sort and uniq to get a unique list of activity names:
sed 's/.*\[APPName\]\[DEBUG\]\[\([^]]*\)\].*/\1/' logfile.log|sort|uniq
In my data set, we have a multitude of emails that must be parsed (alongside a myriad of other unrelated information like phone numbers and addresses and such.)
I am attempting to look for something that meets the criteria of an email, but does not have the proper format of an email. So, I tried using grep's "AND" function, wherein it fits the second parameter but not the first.
grep -E -c -v "^[a-mA-M][a-zA-Z]*\.#[A-Za-z]+\.[A-Za-z]{2,6}"Data.bash | grep # Data.bash
How should I be implementing this? As this just finds anything with an # in it (as the first parameter returns 0 and the second is just finding everything with an # in it).
In short, How do I AND two conditions together in grep?
EDIT: Sample Data
An email address has a user-id and domain names can consist of letters, numbers,
periods, and dashes.
Matches:
saltypickle#gmail.com
saltypickle#g-mail.com
No Match:
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#
grep -P '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#gmail.com
saltypickle#g-mail.com
This will allow any alpha ,number, - or . as domain name.
Following will print invalid email addresses:
grep -vP '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#
I have literally been at this for 5 hours, I have busybox on my device, and I unfortunately do not have -X in grep to make my life easier.
edit;
I have two list both of them have mac addresses, essentially I am just wanting to achieve offline mac address lookup so I don't have to keep looking it up online
list.txt has vendor mac prefix of course this isn't the complete list but just for an example
00:13:46
00:15:E9
00:17:9A
00:19:5B
00:1B:11
00:1C:F0
scan will have list of different mac addresses unknown to which vendor they go to. Which will be full length mac addresses. when ever there is a match I want the line in scan to be output.
Pretty much it does that, but it outputs everything from the scan file, and then it will output matching one at the end, and causing duplicate. I tried sort -u, but it has no effect its as if there is two different output from two different methods, the reason why I say that is because it will instantly output scan file that has everything in it, and couple seconds later it will output the matching one.
From searching I came across this
#!/bin/bash
while read line; do
grep -F 'list' 'scan'
done < list.txt
which displays the duplicate result when/if found, the output is pretty much echoing my scan file then displaying the matched pattern, this creating duplicate
This is frustrating me that I have not found a solution after click on all the links in google up to page 9.
Please someone help me.
I don't know if the Busybox sed supports this out of the box, but it should be easy to do in Awk or Perl instead then.
Create a sed script to print lines from file2 which are covered by a prefix in file1 by transforming each line in file1 into a sed command to print a match for that regular expression:
sed 's%.*%/&/p%' file1 | sed -n -f - file2
The same in Awk:
awk 'NR==FNR { a[++i]="^" $0; next }
{ for (j=1; j<=i; ++j) if ($0 ~ a[j]) print }' file1 file2
Ok guys I did a nested for loop (probably very in efficient) but I got it working printing the matching mac addresses using this
#!/usr/bin/bash
for scanlist in `cat scan | cut -d: -f1,2,3`
do
for listt in `cat list`
do
if [[ $scanlist == $listt ]]; then
grep $scanlist scan
fi
done
done
if anyone can make this more elegant but it works for me for now. I think the problem I had was one list contained just 00:11:22 while my other list contained 00:11:22:33:44:55 that is why I cut it on my scanlist to make same length as my other list. So this only output the matches instead of doing duplicate output.
I have some firewall logs and I want to find multiple unique values. I need to find every unique combination of source IP and destination port, which are in this format in /var/log/iptables.
SRC=123.123.123.123
DPT=137
So, if source IP 123.123.123.123 makes multiple appearances on multiple ports, I want to see that but, just once for each SRC/DPT combo.
Thanks!
This awk solution might help. The first awk command combines each pair of successive SRC and DPT lines into a single line. The output from this command is then piped to the second awk command, which provides uniquefied output, retaining original order
awk '/^SRC|^DPT/{ORS=$0 ~ /^SRC/?" ":"\n"; print}' file.* | awk '!a[$0]++'
If multiple SRC, DPT entries exist per line, the following should work
grep -oE 'SRC=[[:digit:].]+[[:space:]]+DPT=[[:digit:].]+' file.txt | awk '!a[$0]++'
You can try "grep AND", see examples from the link:
http://www.thegeekstuff.com/2011/10/grep-or-and-not-operators/