grep file with a large array

grep file with a large array - grep

Hi i have a few archive of FW log and occasionally im required to compare them with a series of IP addresses (thousand of them) to get the date and time if the ip addresses matches. my current script is as follow:
#input the list of ip into array
mapfile -t -O 1 var < ip.txt while true
do
#check array is not null
if [[-n "${var[i]}"]] then
zcat /.../abc.log.gz | grep "${var[i]}"
((i++))
It does work but its way too slow and i would think that grep-ping a line with multiple strings would be faster than zcat on every ip line. So my question is is there a way to generate a 'long grep search string' from the ip.txt? or is there a better way to do this

Sure. One thing is that using cat is usually slightly inefficient. I'd recommend using zgrep here instead. You could generate a regex as follows
IP=`paste -s -d ' ' ip.txt`
zgrep -E "(${IP// /|})" /.../abc.log.gz
The first line loads the IP addresses into IP as a single line. The second line builds up a regex that looks something like (127.0.0.1|8.8.8.8) by replacing spaces with |'s. It then uses zgrep to search through abc.log.gz once, with that -Extended regex.
However, I recommend that you do not do this. Firstly, you should escape strings put into a regex. Even if you know that ip.txt really contains IP addresses (e.g. not controlled by a malicious user), you should still escape the periods. But rather than building up a search string and then escape it, just use the -Fixed strings and -file features of grep. Then you get the simple and fast one-liner:
zgrep -F -f ip.txt /.../abc.log.gz

Related

Grep looking for something that fits one paramter but NOT the other

In my data set, we have a multitude of emails that must be parsed (alongside a myriad of other unrelated information like phone numbers and addresses and such.)
I am attempting to look for something that meets the criteria of an email, but does not have the proper format of an email. So, I tried using grep's "AND" function, wherein it fits the second parameter but not the first.
grep -E -c -v "^[a-mA-M][a-zA-Z]*\.#[A-Za-z]+\.[A-Za-z]{2,6}"Data.bash | grep # Data.bash
How should I be implementing this? As this just finds anything with an # in it (as the first parameter returns 0 and the second is just finding everything with an # in it).
In short, How do I AND two conditions together in grep?
EDIT: Sample Data
An email address has a user-id and domain names can consist of letters, numbers,
periods, and dashes.
Matches:
saltypickle#gmail.com
saltypickle#g-mail.com
No Match:
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#

grep -P '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#gmail.com
saltypickle#g-mail.com
This will allow any alpha ,number, - or . as domain name.
Following will print invalid email addresses:
grep -vP '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#

duplicate grep output when comparing two files

I have literally been at this for 5 hours, I have busybox on my device, and I unfortunately do not have -X in grep to make my life easier.
edit;
I have two list both of them have mac addresses, essentially I am just wanting to achieve offline mac address lookup so I don't have to keep looking it up online
list.txt has vendor mac prefix of course this isn't the complete list but just for an example
00:13:46
00:15:E9
00:17:9A
00:19:5B
00:1B:11
00:1C:F0
scan will have list of different mac addresses unknown to which vendor they go to. Which will be full length mac addresses. when ever there is a match I want the line in scan to be output.
Pretty much it does that, but it outputs everything from the scan file, and then it will output matching one at the end, and causing duplicate. I tried sort -u, but it has no effect its as if there is two different output from two different methods, the reason why I say that is because it will instantly output scan file that has everything in it, and couple seconds later it will output the matching one.
From searching I came across this
#!/bin/bash
while read line; do
grep -F 'list' 'scan'
done < list.txt
which displays the duplicate result when/if found, the output is pretty much echoing my scan file then displaying the matched pattern, this creating duplicate
This is frustrating me that I have not found a solution after click on all the links in google up to page 9.
Please someone help me.

I don't know if the Busybox sed supports this out of the box, but it should be easy to do in Awk or Perl instead then.
Create a sed script to print lines from file2 which are covered by a prefix in file1 by transforming each line in file1 into a sed command to print a match for that regular expression:
sed 's%.*%/&/p%' file1 | sed -n -f - file2
The same in Awk:
awk 'NR==FNR { a[++i]="^" $0; next }
{ for (j=1; j<=i; ++j) if ($0 ~ a[j]) print }' file1 file2

Ok guys I did a nested for loop (probably very in efficient) but I got it working printing the matching mac addresses using this
#!/usr/bin/bash
for scanlist in `cat scan | cut -d: -f1,2,3`
do
for listt in `cat list`
do
if [[ $scanlist == $listt ]]; then
grep $scanlist scan
fi
done
done
if anyone can make this more elegant but it works for me for now. I think the problem I had was one list contained just 00:11:22 while my other list contained 00:11:22:33:44:55 that is why I cut it on my scanlist to make same length as my other list. So this only output the matches instead of doing duplicate output.

How can I grep a file for multiple unique values?

I have some firewall logs and I want to find multiple unique values. I need to find every unique combination of source IP and destination port, which are in this format in /var/log/iptables.
SRC=123.123.123.123
DPT=137
So, if source IP 123.123.123.123 makes multiple appearances on multiple ports, I want to see that but, just once for each SRC/DPT combo.
Thanks!

This awk solution might help. The first awk command combines each pair of successive SRC and DPT lines into a single line. The output from this command is then piped to the second awk command, which provides uniquefied output, retaining original order
awk '/^SRC|^DPT/{ORS=$0 ~ /^SRC/?" ":"\n"; print}' file.* | awk '!a[$0]++'
If multiple SRC, DPT entries exist per line, the following should work
grep -oE 'SRC=[[:digit:].]+[[:space:]]+DPT=[[:digit:].]+' file.txt | awk '!a[$0]++'

You can try "grep AND", see examples from the link:
http://www.thegeekstuff.com/2011/10/grep-or-and-not-operators/

grep from beginning of found word to end of word

I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!

If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages

It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.

Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.

I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.

Find stored procedures not referenced in source code

I am trying to clean up a legacy database by dropping all procedures that are not used by the application. Using grep, I have been able to determine that a single procedure does not occur in the source code. Is there a way to do this for all of the procedures at once?
UPDATE: While using -E "proc1|proc2" produces an output of all lines in all files which match either pattern, this is not very useful. The legacy database has 2000+ procedures.
I tried to use the -o option thinking that I could use its output as the pattern for an inverse search on the original pattern. However, I found that there is no output when you use the -o option with more than one pattern.
Any other ideas?
UPDATE: After further experimenting, I found that it is the combination of the -i and -o options which are preventing the output. Unfortunately, I need a case insensitive search in this context.

feed the list of stored procedures to egrep separated by "|"
or:
for stored_proc in $stored_procs
do
grep $stored_proc $source_file
done

I've had to do this in the past as well. Don't forget about any procs that may be called from other procs.
If you are using SQL Server you can use this:
SELECT name,
text
FROM sysobjects A
JOIN syscomments B
ON A.id = B.id
WHERE xtype = 'P'
AND text LIKE '%< sproc name >%'

I get output under the circumstances described in your edit:
$ echo "aaaproc1bbb" | grep -Eo 'proc1|proc2'
proc1
$ echo $?
0
$ echo "aaabbb" | grep -Eo 'proc1|proc2'
$ echo $?
1
The exit code shows if there was no match.
You might also find these options to grep useful (-L may be specific to GNU grep):
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines. (-c is specified by POSIX.)
-L, --files-without-match
Suppress normal output; instead print the name of each input
file from which no output would normally have been printed. The
scanning will stop on the first match.
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
-q, --quiet, --silent
Quiet; do not write anything to standard output. Exit
immediately with zero status if any match is found, even if an
error was detected. Also see the -s or --no-messages option.
(-q is specified by POSIX.)
Sorry for quoting the man page at you, but sometimes it helps to screen things a bit.
Edit:
For a list of filenames that do not contain any of the procedures (case insensitive):
grep -EiL 'proc1|proc2' *
For a list of filenames that contain any of the procedures (case insensitive):
grep -Eil 'proc1|proc2' *
To list the files and show the match (case insensitive):
grep -Eio 'proc1|proc2' *

Start with your list of procedure names. For easy re-use later, sort them and make them lowercase, like so:
tr "[:upper:]" "[:lower:]" < list_of_procedures | sort > sorted_list_o_procs
... now you have a sorted list of the procedure names. Sounds like you're already using gnu grep, so you've got the -o option.
fgrep -o -i -f sorted_list_o_procs source1 source2 ... > list_of_used_procs
Note the use of fgrep: these aren't regexps, really, so why treat them as such. Hopefully you will also find that this magically corrects your output issues ;). Now you have an ugly list of the used procedures. Let's clean them up as we did the orginal list above.
tr "[:upper:]" "[:lower:]" < list_of_used_procs | sort -u > short_list
Now you have a short list of the used procedures. Let's find the ones in the original list that aren't in the short list.
fgrep -v -f short_list sorted_list_o_procs
... and there they are.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

grep file with a large array - grep

Related

Grep looking for something that fits one paramter but NOT the other

duplicate grep output when comparing two files

How can I grep a file for multiple unique values?

grep from beginning of found word to end of word

Find stored procedures not referenced in source code

Categories

Resources