Recursively grep results and pipe back - grep

I need to find some matching conditions from a file and recursively find the next conditions in previously matched files , i have something like this
input.txt
123
22
33
The files where you need to find above terms in following files, the challenge is if 123 is found in say 10 files , the 22 should be searched in these 10 files only and so on...
Example of files are like f1,f2,f3,f4.....f1200
so it is like i need to grep -w "123" f* | grep -w "123" | .....
its not possible to list them manually so any easier way?

You can solve this using awk script, i ve encountered a similar problem and this will work fine
awk '{ if(!NR){printf("grep -w %d f*|",$1)} else {printf("grep -w %d f*",$1)} }' input.txt | sh
What it Does?
it reads input.txt line by line
until it is at last record , it prints grep -w %d | (note there is a
pipe here)
which is then sent to shell for execution and results are piped back
to back
and when you reach the end the pipe is avoided

Perhaps taking a meta-programming viewpoint would help. Have grep output a series of grep commands. Or write a little PERL program. Maybe Ruby, if the mood suits.

You can use grep -lw to write the list of file names that matched (note that it will stop after finding the first match).
You capture the list of file names and use that for the next iteration in a loop.

Related

Grep words with exact two vowels

I have the following issue, I need to retrieve all words that contains exactly 2 vowels (in any order) from a file. The file only contains one word per line.
My current workaround is:
Grep1: Retrieve words such as earth, over, under, one...
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > A.txt
and
Grep2: Retrieve words such as formless, deep, said...
grep -i "^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > B.txt
the above solution works but when I concatenate both regexs into a single regex then return nothing!
Mother of Grep1 & Grep2: should retrieve everything!
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$|^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words
I think issue is around my implementation of ^$ in expression but have tried diff versions with no sucess!
Any help will be highly appreciated!
OS is AIX 6100-09-04-1441
You were close. This should work:
grep -i "^[^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > A.txt
So it should find all eight possibilities (two vowels identify three nonvowel sequence, each possibly empty; 2^3 is 8):
[ ]I[ ]o[ ]
[ ]e[ ]a[r]
[ ]e[r]a[ ]
[ ]e[l]a[n]
[T]e[ ]a[ ]
[D]e[ ]a[r]
[D]e[w]a[r]
[D]a[w]a[ ]
[H]a[w]a[y]
As for concatenation, | needs escaping. You can use a single anchoring:
^(regexp1\|regexp2)$
Since the * can match 0 times or more you should be able to start the string with [^aeiou]*: try
"^[^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$"
As for fixing your regex, I think you need to escape the bar as \|, so
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$\|^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words
If you don't mind Perl, you could use this:
perl -lne '$m=$_; tr/[aeiou]//cd; print $m if length()==2;' /usr/share/dict/words
That says... "save the current line (word) in $m. Delete everything that is not a vowel. Print the original word if there are two things (i.e vowels) left."
Note that I am using the system dictionary as input for my tests.
You could do pretty much the same thing in awk.
If you're able to use an alternative to grep tr with wc works well:
words=/path/to/words.txt
while read -e word ; do
v=$(echo $word | tr -cd 'aeiou' | wc -c)
[[ ! $v -eq "2" ]] || echo $word >> output.txt
done < $words
This reads the original file line by line, counts the vowels & returns results with only 2 to output.txt.

duplicate grep output when comparing two files

I have literally been at this for 5 hours, I have busybox on my device, and I unfortunately do not have -X in grep to make my life easier.
edit;
I have two list both of them have mac addresses, essentially I am just wanting to achieve offline mac address lookup so I don't have to keep looking it up online
list.txt has vendor mac prefix of course this isn't the complete list but just for an example
00:13:46
00:15:E9
00:17:9A
00:19:5B
00:1B:11
00:1C:F0
scan will have list of different mac addresses unknown to which vendor they go to. Which will be full length mac addresses. when ever there is a match I want the line in scan to be output.
Pretty much it does that, but it outputs everything from the scan file, and then it will output matching one at the end, and causing duplicate. I tried sort -u, but it has no effect its as if there is two different output from two different methods, the reason why I say that is because it will instantly output scan file that has everything in it, and couple seconds later it will output the matching one.
From searching I came across this
#!/bin/bash
while read line; do
grep -F 'list' 'scan'
done < list.txt
which displays the duplicate result when/if found, the output is pretty much echoing my scan file then displaying the matched pattern, this creating duplicate
This is frustrating me that I have not found a solution after click on all the links in google up to page 9.
Please someone help me.
I don't know if the Busybox sed supports this out of the box, but it should be easy to do in Awk or Perl instead then.
Create a sed script to print lines from file2 which are covered by a prefix in file1 by transforming each line in file1 into a sed command to print a match for that regular expression:
sed 's%.*%/&/p%' file1 | sed -n -f - file2
The same in Awk:
awk 'NR==FNR { a[++i]="^" $0; next }
{ for (j=1; j<=i; ++j) if ($0 ~ a[j]) print }' file1 file2
Ok guys I did a nested for loop (probably very in efficient) but I got it working printing the matching mac addresses using this
#!/usr/bin/bash
for scanlist in `cat scan | cut -d: -f1,2,3`
do
for listt in `cat list`
do
if [[ $scanlist == $listt ]]; then
grep $scanlist scan
fi
done
done
if anyone can make this more elegant but it works for me for now. I think the problem I had was one list contained just 00:11:22 while my other list contained 00:11:22:33:44:55 that is why I cut it on my scanlist to make same length as my other list. So this only output the matches instead of doing duplicate output.

grep from beginning of found word to end of word

I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.

How to filter using grep on a selected word

grep (GNU grep) 2.14
Hello,
I have a log file that I want to filter on a selected word. However, it tends to filter on many for example.
tail -f gateway-* | grep "P_SIP:N_iptB1T1"
This will also find words like this:
"P_SIP:N_iptB1T10"
"P_SIP:N_iptB1T11"
"P_SIP:N_iptB1T12"
etc
However, I don't want to display anything after the 1. grep is picking up 11, 12, 13, etc.
Many thanks for any suggestions,
You can restrict the word to end at 1:
tail -f gateway-* | grep "P_SIP:N_iptB1T1\>"
This will work assuming that you have a matching case which is only "P_SIP:N_iptB1T1".
But if you want to extract from P_SIP:N_iptB1T1x, and display only once, then you need to restrict to show only first match.
grep -o "P_SIP:N_iptB1T1"
-o, --only-matching show only the part of a line matching PATTERN
More info
At least two approaches can be tried:
grep -w pattern matches for full words. Seems to work for this case too, even though the pattern has punctuation.
grep pattern -m 1 to restrict the output to first match. (Also doable with grep xxx | head -1)
If the lines contains the quotes as in your example, just use the -E option in grep and match the closing quote with \". For example:
grep -E "P_SIP:N_iptB1T1\"" file
If these quotes aren't in the text file, and there's blank spaces or endlines after the word, you can match these too:
# The word is followed by one or more blanks
grep -E "P_SIP:N_iptB1T1\s+" file
# Match lines ending with the interesting word
grep -E "P_SIP:N_iptB1T1$" file

grep for a string which has a specific number in the end

I want to grep for the string THREAD: 2. It has a space in between. Not able to figure out how.
I tried with grep "THREAD:[ \2]", but its not working
Please let me know.
Try grep "THREAD: 2" <filename>? You just want a literal '2', right?
If you are using GNU grep you could try using the alias egrep or grep -e with 'THREAD: 2$'
You might have to use '^.*THREAD: 2$'
grep reports back the entire line that has matched your pattern. If you wish to look at lines that contains THREAD: 2 then the following should work -
grep "THREAD: 2" filename
However, if you wish to fetch lines that could contain THREAD: and any number then you can use a character class. So in that case the answer would be -
grep "THREAD: [0-9]" filename
You can add + after the character class which means one or more numbers so that you can match numbers like 1,2,3 or 11,12,13 etc.
If you only want to fetch THREAD: 2 from your line then you will have to use an option of grep which is -o. It means show me only my pattern from the file not the entire line.
grep -o "THREAD: 2" filename
You can look up man page for grep and play around with all the options.

Resources