Grep words with exact two vowels

Grep words with exact two vowels - grep

I have the following issue, I need to retrieve all words that contains exactly 2 vowels (in any order) from a file. The file only contains one word per line.
My current workaround is:
Grep1: Retrieve words such as earth, over, under, one...
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > A.txt
and
Grep2: Retrieve words such as formless, deep, said...
grep -i "^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > B.txt
the above solution works but when I concatenate both regexs into a single regex then return nothing!
Mother of Grep1 & Grep2: should retrieve everything!
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$|^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words
I think issue is around my implementation of ^$ in expression but have tried diff versions with no sucess!
Any help will be highly appreciated!
OS is AIX 6100-09-04-1441

You were close. This should work:
grep -i "^[^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words > A.txt
So it should find all eight possibilities (two vowels identify three nonvowel sequence, each possibly empty; 2^3 is 8):
[ ]I[ ]o[ ]
[ ]e[ ]a[r]
[ ]e[r]a[ ]
[ ]e[l]a[n]
[T]e[ ]a[ ]
[D]e[ ]a[r]
[D]e[w]a[r]
[D]a[w]a[ ]
[H]a[w]a[y]
As for concatenation, | needs escaping. You can use a single anchoring:
^(regexp1\|regexp2)$

Since the * can match 0 times or more you should be able to start the string with [^aeiou]*: try
"^[^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$"
As for fixing your regex, I think you need to escape the bar as \|, so
grep -i "^[aeiou][^aeiou]*[aeiou][^aeiou]*$\|^[^aeiou][^aeiou]*[aeiou][^aeiou]*[aeiou][^aeiou]*$" genesis.words

If you don't mind Perl, you could use this:
perl -lne '$m=$_; tr/[aeiou]//cd; print $m if length()==2;' /usr/share/dict/words
That says... "save the current line (word) in $m. Delete everything that is not a vowel. Print the original word if there are two things (i.e vowels) left."
Note that I am using the system dictionary as input for my tests.
You could do pretty much the same thing in awk.

If you're able to use an alternative to grep tr with wc works well:
words=/path/to/words.txt
while read -e word ; do
v=$(echo $word | tr -cd 'aeiou' | wc -c)
[[ ! $v -eq "2" ]] || echo $word >> output.txt
done < $words
This reads the original file line by line, counts the vowels & returns results with only 2 to output.txt.

Related

duplicate grep output when comparing two files

I have literally been at this for 5 hours, I have busybox on my device, and I unfortunately do not have -X in grep to make my life easier.
edit;
I have two list both of them have mac addresses, essentially I am just wanting to achieve offline mac address lookup so I don't have to keep looking it up online
list.txt has vendor mac prefix of course this isn't the complete list but just for an example
00:13:46
00:15:E9
00:17:9A
00:19:5B
00:1B:11
00:1C:F0
scan will have list of different mac addresses unknown to which vendor they go to. Which will be full length mac addresses. when ever there is a match I want the line in scan to be output.
Pretty much it does that, but it outputs everything from the scan file, and then it will output matching one at the end, and causing duplicate. I tried sort -u, but it has no effect its as if there is two different output from two different methods, the reason why I say that is because it will instantly output scan file that has everything in it, and couple seconds later it will output the matching one.
From searching I came across this
#!/bin/bash
while read line; do
grep -F 'list' 'scan'
done < list.txt
which displays the duplicate result when/if found, the output is pretty much echoing my scan file then displaying the matched pattern, this creating duplicate
This is frustrating me that I have not found a solution after click on all the links in google up to page 9.
Please someone help me.

I don't know if the Busybox sed supports this out of the box, but it should be easy to do in Awk or Perl instead then.
Create a sed script to print lines from file2 which are covered by a prefix in file1 by transforming each line in file1 into a sed command to print a match for that regular expression:
sed 's%.*%/&/p%' file1 | sed -n -f - file2
The same in Awk:
awk 'NR==FNR { a[++i]="^" $0; next }
{ for (j=1; j<=i; ++j) if ($0 ~ a[j]) print }' file1 file2

Ok guys I did a nested for loop (probably very in efficient) but I got it working printing the matching mac addresses using this
#!/usr/bin/bash
for scanlist in `cat scan | cut -d: -f1,2,3`
do
for listt in `cat list`
do
if [[ $scanlist == $listt ]]; then
grep $scanlist scan
fi
done
done
if anyone can make this more elegant but it works for me for now. I think the problem I had was one list contained just 00:11:22 while my other list contained 00:11:22:33:44:55 that is why I cut it on my scanlist to make same length as my other list. So this only output the matches instead of doing duplicate output.

How can I know how many times a word is in a file using grep?

I have a file and I want to know how many times does a word is inside that file.(NOTE: A row can have the same word)

You can use this command. Hope this wil help you.
grep -o yourWord file | wc -l

Use the grep -c option to count the number of occurences of a search pattern.
grep -c searchString file

awk solution:
awk '{s+=gsub(/word/,"&")}END{print s}' file
test:
kent$ cat f
word word word
word
word word word
kent$ awk '{s+=gsub(/word/,"&")}END{print s}' f
7
you may want to add word boundary if you want to match an exact word.

Yes, i know you want a grep solution, but my favorite perl with the rolex operator can't missing here... ;)
perl -0777 -nlE 'say $n=()=m/\bYourWord\b/g' filename
# ^^^^^^^^
if yoy want match the YourWord surrounded with another letters like abcYourWordXYZ, use
perl -0777 -nlE 'say $n=()=m/YourWord/g' filename

How to filter using grep on a selected word

grep (GNU grep) 2.14
Hello,
I have a log file that I want to filter on a selected word. However, it tends to filter on many for example.
tail -f gateway-* | grep "P_SIP:N_iptB1T1"
This will also find words like this:
"P_SIP:N_iptB1T10"
"P_SIP:N_iptB1T11"
"P_SIP:N_iptB1T12"
etc
However, I don't want to display anything after the 1. grep is picking up 11, 12, 13, etc.
Many thanks for any suggestions,

You can restrict the word to end at 1:
tail -f gateway-* | grep "P_SIP:N_iptB1T1\>"
This will work assuming that you have a matching case which is only "P_SIP:N_iptB1T1".
But if you want to extract from P_SIP:N_iptB1T1x, and display only once, then you need to restrict to show only first match.

grep -o "P_SIP:N_iptB1T1"
-o, --only-matching show only the part of a line matching PATTERN
More info

At least two approaches can be tried:
grep -w pattern matches for full words. Seems to work for this case too, even though the pattern has punctuation.
grep pattern -m 1 to restrict the output to first match. (Also doable with grep xxx | head -1)

If the lines contains the quotes as in your example, just use the -E option in grep and match the closing quote with \". For example:
grep -E "P_SIP:N_iptB1T1\"" file
If these quotes aren't in the text file, and there's blank spaces or endlines after the word, you can match these too:
# The word is followed by one or more blanks
grep -E "P_SIP:N_iptB1T1\s+" file
# Match lines ending with the interesting word
grep -E "P_SIP:N_iptB1T1$" file

Recursively grep results and pipe back

I need to find some matching conditions from a file and recursively find the next conditions in previously matched files , i have something like this
input.txt
123
22
33
The files where you need to find above terms in following files, the challenge is if 123 is found in say 10 files , the 22 should be searched in these 10 files only and so on...
Example of files are like f1,f2,f3,f4.....f1200
so it is like i need to grep -w "123" f* | grep -w "123" | .....
its not possible to list them manually so any easier way?

You can solve this using awk script, i ve encountered a similar problem and this will work fine
awk '{ if(!NR){printf("grep -w %d f*|",$1)} else {printf("grep -w %d f*",$1)} }' input.txt | sh
What it Does?
it reads input.txt line by line
until it is at last record , it prints grep -w %d | (note there is a
pipe here)
which is then sent to shell for execution and results are piped back
to back
and when you reach the end the pipe is avoided

Perhaps taking a meta-programming viewpoint would help. Have grep output a series of grep commands. Or write a little PERL program. Maybe Ruby, if the mood suits.

You can use grep -lw to write the list of file names that matched (note that it will stop after finding the first match).
You capture the list of file names and use that for the next iteration in a loop.

grep: one pattern works but not the other

I have a teb-delimited file that has gene names in one column and expression values for these genes in the other. I want to delete certain genes from this file using grep. So, this:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
"42266" "snoMBII-202" "0"
"42267" "snoMBII-202" "0"
"42268" "snoMe28S-Am2634" "0"
"42269" "snoMe28S-Am2634" "0"
"42270" "snoR26" "0"
"42271" "SNORA1" "0"
"42272" "SNORA1" "0"
becomes this:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
I've used the following command that i've put together with my limited terminal knowledge:
grep -iv sno* <input.text> | grep -iv rp* | grep -iv U6* | grep -iv 7SK* > <output.txt>
So with this command, my output file lacks genes that start with sno, u6 and 7sk but somehow grep has deleted all the genes that has "r" in them instead of the ones that start with "rp". I'm very confused about this. Any ideas why sno* works but rp* not?
Thanks!

The grep command uses regular expressions, not globbing patterns.
The pattern rp* means "'r' followed by zero or more 'p'". What you really want is rp.*, or even better, "rp.* (or even just "rp, there's no point in trying to grep for anything after the "rp" after all). Likewise, sno* means "'sn' followed by zero or more 'o'". Again, you'd want sno.* or "sno.* (or even just "sno).

Although this doesn't directly answer your question, there is one thing in your sample command line that you may want to be careful with: Whenever you use a special shell metacharacter (like "*"), you need to escape or quote it. So your command line should look more like:
grep -iv 'sno*' <input.text> | grep -iv 'rp*' | grep -iv 'U6*' | grep -iv '7SK*' > <output.txt>
Often, shells are smart, and if no files match the glob, they will use the text as-is (so if you enter "foo*" but there are no filenames starting with "foo", then the string "foo*" will be passed to the command).

grep -iEv "sno|rp|U6|7SK" yourInput
test:
kent$ cat b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
"42266" "snoMBII-202" "0"
"42267" "snoMBII-202" "0"
"42268" "snoMe28S-Am2634" "0"
"42269" "snoMe28S-Am2634" "0"
"42270" "snoR26" "0"
"42271" "SNORA1" "0"
"42272" "SNORA1" "0"
kent$ grep -iEv "sno|rp|U6|7SK" b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Grep words with exact two vowels - grep

Related

duplicate grep output when comparing two files

How can I know how many times a word is in a file using grep?

How to filter using grep on a selected word

Recursively grep results and pipe back

grep: one pattern works but not the other

Categories

Resources