output lines without multiple patterns - grep

I used cat file | grep -v "pat1" | grep -v "pat2" | ... | grep -v "patN" to drop lines with any of a group of patterns. It looks awkward. Is there a better (concise) way to do that?

In case you are ok with awk you could try following. Created a variable named valIgnore which has all the values to be ignored, you can mention all values by comma separated fashion and have them in it. By doing this you can give N number of keywords in a single shot only in a variable itself. Moreover you can also create a shell variable which has all values to be ignored(in lines) make sure its comma separated and pass it to this awk program here. Since no samples are given so didn't test it but should work though.
awk -v valIgnore="pat1,pat2,pat3,pat4,pat5,pat6,pat7,pat8,pat9" '
BEGIN{
num=split(valIgnore,arr,",")
for(i=1;i<=num;i++){ ignoreVals[arr[i]] }
}
{
for(key in ignoreVals){
if(index($0,key)){ next }
}
}
1' Input_file

Related

How to remove directories in a list from two lists?

I am writing a c-shell script, where I am grepping two different directories in two strings. I wanted to remove the name of the directories which are the same. I only want the unique directory among the two by leaving out the duplicate ones. I am little confused about how to do this.
There are some common directories present in sta_views and pnr_views strings. I am grepped both of them using ls -l command as seen above stored them. Now what I want to do is "first we can loop over sta_view and check if that exist in the pnr_view list and if no , then put them in a separate list , if yes , then do not do anything" Hope this helps to understand you my question. Thank You!
Please let me know the approach to do it,
set pnr_views = `(ls -l pnr/rep/signoff_pnr/ | grep '^d' | awk '{print $9}'\n)`
set sta_views = `(ls -l sta/rep/ | grep '^d' | grep -v common | grep -v signoff.all.SAVEDESIGN | awk '{print $9}'\n)`
foreach i ($sta_view)
if [$i == $pnr_view] #just want to remove the list pnr_view from sta_view
then
echo ...
else
Here sta_views contains the directories that are present in pnr_views. How shall I remove the pnr_views directories from sta_views?
Loop over the directories in the first directory, and simply check whether the same entry exists in the second.
foreach i ( pnr/rep/signoff_pnr/*/ )
if ( -d sta/rep/"$i:h:t" ) then
switch ("$i:h:t")
case *'common'*:
case *'signoff.all.SAVEDESIGN'*:
breaksw
default:
echo sta/rep/"$i:h:t"
breaksw
endsw
endif
end
The Csh variable modifier $i:h returns the directory name part of the file name in $i and the modifier :t returns the last element from that.
Csh is extremely fickle and thus these days rather unpopular; you might have better luck if you switch to a modern Bourne-compatible shell, especially if you are only just learning. See also Csh considered harmful and the other links on https://www.shlomifish.org/open-source/anti/csh/. Looping over pairs of values in bash shows how to do something similar in Bash.

Grep with as least one matching value and at least one not matching

I have some files, and I want grep to return the lines, where I have at least one string Position:"Engineer" AND at least one string which does have Position not equal to "Engineer"
So in the below file should return only first line:
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Position:"Engineer" Name:"Eva" Position:"Engineer" Name:"Adam"
I could write something like
grep 'Position:"Engineer"' filename | grep 'Position:"Accountant"'
And this works fine (I get only first line), but the thing is I don't know what are all of the possible values in Position, so the grep needs to be generic something like
grep 'Position:"Engineer"' filename | grep -v 'Position:"Engineer"'
But this doesn't return anything (as both grep contradict each other)
Do you have any idea how this can be done?
This line works :
grep "^Position:\"Engineer\"" filename | grep -v " Position:\"Engineer\""
The first expresion with "$" catch only the Position at the begining of line, the second expression with " " space remove the second "Postion" expression.
You can avoid the pipe and additional subshell by using awk if that is allowed, e.g.
awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Above just checks if the first field contains Engineer and if so checks if field 3 also contains Engineer, and if so skips the record, if not prints it. The second rule, just swaps the order of the tests. The result of the tests is that Engineer can only appear in one of the fields (either first or third, but not both)
Example Use/Output
With your sample input in file, you would have:
$ awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Use negative lookahead to exclude a pattern after match.
grep 'Position:"Engineer"' | grep -P 'Position:"(?!Engineer)'
With two greps in a pipe:
grep -F 'Position:"Engineer"' file | grep -Ev '(Position:"[^"]*").*\1'
or, perhaps more robustly
grep -F 'Position:"Engineer"' file | grep -v 'Position:"Engineer".*Position:"Engineer"'
In general case, if you want to print the lines with unique Position fields,
grep -Ev '(Position:"[^"]*").*\1' file
should do the job, assuming all the lines have the format specified. This will work also when there are more than two Position fields in the line.

How to extract certain part of line that's between quotes

For example if I have file.txt with the following
object = {
'name' : 'namestring',
'type' : 'type',
'real' : 'yes',
'version' : '2.0',
}
and I want to extract just the version so the output is 2.0 how would I go about doing this?
I would suggest that grep is probably the wrong tool for this. Nevertheless, it is possible, using grep twice.
grep 'version' input.txt | grep -Eo '[0-9.]+'
The first grep isolates the line you're interested in, and the second one prints only the characters of the line that match the regex, in this case numbers and periods. For your input data, this should work.
However, this solution is weak in a few areas. It doesn't handle cases where multiple version lines exist, it's hugely dependent on the structure of the file (i.e. I suspect your file would be syntactically valid if all the lines were joined into a single long line). It also uses a pipe, and in general, if there's a way to achieve something with a pipe, and a way without a pipe, you choose the latter.
One compromise might be to use awk, assuming you're always going to have things split by line:
awk '/version/ { gsub(/[^0-9.]/,"",$NF); print $NF; }' input.txt
This is pretty much identical in functionality to the dual grep solution above.
If you wanted to process multiple variables within that section of file, you might do something like the following with awk:
BEGIN {
FS=":";
}
/{/ {
inside=1;
next;
}
/}/ {
inside=0;
print a["version"];
# do things with other variables too
#for(i in a) { printf("i=%s / a=%s\n", i, a[i]); } # for example
delete a;
}
inside {
sub(/^ *'/,"",$1); sub(/' *$/,"",$1); # strip whitespace and quotes
sub(/^ *'/,"",$2); sub(/',$/,"",$2); # strip whitespace and quotes
a[$1]=$2;
}
A better solution would be to use a tool that actually understands the file format you're using.
A simple and clean solution using grep and cut
grep version file.txt | cut -d \' -f4

Recursively grep results and pipe back

I need to find some matching conditions from a file and recursively find the next conditions in previously matched files , i have something like this
input.txt
123
22
33
The files where you need to find above terms in following files, the challenge is if 123 is found in say 10 files , the 22 should be searched in these 10 files only and so on...
Example of files are like f1,f2,f3,f4.....f1200
so it is like i need to grep -w "123" f* | grep -w "123" | .....
its not possible to list them manually so any easier way?
You can solve this using awk script, i ve encountered a similar problem and this will work fine
awk '{ if(!NR){printf("grep -w %d f*|",$1)} else {printf("grep -w %d f*",$1)} }' input.txt | sh
What it Does?
it reads input.txt line by line
until it is at last record , it prints grep -w %d | (note there is a
pipe here)
which is then sent to shell for execution and results are piped back
to back
and when you reach the end the pipe is avoided
Perhaps taking a meta-programming viewpoint would help. Have grep output a series of grep commands. Or write a little PERL program. Maybe Ruby, if the mood suits.
You can use grep -lw to write the list of file names that matched (note that it will stop after finding the first match).
You capture the list of file names and use that for the next iteration in a loop.

grep: one pattern works but not the other

I have a teb-delimited file that has gene names in one column and expression values for these genes in the other. I want to delete certain genes from this file using grep. So, this:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
"42266" "snoMBII-202" "0"
"42267" "snoMBII-202" "0"
"42268" "snoMe28S-Am2634" "0"
"42269" "snoMe28S-Am2634" "0"
"42270" "snoR26" "0"
"42271" "SNORA1" "0"
"42272" "SNORA1" "0"
becomes this:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
I've used the following command that i've put together with my limited terminal knowledge:
grep -iv sno* <input.text> | grep -iv rp* | grep -iv U6* | grep -iv 7SK* > <output.txt>
So with this command, my output file lacks genes that start with sno, u6 and 7sk but somehow grep has deleted all the genes that has "r" in them instead of the ones that start with "rp". I'm very confused about this. Any ideas why sno* works but rp* not?
Thanks!
The grep command uses regular expressions, not globbing patterns.
The pattern rp* means "'r' followed by zero or more 'p'". What you really want is rp.*, or even better, "rp.* (or even just "rp, there's no point in trying to grep for anything after the "rp" after all). Likewise, sno* means "'sn' followed by zero or more 'o'". Again, you'd want sno.* or "sno.* (or even just "sno).
Although this doesn't directly answer your question, there is one thing in your sample command line that you may want to be careful with: Whenever you use a special shell metacharacter (like "*"), you need to escape or quote it. So your command line should look more like:
grep -iv 'sno*' <input.text> | grep -iv 'rp*' | grep -iv 'U6*' | grep -iv '7SK*' > <output.txt>
Often, shells are smart, and if no files match the glob, they will use the text as-is (so if you enter "foo*" but there are no filenames starting with "foo", then the string "foo*" will be passed to the command).
grep -iEv "sno|rp|U6|7SK" yourInput
test:
kent$ cat b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
"42266" "snoMBII-202" "0"
"42267" "snoMBII-202" "0"
"42268" "snoMe28S-Am2634" "0"
"42269" "snoMe28S-Am2634" "0"
"42270" "snoR26" "0"
"42271" "SNORA1" "0"
"42272" "SNORA1" "0"
kent$ grep -iEv "sno|rp|U6|7SK" b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"

Resources