Double/half inverted egrep - grep

I'd like to show all lines except those containing foo, unless they also contain bar. Logically !(foo and (!bar)) === (!foo) or bar, so I can use two separate expressions. Can I do this sort of match with a single grep or egrep? -v doesn't work, since it negates both expressions, and I probably can't use Perl regex.
The following works, but it would be much less work to convert the code if it could be done in egrep:
$ echo '
foo
bar
moofoo
foobar
barbar' | grep -Pv '^((?!bar).)*foo((?!bar).)*$'
bar
foobar
barbar
The issue at hand is speed (looking for patterns in gigabytes of data).

If using awk is fine then following gives desired output
awk 'BEGIN {FS=" "};
{
if ($0 ~ /(foo)/)
{
if ($0 ~ /(bar)/)
{
print $0
}
}
else
{
print $0
}
}' FileContainingText.txt
since this works per line and no pipes are involved this should be fast.

Related

Grep with as least one matching value and at least one not matching

I have some files, and I want grep to return the lines, where I have at least one string Position:"Engineer" AND at least one string which does have Position not equal to "Engineer"
So in the below file should return only first line:
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Position:"Engineer" Name:"Eva" Position:"Engineer" Name:"Adam"
I could write something like
grep 'Position:"Engineer"' filename | grep 'Position:"Accountant"'
And this works fine (I get only first line), but the thing is I don't know what are all of the possible values in Position, so the grep needs to be generic something like
grep 'Position:"Engineer"' filename | grep -v 'Position:"Engineer"'
But this doesn't return anything (as both grep contradict each other)
Do you have any idea how this can be done?
This line works :
grep "^Position:\"Engineer\"" filename | grep -v " Position:\"Engineer\""
The first expresion with "$" catch only the Position at the begining of line, the second expression with " " space remove the second "Postion" expression.
You can avoid the pipe and additional subshell by using awk if that is allowed, e.g.
awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Above just checks if the first field contains Engineer and if so checks if field 3 also contains Engineer, and if so skips the record, if not prints it. The second rule, just swaps the order of the tests. The result of the tests is that Engineer can only appear in one of the fields (either first or third, but not both)
Example Use/Output
With your sample input in file, you would have:
$ awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Use negative lookahead to exclude a pattern after match.
grep 'Position:"Engineer"' | grep -P 'Position:"(?!Engineer)'
With two greps in a pipe:
grep -F 'Position:"Engineer"' file | grep -Ev '(Position:"[^"]*").*\1'
or, perhaps more robustly
grep -F 'Position:"Engineer"' file | grep -v 'Position:"Engineer".*Position:"Engineer"'
In general case, if you want to print the lines with unique Position fields,
grep -Ev '(Position:"[^"]*").*\1' file
should do the job, assuming all the lines have the format specified. This will work also when there are more than two Position fields in the line.

output lines without multiple patterns

I used cat file | grep -v "pat1" | grep -v "pat2" | ... | grep -v "patN" to drop lines with any of a group of patterns. It looks awkward. Is there a better (concise) way to do that?
In case you are ok with awk you could try following. Created a variable named valIgnore which has all the values to be ignored, you can mention all values by comma separated fashion and have them in it. By doing this you can give N number of keywords in a single shot only in a variable itself. Moreover you can also create a shell variable which has all values to be ignored(in lines) make sure its comma separated and pass it to this awk program here. Since no samples are given so didn't test it but should work though.
awk -v valIgnore="pat1,pat2,pat3,pat4,pat5,pat6,pat7,pat8,pat9" '
BEGIN{
num=split(valIgnore,arr,",")
for(i=1;i<=num;i++){ ignoreVals[arr[i]] }
}
{
for(key in ignoreVals){
if(index($0,key)){ next }
}
}
1' Input_file

How to match several patterns, but each only once

I know that if I have a file of patterns I can use
grep -f pat_file search_file
to search the file normally. How would you approach performing this task so that the command looks for each pattern only once?
I'm looking for efficiency, so it might be that simply writing a python program is the most efficient way to do it, but I bet there's something out there.
I would do this in awk:
FNR == NR { pattern[NR] = $0; next }
{
for (i in pattern) {
if ($0 ~ pattern[i]) {
print
delete pattern[i]
continue
}
}
}
To be called as follows:
awk -f script.awk patterns infile
where patterns contains your patterns and infile is the file you want to search.
The first command reads the patterns into an array; the second command (only executed for files after the first file) loops over the patterns, prints matching lines, deletes the pattern from the array and skips the rest of the patterns.
For an example input of
line with pattern1
another line with pattern1
line with pattern2
pattern1 again
pattern3 now
and pattern2
and a pattern file
pattern1
pattern2
pattern3
the output is
$ awk -f script.awk patterns infile
line with pattern1
line with pattern2
pattern3 now
To optimize, you could add a check after the delete statement to see if there are any patterns left and exit if not.
This MAY be what you're looking for:
awk '
NR==FNR { regexps[$0]; next }
{
found = 0
for (regexp in regexps) {
if ($0 ~ regexp) {
found = 1
delete regexps[regexp]
}
}
}
found
' pat_file search_file
but since you haven't provided any testable sample input and expected output it's just an untested guess.
By the way - never use the word "pattern" to describe what type of matching you want as it's ambiguous, use "string" or "regexp", whichever you really mean.

How to extract certain part of line that's between quotes

For example if I have file.txt with the following
object = {
'name' : 'namestring',
'type' : 'type',
'real' : 'yes',
'version' : '2.0',
}
and I want to extract just the version so the output is 2.0 how would I go about doing this?
I would suggest that grep is probably the wrong tool for this. Nevertheless, it is possible, using grep twice.
grep 'version' input.txt | grep -Eo '[0-9.]+'
The first grep isolates the line you're interested in, and the second one prints only the characters of the line that match the regex, in this case numbers and periods. For your input data, this should work.
However, this solution is weak in a few areas. It doesn't handle cases where multiple version lines exist, it's hugely dependent on the structure of the file (i.e. I suspect your file would be syntactically valid if all the lines were joined into a single long line). It also uses a pipe, and in general, if there's a way to achieve something with a pipe, and a way without a pipe, you choose the latter.
One compromise might be to use awk, assuming you're always going to have things split by line:
awk '/version/ { gsub(/[^0-9.]/,"",$NF); print $NF; }' input.txt
This is pretty much identical in functionality to the dual grep solution above.
If you wanted to process multiple variables within that section of file, you might do something like the following with awk:
BEGIN {
FS=":";
}
/{/ {
inside=1;
next;
}
/}/ {
inside=0;
print a["version"];
# do things with other variables too
#for(i in a) { printf("i=%s / a=%s\n", i, a[i]); } # for example
delete a;
}
inside {
sub(/^ *'/,"",$1); sub(/' *$/,"",$1); # strip whitespace and quotes
sub(/^ *'/,"",$2); sub(/',$/,"",$2); # strip whitespace and quotes
a[$1]=$2;
}
A better solution would be to use a tool that actually understands the file format you're using.
A simple and clean solution using grep and cut
grep version file.txt | cut -d \' -f4

Searching tabs with grep

I have a file that might contain a line like this.
A B //Seperated by a tab
I wanna return true to terminal if the line is found, false if the value isn't found.
when I do
grep 'A' 'file.tsv', It returns to row (not true / false)
but
grep 'A \t B' "File.tsv"
or
grep 'A \\t B' "File.tsv"
or
grep 'A\tB'
or
grep 'A<TAB>B' //pressing tab button
doesn't return anything.
How do I search tab seperated values with grep.
How do I return a boolean value with grep.
Use a literal Tab character, not the \t escape. (You may need to press Ctrl+V first.) Also, grep is not Perl 6 (or Perl 5 with the /x modifier); spaces are significant and will be matched literally, so even if \t worked A \t B with the extra spaces around the \t would not unless the spaces were actually there in the original.
As for the return value, know that you get three different kinds of responses from a program: standard output, standard error, and exit code. The latter is 0 for success and non-0 for some error (for most programs that do matching, 1 means not found and 2 and up mean some kind of usage error). In traditional Unix you redirect the output from grep if you only want the exit code; with GNU grep you could use the -q option instead, but be aware that that is not portable. Both traditional and GNU grep allow -s to suppress standard error, but there are some differences in how the two handle it; most portable is grep PATTERN FILE >/dev/null 2>&1.
Two methods:
use the -P option:
grep -P 'A\tB' "File.tsv"
enter ctrl+v first and enter tab
grep 'A B' "File.tsv"
Here's a handy way to create a variable with a literal tab as its value:
TAB=`echo -e "\t"`
Then, you can use it as follows:
grep "A${TAB}B" File.tsv
This way, there's no literal tab required. Note that with this approach, you'll need to use double quotes (not single quotes) around the pattern string, otherwise the variable reference won't be replaced.

Resources