grep or sed or awk + match WORD - grep

I do the following in order to get all WORD in file but not in lines that start with "//"
grep -v "//" file | grep WORD
Can I get some other elegant suggestion to find all occurrences of WORD in the file except lines that begin with //?
Remark: "//" does not necessarily exist at the beginning of the line; there could be some spaces before "//".
For example
// WORD
AA WORD
// ss WORD

grep -v "//" file | grep WORD
This will also exclude any lines with "//" after WORD, such as:
WORD // This line here
A better approach with GNU Grep would be:
grep -v '^[[:space:]]*//' file | grep 'WORD'
...which would first filter out any lines beginning with zero-or-more spaces and a comment string.
Trying to put these two conditions into a single regular expression is probably not more elegant.

awk '!/^[ \t]*\/\// && /WORD/{m=gsub("WORD","");total+=m}END{print total}' file

Related

extract the adjacent character of selected letter

I have this text file:
# cat letter.txt
this
is
just
a
test
to
check
if
grep
works
The letter "e" appear in 3 words.
# grep e letter.txt
test
check
grep
Is there any way to return the letter printed on left of the selected character?
expected.txt
t
h
r
With shown samples in awk, could you please try following.
awk '/e/{print substr($0,index($0,"e")-1,1)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/e/{ ##Looking if current line has e in it then do following.
print substr($0,index($0,"e")-1,1)
##Printing sub string from starting value of index e-1 and print 1 character from there.
}
' Input_file ##Mentioning Input_file name here.
You can use positive lookahead to match a character that is followed by an e, without making the e part of the match.
cat letter.txt | grep -oP '.(?=e)'
With sed:
sed -nE 's/.*(.)e.*/\1/p' letter.txt
Assuming you have this input file:
cat file
this
is
just
a
test
to
check
if
grep
works
egg
element
You may use this grep + sed solution to find letter or empty string before e:
grep -oE '(^|.)e' file | sed 's/.$//'
t
h
r
l
m
Or alternatively this single awk command should also work:
awk -F 'e' 'NF > 1 {
for (i=1; i<NF; i++) print substr($i, length($i), 1)
}' file
This might work for you (GNU sed):
sed -nE '/(.)e/{s//\n\1\n/;s/^[^\n]*\n//;P;D}' file
Turn off implicit printing and enable extended regexp -nE.
Focus only on lines that meet the requirements i.e. contain a character before e.
Surround the required character by newlines.
Remove any characters before and including the first newline.
Print the first line (up to the second newline).
Delete the first line (including the newline).
Repeat.
N.B. The solution will print each such character on a separate line.
To print all such characters on their own line, use:
sed -nE '/(.e)/{s//\n\1/g;s/^/e/;s/e[^\n]*\n?//g;s/\B/ /g;p}' file
N.B. Remove the s/\B /g if space separation is not needed.
With GNU awk you can use empty string as FS to split the input as individual characters:
awk -v FS= '/[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file
t
h
r
Excluding "e" at the beginning in the for loop.
edited
empty string if e is the first character in the word.
For example, this input:
cat file2
grep
erroneously
egg
Wednesday
effectively
awk -v FS= '/^[e]/ {print ""} /[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file2
r
n
W
n
f
v

Grep with as least one matching value and at least one not matching

I have some files, and I want grep to return the lines, where I have at least one string Position:"Engineer" AND at least one string which does have Position not equal to "Engineer"
So in the below file should return only first line:
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Position:"Engineer" Name:"Eva" Position:"Engineer" Name:"Adam"
I could write something like
grep 'Position:"Engineer"' filename | grep 'Position:"Accountant"'
And this works fine (I get only first line), but the thing is I don't know what are all of the possible values in Position, so the grep needs to be generic something like
grep 'Position:"Engineer"' filename | grep -v 'Position:"Engineer"'
But this doesn't return anything (as both grep contradict each other)
Do you have any idea how this can be done?
This line works :
grep "^Position:\"Engineer\"" filename | grep -v " Position:\"Engineer\""
The first expresion with "$" catch only the Position at the begining of line, the second expression with " " space remove the second "Postion" expression.
You can avoid the pipe and additional subshell by using awk if that is allowed, e.g.
awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Above just checks if the first field contains Engineer and if so checks if field 3 also contains Engineer, and if so skips the record, if not prints it. The second rule, just swaps the order of the tests. The result of the tests is that Engineer can only appear in one of the fields (either first or third, but not both)
Example Use/Output
With your sample input in file, you would have:
$ awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Use negative lookahead to exclude a pattern after match.
grep 'Position:"Engineer"' | grep -P 'Position:"(?!Engineer)'
With two greps in a pipe:
grep -F 'Position:"Engineer"' file | grep -Ev '(Position:"[^"]*").*\1'
or, perhaps more robustly
grep -F 'Position:"Engineer"' file | grep -v 'Position:"Engineer".*Position:"Engineer"'
In general case, if you want to print the lines with unique Position fields,
grep -Ev '(Position:"[^"]*").*\1' file
should do the job, assuming all the lines have the format specified. This will work also when there are more than two Position fields in the line.

Grep match only before ":"

Hello How can I grep only match before : mark?
If I run grep test1 file, it shows all three lines.
test1:x:29688:test1,test2
test2:x:22611:test1
test3:x:25163:test1,test3
But I would like to get an output test1:x:29688:test1,test2
I would appreciate any advice.
If the desired lines always start with test1 then you can do:
grep '^test1' file
If it's always followed by : but not the other (potential) matches then you can include it as part of the pattern:
grep 'test1:' file
As your data is in row, columns delimited by a character, you may consider awk:
awk -F: '$1 == "test1"' file
I think that you just need to add “:” after “test1”, see an example:
grep “test1:” file

is there a sophisticated way to grep this file

I have one file. Written in BNF it could be
<line>:== ((<ISBN10>|<ISBN13>)([a-Z/0-9]*)) {1,4})
For example
123456789X/abscd/1234567890123/djfkldsfjj
How can I grep the ISBN10 or ISBN13 ONLY one per line even when in the line are more ISBNs. If there are more ISBNs in the line it should take only the first in line.
When I grep that way
grep -Po "[0-9]{9,13}X{0,1}" file
then I get more lines than the file originally has. (As there could be max 4 ISBNs in line)
I would also need the linecount of file should be the linecount of the grepresult.
Any advices?
Well, assuming the other answer offered isn't correct in assuming that the 'first' ISBN isn't at the start of line, you could always try in perl.
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
chomp;
my ( $first_isbn, #rest ) = m/(\d{9,13}X{0,1})/g;
print $., ":", $first_isbn, "\n" if $first_isbn;
}
$. is the line number in perl, and so we print that and the match if there's a match. <> says read and iterate either filenames or STDIN much like grep does. So you could invoke this in a similar way to grep:
perl myscript.pl <filename>
Or:
cat <filename> | ./myscript.pl
This would one-liner-ify as:
perl -lne 'my ( $first_isbn ) = m/(\d{9,13}X{0,1})/g; print $., ":", $first_isbn, "\n" if $first_isbn;'
One trivial solution is to include the beginning of the line in your regex:
grep -Po "^[0-9]{9,13}X{0,1}" file
This ensures that matches after the first do not satisfy the regex. It does seem from your BNF that the ISBNs, if present, are guaranteed to be the first characters of the line.
Another way is to use sed:
sed -n "s/\([0-9]\{9,13\}X\).*/\1/p" file
This matches your pattern along with the rest of the line, but only prints your pattern. You could then use another utility to add line numbers. E.g. pipe your output to nl -nrz -w9.

Grep: First word in line that begins with ? and ends with?

I'm trying to do a grep command that finds all lines in a file whos first word begins "as" and whos first word also ends with "ng"
How would I go about doing this using grep?
This should just about do it:
$ grep '^as\w*ng\b' file
Regexplanation:
^ # Matches start of the line
as # Matches literal string as
\w # Matches characters in word class
* # Quantifies \w to match either zero or more
ng # Matches literal string ng
\b # Matches word boundary
May have missed the odd corner case.
If you only want to print the words that match and not the whole lines then use the -o option:
$ grep -o '^as\w*ng\b' file
Read man grep for all information on the available options.
I am pretty sure this should work:
grep "^as[a-zA-Z]*ng\b" <filename>
hard to say without seeing samples from the actual input file.
sudo has already covered it well, but I wanted to throw out one more simple one:
grep -i '^as[^ ]*ng\b' <file>
-i to make grep case-insensitive
[^ ]* matches zero or more of any character, except a space
^ finds the 'first character in a line', so you can search for that with:
grep '^as' [file]
\w matches a word character, so \w* would match any number of word characters:
grep '^as\w*' [file]
\b means 'a boundary between a word and whitespace' which you can use to ensure that you're matching the 'ng' letters at the end of the word, instead of just somewhere in the middle:
grep '^as\w*ng\b' [file]
If you choose to omit the [file], simply pipe your files into it:
cat [file] | grep '^as\w*ng\b'
or
echo [some text here] | grep '^as\w*ng\b'
Is that what you're looking for?

Resources