Grep: First word in line that begins with ? and ends with? - grep

I'm trying to do a grep command that finds all lines in a file whos first word begins "as" and whos first word also ends with "ng"
How would I go about doing this using grep?

This should just about do it:
$ grep '^as\w*ng\b' file
Regexplanation:
^ # Matches start of the line
as # Matches literal string as
\w # Matches characters in word class
* # Quantifies \w to match either zero or more
ng # Matches literal string ng
\b # Matches word boundary
May have missed the odd corner case.
If you only want to print the words that match and not the whole lines then use the -o option:
$ grep -o '^as\w*ng\b' file
Read man grep for all information on the available options.

I am pretty sure this should work:
grep "^as[a-zA-Z]*ng\b" <filename>
hard to say without seeing samples from the actual input file.

sudo has already covered it well, but I wanted to throw out one more simple one:
grep -i '^as[^ ]*ng\b' <file>
-i to make grep case-insensitive
[^ ]* matches zero or more of any character, except a space

^ finds the 'first character in a line', so you can search for that with:
grep '^as' [file]
\w matches a word character, so \w* would match any number of word characters:
grep '^as\w*' [file]
\b means 'a boundary between a word and whitespace' which you can use to ensure that you're matching the 'ng' letters at the end of the word, instead of just somewhere in the middle:
grep '^as\w*ng\b' [file]
If you choose to omit the [file], simply pipe your files into it:
cat [file] | grep '^as\w*ng\b'
or
echo [some text here] | grep '^as\w*ng\b'
Is that what you're looking for?

Related

how to grep a word with only one single capital letter?

The txt file is :
bar
quux
kabe
Ass
sBo
CcdD
FGH
I would like to grep the words with only one capital letter in this example, but when I use "grep [A-Z]", it shows me all words with capital letters.
Could anyone find the "grep" solution here? My expected output is
Ass
sBo
grep '\<[a-z]*[A-Z][a-z]*\>' my.txt
will match lines in the ASCII text file my.txt if they contain at least one word consisting entirely of ASCII letters, exactly one of which is upper case.
You seem to have a text file with each word on its own line.
You may use
grep '^[[:lower:]]*[[:upper:]][[:lower:]]*$' file
See the grep online demo.
The ^ matches the start of string (here, line since grep operates on a line by lin basis by default), then [[:lower:]]* matches 0 or more lowercase letters, then an [[:upper:]] pattern matches any uppercase letter, and then [[:lower:]]* matches 0+ lowercase letters and $ asserts the position at the end of string.
If you need to match a whole line with exactly one uppercase letter you may use
grep '^[^[:upper:]]*[[:upper:]][^[:upper:]]*$' file
The only difference from the pattern above is the [^[:upper:]] bracket expression that matches any char but an uppercase letter. See another grep online demo.
To extract words with a single capital letter inside them you may use word boundaries, as shown in mathguy's answer. With GNU grep, you may also use
grep -o '\b[^[:upper:]]*[[:upper:]][^[:upper:]]*\b' file
grep -o '\b[[:lower:]]*[[:upper:]][[:lower:]]*\b' file
See yet another grep online demo.

Getting only grep exact matches

I am trying to grep a file for the exact occurrence of a match, but I get also longer spurious matches:
grep CAT1717O99 myfile.txt -F -w
Output:
CAT1717O99
CAT1717O99.5
I would like to output only the first exactly matching line. Is there any way to get rid of the second line?
Thanks in advance.
Arturo
This is the file 'myfile.txt':
CAT1717O99
CAT1717O99.5
This will do the work for you.
grep -Fx "CAT1717O99" textfile
-F means Fixed
-x mean exact
Use the power of Perl-compatible regular expression (PCRE) and search the matches to the given pattern:
grep -Po "\bCAT1717O99(\s|$)" myfile.txt
(\s|$) - alternative group, ensures matching substring CAT1717O99 if it's followed by whitespace or placed at the end of the line
-P option, allows regular expressions
-o option, prints only matched parts of matching lines
You'll need explicitly request spaces in order to ignore special chars.
grep -E '(^| )CAT1717O99( |$)' myFile.txt
from grep manual :
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.

How to search for whole words with grep?

If I have the file foo:
read_from_buffer
read_from_buffer_and_file
write_to_buffer
some_other_function
then using
cat foo | grep 'read_from_buffer'
will list 2 lines:
read_from_buffer
read_from_buffer_and_file
But I want only exact matches... How to tell grep that different character must come than character: 0-9a-zA-Z_
Use this:
grep -w 'read_from_buffer' foo
From man grep:
-w, --word-regexp: Select only those lines containing matches that form whole words. The test is that the matching substring must
either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be
either at the end of the line or followed by a non-word constituent character. Word-constituent characters are
letters, digits, and the underscore.
or
-x, --line-regexp: Select only those matches that exactly match the whole line. (-x is specified by POSIX.)

is there a sophisticated way to grep this file

I have one file. Written in BNF it could be
<line>:== ((<ISBN10>|<ISBN13>)([a-Z/0-9]*)) {1,4})
For example
123456789X/abscd/1234567890123/djfkldsfjj
How can I grep the ISBN10 or ISBN13 ONLY one per line even when in the line are more ISBNs. If there are more ISBNs in the line it should take only the first in line.
When I grep that way
grep -Po "[0-9]{9,13}X{0,1}" file
then I get more lines than the file originally has. (As there could be max 4 ISBNs in line)
I would also need the linecount of file should be the linecount of the grepresult.
Any advices?
Well, assuming the other answer offered isn't correct in assuming that the 'first' ISBN isn't at the start of line, you could always try in perl.
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
chomp;
my ( $first_isbn, #rest ) = m/(\d{9,13}X{0,1})/g;
print $., ":", $first_isbn, "\n" if $first_isbn;
}
$. is the line number in perl, and so we print that and the match if there's a match. <> says read and iterate either filenames or STDIN much like grep does. So you could invoke this in a similar way to grep:
perl myscript.pl <filename>
Or:
cat <filename> | ./myscript.pl
This would one-liner-ify as:
perl -lne 'my ( $first_isbn ) = m/(\d{9,13}X{0,1})/g; print $., ":", $first_isbn, "\n" if $first_isbn;'
One trivial solution is to include the beginning of the line in your regex:
grep -Po "^[0-9]{9,13}X{0,1}" file
This ensures that matches after the first do not satisfy the regex. It does seem from your BNF that the ISBNs, if present, are guaranteed to be the first characters of the line.
Another way is to use sed:
sed -n "s/\([0-9]\{9,13\}X\).*/\1/p" file
This matches your pattern along with the rest of the line, but only prints your pattern. You could then use another utility to add line numbers. E.g. pipe your output to nl -nrz -w9.

grep or sed or awk + match WORD

I do the following in order to get all WORD in file but not in lines that start with "//"
grep -v "//" file | grep WORD
Can I get some other elegant suggestion to find all occurrences of WORD in the file except lines that begin with //?
Remark: "//" does not necessarily exist at the beginning of the line; there could be some spaces before "//".
For example
// WORD
AA WORD
// ss WORD
grep -v "//" file | grep WORD
This will also exclude any lines with "//" after WORD, such as:
WORD // This line here
A better approach with GNU Grep would be:
grep -v '^[[:space:]]*//' file | grep 'WORD'
...which would first filter out any lines beginning with zero-or-more spaces and a comment string.
Trying to put these two conditions into a single regular expression is probably not more elegant.
awk '!/^[ \t]*\/\// && /WORD/{m=gsub("WORD","");total+=m}END{print total}' file

Resources