Escaping < and > with grep - grep

I'm loosing something here. If I have a file with contents:
foo
<foo>
And I do grep "\<foo\>" file_name shouldn't it match only the second line? I'm also matching the first.
I'm not very good with grep so I'm probably messing things up.

Escaping them activates their meta-character properties and turns them into word boundaries in GNU grep:
$ grep 'foo' file
foo
<foo>
foobar
$ grep '\<foo\>' file
foo
<foo>
The 2nd grep above isn't looking for the string <foo>, it's looking for the string foo NOT preceded or succeeded immediately by word-constituent characters.
In general it's not safe to escape characters without knowing exactly what it means to do so. Here's another example:
$ printf 'aa\na{2}b\n'
aa
a{2}b
$ printf 'aa\na{2}b\n' | grep 'a{2}'
a{2}b
$ printf 'aa\na{2}b\n' | grep 'a\{2\}'
aa
The above \{..\} is activating their meta character properites as regexp interval delimiters.

Related

Regex for line containing one or more spaces or dashes

I got .txt file with city names, each in separate line. Some of them are few words with one or multiple spaces or words connected with '-'. I need to create bash command which will echo those lines out. Currently I'm using cat piped with grep but I can't get both spaces and dash into one search and I had problems with checking for multiple spaces.
print lines with dash:
cat file.txt | grep ".*-.*"
print lines with spaces:
cat file.txt | grep ".*\s.*"
tho when I try to do:
cat file.txt | grep ".*\s+.*"
I get nothing.
Thanks for help
Something like that should work:
grep -E -- ' |\-' file.txt
Explanation:
-E: to interpret patterns as extended regular expressions
--: to signify the end of command options
' |\-': the line contains either a space or a dash
This does not directly address your question, but is too much to put in a comment.
You don't need the .* in your patterns. .* at the beginning or end of a pattern is useless, because it means "0 or more of any character" and so will always match.
These lines are all identical:
cat file.txt | grep ".*-.*"
cat file.txt | grep "-.*"
cat file.txt | grep "-"
Plus you don't need to cat and pipe:
grep "-" file.txt
When grep pattern matches, the default action is to print the whole line, so .* in all your patterns are redundant, you may delete them. Also, you don't have to use cat file | as you may specify the file to grep directly after pattern, i.e. grep 'pattern' file.txt.
Here are some more details:
grep ".*-.*" = grep -- "-" - returns any lines having a - char (-- singals the end of options, the next thing is the pattern)
grep ".*\s.*" = grep "\s" - matches and returns lines containing a whitespace char (only GNU grep)
grep ".*\s+.*" = grep "\s+" - returns line containing a whitespace followed with a literal + char (since you are using POSIX BRE regex here the unescaped + matches a literal plus symbol).
You want
grep "[[:space:]-]" file.txt
See the online demo:
#!/bin/bash
s='abc - def
ghi
jkl mno'
grep '[[:space:]-]' <<< "$s"
Output:
abc - def
jkl mno
The [[:space:]-] POSIX BRE and ERE (enabled with -E option) compliant pattern matches either any whitespace (with the [:space:] POSIX character class) or a hyphen.
Note that [\s-] won't work since \s inside a bracket expression is not treated as a regex escape sequence but as a mere \ or s.

How to include optional space in Grep statement

I have some log files that I'm grepping through which contain entries in the following form
foo($abc) - sometext
foo ($xyz) - moretext
baz($qux) - moartext
I'm looking to use grep that would output the first two lines as matches, i.e.
foo($abc)
foo ($xyz)
I've tried the following grep statement
grep 'foo(\$' log.txt
which outputs the first match, but I tried to include an optional space, and neither return:
grep 'foo\s?(\$' log.txt
I'm using the optional space incorrectly, but I'm unsure how
You are using a POSIX BRE regex and foo\s?(\$ matches foo, a whitespace, a literal ?, a literal ( and a literal $.
You can use
grep -E 'foo\s?\(\$' log.txt
Here, -E makes the pattern POSIX ERE, and thus it now matches foo, then an optional whitespace, and a ($ substring.
See an online demo:
s='foo($abc) - sometext
foo ($xyz) - moretext
baz($qux) - moartext'
grep -E 'foo\s?\(\$' <<< "$s"
Output:
foo($abc) - sometext
foo ($xyz) - moretext
You may still use a more universal syntax like
grep 'foo[[:space:]]\{0,1\}(\$' log.txt
It is a POSIX BRE regex matching foo, one or zero whitespaces, and then ($ substring.
You can either change the query slightly and use * instead of ?:
grep 'foo *(\$' log.txt
or use a literal whitespace and escape ?:
grep 'foo \?(\$' log.txt
Both solutions would work with GNU, busybox and FreeBSD grep.

grep exact match in colon delimited string

I am trying to extract the version from a colon delimited list. The value I want is for foo, however there is another value in the list called foo-bar causing both values to return. This is what I am doing:
LIST="foo:1.0.0
foo-bar:1.0.1"
VERSION=$(echo "${LIST}" | grep "\bfoo\b" | cut -s -d':' -f2)
echo -e "VERSION: ${VERSION}"
Output:
VERSION: 1.0.0
1.0.1
NOTE: Sometimes LIST will look like the following, which should result in version being empty (this is expected).
LIST="foo
foo-bar:1.0.1"
You may use a PCRE regex enabled with -P option and use a (?!-) negative lookahead that will fail the match in case there is a - after a whole word foo:
grep -P "\bfoo\b(?!-)"
See online demo
This regex should extract any number and optional dots at the end of each line. If the line ends with a colon, then it won't match.
grep -oE '(([[:digit:]]+[.]*)+)$

Escape puzzle: why does grep ignore escaping of single quote?

(other quote/grep questions are about bash interpretation, this is not)
Apparently grep handles escaped single quotes differently than other escaped regex characters, but I don't understand why.
$ grep --version
grep (GNU grep) 2.25
$ cat data
a']
b']
c']
d]
e\']
$ cat patterns
a']
b\'\]
c'\]
d\']
e\']
$ grep -Ef patterns data
a']
c']
Because c is matched but b isn't, apparently grep does not interpret an escaped single quote \'as a single quote. But as what then?
d isn't matched, so it is not ignored.
e isn't matched, so it is not taken literally
TIA for solving this x-mas mystery! PS. Yes in this case I can use -F for literal matching, but my application requires regex.
\' in GNU tools means "end of string". See http://www.regular-expressions.info/gnu.html:
Additional GNU Extensions
....
The anchor \` (backtick) matches at the very start of the subject string,
while \' (single quote) matches at the very end.
Don't ask me why they introduced that as it seems to be exactly the same as $.

How to escape parenthesis in grep

I want to grep for a function call 'init()' in all JavaScript files in a directory. How do I do this using grep?
Particularly, how do I escape parenthesis, ()?
It depends. If you use regular grep, you don't escape:
echo '(foo)' | grep '(fo*)'
You actually have to escape if you want to use the parentheses as grouping.
If you use extended regular expressions, you do escape:
echo '(foo)' | grep -E '\(fo*\)'
If you want to search for exactly the string "init()" then use fgrep "init()" or grep -F "init()".
Both of these will do fixed string matching, i.e. will treat the pattern as a plain string to search for and not as a regex. I believe it is also faster than doing a regex search.
$ echo "init()" | grep -Erin 'init\([^)]*\)'
1:init()
$ echo "init(test)" | grep -Erin 'init\([^)]*\)'
1:init(test)
$ echo "initwhat" | grep -Erin 'init\([^)]*\)'
Move to your root directory (if you are aware where the JavaScript files are). Then do the following.
grep 'init()' *.js

Resources