Regex for line containing one or more spaces or dashes - grep

I got .txt file with city names, each in separate line. Some of them are few words with one or multiple spaces or words connected with '-'. I need to create bash command which will echo those lines out. Currently I'm using cat piped with grep but I can't get both spaces and dash into one search and I had problems with checking for multiple spaces.
print lines with dash:
cat file.txt | grep ".*-.*"
print lines with spaces:
cat file.txt | grep ".*\s.*"
tho when I try to do:
cat file.txt | grep ".*\s+.*"
I get nothing.
Thanks for help

Something like that should work:
grep -E -- ' |\-' file.txt
Explanation:
-E: to interpret patterns as extended regular expressions
--: to signify the end of command options
' |\-': the line contains either a space or a dash

This does not directly address your question, but is too much to put in a comment.
You don't need the .* in your patterns. .* at the beginning or end of a pattern is useless, because it means "0 or more of any character" and so will always match.
These lines are all identical:
cat file.txt | grep ".*-.*"
cat file.txt | grep "-.*"
cat file.txt | grep "-"
Plus you don't need to cat and pipe:
grep "-" file.txt

When grep pattern matches, the default action is to print the whole line, so .* in all your patterns are redundant, you may delete them. Also, you don't have to use cat file | as you may specify the file to grep directly after pattern, i.e. grep 'pattern' file.txt.
Here are some more details:
grep ".*-.*" = grep -- "-" - returns any lines having a - char (-- singals the end of options, the next thing is the pattern)
grep ".*\s.*" = grep "\s" - matches and returns lines containing a whitespace char (only GNU grep)
grep ".*\s+.*" = grep "\s+" - returns line containing a whitespace followed with a literal + char (since you are using POSIX BRE regex here the unescaped + matches a literal plus symbol).
You want
grep "[[:space:]-]" file.txt
See the online demo:
#!/bin/bash
s='abc - def
ghi
jkl mno'
grep '[[:space:]-]' <<< "$s"
Output:
abc - def
jkl mno
The [[:space:]-] POSIX BRE and ERE (enabled with -E option) compliant pattern matches either any whitespace (with the [:space:] POSIX character class) or a hyphen.
Note that [\s-] won't work since \s inside a bracket expression is not treated as a regex escape sequence but as a mere \ or s.

Related

How to include optional space in Grep statement

I have some log files that I'm grepping through which contain entries in the following form
foo($abc) - sometext
foo ($xyz) - moretext
baz($qux) - moartext
I'm looking to use grep that would output the first two lines as matches, i.e.
foo($abc)
foo ($xyz)
I've tried the following grep statement
grep 'foo(\$' log.txt
which outputs the first match, but I tried to include an optional space, and neither return:
grep 'foo\s?(\$' log.txt
I'm using the optional space incorrectly, but I'm unsure how
You are using a POSIX BRE regex and foo\s?(\$ matches foo, a whitespace, a literal ?, a literal ( and a literal $.
You can use
grep -E 'foo\s?\(\$' log.txt
Here, -E makes the pattern POSIX ERE, and thus it now matches foo, then an optional whitespace, and a ($ substring.
See an online demo:
s='foo($abc) - sometext
foo ($xyz) - moretext
baz($qux) - moartext'
grep -E 'foo\s?\(\$' <<< "$s"
Output:
foo($abc) - sometext
foo ($xyz) - moretext
You may still use a more universal syntax like
grep 'foo[[:space:]]\{0,1\}(\$' log.txt
It is a POSIX BRE regex matching foo, one or zero whitespaces, and then ($ substring.
You can either change the query slightly and use * instead of ?:
grep 'foo *(\$' log.txt
or use a literal whitespace and escape ?:
grep 'foo \?(\$' log.txt
Both solutions would work with GNU, busybox and FreeBSD grep.

How to make "grep" output complete word that includes the match?

I would like grep to print out all complete words that include the match.
Google did not help me. Here what I tried:
cat file.txt
21676 Mm.24685 NM_009346 ENSMUSG00000055320
20349 Mm.134093 NM_011348 ENSMUSG00000063531
12456 Mm.134000 NM_011228 GM415666
grep -o "ENSMUS" file.txt
ENSMUS
ENSMUS
Desired output:
ENSMUSG00000055320
ENSMUSG00000063531
Thanks for your help!
You may use:
grep -wo "ENSMUS[^[:blank:]]*" file.txt
ENSMUSG00000055320
ENSMUSG00000063531
Here [^[:blank:]]* will match 0 or more characters that are not whitespaces. -w will ensure full word matches.
To extract ENSEMBL mouse accession numbers without the version number:
grep -Po 'ENSMUS\w+' in_file
With the version number:
grep -Po 'ENSMUS\S+' in_file
Here,
\w+ : 1 or more word characters ([A-Za-z0-9_]).
\S+ : 1 or more non-whitespace characters (you can also be more restrictive and use [\w.]+, which is 1 or more word character or literal dot).
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
grep manual
perlre - Perl regular expressions

Escaping < and > with grep

I'm loosing something here. If I have a file with contents:
foo
<foo>
And I do grep "\<foo\>" file_name shouldn't it match only the second line? I'm also matching the first.
I'm not very good with grep so I'm probably messing things up.
Escaping them activates their meta-character properties and turns them into word boundaries in GNU grep:
$ grep 'foo' file
foo
<foo>
foobar
$ grep '\<foo\>' file
foo
<foo>
The 2nd grep above isn't looking for the string <foo>, it's looking for the string foo NOT preceded or succeeded immediately by word-constituent characters.
In general it's not safe to escape characters without knowing exactly what it means to do so. Here's another example:
$ printf 'aa\na{2}b\n'
aa
a{2}b
$ printf 'aa\na{2}b\n' | grep 'a{2}'
a{2}b
$ printf 'aa\na{2}b\n' | grep 'a\{2\}'
aa
The above \{..\} is activating their meta character properites as regexp interval delimiters.

How to grep for a 7 digit hexadecimal string and return only that hexadecimal string?

I am trying to extract all the leading 7 digit hexadecimal strings in a file, that contains lines such as:
3fce110:: ..\Utilities\c\misc.c(431): YESFREED (120 bytes) Misc
egrep -o '^[0-9a-f]{7}\b' file.txt
egrep is the same as grep -E; it uses extended regexp.
-o prints only the matching part of each line.
^ anchors the match to the beginning of the line.
[0-9a-f]{7} matches seven hexadecimal characters. If you want to match uppercase letters add A-F here or add the -i flag.
\b checks for a word boundary; it ensures we don't match hex numbers more than 7 digits long.
If all the lines in the file follow the given format then a couple of methods:
$ grep -o '^[^:]*' file
3fce110
$ awk -F: '{print $1}' file
3fce110
$ cut -d: -f1 file
3fce110
$ sed  's/:.*//' file
3fce110

Use grep to report back only line numbers

I have a file that possibly contains bad formatting (in this case, the occurrence of the pattern \\backslash). I would like to use grep to return only the line numbers where this occurs (as in, the match was here, go to line # x and fix it).
However, there doesn't seem to be a way to print the line number (grep -n) and not the match or line itself.
I can use another regex to extract the line numbers, but I want to make sure grep cannot do it by itself. grep -no comes closest, I think, but still displays the match.
try:
grep -n "text to find" file.ext | cut -f1 -d:
If you're open to using AWK:
awk '/textstring/ {print FNR}' textfile
In this case, FNR is the line number. AWK is a great tool when you're looking at grep|cut, or any time you're looking to take grep output and manipulate it.
All of these answers require grep to generate the entire matching lines, then pipe it to another program. If your lines are very long, it might be more efficient to use just sed to output the line numbers:
sed -n '/pattern/=' filename
Bash version
lineno=$(grep -n "pattern" filename)
lineno=${lineno%%:*}
I recommend the answers with sed and awk for just getting the line number, rather than using grep to get the entire matching line and then removing that from the output with cut or another tool. For completeness, you can also use Perl:
perl -nE 'say $. if /pattern/' filename
or Ruby:
ruby -ne 'puts $. if /pattern/' filename
using only grep:
grep -n "text to find" file.ext | grep -Po '^[^:]+'
You're going to want the second field after the colon, not the first.
grep -n "text to find" file.txt | cut -f2 -d:
To count the number of lines matched the pattern:
grep -n "Pattern" in_file.ext | wc -l
To extract matched pattern
sed -n '/pattern/p' file.est
To display line numbers on which pattern was matched
grep -n "pattern" file.ext | cut -f1 -d:

Resources