dot in grep command being used as regex - grep

I'm trying to understand if bash is doing something with the string before passing it to grep or if grep uses basic regex searching by default. The man page and other answers don't really clarify
ss -an | grep "8.02"
u_dgr UNCONN 0 0 * 820284002 * 820284001
u_str ESTAB 0 0 * 820283949 * 820287456
It looks like the . is being used in a regex fashion to match a single char. However, I would only expect this to happen when using grep -e or grep -E. If bash was intercepting the string I would expect special shell chars to be intercepted first such as * or ?.
The man entry states I am using GNU grep 3.1

Looks like I have immediately found the answer after RTFMing a little closer
-G, --basic-regexp
Interpret PATTERN as a basic regular expression (BRE, see below). This is the default.
"This is the default" - I assume means this is the default behaviour if no flags are passed?

Related

Use shell variable in grep lookahead in csh

I am trying to utilize a grep lookahead to get a value at the end of a line for a project I'm working on. The main issue I'm having is that I'm not sure how to use a shell variable in the grep lookahead syntax in cshell
Here's the gist of what I'm trying to do.
There will be a dogfile.txt with several lines listing the names of dogs in the format below
genericDog2033, pomeranian
genericDog2034, greatDane
genericDog2035, Doberman
I wanted a way of retrieving the breed of the dog after the comma on each line so I thought a grep lookahead might be a good way of doing it. The project I'm working on isn't so hard-coded however, so I have no way of knowing what genericDog number I am searching for. There will be a shell variable in a greater while loop which will have access to the dog name.
For example if I set the dogNumber variable to the first dog in the file like so:
set dogNumber = genericDog2033
I then try to access the value of dogNumber in the grep lookahead
set dogBreed = `cat File.txt | grep -oP '(?<=$dogNumber ,)[^ ]*'`
The problem with the line above is that I think grep is looking for the literal string "$dognumber ," in the file which obviously doesn't exist. Is there some sort of wrapper I can put around the shell variable so cshell knows that dogNumber is a variable? I'm also open to other methods of doing this. Any help would be appreciated, this is the literal last line of code I need to finish my project and I'm at my wits end.
Variable expansion only happens inside double quotes ("), and not single quotes ('):
% set var = 'hello'
% echo '$var'
$var
% echo "$var"
hello
Furthermore, you have an error in your regexp:
(?<=$dogNumber ,)[^ ]*
In your data, the space is after the comma, not before.
% set dogNumber = genericDog2033
% set dogBreed = `cat a | grep -oP "(?<=$dogNumber, )[^ ]*"`
% echo $dogBreed
pomeranian
The easiest way to debug this is to not use variables at all in the first place, and simply check if the grep works:
% grep -oP "(?<=genericDog2034 ,)[^ ].*" a
[no output]
Then first make the grep work with static data, add the variable to make that work, and then put it all together by assigning it to a variable.

Trying to figure out why my regex command won't work [duplicate]

This question already has answers here:
How do you use a plus symbol with a character class as part of a regular expression?
(3 answers)
Closed 2 years ago.
I have a problem to work on and was wondering why my regex won't work. It's a simple exercise to match words in a text dictionary that contains the top row. I believe I have a solution but grep comes up blank every time:
grep ^[qwertyuiop]+$ /opt/~~~~~~/data/web2
this is my command, which does nothing, but if i just put:
grep [qwertyuiop] /opt/~~~~~~/data/web2
it matches words with letters from the top row. Can anybody tell me why it isn't working? Thank you all for your time.
you're super close.
With grep you want to use the -x flag to match the whole line.
grep -x '[qwertyuiop]\+' /usr/share/dict/american-english
then a simple escaped + to match multiple characters.
if you want to avoid the -x you can take your original approach like so:
grep '^[qwertyuiop]\+$' /usr/share/dict/american-english
With an escape and some quotes it works marvelously, although i think the -x is more idiomatic, as some other people have commented, you can also get away with using -e although that can have some unintended consequences. I recommend man grep which gives a nice overview.
I don't think grep recognizes ^ $ or + on it's own. You have to use grep -e or egrep to use special characters like that

Get content inside brackets using grep

I have text that looks like this:
Name (OneData) [113C188D-5F70-44FE-A709-A07A5289B75D] (MoreData)
I want to use grep or some other way to get the ID inside [].
How to do it?
You can do something like this via bash (GNU grep required):
t="Name (OneData) [113C188D-5F70-44FE-A709-A07A5289B75D] (MoreData)"
echo "$t" | grep -Po "(?<=\[).*(?=\])"
The pattern will give you everything between the brackets, and uses a zero-width look-behind assertion (?<= ...) to eliminate the opening bracket and uses a zero-width look-ahead assertion (?= ...) to eliminate the closing bracket.
The -P flag activates perl-style regexes which can be useful not having too much to escape, then. The -o flag will give you only the wanted result (not the "non-capturing groups").
If you don't have GNU grep available, you can solve the problem in two steps (there are probably also other solutions):
Get the ID with the brackets (\[.*\])
Remove the brackets (] and [, here via sed, for example)
echo "$t" | grep -o "\[.*\]" | sed 's/[][]//g'
As Cyrus commented, you can also use the pattern grep -oE '[0-9A-F-]{36}' if you can ensure not having strings of length 36 or larger containing only the characters 0-9, A-F and - and if all the IDs have the length of 36 characters, of course. Then you can simply ignore the brackets.

grep treating leading * as literal character?

I recently noticed this behavior of GNU grep:
$ grep '*' <<< 'aaa'; echo $?
1
$ grep '*' <<< 'aa**aa'
aa**aa
In the second command, both asterisks are output highlighted, meaning they're considered "matched" by grep.
As far as I know, GNU grep assumes POSIX BRE (as grep -G) without any options, and a single asterisk is an invalid BRE. However, it appears like grep treats a leading asterisk as a literal one:
$ grep '*?' <<< 'aaa***???bbb'
aaa***???bbb
^^
This may appear intuitive to non-regexers, but I'm finding it strange. I have gone through man grep but can't find any related description about this behavior.
Why doesn't grep complain about this invalid regex but instead treat inappropriately positioned metacharacters as literal ones?
In POSIX BRE, * is required to match a * character when it's found:
In a bracket expression
As the first character of an entire BRE (after an initial '^', if any)
As the first character of a subexpression (after an initial '^', if any);
So grep '*', grep '^*', grep '\(*\)', grep '\(^*\)', grep '[*]' are all required to match on a literal *.
It's different in POSIX EREs (as used with grep -E), where the behaviour is undefined if * (or +, ?, {x,y}) is used in those contexts (which allows some implementations to implement some extended (?...), (*...) operators for instance, though most actually report errors instead)

grep: one pattern works but not the other

I have a teb-delimited file that has gene names in one column and expression values for these genes in the other. I want to delete certain genes from this file using grep. So, this:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
"42266" "snoMBII-202" "0"
"42267" "snoMBII-202" "0"
"42268" "snoMe28S-Am2634" "0"
"42269" "snoMe28S-Am2634" "0"
"42270" "snoR26" "0"
"42271" "SNORA1" "0"
"42272" "SNORA1" "0"
becomes this:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
I've used the following command that i've put together with my limited terminal knowledge:
grep -iv sno* <input.text> | grep -iv rp* | grep -iv U6* | grep -iv 7SK* > <output.txt>
So with this command, my output file lacks genes that start with sno, u6 and 7sk but somehow grep has deleted all the genes that has "r" in them instead of the ones that start with "rp". I'm very confused about this. Any ideas why sno* works but rp* not?
Thanks!
The grep command uses regular expressions, not globbing patterns.
The pattern rp* means "'r' followed by zero or more 'p'". What you really want is rp.*, or even better, "rp.* (or even just "rp, there's no point in trying to grep for anything after the "rp" after all). Likewise, sno* means "'sn' followed by zero or more 'o'". Again, you'd want sno.* or "sno.* (or even just "sno).
Although this doesn't directly answer your question, there is one thing in your sample command line that you may want to be careful with: Whenever you use a special shell metacharacter (like "*"), you need to escape or quote it. So your command line should look more like:
grep -iv 'sno*' <input.text> | grep -iv 'rp*' | grep -iv 'U6*' | grep -iv '7SK*' > <output.txt>
Often, shells are smart, and if no files match the glob, they will use the text as-is (so if you enter "foo*" but there are no filenames starting with "foo", then the string "foo*" will be passed to the command).
grep -iEv "sno|rp|U6|7SK" yourInput
test:
kent$ cat b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
"42266" "snoMBII-202" "0"
"42267" "snoMBII-202" "0"
"42268" "snoMe28S-Am2634" "0"
"42269" "snoMe28S-Am2634" "0"
"42270" "snoR26" "0"
"42271" "SNORA1" "0"
"42272" "SNORA1" "0"
kent$ grep -iEv "sno|rp|U6|7SK" b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"

Resources