Find a string between two characters with grep - grep

I have found on this answer the regex to find a string between two characters. In my case I want to find every pattern between ‘ and ’. Here's the regex :
(?<=‘)(.*?)(?=’)
Indeed, it works when I try it on https://regex101.com/.
The thing is I want to use it with grep but it doesn't work :
grep -E '(?<=‘)(.*?)(?=’)' file
Is there anything missing ?

Those are positive look-ahead and look behind assertions. You need to enable it using PCRE(Perl Compatible Regex) and perhaps its better to get only matching part using -o option in GNU grep:
grep -oP '(?<=‘)(.*?)(?=’)' file

Related

Match pattern ending with a certain character in grep

This is a common problem I encounter when using grep. Say the pattern is 'chr1' in a third column of a file, when I do the following:
grep 'chr1' file
How can I avoid getting the results including chr10, chr11, chr13 etc as well?
Thanks!
It seems this works:
grep -w 'chr1' file
Since you're interested in values in specific columns, you're much better off using awk:
awk '$3 == "chr1"' file

How to use grep to search for an exact word match in TextWrangler

There is a possibility to search using grep in TextWrangler
I want to find and replace the following word: bauvol, but not bauvolumen.
I tried typing ^bauvol$ into the search field but that didn't do the trick, it didn't find anything, although the word is clearly there.
I think it's because, in grep, the ^and $signify start and end of line, not a word?!
You want to use \b as word boundaries, as #gromi08 said:
\bbauvol\b
If you want to copy any portion of this word (so you can replace it, modify it, change the case, etc.) it is usually best to wrap it in ( and ) braces so you can reference them in the Replace box:
Find:
(\bbauvol\b)
Replace:
<some_tag>\1</some_tag>
Did you have anything specific you were trying to do with the result once you found it (cut it, duplicate it, etc.)?
Use the -w option of grep (see grep man-page.
This option searches for the expression as a word.
Therefore the command will be:
cat file.txt | grep -w bauvol
And yes, ^ and $ are for start and end of line.

grep from beginning of found word to end of word

I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.

Opposite of "only-matching" in grep?

Is there any way to do the opposite of showing only the matching part of strings in grep (the -o flag), that is, show everything except the part that matches the regex?
That is, the -v flag is not the answer, since that would not show files containing the match at all, but I want to show these lines, but not the part of the line that matches.
EDIT: I wanted to use grep over sed, since it can do "only-matching" matches on multi-line, with:
cat file.xml|grep -Pzo "<starttag>.*?(\n.*?)+.*?</starttag>"
This is a rather unusual requirement, I don't think grep would alternate the strings like that. You can achieve this with sed, though:
sed -n 's/$PATTERN//gp' file
EDIT in response to OP's edit:
You can do multiline matching with sed, too, if the file is small enough to load it all into memory:
sed -rn ':r;$!{N;br};s/<starttag>.*?(\n.*?)+.*?<\/starttag>//gp' file.xml
You can do that with a little help from sed:
grep "pattern" input_file | sed 's/pattern//g'
I don't think there is a way in grep.
If you use ack, you could output Perl's special variables $` and $' variables to show everything before and after the match, respectively:
ack string --output="\$`\$'"
Similarly if you wanted to output what did match along with other text, you could use $& which contains the matched string;
ack string --output="Matched: $&"

How to grep to find all instances of a Java method call using a reference?

I am trying the following query, but without success
grep -nr "[[:alnum:]]+\.[[:alnum:]]+\(\)" .
So, according to my logic, a method call would be one or more alphanumeric characters
[[:alnum:]]+
followed by a dot
\.
followed by one or more alphanumeric characters
[[:alnum:]]+
followed by paranthesis (for void return type only)
\(\)
But this query isn't working. How to write such a query?
grep provides several types of regex syntax.
Your pattern is written is the extended syntax and works with -E
extended-regexp has an easier/better syntax, and perl-regexp is, well, quite powerful.
-E, --extended-regexp
-F, --fixed-strings
-G, --basic-regexp (the default)
-P, --perl-regexp
grep -nrE "[[:alnum:]]+\.[[:alnum:]]+\(\)" .
You need to use "\+" instead of "+" otherwise it'll directly match the character "+".

Resources